2022年9月2日

如何流式传输补全结果

默认情况下,当您向OpenAI请求补全时,系统会先生成完整的补全内容,然后通过单次响应返回。

如果你正在生成较长的补全内容,等待响应可能需要数秒时间。

为了更快获得响应,您可以在生成过程中"流式传输"补全结果。这样您可以在完整补全完成之前就开始打印或处理补全的开头部分。

要流式传输补全结果,在调用聊天补全或补全端点时设置stream=True。这将返回一个以纯数据服务器发送事件形式流式返回响应的对象。请从delta字段而非message字段提取数据块。

缺点

请注意,在生产应用中使用stream=True会使补全内容的审核变得更加困难,因为部分补全可能更难评估。这可能对approved usage产生影响。

示例代码

以下,这个笔记本展示了:

  1. 典型的聊天补全响应是什么样的
  2. 流式聊天完成响应的外观示例
  3. 流式传输聊天完成能节省多少时间
  4. 如何获取流式聊天完成响应的令牌使用数据
# !pip install openai
# imports
import time  # for measuring time duration of API calls
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))
# Example of an OpenAI ChatCompletion request
# https://platform.openai.com/docs/guides/text-generation/chat-completions-api

# record the time before the request is sent
start_time = time.time()

# send a ChatCompletion request to count to 100
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[
        {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
    ],
    temperature=0,
)
# calculate the time it took to receive the response
response_time = time.time() - start_time

# print the time delay and text received
print(f"Full response received {response_time:.2f} seconds after request")
print(f"Full response received:\n{response}")
Full response received 1.88 seconds after request
Full response received:
ChatCompletion(id='chatcmpl-9lMgdoiMfxVHPDNVCtvXuTWcQ2GGb', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100', role='assistant', function_call=None, tool_calls=None))], created=1721075651, model='gpt-july-test', object='chat.completion', system_fingerprint='fp_e9b8ed65d2', usage=CompletionUsage(completion_tokens=298, prompt_tokens=36, total_tokens=334))

回复可以通过response.choices[0].message提取。

回复内容可以通过response.choices[0].message.content提取。

reply = response.choices[0].message
print(f"Extracted reply: \n{reply}")

reply_content = response.choices[0].message.content
print(f"Extracted content: \n{reply_content}")
Extracted reply: 
ChatCompletionMessage(content='1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100', role='assistant', function_call=None, tool_calls=None)
Extracted content: 
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100

2. 如何流式传输聊天完成

通过流式API调用,响应会通过事件流以增量方式分块返回。在Python中,您可以使用for循环遍历这些事件。

让我们看看它是什么样子的:

# Example of an OpenAI ChatCompletion request with stream=True
# https://platform.openai.com/docs/api-reference/streaming#chat/create-stream

# a ChatCompletion request
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[
        {'role': 'user', 'content': "What's 1+1? Answer in one word."}
    ],
    temperature=0,
    stream=True  # this time, we set stream=True
)

for chunk in response:
    print(chunk)
    print(chunk.choices[0].delta.content)
    print("****************")
ChatCompletionChunk(id='chatcmpl-9lMgfRSWPHcw51s6wxKT1YEO2CKpd', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1721075653, model='gpt-july-test', object='chat.completion.chunk', system_fingerprint='fp_e9b8ed65d2', usage=None)

****************
ChatCompletionChunk(id='chatcmpl-9lMgfRSWPHcw51s6wxKT1YEO2CKpd', choices=[Choice(delta=ChoiceDelta(content='Two', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1721075653, model='gpt-july-test', object='chat.completion.chunk', system_fingerprint='fp_e9b8ed65d2', usage=None)
Two
****************
ChatCompletionChunk(id='chatcmpl-9lMgfRSWPHcw51s6wxKT1YEO2CKpd', choices=[Choice(delta=ChoiceDelta(content='.', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1721075653, model='gpt-july-test', object='chat.completion.chunk', system_fingerprint='fp_e9b8ed65d2', usage=None)
.
****************
ChatCompletionChunk(id='chatcmpl-9lMgfRSWPHcw51s6wxKT1YEO2CKpd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1721075653, model='gpt-july-test', object='chat.completion.chunk', system_fingerprint='fp_e9b8ed65d2', usage=None)
None
****************

如上所示,流式响应包含一个delta字段而非message字段。delta可以包含以下内容:

  • 一个角色令牌(例如 {"role": "assistant"}
  • 内容令牌(例如 {"content": "\n\n"}
  • 无内容(例如 {}),当流结束时
# Example of an OpenAI ChatCompletion request with stream=True
# https://platform.openai.com/docs/api-reference/streaming#chat/create-stream

# record the time before the request is sent
start_time = time.time()

# send a ChatCompletion request to count to 100
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[
        {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
    ],
    temperature=0,
    stream=True  # again, we set stream=True
)
# create variables to collect the stream of chunks
collected_chunks = []
collected_messages = []
# iterate through the stream of events
for chunk in response:
    chunk_time = time.time() - start_time  # calculate the time delay of the chunk
    collected_chunks.append(chunk)  # save the event response
    chunk_message = chunk.choices[0].delta.content  # extract the message
    collected_messages.append(chunk_message)  # save the message
    print(f"Message received {chunk_time:.2f} seconds after request: {chunk_message}")  # print the delay and text

# print the time delay and text received
print(f"Full response received {chunk_time:.2f} seconds after request")
# clean None in collected_messages
collected_messages = [m for m in collected_messages if m is not None]
full_reply_content = ''.join(collected_messages)
print(f"Full conversation received: {full_reply_content}")
Message received 1.14 seconds after request: 
Message received 1.14 seconds after request: 1
Message received 1.14 seconds after request: ,
Message received 1.14 seconds after request:  
Message received 1.14 seconds after request: 2
Message received 1.16 seconds after request: ,
Message received 1.16 seconds after request:  
Message received 1.16 seconds after request: 3
Message received 1.35 seconds after request: ,
Message received 1.35 seconds after request:  
Message received 1.35 seconds after request: 4
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 5
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 6
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 7
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 8
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 9
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 10
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 11
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.36 seconds after request: 12
Message received 1.36 seconds after request: ,
Message received 1.36 seconds after request:  
Message received 1.45 seconds after request: 13
Message received 1.45 seconds after request: ,
Message received 1.45 seconds after request:  
Message received 1.45 seconds after request: 14
Message received 1.45 seconds after request: ,
Message received 1.45 seconds after request:  
Message received 1.45 seconds after request: 15
Message received 1.45 seconds after request: ,
Message received 1.45 seconds after request:  
Message received 1.46 seconds after request: 16
Message received 1.46 seconds after request: ,
Message received 1.46 seconds after request:  
Message received 1.47 seconds after request: 17
Message received 1.47 seconds after request: ,
Message received 1.47 seconds after request:  
Message received 1.49 seconds after request: 18
Message received 1.49 seconds after request: ,
Message received 1.49 seconds after request:  
Message received 1.52 seconds after request: 19
Message received 1.52 seconds after request: ,
Message received 1.52 seconds after request:  
Message received 1.53 seconds after request: 20
Message received 1.53 seconds after request: ,
Message received 1.53 seconds after request:  
Message received 1.55 seconds after request: 21
Message received 1.55 seconds after request: ,
Message received 1.55 seconds after request:  
Message received 1.56 seconds after request: 22
Message received 1.56 seconds after request: ,
Message received 1.56 seconds after request:  
Message received 1.58 seconds after request: 23
Message received 1.58 seconds after request: ,
Message received 1.58 seconds after request:  
Message received 1.59 seconds after request: 24
Message received 1.59 seconds after request: ,
Message received 1.59 seconds after request:  
Message received 1.62 seconds after request: 25
Message received 1.62 seconds after request: ,
Message received 1.62 seconds after request:  
Message received 1.62 seconds after request: 26
Message received 1.62 seconds after request: ,
Message received 1.62 seconds after request:  
Message received 1.65 seconds after request: 27
Message received 1.65 seconds after request: ,
Message received 1.65 seconds after request:  
Message received 1.67 seconds after request: 28
Message received 1.67 seconds after request: ,
Message received 1.67 seconds after request:  
Message received 1.69 seconds after request: 29
Message received 1.69 seconds after request: ,
Message received 1.69 seconds after request:  
Message received 1.80 seconds after request: 30
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.80 seconds after request: 31
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.80 seconds after request: 32
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.80 seconds after request: 33
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.80 seconds after request: 34
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.80 seconds after request: 35
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.80 seconds after request: 36
Message received 1.80 seconds after request: ,
Message received 1.80 seconds after request:  
Message received 1.82 seconds after request: 37
Message received 1.82 seconds after request: ,
Message received 1.82 seconds after request:  
Message received 1.83 seconds after request: 38
Message received 1.83 seconds after request: ,
Message received 1.83 seconds after request:  
Message received 1.84 seconds after request: 39
Message received 1.84 seconds after request: ,
Message received 1.84 seconds after request:  
Message received 1.87 seconds after request: 40
Message received 1.87 seconds after request: ,
Message received 1.87 seconds after request:  
Message received 1.88 seconds after request: 41
Message received 1.88 seconds after request: ,
Message received 1.88 seconds after request:  
Message received 1.91 seconds after request: 42
Message received 1.91 seconds after request: ,
Message received 1.91 seconds after request:  
Message received 1.93 seconds after request: 43
Message received 1.93 seconds after request: ,
Message received 1.93 seconds after request:  
Message received 1.93 seconds after request: 44
Message received 1.93 seconds after request: ,
Message received 1.93 seconds after request:  
Message received 1.95 seconds after request: 45
Message received 1.95 seconds after request: ,
Message received 1.95 seconds after request:  
Message received 2.00 seconds after request: 46
Message received 2.00 seconds after request: ,
Message received 2.00 seconds after request:  
Message received 2.00 seconds after request: 47
Message received 2.00 seconds after request: ,
Message received 2.00 seconds after request:  
Message received 2.00 seconds after request: 48
Message received 2.00 seconds after request: ,
Message received 2.00 seconds after request:  
Message received 2.00 seconds after request: 49
Message received 2.00 seconds after request: ,
Message received 2.00 seconds after request:  
Message received 2.00 seconds after request: 50
Message received 2.00 seconds after request: ,
Message received 2.00 seconds after request:  
Message received 2.00 seconds after request: 51
Message received 2.00 seconds after request: ,
Message received 2.04 seconds after request:  
Message received 2.04 seconds after request: 52
Message received 2.04 seconds after request: ,
Message received 2.04 seconds after request:  
Message received 2.04 seconds after request: 53
Message received 2.04 seconds after request: ,
Message received 2.13 seconds after request:  
Message received 2.13 seconds after request: 54
Message received 2.14 seconds after request: ,
Message received 2.14 seconds after request:  
Message received 2.14 seconds after request: 55
Message received 2.14 seconds after request: ,
Message received 2.14 seconds after request:  
Message received 2.14 seconds after request: 56
Message received 2.14 seconds after request: ,
Message received 2.14 seconds after request:  
Message received 2.16 seconds after request: 57
Message received 2.16 seconds after request: ,
Message received 2.16 seconds after request:  
Message received 2.17 seconds after request: 58
Message received 2.17 seconds after request: ,
Message received 2.17 seconds after request:  
Message received 2.19 seconds after request: 59
Message received 2.19 seconds after request: ,
Message received 2.19 seconds after request:  
Message received 2.21 seconds after request: 60
Message received 2.21 seconds after request: ,
Message received 2.21 seconds after request:  
Message received 2.34 seconds after request: 61
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.34 seconds after request: 62
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.34 seconds after request: 63
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.34 seconds after request: 64
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.34 seconds after request: 65
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.34 seconds after request: 66
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.34 seconds after request: 67
Message received 2.34 seconds after request: ,
Message received 2.34 seconds after request:  
Message received 2.36 seconds after request: 68
Message received 2.36 seconds after request: ,
Message received 2.36 seconds after request:  
Message received 2.36 seconds after request: 69
Message received 2.36 seconds after request: ,
Message received 2.36 seconds after request:  
Message received 2.38 seconds after request: 70
Message received 2.38 seconds after request: ,
Message received 2.38 seconds after request:  
Message received 2.39 seconds after request: 71
Message received 2.39 seconds after request: ,
Message received 2.39 seconds after request:  
Message received 2.39 seconds after request: 72
Message received 2.39 seconds after request: ,
Message received 2.39 seconds after request:  
Message received 2.39 seconds after request: 73
Message received 2.39 seconds after request: ,
Message received 2.39 seconds after request:  
Message received 2.39 seconds after request: 74
Message received 2.39 seconds after request: ,
Message received 2.39 seconds after request:  
Message received 2.39 seconds after request: 75
Message received 2.39 seconds after request: ,
Message received 2.40 seconds after request:  
Message received 2.40 seconds after request: 76
Message received 2.40 seconds after request: ,
Message received 2.42 seconds after request:  
Message received 2.42 seconds after request: 77
Message received 2.42 seconds after request: ,
Message received 2.51 seconds after request:  
Message received 2.51 seconds after request: 78
Message received 2.51 seconds after request: ,
Message received 2.52 seconds after request:  
Message received 2.52 seconds after request: 79
Message received 2.52 seconds after request: ,
Message received 2.52 seconds after request:  
Message received 2.52 seconds after request: 80
Message received 2.52 seconds after request: ,
Message received 2.52 seconds after request:  
Message received 2.52 seconds after request: 81
Message received 2.52 seconds after request: ,
Message received 2.52 seconds after request:  
Message received 2.52 seconds after request: 82
Message received 2.52 seconds after request: ,
Message received 2.60 seconds after request:  
Message received 2.60 seconds after request: 83
Message received 2.60 seconds after request: ,
Message received 2.64 seconds after request:  
Message received 2.64 seconds after request: 84
Message received 2.64 seconds after request: ,
Message received 2.64 seconds after request:  
Message received 2.64 seconds after request: 85
Message received 2.64 seconds after request: ,
Message received 2.64 seconds after request:  
Message received 2.66 seconds after request: 86
Message received 2.66 seconds after request: ,
Message received 2.66 seconds after request:  
Message received 2.66 seconds after request: 87
Message received 2.66 seconds after request: ,
Message received 2.66 seconds after request:  
Message received 2.68 seconds after request: 88
Message received 2.68 seconds after request: ,
Message received 2.68 seconds after request:  
Message received 2.69 seconds after request: 89
Message received 2.69 seconds after request: ,
Message received 2.69 seconds after request:  
Message received 2.72 seconds after request: 90
Message received 2.72 seconds after request: ,
Message received 2.72 seconds after request:  
Message received 2.82 seconds after request: 91
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 92
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 93
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 94
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 95
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 96
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 97
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 98
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 99
Message received 2.82 seconds after request: ,
Message received 2.82 seconds after request:  
Message received 2.82 seconds after request: 100
Message received 2.82 seconds after request: None
Full response received 2.82 seconds after request
Full conversation received: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100

时间对比

在上面的示例中,两个请求都花费了大约4到5秒才完全完成。请求时间会根据负载和其他随机因素而变化。

然而,通过流式请求,我们在0.1秒后收到了第一个token,随后每隔约0.01-0.02秒收到后续token。

4. 如何获取流式聊天完成响应的令牌使用数据

您可以通过设置stream_options={"include_usage": True}来获取流式响应的令牌使用统计信息。当您这样做时,将会额外流式传输一个数据块作为最终块。您可以通过该数据块上的usage字段访问整个请求的使用数据。当您设置stream_options={"include_usage": True}时,有几个重要注意事项:

  • 除最后一个块外,所有块的usage字段值将为null。
  • 最后一个数据块中的usage字段包含整个请求的令牌使用统计信息。
  • 最后一个数据块的 choices 字段将始终为空数组 []

让我们通过示例2来看看它是如何工作的。

# Example of an OpenAI ChatCompletion request with stream=True and stream_options={"include_usage": True}

# a ChatCompletion request
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[
        {'role': 'user', 'content': "What's 1+1? Answer in one word."}
    ],
    temperature=0,
    stream=True,
    stream_options={"include_usage": True}, # retrieving token usage for stream response
)

for chunk in response:
    print(f"choices: {chunk.choices}\nusage: {chunk.usage}")
    print("****************")
choices: [Choice(delta=ChoiceDelta(content='', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)]
usage: None
****************
choices: [Choice(delta=ChoiceDelta(content='Two', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)]
usage: None
****************
choices: [Choice(delta=ChoiceDelta(content='.', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)]
usage: None
****************
choices: [Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)]
usage: None
****************
choices: []
usage: CompletionUsage(completion_tokens=2, prompt_tokens=18, total_tokens=20)
****************