How to format inputs to ChatGPT models

ChatGPT由gpt-3.5-turbo和gpt-4驱动，这是OpenAI最先进的模型。

您可以使用OpenAI API通过gpt-3.5-turbo或gpt-4构建自己的应用程序。

聊天模型接收一系列消息作为输入，并返回AI撰写的消息作为输出。

本指南通过几个示例API调用演示了聊天格式。

1. 导入openai库

# if needed, install and/or upgrade to the latest version of the OpenAI Python library
%pip install --upgrade openai

# import the OpenAI Python library for calling the OpenAI API
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))

2. 聊天补全API调用示例

聊天补全API调用参数，必填

model: 你想使用的模型名称（例如：gpt-3.5-turbo, gpt-4, gpt-3.5-turbo-16k-1106）
messages: a list of message objects, where each object has two required fields:
- role: 消息发送者的角色（可以是system、user、assistant或tool）
- content: 消息内容 (例如: Write me a beautiful poem)

消息还可以包含一个可选的name字段，用于为发送者指定名称。例如：example-user、Alice、BlackbeardBot。名称中不能包含空格。

可选

frequency_penalty: 根据词频对标记进行惩罚，减少重复内容。
logit_bias: 通过偏置值调整指定标记的出现概率。
logprobs: 如果为true，则返回输出词元的对数概率。
top_logprobs: 指定在每个位置返回的最可能标记的数量。
max_tokens: 设置聊天补全中生成的最大令牌数。
n: 为每个输入生成指定数量的聊天完成选项。
presence_penalty: 根据文本中已存在的标记对新标记进行惩罚。
response_format: 指定输出格式，例如JSON模式。
seed: 通过指定种子确保确定性采样。
stop: 指定最多4个序列，API应在生成标记时停止。
stream: 当令牌可用时发送部分消息增量。
temperature: 设置采样温度，范围在0到2之间。
top_p: 使用核心采样方法; 考虑具有top_p概率质量的token。
tools: 列出模型可以调用的函数。
tool_choice: 控制模型的功能调用方式（无/自动/函数）。
user: 用于终端用户监控和滥用检测的唯一标识符。

截至2024年1月，您还可以选择提交一个functions列表，用于指示GPT是否可以生成JSON数据以供函数调用。详情请参阅文档、API参考或实用指南如何使用聊天模型调用函数。

通常，对话会以一条系统消息开始，告知助手应如何表现，随后交替出现用户和助手消息，但您无需严格遵循此格式。

让我们看一个聊天API调用的示例，了解聊天格式在实际中是如何工作的。

# Example OpenAI Python library request
MODEL = "gpt-3.5-turbo"
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Knock knock."},
        {"role": "assistant", "content": "Who's there?"},
        {"role": "user", "content": "Orange."},
    ],
    temperature=0,
)

print(json.dumps(json.loads(response.model_dump_json()), indent=4))

{
    "id": "chatcmpl-8dee9DuEFcg2QILtT2a6EBXZnpirM",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "Orange who?",
                "role": "assistant",
                "function_call": null,
                "tool_calls": null
            }
        }
    ],
    "created": 1704461729,
    "model": "gpt-3.5-turbo-0613",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 3,
        "prompt_tokens": 35,
        "total_tokens": 38
    }
}

如你所见，响应对象包含以下几个字段：

id: 请求的ID
choices: a list of completion objects (only one, unless you set n greater than 1)
- finish_reason: 模型停止生成文本的原因（可能是stop，或者如果达到max_tokens限制时则为length）
- index: 选项在选项列表中的索引位置。
- logprobs: 该选项的对数概率信息。
- message: the message object generated by the model
  - content: 消息内容
  - role: 此消息作者的角色。
  - tool_calls: 由模型生成的工具调用，例如函数调用（如果提供了工具参数）
created: 请求的时间戳
model: 用于生成响应的模型全称
object: 返回对象的类型（例如 chat.completion）
system_fingerprint: 该指纹表示模型运行时的后端配置。
usage: 用于生成回复的令牌数量，统计包括提示词、补全内容和总量

仅提取回复内容：

response.choices[0].message.content

'Orange who?'

即使是非对话类任务，也可以通过将指令放在第一条用户消息中，适配到聊天格式里。

例如，要让模型以海盗黑胡子的风格解释异步编程，我们可以按以下方式构建对话：

# example with a system message
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

Arr, me matey! Let me tell ye a tale of asynchronous programming, in the style of the fearsome pirate Blackbeard!

Picture this, me hearties. In the vast ocean of programming, there be times when ye need to perform multiple tasks at once. But fear not, for asynchronous programming be here to save the day!

Ye see, in traditional programming, ye be waitin' for one task to be done before movin' on to the next. But with asynchronous programming, ye can be takin' care of multiple tasks at the same time, just like a pirate multitaskin' on the high seas!

Instead of waitin' for a task to be completed, ye can be sendin' it off on its own journey, while ye move on to the next task. It be like havin' a crew of trusty sailors, each takin' care of their own duties, without waitin' for the others.

Now, ye may be wonderin', how does this sorcery work? Well, me matey, it be all about callbacks and promises. When ye be sendin' off a task, ye be attachin' a callback function to it. This be like leavin' a message in a bottle, tellin' the task what to do when it be finished.

While the task be sailin' on its own, ye can be movin' on to the next task, without wastin' any precious time. And when the first task be done, it be sendin' a signal back to ye, lettin' ye know it be finished. Then ye can be takin' care of the callback function, like openin' the bottle and readin' the message inside.

But wait, there be more! With promises, ye can be makin' even fancier arrangements. Instead of callbacks, ye be makin' a promise that the task will be completed. It be like a contract between ye and the task, swearin' that it will be done.

Ye can be attachin' multiple promises to a task, promisin' different outcomes. And when the task be finished, it be fulfillin' the promises, lettin' ye know it be done. Then ye can be handlin' the fulfillments, like collectin' the rewards of yer pirate adventures!

So, me hearties, that be the tale of asynchronous programming, told in the style of the fearsome pirate Blackbeard! With callbacks and promises, ye can be takin' care of multiple tasks at once, just like a pirate conquerin' the seven seas!

# example without a system message
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

Arr, me hearties! Gather 'round and listen up, for I be tellin' ye about the mysterious art of asynchronous programming, in the style of the fearsome pirate Blackbeard!

Now, ye see, in the world of programming, there be times when we need to perform tasks that take a mighty long time to complete. These tasks might involve fetchin' data from the depths of the internet, or performin' complex calculations that would make even Davy Jones scratch his head.

In the olden days, we pirates used to wait patiently for each task to finish afore movin' on to the next one. But that be a waste of precious time, me hearties! We be pirates, always lookin' for ways to be more efficient and plunder more booty!

That be where asynchronous programming comes in, me mateys. It be a way to tackle multiple tasks at once, without waitin' for each one to finish afore movin' on. It be like havin' a crew of scallywags workin' on different tasks simultaneously, while ye be overseein' the whole operation.

Ye see, in asynchronous programming, we be breakin' down our tasks into smaller chunks called "coroutines." Each coroutine be like a separate pirate, workin' on its own task. When a coroutine be startin' its work, it don't wait for the task to finish afore movin' on to the next one. Instead, it be movin' on to the next task, lettin' the first one continue in the background.

Now, ye might be wonderin', "But Blackbeard, how be we know when a task be finished if we don't wait for it?" Ah, me hearties, that be where the magic of callbacks and promises come in!

When a coroutine be startin' its work, it be attachin' a callback or a promise to it. This be like leavin' a message in a bottle, tellin' the coroutine what to do when it be finished. So, while the coroutine be workin' away, the rest of the crew be movin' on to other tasks, plunderin' more booty along the way.

When a coroutine be finished with its task, it be sendin' a signal to the callback or fulfillin' the promise, lettin' the rest of the crew know that it be done. Then, the crew can gather 'round and handle the results of the completed task, celebratin' their victory and countin' their plunder.

So, me hearties, asynchronous programming be like havin' a crew of pirates workin' on different tasks at once, without waitin' for each one to finish afore movin' on. It be a way to be more efficient, plunder more booty, and conquer the vast seas of programming!

Now, set sail, me mateys, and embrace the power of asynchronous programming like true pirates of the digital realm! Arr!

3. 指导gpt-3.5-turbo-0301的使用技巧

指导模型的最佳实践可能会因模型版本而异。以下建议适用于gpt-3.5-turbo-0301，可能不适用于未来的模型。

系统消息

系统消息可用于为助手设定不同的个性或行为模式。

请注意，gpt-3.5-turbo-0301通常不像gpt-4-0314或gpt-3.5-turbo-0613那样重视系统消息。因此，对于gpt-3.5-turbo-0301，我们建议将重要指令放在用户消息中。一些开发者发现，随着对话变长，持续将系统消息移至对话末尾附近可以防止模型的注意力分散。

# An example of a system message that primes the assistant to explain concepts in great depth
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a friendly and helpful teaching assistant. You explain concepts in great depth using simple terms, and you give examples to help people learn. At the end of each explanation, you ask a question to check for understanding"},
        {"role": "user", "content": "Can you explain how fractions work?"},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

Of course! Fractions are a way to represent parts of a whole. They are made up of two numbers: a numerator and a denominator. The numerator tells you how many parts you have, and the denominator tells you how many equal parts make up the whole.

Let's take an example to understand this better. Imagine you have a pizza that is divided into 8 equal slices. If you eat 3 slices, you can represent that as the fraction 3/8. Here, the numerator is 3 because you ate 3 slices, and the denominator is 8 because the whole pizza is divided into 8 slices.

Fractions can also be used to represent numbers less than 1. For example, if you eat half of a pizza, you can write it as 1/2. Here, the numerator is 1 because you ate one slice, and the denominator is 2 because the whole pizza is divided into 2 equal parts.

Now, let's talk about equivalent fractions. Equivalent fractions are different fractions that represent the same amount. For example, 1/2 and 2/4 are equivalent fractions because they both represent half of something. To find equivalent fractions, you can multiply or divide both the numerator and denominator by the same number.

Here's a question to check your understanding: If you have a cake divided into 12 equal slices and you eat 4 slices, what fraction of the cake did you eat?

# An example of a system message that primes the assistant to give brief, to-the-point answers
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a laconic assistant. You reply with brief, to-the-point answers with no elaboration."},
        {"role": "user", "content": "Can you explain how fractions work?"},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

Fractions represent parts of a whole. They have a numerator (top number) and a denominator (bottom number).

Few-shot prompting

在某些情况下，向模型展示你想要的内容比直接告诉模型更容易。

向模型展示你想要什么的一种方法是使用虚构的示例消息。

例如：

# An example of a faked few-shot conversation to prime the model into translating business jargon to simpler speech
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful, pattern-following assistant."},
        {"role": "user", "content": "Help me translate the following corporate jargon into plain English."},
        {"role": "assistant", "content": "Sure, I'd be happy to!"},
        {"role": "user", "content": "New synergies will help drive top-line growth."},
        {"role": "assistant", "content": "Things working well together will increase revenue."},
        {"role": "user", "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage."},
        {"role": "assistant", "content": "Let's talk later when we're less busy about how to do better."},
        {"role": "user", "content": "This late pivot means we don't have time to boil the ocean for the client deliverable."},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

This sudden change in direction means we don't have enough time to complete the entire project for the client.

为了帮助明确示例消息并非真实对话的一部分，且不应被模型回溯引用，您可以尝试将system消息的name字段设置为example_user和example_assistant。

将上述的少量示例进行转换后，我们可以这样编写：

# The business jargon translation example, but with example names for the example messages
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English."},
        {"role": "system", "name":"example_user", "content": "New synergies will help drive top-line growth."},
        {"role": "system", "name": "example_assistant", "content": "Things working well together will increase revenue."},
        {"role": "system", "name":"example_user", "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage."},
        {"role": "system", "name": "example_assistant", "content": "Let's talk later when we're less busy about how to do better."},
        {"role": "user", "content": "This late pivot means we don't have time to boil the ocean for the client deliverable."},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

This sudden change in direction means we don't have enough time to complete the entire project for the client.

并非每一次尝试设计对话都能在一开始就取得成功。

如果初次尝试失败，不要害怕尝试不同的模型引导或条件设置方法。

例如，一位开发者发现，当他们插入一条用户消息"目前为止做得很好，这些都非常完美"时，模型的准确性有所提高，这有助于引导模型提供更高质量的响应。

如需获取更多提升模型可靠性的思路，请参阅我们的提高可靠性技术指南。该指南虽针对非聊天模型编写，但其大部分原则仍适用。

4. 计算令牌

当您提交请求时，API会将消息转换为一系列令牌。

使用的令牌数量会影响：

请求的成本
生成响应所需的时间
当回复因达到最大token限制而被截断时（gpt-3.5-turbo为4,096，gpt-4为8,192）

您可以使用以下函数来计算消息列表将使用的令牌数量。

请注意，不同模型计算消息中的token数量的具体方式可能有所不同。以下函数提供的计数应视为估算值，而非永久保证。

特别是，使用可选函数输入的请求将在下方估算的基础上消耗额外的令牌。

了解更多关于令牌计数的信息，请参阅如何使用tiktoken计算令牌。

import tiktoken


def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    """Return the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model in {
        "gpt-3.5-turbo-0613",
        "gpt-3.5-turbo-16k-0613",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    else:
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}."""
        )
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens

# let's verify the function above matches the OpenAI API response
example_messages = [
    {
        "role": "system",
        "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "New synergies will help drive top-line growth.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Things working well together will increase revenue.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Let's talk later when we're less busy about how to do better.",
    },
    {
        "role": "user",
        "content": "This late pivot means we don't have time to boil the ocean for the client deliverable.",
    },
]

for model in [
    # "gpt-3.5-turbo-0301",
    # "gpt-4-0314",
    # "gpt-4-0613",
    "gpt-3.5-turbo-1106",
    "gpt-3.5-turbo",
    "gpt-4",
    "gpt-4-1106-preview",
    ]:
    print(model)
    # example token count from the function defined above
    print(f"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().")
    # example token count from the OpenAI API
    response = client.chat.completions.create(model=model,
    messages=example_messages,
    temperature=0,
    max_tokens=1)
    token = response.usage.prompt_tokens
    print(f'{token} prompt tokens counted by the OpenAI API.')
    print()

gpt-3.5-turbo-1106
Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-3.5-turbo
Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-4
Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-4-1106-preview
Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.