2025年4月25日

使用推理模型处理函数调用

OpenAI现在提供使用推理模型的函数调用功能。推理模型经过训练能够遵循逻辑思维链,使其更适合处理复杂或多步骤的任务。

像o3和o4-mini这样的推理模型是通过强化学习训练的大型语言模型,专门用于执行推理任务。推理模型在回答问题前会进行思考,生成一个长长的内部思维链,然后再回应用户。这些模型在复杂问题解决、编程、科学推理以及为智能体工作流进行多步骤规划方面表现卓越。它们也是我们轻量级编程智能体Codex CLI的最佳选择。

在大多数情况下,通过API使用这些模型非常简单,类似于使用熟悉的'聊天'模型。

然而,在使用函数调用等功能时,仍需注意一些细微差别。

本笔记本中的所有示例均使用较新的Responses API,该API为管理对话状态提供了便捷的抽象层。不过,在使用旧版聊天补全API时,此处介绍的原则同样适用。

# pip install openai
# Import libraries 
import json
from openai import OpenAI
from uuid import uuid4
from typing import Callable

client = OpenAI()
MODEL_DEFAULTS = {
    "model": "o4-mini", # 200,000 token context window
    "reasoning": {"effort": "low", "summary": "auto"}, # Automatically summarise the reasoning process. Can also choose "detailed" or "none"
}

让我们使用Responses API对推理模型进行一次简单调用。 我们指定较低的推理强度,并通过实用的output_text属性获取响应。 我们可以提出后续问题,并使用previous_response_id让OpenAI自动管理对话历史记录

response = client.responses.create(
    input="Which of the last four Olympic host cities has the highest average temperature?",
    **MODEL_DEFAULTS
)
print(response.output_text)

response = client.responses.create(
    input="what about the lowest?",
    previous_response_id=response.id,
    **MODEL_DEFAULTS
)
print(response.output_text)
Among the last four Summer Olympic host cities—Tokyo (2020), Rio de Janeiro (2016), London (2012) and Beijing (2008)—Rio de Janeiro has by far the warmest climate. Average annual temperatures are roughly:

• Rio de Janeiro: ≈ 23 °C  
• Tokyo: ≈ 16 °C  
• Beijing: ≈ 13 °C  
• London: ≈ 11 °C  

So Rio de Janeiro has the highest average temperature.
Among those four, London has the lowest average annual temperature, at about 11 °C.

简单又方便!

我们提出了一些相对复杂的问题,可能需要模型推理出一个计划并分步骤执行,但这种推理过程对我们来说是隐藏的——我们只需稍等片刻就能看到最终给出的响应。

然而,如果我们检查输出结果,会发现模型使用了一组隐藏的"推理"标记,这些标记被包含在模型的上下文窗口中,但并未向我们作为终端用户展示。 我们可以在响应中看到这些标记和推理摘要(但看不到实际使用的字面标记)。

print(next(rx for rx in response.output if rx.type == 'reasoning').summary[0].text)
response.usage.to_dict()
**Determining lowest temperatures**

The user is asking about the lowest average temperatures of the last four Olympic host cities: Tokyo, Rio, London, and Beijing. I see London has the lowest average temperature at around 11°C. If I double-check the annual averages: Rio is about 23°C, Tokyo is around 16°C, and Beijing is approximately 13°C. So, my final answer is London with an average of roughly 11°C. I could provide those approximate values clearly for the user.
{'input_tokens': 136,
 'input_tokens_details': {'cached_tokens': 0},
 'output_tokens': 89,
 'output_tokens_details': {'reasoning_tokens': 64},
 'total_tokens': 225}

了解这些推理标记非常重要,因为这意味着与传统聊天模型相比,我们会更快消耗可用的上下文窗口。

调用自定义函数

如果我们向模型提出一个复杂请求,同时还需要使用自定义工具,会发生什么?

  • 假设我们有更多关于奥运城市的问题,但我们还有一个内部数据库,其中包含每个城市的ID。
  • 模型可能需要在推理过程中中途调用我们的工具,然后才返回结果。
  • 让我们创建一个生成随机UUID的函数,并要求模型对这些UUID进行推理。

def get_city_uuid(city: str) -> str:
    """Just a fake tool to return a fake UUID"""
    uuid = str(uuid4())
    return f"{city} ID: {uuid}"

# The tool schema that we will pass to the model
tools = [
    {
        "type": "function",
        "name": "get_city_uuid",
        "description": "Retrieve the internal ID for a city from the internal database. Only invoke this function if the user needs to know the internal ID for a city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "The name of the city to get information about"}
            },
            "required": ["city"]
        }
    }
]

# This is a general practice - we need a mapping of the tool names we tell the model about, and the functions that implement them.
tool_mapping = {
    "get_city_uuid": get_city_uuid
}

# Let's add this to our defaults so we don't have to pass it every time
MODEL_DEFAULTS["tools"] = tools

response = client.responses.create(
    input="What's the internal ID for the lowest-temperature city?",
    previous_response_id=response.id,
    **MODEL_DEFAULTS)
print(response.output_text)

这次我们没有获取到output_text。让我们查看响应输出

response.output
[ResponseReasoningItem(id='rs_68246219e8288191af051173b1d53b3f0c4fbdb0d4a46f3c', summary=[], type='reasoning', status=None),
 ResponseFunctionToolCall(arguments='{"city":"London"}', call_id='call_Mx6pyTjCkSkmASETsVASogoC', name='get_city_uuid', type='function_call', id='fc_6824621b8f6c8191a8095df7230b611e0c4fbdb0d4a46f3c', status='completed')]

在推理步骤中,模型已成功识别出需要进行工具调用,并传回了发送给我们函数调用的指令。

让我们调用函数并将结果发送给模型,以便它能够继续推理。 函数响应是一种特殊类型的消息,因此我们需要将下一条消息结构化为一种特殊类型的输入:

{
    "type": "function_call_output",
    "call_id": function_call.call_id,
    "output": tool_output
}
# Extract the function call(s) from the response
new_conversation_items = []
function_calls = [rx for rx in response.output if rx.type == 'function_call']
for function_call in function_calls:
    target_tool = tool_mapping.get(function_call.name)
    if not target_tool:
        raise ValueError(f"No tool found for function call: {function_call.name}")
    arguments = json.loads(function_call.arguments) # Load the arguments as a dictionary
    tool_output = target_tool(**arguments) # Invoke the tool with the arguments
    new_conversation_items.append({
        "type": "function_call_output",
        "call_id": function_call.call_id, # We map the response back to the original function call
        "output": tool_output
    })
response = client.responses.create(
    input=new_conversation_items,
    previous_response_id=response.id,
    **MODEL_DEFAULTS
)
print(response.output_text)
The internal ID for London is 816bed76-b956-46c4-94ec-51d30b022725.

在这里效果很好 - 因为我们知道模型只需要一次函数调用就能响应 - 但我们也需要考虑可能需要执行多个工具调用才能完成推理的情况。

让我们再添加一次调用来运行网页搜索。

OpenAI的网页搜索工具并未默认集成在推理模型中(截至2025年5月——这一情况可能很快会改变),但使用4o mini或其他支持网页搜索的模型来创建自定义网页搜索功能并不困难。

def web_search(query: str) -> str:
    """Search the web for information and return back a summary of the results"""
    result = client.responses.create(
        model="gpt-4o-mini",
        input=f"Search the web for '{query}' and reply with only the result.",
        tools=[{"type": "web_search_preview"}],
    )
    return result.output_text

tools.append({
        "type": "function",
        "name": "web_search",
        "description": "Search the web for information and return back a summary of the results",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The query to search the web for."}
            },
            "required": ["query"]
        }
    })
tool_mapping["web_search"] = web_search

连续执行多个函数

部分OpenAI模型支持parallel_tool_calls参数,该参数允许模型返回一个可并行执行的函数数组。然而,推理模型可能会生成必须按顺序执行的一系列函数调用,特别是当某些步骤依赖于先前步骤的结果时。 因此,我们需要定义一个通用模式来处理任意复杂的推理工作流:

  • 在对话的每一步中,初始化一个循环
  • 如果响应中包含函数调用,我们必须假设推理仍在进行中,应该将函数结果(以及任何中间推理)反馈给模型以进行进一步推断
  • 如果没有函数调用,而我们收到的是类型为'message'的Response.output,则可以安全地假设智能体已完成推理,我们可以跳出循环
# Let's wrap our logic above into a function which we can use to invoke tool calls.
def invoke_functions_from_response(response,
                                   tool_mapping: dict[str, Callable] = tool_mapping
                                   ) -> list[dict]:
    """Extract all function calls from the response, look up the corresponding tool function(s) and execute them.
    (This would be a good place to handle asynchroneous tool calls, or ones that take a while to execute.)
    This returns a list of messages to be added to the conversation history.
    """
    intermediate_messages = []
    for response_item in response.output:
        if response_item.type == 'function_call':
            target_tool = tool_mapping.get(response_item.name)
            if target_tool:
                try:
                    arguments = json.loads(response_item.arguments)
                    print(f"Invoking tool: {response_item.name}({arguments})")
                    tool_output = target_tool(**arguments)
                except Exception as e:
                    msg = f"Error executing function call: {response_item.name}: {e}"
                    tool_output = msg
                    print(msg)
            else:
                msg = f"ERROR - No tool registered for function call: {response_item.name}"
                tool_output = msg
                print(msg)
            intermediate_messages.append({
                "type": "function_call_output",
                "call_id": response_item.call_id,
                "output": tool_output
            })
        elif response_item.type == 'reasoning':
            print(f'Reasoning step: {response_item.summary}')
    return intermediate_messages

现在让我们演示之前讨论的循环概念。

initial_question = (
    "What are the internal IDs for the cities that have hosted the Olympics in the last 20 years, "
    "and which of those cities have recent news stories (in 2025) about the Olympics? "
    "Use your internal tools to look up the IDs and the web search tool to find the news stories."
)

# We fetch a response and then kick off a loop to handle the response
response = client.responses.create(
    input=initial_question,
    **MODEL_DEFAULTS,
)
while True:   
    function_responses = invoke_functions_from_response(response)
    if len(function_responses) == 0: # We're done reasoning
        print(response.output_text)
        break
    else:
        print("More reasoning required, continuing...")
        response = client.responses.create(
            input=function_responses,
            previous_response_id=response.id,
            **MODEL_DEFAULTS
        )
Reasoning step: []
Invoking tool: get_city_uuid({'city': 'Beijing'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: get_city_uuid({'city': 'London'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: get_city_uuid({'city': 'Tokyo'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: get_city_uuid({'city': 'Paris'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: get_city_uuid({'city': 'Turin'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: get_city_uuid({'city': 'Vancouver'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: get_city_uuid({'city': 'Sochi'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: get_city_uuid({'city': 'Pyeongchang'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: web_search({'query': '2025 Beijing Olympics news'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: web_search({'query': '2025 London Olympics news'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: web_search({'query': '2025 Rio de Janeiro Olympics news'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: web_search({'query': '2025 Tokyo Olympics news'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: web_search({'query': '2025 Paris Olympics news'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: web_search({'query': '2025 Turin Olympics news'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: web_search({'query': '2025 Vancouver Olympics news'})
More reasoning required, continuing...
Reasoning step: [Summary(text='**Focusing on Olympic News**\n\nI need to clarify that the Invictus Games are not related to the Olympics, so I should exclude them from my search. That leaves me with Olympic-specific news focusing on Paris. I also want to consider past events, like Sochi and Pyeongchang, so I think it makes sense to search for news related to Sochi as well. Let’s focus on gathering relevant Olympic updates to keep things organized.', type='summary_text')]
Invoking tool: web_search({'query': '2025 Sochi Olympics news'})
More reasoning required, continuing...
Reasoning step: []
Invoking tool: web_search({'query': '2025 Pyeongchang Olympics news'})
More reasoning required, continuing...
Reasoning step: []
Here are the internal IDs for all cities that have hosted Olympic Games in the last 20 years (2005–2025), along with those cities that have notable 2025 news stories specifically about the Olympics:

1. Beijing (2008 Summer; 2022 Winter)  
   • UUID: 5b058554-7253-4d9d-a434-5d4ccc87c78b  
   • 2025 Olympic News? No major Olympic-specific news in 2025

2. London (2012 Summer)  
   • UUID: 9a67392d-c319-4598-b69a-adc5ffdaaba2  
   • 2025 Olympic News? No

3. Rio de Janeiro (2016 Summer)  
   • UUID: ad5eaaae-b280-4c1d-9360-3a38b0c348c3  
   • 2025 Olympic News? No

4. Tokyo (2020 Summer)  
   • UUID: 66c3a62a-840c-417a-8fad-ce87b97bb6a3  
   • 2025 Olympic News? No

5. Paris (2024 Summer)  
   • UUID: a2da124e-3fad-402b-8ccf-173f63b4ff68  
   • 2025 Olympic News? Yes  
     – Olympic cauldron balloon to float annually over Paris into 2028 ([AP News])  
     – IOC to replace defective Paris 2024 medals ([NDTV Sports])  
     – IOC elects Kirsty Coventry as president at March 2025 session ([Wikipedia])  
     – MLB cancels its planned 2025 Paris regular-season games ([AP News])

6. Turin (2006 Winter)  
   • UUID: 3674750b-6b76-49dc-adf4-d4393fa7bcfa  
   • 2025 Olympic News? No (Host of Special Olympics World Winter Games, but not mainline Olympics)

7. Vancouver (2010 Winter)  
   • UUID: 22517787-5915-41c8-b9dd-a19aa2953210  
   • 2025 Olympic News? No

8. Sochi (2014 Winter)  
   • UUID: f7efa267-c7da-4cdc-a14f-a4844f47b888  
   • 2025 Olympic News? No

9. Pyeongchang (2018 Winter)  
   • UUID: ffb19c03-5212-42a9-a527-315d35efc5fc  
   • 2025 Olympic News? No

Summary of cities with 2025 Olympic-related news:  
• Paris (a2da124e-3fad-402b-8ccf-173f63b4ff68)

手动对话编排

到目前为止一切顺利!看着模型暂停执行来运行一个函数然后再继续,这真的很酷。 实际上,上面的例子相当简单,而实际生产用例可能会复杂得多:

  • 我们的上下文窗口可能会变得过大,这时可能需要修剪较早且不太相关的消息,或者对当前对话进行总结
  • 我们可能希望允许用户在对话中前后导航并重新生成答案
  • 出于审计目的,我们可能希望将消息存储在自己的数据库中,而不是依赖OpenAI的存储和编排
  • 等等。

在这些情况下,我们可能希望完全掌控对话流程。不同于使用previous_message_id,我们可以将API视为"无状态"服务,创建并维护一个对话条目数组,每次都将这个数组作为输入发送给模型。

这带来了一些特定于推理模型的细微差别需要考虑。

  • 特别是,我们必须确保在对话历史中保留所有的推理过程和函数调用响应。
  • 这是模型跟踪其已执行的思维链步骤的方式。如果不包含这些步骤,API将会报错。

让我们再次运行上面的示例,自行编排消息并跟踪令牌使用情况。


请注意,以下代码结构是为了便于阅读而设计的 - 在实际应用中,您可能需要考虑采用更复杂的工作流程来处理边界情况

# Let's initialise our conversation with the first user message
total_tokens_used = 0
user_messages = [
    (
        "Of those cities that have hosted the summer Olympic games in the last 20 years - "
        "do any of them have IDs beginning with a number and a temperate climate? "
        "Use your available tools to look up the IDs for each city and make sure to search the web to find out about the climate."
    ),
    "Great thanks! We've just updated the IDs - could you please check again?"
    ]

conversation = []
for message in user_messages:
    conversation_item = {
        "role": "user",
        "type": "message",
        "content": message
    }
    print(f"{'*' * 79}\nUser message: {message}\n{'*' * 79}")
    conversation.append(conversation_item)
    while True: # Response loop
        response = client.responses.create(
            input=conversation,
            **MODEL_DEFAULTS
        )
        total_tokens_used += response.usage.total_tokens
        reasoning = [rx.to_dict() for rx in response.output if rx.type == 'reasoning']
        function_calls = [rx.to_dict() for rx in response.output if rx.type == 'function_call']
        messages = [rx.to_dict() for rx in response.output if rx.type == 'message']
        if len(reasoning) > 0:
            print("More reasoning required, continuing...")
            # Ensure we capture any reasoning steps
            conversation.extend(reasoning)
            print('\n'.join(s['text'] for r in reasoning for s in r['summary']))
        if len(function_calls) > 0:
            function_outputs = invoke_functions_from_response(response)
            # Preserve order of function calls and outputs in case of multiple function calls (currently not supported by reasoning models, but worth considering)
            interleaved = [val for pair in zip(function_calls, function_outputs) for val in pair]
            conversation.extend(interleaved)
        if len(messages) > 0:
            print(response.output_text)
            conversation.extend(messages)
        if len(function_calls) == 0:  # No more functions = We're done reasoning and we're ready for the next user message
            break
print(f"Total tokens used: {total_tokens_used} ({total_tokens_used / 200_000:.2%} of o4-mini's context window)")
*******************************************************************************
User message: Of those cities that have hosted the summer Olympic games in the last 20 years - do any of them have IDs beginning with a number and a temperate climate? Use your available tools to look up the IDs for each city and make sure to search the web to find out about the climate.
*******************************************************************************
More reasoning required, continuing...
**Clarifying Olympic Cities**

The user is asking about cities that hosted the Summer Olympics in the last 20 years. The relevant years to consider are 2004 Athens, 2008 Beijing, 2012 London, 2016 Rio de Janeiro, and 2020 Tokyo. If we're considering 2025, then 2004 would actually be 21 years ago, so I should focus instead on the years from 2005 onwards. Therefore, the cities to include are Beijing, London, Rio, and Tokyo. I’ll exclude Paris since it hasn’t hosted yet.
Reasoning step: [Summary(text="**Clarifying Olympic Cities**\n\nThe user is asking about cities that hosted the Summer Olympics in the last 20 years. The relevant years to consider are 2004 Athens, 2008 Beijing, 2012 London, 2016 Rio de Janeiro, and 2020 Tokyo. If we're considering 2025, then 2004 would actually be 21 years ago, so I should focus instead on the years from 2005 onwards. Therefore, the cities to include are Beijing, London, Rio, and Tokyo. I’ll exclude Paris since it hasn’t hosted yet.", type='summary_text')]
Invoking tool: get_city_uuid({'city': 'Beijing'})
Invoking tool: get_city_uuid({'city': 'London'})
Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})
Invoking tool: get_city_uuid({'city': 'Tokyo'})
More reasoning required, continuing...

Reasoning step: []
Invoking tool: web_search({'query': 'London climate'})
Invoking tool: web_search({'query': 'Tokyo climate'})
More reasoning required, continuing...

I looked up the internal IDs and climates for each Summer-Olympics host of the last 20 years:

• Beijing  
  – ID: 937b336d-2708-4ad3-8c2f-85ea32057e1e (starts with “9”)  
  – Climate: humid continental (cold winters, hot summers) → not temperate

• London  
  – ID: ee57f35a-7d1b-4888-8833-4ace308fa004 (starts with “e”)  
  – Climate: temperate oceanic (mild, moderate rainfall)

• Rio de Janeiro  
  – ID: 2a70c45e-a5b4-4e42-8d2b-6c1dbb2aa2d9 (starts with “2”)  
  – Climate: tropical (hot/wet)

• Tokyo  
  – ID: e5de3686-a7d2-42b8-aca5-6b6e436083ff (starts with “e”)  
  – Climate: humid subtropical (hot, humid summers; mild winters)

The only IDs that begin with a numeral are Beijing (“9…”) and Rio (“2…”), but neither city has a temperate climate. Therefore, none of the last-20-years hosts combine an ID starting with a number with a temperate climate.
*******************************************************************************
User message: Great thanks! We've just updated the IDs - could you please check again?
*******************************************************************************
More reasoning required, continuing...

Reasoning step: []
Invoking tool: get_city_uuid({'city': 'Beijing'})
Invoking tool: get_city_uuid({'city': 'London'})
Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})
Invoking tool: get_city_uuid({'city': 'Tokyo'})
Here are the updated IDs along with their climates:

• Beijing  
  – ID: 8819a1fd-a958-40e6-8ba7-9f450b40fb13 (starts with “8”)  
  – Climate: humid continental → not temperate

• London  
  – ID: 50866ef9-6505-4939-90e7-e8b930815782 (starts with “5”)  
  – Climate: temperate oceanic

• Rio de Janeiro  
  – ID: 5bc1b2de-75da-4689-8bff-269e60af32cb (starts with “5”)  
  – Climate: tropical → not temperate

• Tokyo  
  – ID: 9d1c920e-e725-423e-b83c-ec7d97f2e79f (starts with “9”)  
  – Climate: humid subtropical → not temperate

Of these, the only city with a temperate climate is London, but its ID begins with “5” (a number) – so it does meet “ID beginning with a number AND temperate climate.” 
Total tokens used: 17154 (8.58% of o4-mini's context window)

概述

在本指南中,我们阐述了如何将函数调用与OpenAI的推理模型相结合,以展示依赖外部数据源(包括网络搜索)的多步骤任务。

重要的是,我们涵盖了函数调用过程中特定于推理模型的细微差别,具体包括:

  • 模型可以选择连续执行多个函数调用或推理步骤,某些步骤可能依赖于之前步骤的结果
  • 我们无法预知这些步骤会有多少步,因此必须通过循环来处理响应
  • responses API通过previous_response_id参数简化了编排流程,但在需要手动控制时,保持对话项的正确顺序以维护"思维链"至关重要

这里的示例相当简单,但您可以想象这项技术如何扩展到更现实的应用场景,例如:

  • 查询客户的交易记录和近期往来信息,以确定其是否符合促销优惠资格
  • 调用最近的交易日志、地理位置数据和设备元数据,以评估交易存在欺诈的可能性
  • 查阅内部人力资源数据库,获取员工的福利使用情况、在职年限及近期政策变更,以解答个性化的人力资源问题
  • 阅读内部仪表板、竞争对手新闻动态和市场分析,以编制针对其关注领域的每日高管简报