本指南旨在为AI工程师提供实用建议,帮助他们充分发挥OpenAI实时API的效能,特别是在处理数据密集型函数调用时。我们将重点关注语音转语音智能体中的常见场景,这些场景需要平稳高效地处理海量数据。
本文不会涉及搭建实时API解决方案的基础知识。相反,您将获得清晰的见解和可操作的策略,以提升实时对话智能体的性能和可靠性。它专门针对实时对话场景中处理海量数据所特有的挑战。
本指南旨在为AI工程师提供实用建议,帮助他们充分发挥OpenAI实时API的效能,特别是在处理数据密集型函数调用时。我们将重点关注语音转语音智能体中的常见场景,这些场景需要平稳高效地处理海量数据。
本文不会涉及搭建实时API解决方案的基础知识。相反,您将获得清晰的见解和可操作的策略,以提升实时对话智能体的性能和可靠性。它专门针对实时对话场景中处理海量数据所特有的挑战。
在深入探讨之前,让我们为新用户快速回顾一下API。OpenAI Realtime API是近期推出的服务,支持低延迟、多模态交互——例如语音对语音对话和实时转录。想象一下实时语音客服或电影实时字幕等场景。
智能体需要访问工具和相关数据来执行任务。例如,金融分析师智能体可能会拉取实时市场数据。在许多情况下,您的环境中已经存在通过API公开这些信息的服务。
从历史上看,API的设计并未考虑到智能体,通常会根据服务返回大量数据。作为工程师,我们经常用函数调用来封装这些API以加速智能体开发——这完全合理。为什么要重复造轮子呢?
如果不仔细优化,这些数据密集型的函数调用可能会迅速压垮Realtime API——导致响应缓慢甚至无法处理用户请求。
我们的示例围绕一个NBA球探智能体展开,该智能体会调用多个功能来提供对新秀球员的深度分析。为了展示实时API交互的实用指南,我们使用了受NBA选秀新秀启发的、大型且逼真的数据负载。在下方,您将看到一个在实时会话中定义的完整searchDraftProspects函数,用于搭建场景。
// "Hey, pull up point guards projected in the top 10 in the 2025 draft"
{
"type": "session.update",
"session": {
"tools": [
{
"type": "function",
"name": "searchDraftProspects",
"description": "Search draft prospects for a given year e.g., Point Guard",
"parameters": {
"type": "object",
"properties": {
"sign": {
"type": "string",
"description": "The player position",
"enum": [
"Point Guard",
"Shooting Guard",
"Small Forward",
"Power Forward",
"Center",
"Any"
]
},
year: { type: "number", description: "Draft year e.g., 2025" },
mockDraftRanking: { type: "number", description: "Predicted Draft Ranking" },
},
"required": ["position", "year"]
}
}
],
"tool_choice": "auto",
}
}searchDraftProspects函数调用返回一个庞大的数据负载。示例的结构和大小来源于我们实际遇到的真实场景。
// Example Payload
{
"status": {
"code": 200,
"message": "SUCCESS"
},
"found": 4274,
"offset": 0,
"limit": 10,
"data": [
{
"prospectId": 10001,
"data": {
"ProspectInfo": {
"league": "NCAA",
"collegeId": 301,
"isDraftEligible": true,
"Player": {
"personalDetails": {
"firstName": "Jalen",
"lastName": "Storm",
"dateOfBirth": "2003-01-15",
"nationality": "USA"
},
"physicalAttributes": {
"position": "PG",
"height": {
"feet": 6,
"inches": 4
},
"weightPounds": 205
},
"hometown": {
"city": "Springfield",
"state": "IL"
}
},
"TeamInfo": {
"collegeTeam": "Springfield Tigers",
"conference": "Big West",
"teamRanking": 12,
"coach": {
"coachId": 987,
"coachName": "Marcus Reed",
"experienceYears": 10
}
}
},
"Stats": {
"season": "2025",
"gamesPlayed": 32,
"minutesPerGame": 34.5,
"shooting": {
"FieldGoalPercentage": 47.2,
"ThreePointPercentage": 39.1,
"FreeThrowPercentage": 85.6
},
"averages": {
"points": 21.3,
"rebounds": 4.1,
"assists": 6.8,
"steals": 1.7,
"blocks": 0.3
}
},
"Scouting": {
"evaluations": {
"strengths": ["Court vision", "Clutch shooting"],
"areasForImprovement": ["Defensive consistency"]
},
"scouts": [
{
"scoutId": 501,
"name": "Greg Hamilton",
"organization": "National Scouting Bureau"
}
]
},
"DraftProjection": {
"mockDraftRanking": 5,
"lotteryPickProbability": 88,
"historicalComparisons": [
{
"player": "Chris Paul",
"similarityPercentage": 85
}
]
},
"Media": {
"highlightReelUrl": "https://example.com/highlights/jalen-storm",
"socialMedia": {
"twitter": "@jstorm23",
"instagram": "@jstorm23_ig"
}
},
"Agent": {
"agentName": "Rick Allen",
"agency": "Elite Sports Management",
"contact": {
"email": "rallen@elitesports.com",
"phone": "555-123-4567"
}
}
}
},
// ... Many thousands of tokens later.
]
}几乎不言而喻——在构建函数调用时,您的首要任务是设计清晰、定义明确的函数。这有助于精简响应大小并避免模型过载。每个函数调用都应该易于解释、范围明确,并且仅返回其目的所需的信息。函数之间的职责重叠必然会引发混淆。
例如,我们可以限制searchDraftProspects函数调用仅返回每位新秀球员的基本信息——如球员统计数据——从而显著减小响应数据量。如需更多信息,新的getProspectDetails函数调用可提供扩展详情。没有放之四海皆准的解决方案,正确方法取决于您的具体用例和数据模型。
{
"tools": [
{
"type": "function",
"name": "searchDraftProspects",
"description": "Search NBA draft prospects by position, draft year, and projected ranking, returning only general statistics to optimize response size.",
"parameters": {
"type": "object",
"properties": {
"position": {
"type": "string",
"description": "The player's basketball position.",
"enum": [
"Point Guard",
"Shooting Guard",
"Small Forward",
"Power Forward",
"Center",
"Any"
]
},
"year": {
"type": "number",
"description": "Draft year, e.g., 2025"
},
"maxMockDraftRanking": {
"type": "number",
"description": "Maximum predicted draft ranking (e.g., top 10)"
}
},
"required": ["position", "year"]
}
},
{
"type": "function",
"name": "getProspectDetails",
"description": "Fetch detailed information for a specific NBA prospect, including comprehensive stats, agent details, and scouting reports.",
"parameters": {
"type": "object",
"properties": {
"playerName": {
"type": "string",
"description": "Full name of the prospect (e.g., Jalen Storm)"
},
"year": {
"type": "number",
"description": "Draft year, e.g., 2025"
},
"includeAgentInfo": {
"type": "boolean",
"description": "Include agent information"
},
"includeStats": {
"type": "boolean",
"description": "Include detailed player statistics"
},
"includeScoutingReport": {
"type": "boolean",
"description": "Include scouting report details"
}
},
"required": ["playerName", "year"]
}
}
],
"tool_choice": "auto"
}实时对话允许长达30分钟的会话——但滚动上下文窗口仅支持约16,000个token(具体取决于模型快照,上下文窗口限制正在不断改进)。因此,在长时间交流中您可能会注意到性能逐渐下降。随着对话的推进和更多函数调用的发生,对话状态会迅速膨胀,既包含重要信息也混杂不必要噪音——因此专注于保留最相关的细节至关重要。这种方法有助于保持强劲性能并降低成本。
i) 定期总结对话状态
随着对话的展开定期进行总结,是减少上下文长度的绝佳方法——既能降低成本又能减少延迟。
查看@Minhajul关于在实时对话中实现自动摘要的史诗级指南(link)。
ii) 定期提醒模型其角色和职责
数据量大的负载会迅速填满上下文窗口。如果您发现模型开始偏离指令或可用工具,可以通过调用session.update定期提醒它系统提示和工具——这有助于保持其对角色和职责的关注。
i) 在函数调用中使用过滤功能,将数据量大的响应精简为仅回答问题所需的必要字段
通常来说,函数调用返回的token数量越少,响应质量越高。常见的陷阱是函数调用返回过大的负载,涉及数千个token。应重点在每个函数调用中应用数据级或函数级的过滤器,以最小化响应大小。
// Filtered response
{
"status": {
"code": 200,
"message": "SUCCESS"
},
"found": 4274,
"offset": 0,
"limit": 5,
"data": [
{
"zpid": 7972122,
"data": {
"PropertyInfo": {
"houseNumber": "19661",
"directionPrefix": "N ",
"streetName": "Central",
"streetSuffix": "Ave",
"city": "Phoenix",
"state": "AZ",
"postalCode": "85024",
"zipPlusFour": "1641"
"bedroomCount": 2,
"bathroomCount": 2,
"storyCount": 1,
"livingAreaSize": 1089,
"livingAreaSizeUnits": "Square Feet",
"yearBuilt": "1985"
}
}
}
]
// ...
}ii) 扁平化分层负载——同时不丢失关键信息
来自API调用的分层负载有时会包含重复的层级标题——比如"ProspectInfo"或"Stats"——这可能会增加额外的干扰,使模型更难处理数据。在探索提高数据效率的方法时,您可以尝试通过去除一些不必要的标签来扁平化这些结构。这有助于提升性能,但请根据您的具体使用场景考虑需要保留哪些重要信息。
// Flattened payload
{
"status": {
"code": 200,
"message": "SUCCESS"
},
"found": 4274,
"offset": 0,
"limit": 2,
"data": [
{
"prospectId": 10001,
"league": "NCAA",
"collegeId": 301,
"isDraftEligible": true,
"firstName": "Jalen",
"lastName": "Storm",
"position": "PG",
"heightFeet": 6,
"heightInches": 4,
"weightPounds": 205,
"hometown": "Springfield",
"state": "IL",
"collegeTeam": "Springfield Tigers",
"conference": "Big West",
"teamRanking": 12,
"coachId": 987,
"coachName": "Marcus Reed",
"gamesPlayed": 32,
"minutesPerGame": 34.5,
"FieldGoalPercentage": 47.2,
"ThreePointPercentage": 39.1,
"FreeThrowPercentage": 85.6,
"averagePoints": 21.3,
"averageRebounds": 4.1,
"averageAssists": 6.8,
"stealsPerGame": 1.7,
"blocksPerGame": 0.3,
"strengths": ["Court vision", "Clutch shooting"],
"areasForImprovement": ["Defensive consistency"],
"mockDraftRanking": 5,
"lotteryPickProbability": 88,
"highlightReelUrl": "https://example.com/highlights/jalen-storm",
"agentName": "Rick Allen",
"agency": "Elite Sports Management",
"contactEmail": "rallen@elitesports.com"
},
...
}iii) 尝试不同的数据格式
数据的结构方式直接影响模型处理和总结API响应的效果。根据我们的经验,清晰、基于键值的格式(如JSON或YAML)相比表格格式(如Markdown)能帮助模型更准确地解析数据。特别是大型表格往往会超出模型的处理能力,导致输出不够流畅和准确。不过,仍然值得尝试不同格式,以找到最适合您用例的方案。
status:
code: 200
message: "SUCCESS"
found: 4274
offset: 0
limit: 10
data:
- prospectId: 10001
data:
ProspectInfo:
league: "NCAA"
collegeId: 301
isDraftEligible: true
Player:
firstName: "Jalen"
lastName: "Storm"
position: "PG"
heightFeet: 6
heightInches: 4
weightPounds: 205
hometown: "Springfield"
state: "IL"
TeamInfo:
collegeTeam: "Springfield Tigers"
conference: "Big West"
teamRanking: 12
coachId: 987
coachName: "Marcus Reed"
Stats:
gamesPlayed: 32
minutesPerGame: 34.5
FieldGoalPercentage: 47.2
ThreePointPercentage: 39.1
FreeThrowPercentage: 85.6
averagePoints: 21.3
averageRebounds: 4.1
averageAssists: 6.8
stealsPerGame: 1.7
blocksPerGame: 0.3
Scouting:
strengths:
- "Court vision"
- "Clutch shooting"
areasForImprovement:
- "Defensive consistency"
DraftProjection:
mockDraftRanking: 5
lotteryPickProbability: 88
Media:
highlightReelUrl: "https://example.com/highlights/jalen-storm"
Agent:
agentName: "Rick Allen"
agency: "Elite Sports Management"
contactEmail: "rallen@elitesports.com"底层模型常常难以从数据密集型的响应平稳过渡到准确的答案。为了提高处理复杂数据时的流畅性和准确性,在函数调用后立即提供函数调用提示。这些提示会引导模型执行特定任务——教会它如何解释关键字段和领域特定的值。
以下示例展示了一个有效的提示提示。
// Function call hint
let prospectSearchPrompt = `
Parse NBA prospect data and provide a concise, engaging response.
General Guidelines
- Act as an NBA scouting expert.
- Highlight key strengths and notable attributes.
- Use conversational language.
- Mention identical attributes once.
- Ignore IDs and URLs.
Player Details
- State height conversationally ("six-foot-eight").
- Round weights to nearest 5 lbs.
Stats & Draft Info
- Round stats to nearest whole number.
- Use general terms for draft ranking ("top-five pick").
Experience
- Refer to players as freshman, sophomore, etc., or mention professional experience.
- Location & TeamMention hometown city and state/country.
- Describe teams conversationally.
Skip (unless asked explicitly)
- Exact birth dates
- IDs
- Agent/contact details
- URLs
Examples
- "Jalen Storm, a dynamic six-foot-four point guard from Springfield, Illinois, averages 21 points per game."
- "Known for his clutch shooting, he's projected as a top-five pick."
Important: Respond based strictly on provided data, without inventing details.
`;在实践中,我们首先将函数调用结果附加到对话中。然后,我们从Realtime API发出带有提示提示的响应。瞧——模型优雅地处理了所有信息。
// Add new conversation item for the model
const conversationItem = {
type: 'conversation.item.create',
previous_item_id: output.id,
item: {
call_id: output.call_id,
type: 'function_call_output',
output: `Draft Prospect Search Results: ${result}`
}
};
dataChannel.send(JSON.stringify(conversationItem));
// Emit a response from the model including the hint prompt
const event = {
type: 'response.create',
conversation: "none",
response: {
instructions: prospectSearchPrompt # function call hint
}
};
dataChannel.send(JSON.stringify(event));利用Realtime API构建高效的智能体是一个持续探索和适应的过程。
关键建议总结
请记住——实验至关重要。实时模型在不断改进,我们将持续分享技巧,帮助您充分利用Realtime API。