2025年5月29日

使用Realtime API构建数据密集型应用的实用指南

, ,

本指南旨在为AI工程师提供实用建议,帮助他们充分发挥OpenAI实时API的效能,特别是在处理数据密集型函数调用时。我们将重点关注语音转语音智能体中的常见场景,这些场景需要平稳高效地处理海量数据。

本文不会涉及搭建实时API解决方案的基础知识。相反,您将获得清晰的见解和可操作的策略,以提升实时对话智能体的性能和可靠性。它专门针对实时对话场景中处理海量数据所特有的挑战。

什么是Realtime API?

在深入探讨之前,让我们为新用户快速回顾一下API。OpenAI Realtime API是近期推出的服务,支持低延迟、多模态交互——例如语音对语音对话和实时转录。想象一下实时语音客服或电影实时字幕等场景。

什么是数据密集型函数调用?

智能体需要访问工具和相关数据来执行任务。例如,金融分析师智能体可能会拉取实时市场数据。在许多情况下,您的环境中已经存在通过API公开这些信息的服务。

从历史上看,API的设计并未考虑到智能体,通常会根据服务返回大量数据。作为工程师,我们经常用函数调用来封装这些API以加速智能体开发——这完全合理。为什么要重复造轮子呢?

如果不仔细优化,这些数据密集型的函数调用可能会迅速压垮Realtime API——导致响应缓慢甚至无法处理用户请求。

搭建舞台

我们的示例围绕一个NBA球探智能体展开,该智能体会调用多个功能来提供对新秀球员的深度分析。为了展示实时API交互的实用指南,我们使用了受NBA选秀新秀启发的、大型且逼真的数据负载。在下方,您将看到一个在实时会话中定义的完整searchDraftProspects函数,用于搭建场景。

// "Hey, pull up point guards projected in the top 10 in the 2025 draft"
{
  "type": "session.update",
  "session": {
    "tools": [
      {
        "type": "function",
        "name": "searchDraftProspects",
        "description": "Search draft prospects for a given year e.g., Point Guard",
        "parameters": {
          "type": "object",
          "properties": {
            "sign": {
              "type": "string",
              "description": "The player position",
              "enum": [
                "Point Guard",
                "Shooting Guard",
                "Small Forward",
                "Power Forward",
                "Center",
                "Any"
              ]
            },
            year: { type: "number", description: "Draft year e.g., 2025" },
            mockDraftRanking: { type: "number", description: "Predicted Draft Ranking" },
          },
          "required": ["position", "year"]
        }
      }
    ],
    "tool_choice": "auto",
  }
}

searchDraftProspects函数调用返回一个庞大的数据负载。示例的结构和大小来源于我们实际遇到的真实场景。

// Example Payload
{
  "status": {
    "code": 200,
    "message": "SUCCESS"
  },
  "found": 4274,
  "offset": 0,
  "limit": 10,
  "data": [
    {
      "prospectId": 10001,
      "data": {
        "ProspectInfo": {
          "league": "NCAA",
          "collegeId": 301,
          "isDraftEligible": true,
          "Player": {
            "personalDetails": {
              "firstName": "Jalen",
              "lastName": "Storm",
              "dateOfBirth": "2003-01-15",
              "nationality": "USA"
            },
            "physicalAttributes": {
              "position": "PG",
              "height": {
                "feet": 6,
                "inches": 4
              },
              "weightPounds": 205
            },
            "hometown": {
              "city": "Springfield",
              "state": "IL"
            }
          },
          "TeamInfo": {
            "collegeTeam": "Springfield Tigers",
            "conference": "Big West",
            "teamRanking": 12,
            "coach": {
              "coachId": 987,
              "coachName": "Marcus Reed",
              "experienceYears": 10
            }
          }
        },
        "Stats": {
          "season": "2025",
          "gamesPlayed": 32,
          "minutesPerGame": 34.5,
          "shooting": {
            "FieldGoalPercentage": 47.2,
            "ThreePointPercentage": 39.1,
            "FreeThrowPercentage": 85.6
          },
          "averages": {
            "points": 21.3,
            "rebounds": 4.1,
            "assists": 6.8,
            "steals": 1.7,
            "blocks": 0.3
          }
        },
        "Scouting": {
          "evaluations": {
            "strengths": ["Court vision", "Clutch shooting"],
            "areasForImprovement": ["Defensive consistency"]
          },
          "scouts": [
            {
              "scoutId": 501,
              "name": "Greg Hamilton",
              "organization": "National Scouting Bureau"
            }
          ]
        },
        "DraftProjection": {
          "mockDraftRanking": 5,
          "lotteryPickProbability": 88,
          "historicalComparisons": [
            {
              "player": "Chris Paul",
              "similarityPercentage": 85
            }
          ]
        },
        "Media": {
          "highlightReelUrl": "https://example.com/highlights/jalen-storm",
          "socialMedia": {
            "twitter": "@jstorm23",
            "instagram": "@jstorm23_ig"
          }
        },
        "Agent": {
          "agentName": "Rick Allen",
          "agency": "Elite Sports Management",
          "contact": {
            "email": "rallen@elitesports.com",
            "phone": "555-123-4567"
          }
        }
      }
    },
    // ... Many thousands of tokens later.
  ]
}

1. 将庞大复杂的函数拆分为职责明确的小型函数

几乎不言而喻——在构建函数调用时,您的首要任务是设计清晰、定义明确的函数。这有助于精简响应大小并避免模型过载。每个函数调用都应该易于解释、范围明确,并且仅返回其目的所需的信息。函数之间的职责重叠必然会引发混淆。

例如,我们可以限制searchDraftProspects函数调用仅返回每位新秀球员的基本信息——如球员统计数据——从而显著减小响应数据量。如需更多信息,新的getProspectDetails函数调用可提供扩展详情。没有放之四海皆准的解决方案,正确方法取决于您的具体用例和数据模型。

{
  "tools": [
    {
      "type": "function",
      "name": "searchDraftProspects",
      "description": "Search NBA draft prospects by position, draft year, and projected ranking, returning only general statistics to optimize response size.",
      "parameters": {
        "type": "object",
        "properties": {
          "position": {
            "type": "string",
            "description": "The player's basketball position.",
            "enum": [
              "Point Guard",
              "Shooting Guard",
              "Small Forward",
              "Power Forward",
              "Center",
              "Any"
            ]
          },
          "year": {
            "type": "number",
            "description": "Draft year, e.g., 2025"
          },
          "maxMockDraftRanking": {
            "type": "number",
            "description": "Maximum predicted draft ranking (e.g., top 10)"
          }
        },
        "required": ["position", "year"]
      }
    },
    {
      "type": "function",
      "name": "getProspectDetails",
      "description": "Fetch detailed information for a specific NBA prospect, including comprehensive stats, agent details, and scouting reports.",
      "parameters": {
        "type": "object",
        "properties": {
          "playerName": {
            "type": "string",
            "description": "Full name of the prospect (e.g., Jalen Storm)"
          },
          "year": {
            "type": "number",
            "description": "Draft year, e.g., 2025"
          },
          "includeAgentInfo": {
            "type": "boolean",
            "description": "Include agent information"
          },
          "includeStats": {
            "type": "boolean",
            "description": "Include detailed player statistics"
          },
          "includeScoutingReport": {
            "type": "boolean",
            "description": "Include scouting report details"
          }
        },
        "required": ["playerName", "year"]
      }
    }
  ],
  "tool_choice": "auto"
}

2. 随着对话展开,优化上下文

实时对话允许长达30分钟的会话——但滚动上下文窗口仅支持约16,000个token(具体取决于模型快照,上下文窗口限制正在不断改进)。因此,在长时间交流中您可能会注意到性能逐渐下降。随着对话的推进和更多函数调用的发生,对话状态会迅速膨胀,既包含重要信息也混杂不必要噪音——因此专注于保留最相关的细节至关重要。这种方法有助于保持强劲性能并降低成本。

i) 定期总结对话状态

随着对话的展开定期进行总结,是减少上下文长度的绝佳方法——既能降低成本又能减少延迟。

查看@Minhajul关于在实时对话中实现自动摘要的史诗级指南(link)。

ii) 定期提醒模型其角色和职责

数据量大的负载会迅速填满上下文窗口。如果您发现模型开始偏离指令或可用工具,可以通过调用session.update定期提醒它系统提示和工具——这有助于保持其对角色和职责的关注。

i) 在函数调用中使用过滤功能,将数据量大的响应精简为仅回答问题所需的必要字段

通常来说,函数调用返回的token数量越少,响应质量越高。常见的陷阱是函数调用返回过大的负载,涉及数千个token。应重点在每个函数调用中应用数据级或函数级的过滤器,以最小化响应大小。

// Filtered response
{
  "status": {
    "code": 200,
    "message": "SUCCESS"
  },
  "found": 4274,
  "offset": 0,
  "limit": 5,
  "data": [
    {
    "zpid": 7972122,
      "data": {
        "PropertyInfo": {
            "houseNumber": "19661",
            "directionPrefix": "N ",
            "streetName": "Central",
            "streetSuffix": "Ave",
            "city": "Phoenix",
            "state": "AZ",
            "postalCode": "85024",
            "zipPlusFour": "1641"
            "bedroomCount": 2,
            "bathroomCount": 2,
            "storyCount": 1,
            "livingAreaSize": 1089,
            "livingAreaSizeUnits": "Square Feet",
            "yearBuilt": "1985"
          }
		    }
			}
		]
		// ... 
}

ii) 扁平化分层负载——同时不丢失关键信息

来自API调用的分层负载有时会包含重复的层级标题——比如"ProspectInfo"或"Stats"——这可能会增加额外的干扰,使模型更难处理数据。在探索提高数据效率的方法时,您可以尝试通过去除一些不必要的标签来扁平化这些结构。这有助于提升性能,但请根据您的具体使用场景考虑需要保留哪些重要信息。

// Flattened payload
{
  "status": {
    "code": 200,
    "message": "SUCCESS"
  },
  "found": 4274,
  "offset": 0,
  "limit": 2,
  "data": [
    {
      "prospectId": 10001,
      "league": "NCAA",
      "collegeId": 301,
      "isDraftEligible": true,
      "firstName": "Jalen",
      "lastName": "Storm",
      "position": "PG",
      "heightFeet": 6,
      "heightInches": 4,
      "weightPounds": 205,
      "hometown": "Springfield",
      "state": "IL",
      "collegeTeam": "Springfield Tigers",
      "conference": "Big West",
      "teamRanking": 12,
      "coachId": 987,
      "coachName": "Marcus Reed",
      "gamesPlayed": 32,
      "minutesPerGame": 34.5,
      "FieldGoalPercentage": 47.2,
      "ThreePointPercentage": 39.1,
      "FreeThrowPercentage": 85.6,
      "averagePoints": 21.3,
      "averageRebounds": 4.1,
      "averageAssists": 6.8,
      "stealsPerGame": 1.7,
      "blocksPerGame": 0.3,
      "strengths": ["Court vision", "Clutch shooting"],
      "areasForImprovement": ["Defensive consistency"],
      "mockDraftRanking": 5,
      "lotteryPickProbability": 88,
      "highlightReelUrl": "https://example.com/highlights/jalen-storm",
      "agentName": "Rick Allen",
      "agency": "Elite Sports Management",
      "contactEmail": "rallen@elitesports.com"
    },
		...
 }

iii) 尝试不同的数据格式

数据的结构方式直接影响模型处理和总结API响应的效果。根据我们的经验,清晰、基于键值的格式(如JSON或YAML)相比表格格式(如Markdown)能帮助模型更准确地解析数据。特别是大型表格往往会超出模型的处理能力,导致输出不够流畅和准确。不过,仍然值得尝试不同格式,以找到最适合您用例的方案。

status:
  code: 200
  message: "SUCCESS"
found: 4274
offset: 0
limit: 10
data:
  - prospectId: 10001
    data:
      ProspectInfo:
        league: "NCAA"
        collegeId: 301
        isDraftEligible: true
        Player:
          firstName: "Jalen"
          lastName: "Storm"
          position: "PG"
          heightFeet: 6
          heightInches: 4
          weightPounds: 205
          hometown: "Springfield"
          state: "IL"
        TeamInfo:
          collegeTeam: "Springfield Tigers"
          conference: "Big West"
          teamRanking: 12
          coachId: 987
          coachName: "Marcus Reed"
      Stats:
        gamesPlayed: 32
        minutesPerGame: 34.5
        FieldGoalPercentage: 47.2
        ThreePointPercentage: 39.1
        FreeThrowPercentage: 85.6
        averagePoints: 21.3
        averageRebounds: 4.1
        averageAssists: 6.8
        stealsPerGame: 1.7
        blocksPerGame: 0.3
      Scouting:
        strengths:
          - "Court vision"
          - "Clutch shooting"
        areasForImprovement:
          - "Defensive consistency"
      DraftProjection:
        mockDraftRanking: 5
        lotteryPickProbability: 88
      Media:
        highlightReelUrl: "https://example.com/highlights/jalen-storm"
      Agent:
        agentName: "Rick Allen"
        agency: "Elite Sports Management"
        contactEmail: "rallen@elitesports.com"

4. 在数据密集型函数调用后,跟进提示提示

底层模型常常难以从数据密集型的响应平稳过渡到准确的答案。为了提高处理复杂数据时的流畅性和准确性,在函数调用后立即提供函数调用提示。这些提示会引导模型执行特定任务——教会它如何解释关键字段和领域特定的值。

以下示例展示了一个有效的提示提示。

// Function call hint
let prospectSearchPrompt = `
Parse NBA prospect data and provide a concise, engaging response.
 
General Guidelines
- Act as an NBA scouting expert.
- Highlight key strengths and notable attributes.
- Use conversational language.
- Mention identical attributes once.
- Ignore IDs and URLs.
 
Player Details
- State height conversationally ("six-foot-eight").
- Round weights to nearest 5 lbs.
 
Stats & Draft Info
- Round stats to nearest whole number.
- Use general terms for draft ranking ("top-five pick").
Experience
- Refer to players as freshman, sophomore, etc., or mention professional experience.
- Location & TeamMention hometown city and state/country.
- Describe teams conversationally.
 
Skip (unless asked explicitly)
- Exact birth dates
- IDs
- Agent/contact details
- URLs
 
Examples
- "Jalen Storm, a dynamic six-foot-four point guard from Springfield, Illinois, averages 21 points per game."
- "Known for his clutch shooting, he's projected as a top-five pick."
 
Important: Respond based strictly on provided data, without inventing details.
`;

在实践中,我们首先将函数调用结果附加到对话中。然后,我们从Realtime API发出带有提示提示的响应。瞧——模型优雅地处理了所有信息。

// Add new conversation item for the model
const conversationItem = {
  type: 'conversation.item.create',
  previous_item_id: output.id,
  item: {
    call_id: output.call_id,
    type: 'function_call_output',
    output: `Draft Prospect Search Results: ${result}`
  }
};
 
dataChannel.send(JSON.stringify(conversationItem));
 
// Emit a response from the model including the hint prompt
const event = {
  type: 'response.create',
  conversation: "none",
  response: {
    instructions: prospectSearchPrompt # function call hint
  }
};
 
dataChannel.send(JSON.stringify(event));

总结

利用Realtime API构建高效的智能体是一个持续探索和适应的过程。

关键建议总结

  • 数据过滤: 仅包含与用户请求或模型下一步直接相关的字段和详细信息。其余部分进行修剪。
  • 扁平化和简化结构:减少深层嵌套或冗余数据。以模型和人类都易于浏览的方式呈现信息。
  • 偏好清晰、结构化的格式: 使用具有统一字段名称且干扰最少的JSON(或YAML)。对于数据量大的响应,避免使用大型表格或markdown格式。
  • 通过提示词引导模型:在返回大量数据后,使用针对性提示明确说明模型应提取或总结的内容。

请记住——实验至关重要。实时模型在不断改进,我们将持续分享技巧,帮助您充分利用Realtime API。