2025年5月10日

使用Realtime API进行上下文摘要

1. 概述

构建一个端到端的语音机器人,它能实时监听您的麦克风、即时回应,并总结长对话以确保质量始终如一。

你将学习的内容

  1. 实时麦克风流传输 → OpenAI 实时 (语音到语音) 端点。
  2. 即时转录与语音回放 在每一轮对话中。
  3. 对话状态容器,用于存储所有用户/助手消息。
  4. 自动“上下文修剪” – 当令牌窗口变得非常大(可配置)时,较早的对话轮次会被压缩成摘要。
  5. 可扩展设计 您可以调整以支持客户服务机器人、自助终端或多语言助手。

先决条件

需求详情
Python ≥ 3.10确保您不会遇到任何问题
OpenAI API keySet OPENAI_API_KEY in your shell or paste inline (not ideal for prod)
麦克风 + 扬声器如果系统提示,请授予操作系统权限

需要帮助设置密钥吗?

遵循官方快速入门指南

备注:

  1. GPT-4o-Realtime 支持128k token的上下文窗口,但在某些使用场景中,当您向上下文窗口填充更多token时,可能会注意到性能下降。
  2. Token窗口 = 模型当前为会话保留在内存中的所有token(包括文字token和音频token)

单行安装(在新单元格中运行)

# Run once to install or upgrade dependencies (comment out if already installed)
# !pip install --upgrade openai websockets sounddevice simpleaudio
# Standard library imports
import os
import sys
import io
import json
import base64
import pathlib
import wave
from dataclasses import dataclass, field
from typing import List, Literal

# Third-party imports
import asyncio
import numpy as np
import sounddevice as sd         # microphone capture
import simpleaudio               # speaker playback
import websockets                # WebSocket client
import openai                    # OpenAI Python SDK >= 1.14.0
# Set your API key safely
openai.api_key = os.getenv("OPENAI_API_KEY", "")
if not openai.api_key:
    raise ValueError("OPENAI_API_KEY not found – please set env var or edit this cell.")

2. 令牌使用情况 – 文本 vs 语音

大token窗口非常宝贵,每多用一个token都会增加延迟和成本。
对于音频来说,输入token窗口的增长速度远快于纯文本,因为需要表示振幅、时间和其他声学细节。

实际上,您经常会发现≈ 10 倍的音频标记数量比文本相同句子多。

  • GPT-4o 实时处理最多支持 128k tokens,随着 token 数量增加,指令遵循能力可能会下降。
  • 每次用户/助手交互都会消耗令牌 → 窗口只会不断增大
  • 策略: 将较早的对话轮次总结为一条助手消息,保留最近几轮对话原文,并继续对话。
drawing

3. 辅助函数

以下辅助函数将使我们能够运行完整的脚本。

3.1 对话状态

与基于HTTP的聊天补全不同,Realtime API通过两个关键组件维持一个开放的有状态会话:

组件用途
会话控制全局设置 — 模型、语音、交互模式、语音活动检测等。
对话存储用户与助手之间的逐轮消息——包括音频和文本。

本笔记本将这些组件封装在一个简单的ConversationState对象中,以保持逻辑清晰、跟踪历史记录,并在上下文窗口填满时管理摘要生成。

@dataclass
class Turn:
    """One utterance in the dialogue (user **or** assistant)."""
    role: Literal["user", "assistant"]
    item_id: str                    # Server‑assigned identifier
    text: str | None = None         # Filled once transcript is ready

@dataclass
class ConversationState:
    """All mutable data the session needs — nothing more, nothing less."""
    history: List[Turn] = field(default_factory=list)         # Ordered log
    waiting: dict[str, asyncio.Future] = field(default_factory=dict)  # Pending transcript fetches
    summary_count: int = 0

    latest_tokens: int = 0          # Window size after last reply
    summarising: bool = False       # Guard so we don’t run two summaries at once

快速查看对话记录的辅助工具:

def print_history(state) -> None:
    """Pretty-print the running transcript so far."""
    print("—— Conversation so far ———————————————")
    for turn in state.history:
        text_preview = (turn.text or "").strip().replace("\n", " ")
        print(f"[{turn.role:<9}] {text_preview}  ({turn.item_id})")
    print("——————————————————————————————————————————")

3.2 · 音频流

我们将直接把原始PCM-16麦克风数据流式传输到Realtime API中。

流程为:麦克风 ─► 异步队列 ─► WebSocket ─► 实时API

3.2.1 捕获麦克风输入

我们将从一个协程开始,该协程:

  • 24 kHz、单声道、PCM-16格式(这是Realtime支持的格式之一)打开默认麦克风。
  • 将数据流切分为≈ 40 毫秒的块。
  • 将每个块转储到asyncio.Queue中,以便另一个任务(下一节)可以将其转发到OpenAI。
async def mic_to_queue(pcm_queue: asyncio.Queue[bytes]) -> None:
    """
    Capture raw PCM‑16 microphone audio and push ~CHUNK_DURATION_MS chunks
    to *pcm_queue* until the surrounding task is cancelled.

    Parameters
    ----------
    pcm_queue : asyncio.Queue[bytes]
        Destination queue for PCM‑16 frames (little‑endian int16).
    """
    blocksize = int(SAMPLE_RATE_HZ * CHUNK_DURATION_MS / 1000)

    def _callback(indata, _frames, _time, status):
        if status:                               # XRuns, device changes, etc.
            print("⚠️", status, file=sys.stderr)
        try:
            pcm_queue.put_nowait(bytes(indata))  # 1‑shot enqueue
        except asyncio.QueueFull:
            # Drop frame if upstream (WebSocket) can’t keep up.
            pass

    # RawInputStream is synchronous; wrap in context manager to auto‑close.
    with sd.RawInputStream(
        samplerate=SAMPLE_RATE_HZ,
        blocksize=blocksize,
        dtype="int16",
        channels=1,
        callback=_callback,
    ):
        try:
            # Keep coroutine alive until cancelled by caller.
            await asyncio.Event().wait()
        finally:
            print("⏹️  Mic stream closed.")

3.2.2 发送音频片段到API

我们的主要任务现在是将原始PCM-16音频块填充到asyncio.Queue中。
下一步:从队列中提取音频块,进行base-64编码(协议要求JSON安全文本),然后将每个块作为input_audio_buffer.append事件发送到实时WebSocket。

# Helper function to encode audio chunks in base64
b64 = lambda blob: base64.b64encode(blob).decode()

async def queue_to_websocket(pcm_queue: asyncio.Queue[bytes], ws):
    """Read audio chunks from queue and send as JSON events."""
    try:
        while (chunk := await pcm_queue.get()) is not None:
            await ws.send(json.dumps({
                "type": "input_audio_buffer.append",
                "audio": b64(chunk),
            }))
    except websockets.ConnectionClosed:
        print("WebSocket closed – stopping uploader")

3.2.3 处理传入事件

一旦音频到达服务器,Realtime API会通过同一个 WebSocket推送一系列JSON事件流。
理解这些事件对于以下方面至关重要:

  • 实时打印转录文本
  • 向用户播放增量音频
  • 保持准确的Conversation State以便后续进行上下文修剪
事件类型触发时机重要性典型处理逻辑
session.createdImmediately after the WebSocket handshakeConfirms the session is open and provides the session.id.Log the ID for traceability and verify the connection.
session.updatedAfter you send a session.update callAcknowledges that the server applied new session settings.Inspect the echoed settings and update any local cache.
conversation.item.created (user)A few ms after the user stops speaking (client VAD fires)Reserves a timeline slot; transcript may still be null.Insert a placeholder user turn in state.history marked “pending transcript”.
conversation.item.retrieved~100 – 300 ms later, once audio transcription is completeSupplies the final user transcript (with timing).Replace the placeholder with the transcript and print it if desired.
response.audio.deltaEvery 20 – 60 ms while the assistant is speakingStreams PCM‑16 audio chunks (and optional incremental text).Buffer each chunk and play it; optionally show partial text in the console.
response.doneAfter the assistant’s last tokenSignals both audio & text are complete; includes usage stats.Finalize the assistant turn, update state.latest_tokens, and log usage.
conversation.item.deletedWhenever you prune with conversation.item.deleteConfirms a turn was removed, freeing tokens on the server.Mirror the deletion locally so your context window matches the server’s.

3.3 检测何时进行总结

Realtime模型保持着一个巨大的128k token窗口,但随着向模型输入更多上下文内容,其输出质量可能在达到该限制前就会逐渐下降。

我们的目标:当运行窗口接近安全阈值时(笔记本默认2 000 个token),自动进行摘要生成,然后在本地服务器端同时修剪被取代的对话轮次。

我们监控response.done返回的latest_tokens。当它超过SUMMARY_TRIGGER且我们保留的对话轮数超过KEEP_LAST_TURNS时,就会启动一个后台摘要协程。

我们将除最后两轮外的所有内容压缩成一个法文段落,然后:

  1. 将该段落作为新的助手消息插入到对话顶部。

  2. 删除用于摘要的消息项。

稍后我们将询问语音智能体摘要使用的是哪种语言,以测试将摘要插入实时API对话上下文是否成功。

async def run_summary_llm(text: str) -> str:
    """Call a lightweight model to summarise `text`."""
    resp = await asyncio.to_thread(lambda: openai.chat.completions.create(
        model=SUMMARY_MODEL,
        temperature=0,
        messages=[
            {"role": "system", "content": "Summarise in French the following conversation "
                            "in one concise paragraph so it can be used as "
                            "context for future dialogue."},
            {"role": "user", "content": text},
        ],
    ))
    return resp.choices[0].message.content.strip()
async def summarise_and_prune(ws, state):
    """Summarise old turns, delete them server‑side, and prepend a single summary
    turn locally + remotely."""
    state.summarising = True
    print(
        f"⚠️  Token window ≈{state.latest_tokens}{SUMMARY_TRIGGER}. Summarising…",
    )
    old_turns, recent_turns = state.history[:-KEEP_LAST_TURNS], state.history[-KEEP_LAST_TURNS:]
    convo_text = "\n".join(f"{t.role}: {t.text}" for t in old_turns if t.text)
    
    if not convo_text:
        print("Nothing to summarise (transcripts still pending).")
        state.summarising = False

    summary_text = await run_summary_llm(convo_text) if convo_text else ""
    state.summary_count += 1
    summary_id = f"sum_{state.summary_count:03d}"
    state.history[:] = [Turn("assistant", summary_id, summary_text)] + recent_turns
    
    print_history(state)    

    # Create summary on server
    await ws.send(json.dumps({
        "type": "conversation.item.create",
        "previous_item_id": "root",
        "item": {
            "id": summary_id,
            "type": "message",
            "role": "assistant",
            "content": [{"type": "text", "text": summary_text}],
        },
    }))

    # Delete old items
    for turn in old_turns:
        await ws.send(json.dumps({
            "type": "conversation.item.delete",
            "item_id": turn.item_id,
        }))

    print(f"✅ Summary inserted ({summary_id})")
    
    state.summarising = False

以下函数让我们可以轮询获取转录文本。这在用户音频未能立即转录的情况下非常有用,因此我们可以稍后获取最终结果。

async def fetch_full_item(
    ws, item_id: str, state: ConversationState, attempts: int = 1
):
    """
    Ask the server for a full conversation item; retry up to 5× if the
    transcript field is still null.  Resolve the waiting future when done.
    """
    # If there is already a pending fetch, just await it
    if item_id in state.waiting:
        return await state.waiting[item_id]

    fut = asyncio.get_running_loop().create_future()
    state.waiting[item_id] = fut

    await ws.send(json.dumps({
        "type": "conversation.item.retrieve",
        "item_id": item_id,
    }))
    item = await fut

    # If transcript still missing retry (max 5×)
    if attempts < 5 and not item.get("content", [{}])[0].get("transcript"):
        await asyncio.sleep(0.4 * attempts)
        return await fetch_full_item(ws, item_id, state, attempts + 1)

    # Done – remove the marker
    state.waiting.pop(item_id, None)
    return item

4. 端到端工作流程演示

运行下面两个单元格以启动交互式会话。中断单元格停止录制。

注意:
本笔记本使用SUMMARY_TRIGGER = 2000KEEP_LAST_TURNS = 2以便快速演示摘要功能。
在实际应用中,您应根据应用程序需求调整这些值。

  • 典型的SUMMARY_TRIGGER阈值介于20,000–32,000 tokens之间,具体取决于您的使用场景中更大上下文对性能的影响程度。
# Audio/config knobs
SAMPLE_RATE_HZ    = 24_000   # Required by pcm16
CHUNK_DURATION_MS = 40       # chunk size for audio capture
BYTES_PER_SAMPLE  = 2        # pcm16 = 2 bytes/sample
SUMMARY_TRIGGER   = 2_000    # Summarise when context ≥ this
KEEP_LAST_TURNS   = 2       # Keep these turns verbatim
SUMMARY_MODEL     = "gpt-4o-mini"  # Cheaper, fast summariser
# --------------------------------------------------------------------------- #
# 🎤 Realtime session                                                          #
# --------------------------------------------------------------------------- #
async def realtime_session(model="gpt-4o-realtime-preview", voice="shimmer", enable_playback=True):
    """
    Main coroutine: connects to the Realtime endpoint, spawns helper tasks,
    and processes incoming events in a big async‑for loop.
    """
    state = ConversationState()  # Reset state for each run

    pcm_queue: asyncio.Queue[bytes] = asyncio.Queue()
    assistant_audio: List[bytes] = []

    # ----------------------------------------------------------------------- #
    # Open the WebSocket connection to the Realtime API                       #
    # ----------------------------------------------------------------------- #
    url = f"wss://api.openai.com/v1/realtime?model={model}"
    headers = {"Authorization": f"Bearer {openai.api_key}", "OpenAI-Beta": "realtime=v1"}

    async with websockets.connect(url, extra_headers=headers, max_size=1 << 24) as ws:
        # ------------------------------------------------------------------- #
        # Wait until server sends session.created                             #
        # ------------------------------------------------------------------- #
        while json.loads(await ws.recv())["type"] != "session.created":
            pass
        print("session.created ✅")

        # ------------------------------------------------------------------- #
        # Configure session: voice, modalities, audio formats, transcription  #
        # ------------------------------------------------------------------- #
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "voice": voice,
                "modalities": ["audio", "text"],
                "input_audio_format": "pcm16",
                "output_audio_format": "pcm16",
                "input_audio_transcription": {"model": "gpt-4o-transcribe"},
            },
        }))

        # ------------------------------------------------------------------- #
        # Launch background tasks: mic capture → queue → websocket            #
        # ------------------------------------------------------------------- #
        mic_task = asyncio.create_task(mic_to_queue(pcm_queue))
        upl_task = asyncio.create_task(queue_to_websocket(pcm_queue, ws))

        print("🎙️ Speak now (Ctrl‑C to quit)…")

        try:
            # ------------------------------------------------------------------- #
            # Main event loop: process incoming events from the websocket         #
            # ------------------------------------------------------------------- #
            async for event_raw in ws:
                event = json.loads(event_raw)
                etype = event["type"]

                # --------------------------------------------------------------- #
                # User just spoke ⇢ conversation.item.created (role = user)        #
                # --------------------------------------------------------------- #
                if etype == "conversation.item.created" and event["item"]["role"] == "user":
                    item = event["item"]
                    text = None
                    if item["content"]:
                        text = item["content"][0].get("transcript")
                    
                    state.history.append(Turn("user", event["item"]["id"], text))
                    
                    # If transcript not yet available, fetch it later
                    if text is None:
                        asyncio.create_task(fetch_full_item(ws, item["id"], state))

                # --------------------------------------------------------------- #
                # Transcript fetched ⇢ conversation.item.retrieved                 #
                # --------------------------------------------------------------- #
                elif etype == "conversation.item.retrieved":
                    content = event["item"]["content"][0]
                    # Fill missing transcript in history
                    for t in state.history:
                        if t.item_id == event["item"]["id"]:
                            t.text = content.get("transcript")
                            break

                # --------------------------------------------------------------- #
                # Assistant audio arrives in deltas                               #
                # --------------------------------------------------------------- #
                elif etype == "response.audio.delta":
                    assistant_audio.append(base64.b64decode(event["delta"]))

                # --------------------------------------------------------------- #
                # Assistant reply finished ⇢ response.done                        #
                # --------------------------------------------------------------- #
                elif etype == "response.done":
                    for item in event["response"]["output"]:
                        if item["role"] == "assistant":
                            txt = item["content"][0]["transcript"]
                            state.history.append(Turn("assistant", item["id"], txt))
                            # print(f"\n🤖 {txt}\n")
                    state.latest_tokens = event["response"]["usage"]["total_tokens"]
                    print(f"—— response.done  (window ≈{state.latest_tokens} tokens) ——")
                    print_history(state)
                    
                    # Fetch any still‑missing user transcripts
                    for turn in state.history:
                        if (turn.role == "user"
                            and turn.text is None
                            and turn.item_id not in state.waiting):
                            asyncio.create_task(
                                fetch_full_item(ws, turn.item_id, state)
                            )

                    # Playback collected audio once reply completes
                    if enable_playback and assistant_audio:
                        simpleaudio.play_buffer(b"".join(assistant_audio), 1, BYTES_PER_SAMPLE, SAMPLE_RATE_HZ)
                        assistant_audio.clear()

                    # Summarise if context too large – fire in background so we don't block dialogue
                    if state.latest_tokens >= SUMMARY_TRIGGER and len(state.history) > KEEP_LAST_TURNS and not state.summarising:
                        asyncio.create_task(summarise_and_prune(ws, state))

        except KeyboardInterrupt:
            print("\nStopping…")
        finally:
            mic_task.cancel()
            await pcm_queue.put(None)
            await upl_task
# Run the realtime session (this cell blocks until you stop it)
await realtime_session()
session.created ✅
🎙️ Speak now (Ctrl‑C to quit)…
—— response.done  (window ≈979 tokens) ——
—— Conversation so far ———————————————
[user     ] Can you tell me a quick story?  (item_BTuMOcpUqp8qknKhLzlkA)
[assistant] Once upon a time, in a cozy little village, there was a cat named Whiskers who was always getting into trouble. One sunny day, Whiskers found a mysterious glowing stone in the garden. Curious, he pawed at it, and poof! The stone granted him the ability to talk to birds. Whiskers and his new bird friends had grand adventures, solving mysteries and exploring the village. And from that day on, Whiskers was known as the most adventurous cat in the village. The end.  (item_BTuMPRWxqpv0ph6QM46DK)
——————————————————————————————————————————
—— response.done  (window ≈2755 tokens) ——
—— Conversation so far ———————————————
[user     ] Can you tell me a quick story?  (item_BTuMOcpUqp8qknKhLzlkA)
[assistant] Once upon a time, in a cozy little village, there was a cat named Whiskers who was always getting into trouble. One sunny day, Whiskers found a mysterious glowing stone in the garden. Curious, he pawed at it, and poof! The stone granted him the ability to talk to birds. Whiskers and his new bird friends had grand adventures, solving mysteries and exploring the village. And from that day on, Whiskers was known as the most adventurous cat in the village. The end.  (item_BTuMPRWxqpv0ph6QM46DK)
[user     ] Can you tell me three extremely funny stories?  (item_BTuNN64LdULM21OyC4vzN)
[assistant] Sure, let's dive into some giggle-worthy tales:  **Story One:** There was a forgetful baker named Benny who baked a hundred cakes for a big wedding. But on the big day, he forgot where he put them! The entire town joined in to find the missing cakes, only to discover Benny had stored them in his neighbor's garage, thinking it was his pantry. The wedding turned into a town-wide cake feast!  **Story Two:** A mischievous dog named Sparky loved to play pranks. One day, he swapped his owner's phone with a squeaky toy, causing a hilarious mix-up of barks, squeaks, and confused calls. Sparky's owner ended up having a full conversation with the mailman, all in squeaks!  **Story Three:** In a small town, a parrot named Polly became a local celebrity for reciting tongue twisters. One day, Polly challenged the mayor to a tongue twister duel. The mayor, tongue-tied and laughing, declared Polly the official town jester. Polly squawked with pride, and the town rang with laughter for days.  (item_BTuNNpNxki5ynSQ5c3Xsa)
——————————————————————————————————————————
⚠️  Token window ≈2755 ≥ 2000. Summarising…
—— Conversation so far ———————————————
[assistant] L'utilisateur a demandé une histoire rapide, et l'assistant a raconté celle d'un chat nommé Whiskers qui, après avoir trouvé une pierre mystérieuse dans son jardin, a obtenu le pouvoir de parler aux oiseaux. Avec ses nouveaux amis oiseaux, Whiskers a vécu de grandes aventures, résolvant des mystères et explorant le village, devenant ainsi le chat le plus aventurier du village.  (sum_001)
[user     ] Can you tell me three extremely funny stories?  (item_BTuNN64LdULM21OyC4vzN)
[assistant] Sure, let's dive into some giggle-worthy tales:  **Story One:** There was a forgetful baker named Benny who baked a hundred cakes for a big wedding. But on the big day, he forgot where he put them! The entire town joined in to find the missing cakes, only to discover Benny had stored them in his neighbor's garage, thinking it was his pantry. The wedding turned into a town-wide cake feast!  **Story Two:** A mischievous dog named Sparky loved to play pranks. One day, he swapped his owner's phone with a squeaky toy, causing a hilarious mix-up of barks, squeaks, and confused calls. Sparky's owner ended up having a full conversation with the mailman, all in squeaks!  **Story Three:** In a small town, a parrot named Polly became a local celebrity for reciting tongue twisters. One day, Polly challenged the mayor to a tongue twister duel. The mayor, tongue-tied and laughing, declared Polly the official town jester. Polly squawked with pride, and the town rang with laughter for days.  (item_BTuNNpNxki5ynSQ5c3Xsa)
——————————————————————————————————————————
✅ Summary inserted (sum_001)
—— response.done  (window ≈2147 tokens) ——
—— Conversation so far ———————————————
[assistant] L'utilisateur a demandé une histoire rapide, et l'assistant a raconté celle d'un chat nommé Whiskers qui, après avoir trouvé une pierre mystérieuse dans son jardin, a obtenu le pouvoir de parler aux oiseaux. Avec ses nouveaux amis oiseaux, Whiskers a vécu de grandes aventures, résolvant des mystères et explorant le village, devenant ainsi le chat le plus aventurier du village.  (sum_001)
[user     ] Can you tell me three extremely funny stories?  (item_BTuNN64LdULM21OyC4vzN)
[assistant] Sure, let's dive into some giggle-worthy tales:  **Story One:** There was a forgetful baker named Benny who baked a hundred cakes for a big wedding. But on the big day, he forgot where he put them! The entire town joined in to find the missing cakes, only to discover Benny had stored them in his neighbor's garage, thinking it was his pantry. The wedding turned into a town-wide cake feast!  **Story Two:** A mischievous dog named Sparky loved to play pranks. One day, he swapped his owner's phone with a squeaky toy, causing a hilarious mix-up of barks, squeaks, and confused calls. Sparky's owner ended up having a full conversation with the mailman, all in squeaks!  **Story Three:** In a small town, a parrot named Polly became a local celebrity for reciting tongue twisters. One day, Polly challenged the mayor to a tongue twister duel. The mayor, tongue-tied and laughing, declared Polly the official town jester. Polly squawked with pride, and the town rang with laughter for days.  (item_BTuNNpNxki5ynSQ5c3Xsa)
[user     ]   (item_BTuPLaCv8ATdIwAQ2rLgO)
[assistant] Sure! The first summary I provided between us was in French.  (item_BTuPLa7BaSQToGCVOmfBK)

我们与语音AI进行了对话。经过几轮交流后,总令牌数达到了SUMMARY_MAX,这触发了对话摘要步骤。该步骤生成了早期消息的摘要。

由于总共有 N = 4 条消息,我们总结了前 N - 2 = 2 条消息:

—— Conversation so far ———————————————
[user     ] Can you tell me a quick story?  (item_BTuMOcpUqp8qknKhLzlkA)
[assistant] Once upon a time, in a cozy little village, there was a cat named Whiskers who was always getting into trouble. One sunny day, Whiskers found a mysterious glowing stone in the garden. Curious, he pawed at it, and poof! The stone granted him the ability to talk to birds. Whiskers and his new bird friends had grand adventures, solving mysteries and exploring the village. And from that day on, Whiskers was known as the most adventurous cat in the village. The end.  (item_BTuMPRWxqpv0ph6QM46DK)

随后我们创建了一个法语摘要,并使用 root: true 标志将其插入到对话历史中。这确保了摘要作为对话中的第一条消息显示。之后,我们使用 "type": "conversation.item.delete" 删除了已被汇总的原始条目。

为了验证摘要插入的正确性,我们询问了语音AI该摘要使用的语言。它正确地回答:

[assistant] Sure! The first summary I provided between us was in French.  (item_BTuPLa7BaSQToGCVOmfBK)

5 · 实际应用

上下文摘要对于长时间运行的语音体验非常有用。
以下是一些用例构想:

使用场景附加价值为何有用
客户支持语音机器人24/7自然语音菜单;自动生成工单摘要总结冗长的客户通话内容,实现高效交接和记录保存,减轻客服人员工作量并提升响应质量。
语言导师提供带纠错反馈的实时对话练习帮助追踪学习者进度并突出常见错误,实现个性化反馈和更有效的语言习得。
AI心理治疗师/教练安全、随时可用的倾听者,能记住会话内容通过回忆关键话题和情绪基调来保持跨会话的连续性,提供更具同理心和有效的体验。
会议助手实时转录 + Slack中的简洁行动项摘要将冗长的会议提炼为可执行的摘要,节省团队成员时间并确保重要内容不被遗漏。

6 · 后续步骤与延伸阅读

尝试使用笔记本,并将上下文摘要集成到您的应用程序中。

你可以尝试的一些事情:

尝试这个…你将学到
A/B测试摘要对比
在开启摘要功能与关闭摘要功能两种状态下运行您的评估套件。
验证摘要裁剪是否真正提升了您所在领域的质量——以及它如何影响延迟和成本。
切换摘要样式
将系统提示更改为项目符号、JSON、英文与法文等格式
下游助手最适合吸收哪种格式;语言选择如何影响后续回答
Vary thresholds
Play with SUMMARY_TRIGGER_TOKENS (2 k → 8 k).
The sweet spot between model drift and summarisation overhead.
Cost tracing
Log usage.total_tokens before/after summarisation.
Concrete ROI: token savings per hour of conversation.

资源: