模型客户端#

AutoGen 提供了一套内置的模型客户端，用于使用 ChatCompletion API。所有模型客户端都实现了 ChatCompletionClient 协议类。

目前有三种内置的模型客户端：

OpenAI#

要使用 OpenAIChatCompletionClient，你需要安装 openai 扩展包。

# pip install "autogen-ext[openai]"

您还需要通过环境变量OPENAI_API_KEY或通过api_key参数提供API密钥。

from autogen_core.models import UserMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient

# Create an OpenAI model client.
model_client = OpenAIChatCompletionClient(
    model="gpt-4o",
    # api_key="sk-...", # Optional if you have an API key set in the environment.
)

你可以调用create()方法来创建一个聊天完成请求，并等待返回一个CreateResult对象。

# Send a message list to the model and await the response.
messages = [
    UserMessage(content="What is the capital of France?", source="user"),
]
response = await model_client.create(messages=messages)

# Print the response
print(response.content)

The capital of France is Paris.

# Print the response token usage
print(response.usage)

RequestUsage(prompt_tokens=15, completion_tokens=7)

Azure OpenAI#

要使用AzureOpenAIChatCompletionClient，你需要提供部署ID、Azure认知服务端点、API版本和模型能力。对于认证，你可以提供API密钥或Azure Active Directory（AAD）令牌凭证。

# pip install "autogen-ext[openai,azure]"

以下代码片段展示了如何使用AAD身份验证。所使用的身份必须被分配Cognitive Services OpenAI User角色。

from autogen_ext.models.openai import AzureOpenAIChatCompletionClient
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

# Create the token provider
token_provider = get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

az_model_client = AzureOpenAIChatCompletionClient(
    azure_deployment="{your-azure-deployment}",
    model="{model-name, such as gpt-4o}",
    api_version="2024-06-01",
    azure_endpoint="https://{your-custom-endpoint}.openai.azure.com/",
    azure_ad_token_provider=token_provider,  # Optional if you choose key-based authentication.
    # api_key="sk-...", # For key-based authentication.
)

注意

查看这里了解如何直接使用Azure客户端或获取更多信息。

Azure AI 铸造厂#

Azure AI Foundry（以前称为Azure AI Studio）提供了托管在Azure上的模型。要使用这些模型，您可以使用AzureAIChatCompletionClient。

您需要安装azure额外包以使用此客户端。

# pip install "autogen-ext[openai,azure]"

以下是使用该客户端与Phi-4模型的示例，来自GitHub Marketplace。

import os

from autogen_core.models import UserMessage
from autogen_ext.models.azure import AzureAIChatCompletionClient
from azure.core.credentials import AzureKeyCredential

client = AzureAIChatCompletionClient(
    model="Phi-4",
    endpoint="https://models.inference.ai.azure.com",
    # To authenticate with the model you will need to generate a personal access token (PAT) in your GitHub settings.
    # Create your PAT token by following instructions here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
    credential=AzureKeyCredential(os.environ["GITHUB_TOKEN"]),
    model_info={
        "json_output": False,
        "function_calling": False,
        "vision": False,
        "family": "unknown",
    },
)

result = await client.create([UserMessage(content="What is the capital of France?", source="user")])
print(result)

finish_reason='stop' content='The capital of France is Paris.' usage=RequestUsage(prompt_tokens=14, completion_tokens=8) cached=False logprobs=None

Ollama（实验性）#

Ollama 是一个本地模型服务器，可以在您的机器上本地运行模型。

注意

本地的小型模型通常不如云端的较大模型强大。对于某些任务，它们可能表现不佳，输出结果也可能令人惊讶。

要使用Ollama，请安装ollama扩展并使用OllamaChatCompletionClient。

pip install -U "autogen-ext[ollama]"

from autogen_core.models import UserMessage
from autogen_ext.models.ollama import OllamaChatCompletionClient

# Assuming your Ollama server is running locally on port 11434.
ollama_model_client = OllamaChatCompletionClient(model="llama3.2")

response = await ollama_model_client.create([UserMessage(content="What is the capital of France?", source="user")])
print(response)

finish_reason='unknown' content='The capital of France is Paris.' usage=RequestUsage(prompt_tokens=32, completion_tokens=8) cached=False logprobs=None thought=None

Gemini（测试版）#

下面的示例展示了如何通过OpenAIChatCompletionClient使用Gemini模型。

from autogen_core.models import UserMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient

model_client = OpenAIChatCompletionClient(
    model="gemini-1.5-flash-8b",
    # api_key="GEMINI_API_KEY",
)

response = await model_client.create([UserMessage(content="What is the capital of France?", source="user")])
print(response)

finish_reason='stop' content='Paris\n' usage=RequestUsage(prompt_tokens=7, completion_tokens=2) cached=False logprobs=None thought=None

语义内核适配器#

SKChatCompletionAdapter 允许你将语义内核模型客户端适配为所需的接口，从而作为ChatCompletionClient使用。

您需要安装相关的提供者扩展以使用此适配器。

可以安装的额外功能列表：

semantic-kernel-anthropic: 安装此扩展以使用Anthropic模型。
semantic-kernel-google: 安装此扩展以使用 Google Gemini 模型。
semantic-kernel-ollama: 安装此额外组件以使用 Ollama 模型。
semantic-kernel-mistralai: 安装此扩展以使用MistralAI模型。
semantic-kernel-aws: 安装此附加组件以使用AWS模型。
semantic-kernel-hugging-face: 安装此扩展以使用Hugging Face模型。

例如，要使用Anthropic模型，你需要安装semantic-kernel-anthropic。

# pip install "autogen-ext[semantic-kernel-anthropic]"

要使用此适配器，您需要创建一个Semantic Kernel模型客户端并将其传递给适配器。

例如，使用Anthropic模型：

import os

from autogen_core.models import UserMessage
from autogen_ext.models.semantic_kernel import SKChatCompletionAdapter
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.anthropic import AnthropicChatCompletion, AnthropicChatPromptExecutionSettings
from semantic_kernel.memory.null_memory import NullMemory

sk_client = AnthropicChatCompletion(
    ai_model_id="claude-3-5-sonnet-20241022",
    api_key=os.environ["ANTHROPIC_API_KEY"],
    service_id="my-service-id",  # Optional; for targeting specific services within Semantic Kernel
)
settings = AnthropicChatPromptExecutionSettings(
    temperature=0.2,
)

anthropic_model_client = SKChatCompletionAdapter(
    sk_client, kernel=Kernel(memory=NullMemory()), prompt_settings=settings
)

# Call the model directly.
model_result = await anthropic_model_client.create(
    messages=[UserMessage(content="What is the capital of France?", source="User")]
)
print(model_result)

finish_reason='stop' content='The capital of France is Paris. It is also the largest city in France and one of the most populous metropolitan areas in Europe.' usage=RequestUsage(prompt_tokens=0, completion_tokens=0) cached=False logprobs=None

阅读更多关于Semantic Kernel适配器的内容。

流式响应#

你可以使用create_stream()方法来创建一个带有流式响应的聊天完成请求。

messages = [
    UserMessage(content="Write a very short story about a dragon.", source="user"),
]

# Create a stream.
stream = model_client.create_stream(messages=messages)

# Iterate over the stream and print the responses.
print("Streamed responses:")
async for response in stream:  # type: ignore
    if isinstance(response, str):
        # A partial response is a string.
        print(response, flush=True, end="")
    else:
        # The last response is a CreateResult object with the complete message.
        print("\n\n------------\n")
        print("The complete response:", flush=True)
        print(response.content, flush=True)
        print("\n\n------------\n")
        print("The token usage was:", flush=True)
        print(response.usage, flush=True)

Streamed responses:
In the heart of an ancient forest, beneath the shadow of snow-capped peaks, a dragon named Elara lived secretly for centuries. Elara was unlike any dragon from the old tales; her scales shimmered with a deep emerald hue, each scale engraved with symbols of lost wisdom. The villagers in the nearby valley spoke of mysterious lights dancing across the night sky, but none dared venture close enough to solve the enigma.

One cold winter's eve, a young girl named Lira, brimming with curiosity and armed with the innocence of youth, wandered into Elara’s domain. Instead of fire and fury, she found warmth and a gentle gaze. The dragon shared stories of a world long forgotten and in return, Lira gifted her simple stories of human life, rich in laughter and scent of earth.

From that night on, the villagers noticed subtle changes—the crops grew taller, and the air seemed sweeter. Elara had infused the valley with ancient magic, a guardian of balance, watching quietly as her new friend thrived under the stars. And so, Lira and Elara’s bond marked the beginning of a timeless friendship that spun tales of hope whispered through the leaves of the ever-verdant forest.

------------

The complete response:
In the heart of an ancient forest, beneath the shadow of snow-capped peaks, a dragon named Elara lived secretly for centuries. Elara was unlike any dragon from the old tales; her scales shimmered with a deep emerald hue, each scale engraved with symbols of lost wisdom. The villagers in the nearby valley spoke of mysterious lights dancing across the night sky, but none dared venture close enough to solve the enigma.

One cold winter's eve, a young girl named Lira, brimming with curiosity and armed with the innocence of youth, wandered into Elara’s domain. Instead of fire and fury, she found warmth and a gentle gaze. The dragon shared stories of a world long forgotten and in return, Lira gifted her simple stories of human life, rich in laughter and scent of earth.

From that night on, the villagers noticed subtle changes—the crops grew taller, and the air seemed sweeter. Elara had infused the valley with ancient magic, a guardian of balance, watching quietly as her new friend thrived under the stars. And so, Lira and Elara’s bond marked the beginning of a timeless friendship that spun tales of hope whispered through the leaves of the ever-verdant forest.

------------

The token usage was:
RequestUsage(prompt_tokens=0, completion_tokens=0)

注意

流式响应中的最后一个响应始终是类型为CreateResult的最终响应。

注意

默认使用响应是返回零值

比较上述非流式 model_client.create(messages=messages) 与流式 model_client.create_stream(messages=messages) 的使用情况，我们可以看到差异。默认情况下，非流式响应会返回有效的提示和完成令牌使用计数。默认情况下，流式响应会返回零值。

正如OPENAI API参考文档中所记录的，可以指定一个额外的参数stream_options以返回有效的使用计数。请参阅stream_options

只有在你使用流式处理时设置此项，即使用 create_stream

要在create_stream中启用此功能，请设置extra_create_args={"stream_options": {"include_usage": True}},

注意

请注意，虽然其他像LiteLLM这样的API也支持这一点，但不能保证它们总是完全支持或正确。

请参阅下面的示例，了解如何使用stream_options参数返回使用情况。

messages = [
    UserMessage(content="Write a very short story about a dragon.", source="user"),
]

# Create a stream.
stream = model_client.create_stream(messages=messages, extra_create_args={"stream_options": {"include_usage": True}})

# Iterate over the stream and print the responses.
print("Streamed responses:")
async for response in stream:  # type: ignore
    if isinstance(response, str):
        # A partial response is a string.
        print(response, flush=True, end="")
    else:
        # The last response is a CreateResult object with the complete message.
        print("\n\n------------\n")
        print("The complete response:", flush=True)
        print(response.content, flush=True)
        print("\n\n------------\n")
        print("The token usage was:", flush=True)
        print(response.usage, flush=True)

Streamed responses:
In a lush, emerald valley hidden by towering peaks, there lived a dragon named Ember. Unlike others of her kind, Ember cherished solitude over treasure, and the songs of the stream over the roar of flames. One misty dawn, a young shepherd stumbled into her sanctuary, lost and frightened. 

Instead of fury, he was met with kindness as Ember extended a wing, guiding him back to safety. In gratitude, the shepherd visited yearly, bringing tales of his world beyond the mountains. Over time, a friendship blossomed, binding man and dragon in shared stories and laughter.

As the years passed, the legend of Ember the gentle-hearted spread far and wide, forever changing the way dragons were seen in the hearts of many.

------------

The complete response:
In a lush, emerald valley hidden by towering peaks, there lived a dragon named Ember. Unlike others of her kind, Ember cherished solitude over treasure, and the songs of the stream over the roar of flames. One misty dawn, a young shepherd stumbled into her sanctuary, lost and frightened. 

Instead of fury, he was met with kindness as Ember extended a wing, guiding him back to safety. In gratitude, the shepherd visited yearly, bringing tales of his world beyond the mountains. Over time, a friendship blossomed, binding man and dragon in shared stories and laughter.

As the years passed, the legend of Ember the gentle-hearted spread far and wide, forever changing the way dragons were seen in the hearts of many.

------------

The token usage was:
RequestUsage(prompt_tokens=17, completion_tokens=146)

结构化输出#

可以通过在OpenAIChatCompletionClient和AzureOpenAIChatCompletionClient中设置response_format字段为Pydantic BaseModel类来启用结构化输出。

注意

结构化输出仅适用于支持它的模型。它还需要模型客户端也支持结构化输出。目前，OpenAIChatCompletionClient 和AzureOpenAIChatCompletionClient 支持结构化输出。

from typing import Literal

from pydantic import BaseModel


# The response format for the agent as a Pydantic base model.
class AgentResponse(BaseModel):
    thoughts: str
    response: Literal["happy", "sad", "neutral"]


# Create an agent that uses the OpenAI GPT-4o model with the custom response format.
model_client = OpenAIChatCompletionClient(
    model="gpt-4o",
    response_format=AgentResponse,  # type: ignore
)

# Send a message list to the model and await the response.
messages = [
    UserMessage(content="I am happy.", source="user"),
]
response = await model_client.create(messages=messages)
assert isinstance(response.content, str)
parsed_response = AgentResponse.model_validate_json(response.content)
print(parsed_response.thoughts)
print(parsed_response.response)

I'm glad to hear that you're feeling happy! It's such a great emotion that can brighten your whole day. Is there anything in particular that's bringing you joy today? 😊
happy

你还可以在extra_create_args参数中，使用create()方法来设置response_format字段，以便为每个请求配置结构化输出。

缓存模型响应#

autogen_ext 实现了 ChatCompletionCache，它可以包裹任何 ChatCompletionClient。使用此包装器可以在多次使用相同提示查询底层客户端时避免产生令牌使用。

ChatCompletionCache 使用了一个 CacheStore 协议。我们已经实现了一些有用的 CacheStore 变体，包括 DiskCacheStore 和 RedisStore。

以下是使用 diskcache 进行本地缓存的示例：

# pip install -U "autogen-ext[openai, diskcache]"

import asyncio
import tempfile

from autogen_core.models import UserMessage
from autogen_ext.cache_store.diskcache import DiskCacheStore
from autogen_ext.models.cache import CHAT_CACHE_VALUE_TYPE, ChatCompletionCache
from autogen_ext.models.openai import OpenAIChatCompletionClient
from diskcache import Cache


async def main() -> None:
    with tempfile.TemporaryDirectory() as tmpdirname:
        # Initialize the original client
        openai_model_client = OpenAIChatCompletionClient(model="gpt-4o")

        # Then initialize the CacheStore, in this case with diskcache.Cache.
        # You can also use redis like:
        # from autogen_ext.cache_store.redis import RedisStore
        # import redis
        # redis_instance = redis.Redis()
        # cache_store = RedisCacheStore[CHAT_CACHE_VALUE_TYPE](redis_instance)
        cache_store = DiskCacheStore[CHAT_CACHE_VALUE_TYPE](Cache(tmpdirname))
        cache_client = ChatCompletionCache(openai_model_client, cache_store)

        response = await cache_client.create([UserMessage(content="Hello, how are you?", source="user")])
        print(response)  # Should print response from OpenAI
        response = await cache_client.create([UserMessage(content="Hello, how are you?", source="user")])
        print(response)  # Should print cached response


asyncio.run(main())

True

检查cached_client.total_usage()（或model_client.total_usage()）在缓存响应前后应该产生相同的计数。

请注意，缓存对提供给 cached_client.create 或 cached_client.create_stream 的确切参数非常敏感，因此更改 tools 或 json_output 参数可能会导致缓存未命中。

使用模型客户端构建一个代理#

让我们创建一个简单的AI代理，它可以使用ChatCompletion API来响应消息。

from dataclasses import dataclass

from autogen_core import MessageContext, RoutedAgent, SingleThreadedAgentRuntime, message_handler
from autogen_core.models import ChatCompletionClient, SystemMessage, UserMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient


@dataclass
class Message:
    content: str


class SimpleAgent(RoutedAgent):
    def __init__(self, model_client: ChatCompletionClient) -> None:
        super().__init__("A simple agent")
        self._system_messages = [SystemMessage(content="You are a helpful AI assistant.")]
        self._model_client = model_client

    @message_handler
    async def handle_user_message(self, message: Message, ctx: MessageContext) -> Message:
        # Prepare input to the chat completion model.
        user_message = UserMessage(content=message.content, source="user")
        response = await self._model_client.create(
            self._system_messages + [user_message], cancellation_token=ctx.cancellation_token
        )
        # Return with the model's response.
        assert isinstance(response.content, str)
        return Message(content=response.content)

SimpleAgent 类是 autogen_core.RoutedAgent 类的子类，为了方便自动将消息路由到适当的处理程序。它有一个单一的处理程序，handle_user_message，用于处理来自用户的消息。它使用 ChatCompletionClient 来生成对消息的响应。然后它按照直接通信模型将响应返回给用户。

注意

cancellation_token 类型为 autogen_core.CancellationToken 的令牌用于取消异步操作。它与消息处理器内部的异步调用相关联，调用者可以使用它来取消处理器。

# Create the runtime and register the agent.
from autogen_core import AgentId

runtime = SingleThreadedAgentRuntime()
await SimpleAgent.register(
    runtime,
    "simple_agent",
    lambda: SimpleAgent(
        OpenAIChatCompletionClient(
            model="gpt-4o-mini",
            # api_key="sk-...", # Optional if you have an OPENAI_API_KEY set in the environment.
        )
    ),
)
# Start the runtime processing messages.
runtime.start()
# Send a message to the agent and get the response.
message = Message("Hello, what are some fun things to do in Seattle?")
response = await runtime.send_message(message, AgentId("simple_agent", "default"))
print(response.content)
# Stop the runtime processing messages.
await runtime.stop()

Seattle is a vibrant city with a wide range of activities and attractions. Here are some fun things to do in Seattle:

**Space Needle**: Visit this iconic observation tower for stunning views of the city and surrounding mountains.

**Pike Place Market**: Explore this historic market where you can see the famous fish toss, buy local produce, and find unique crafts and eateries.

**Museum of Pop Culture (MoPOP)**: Dive into the world of contemporary culture, music, and science fiction at this interactive museum.

**Chihuly Garden and Glass**: Marvel at the beautiful glass art installations by artist Dale Chihuly, located right next to the Space Needle.

**Seattle Aquarium**: Discover the diverse marine life of the Pacific Northwest at this engaging aquarium.

**Seattle Art Museum**: Explore a vast collection of art from around the world, including contemporary and indigenous art.

**Kerry Park**: For one of the best views of the Seattle skyline, head to this small park on Queen Anne Hill.

**Ballard Locks**: Watch boats pass through the locks and observe the salmon ladder to see salmon migrating.

**Ferry to Bainbridge Island**: Take a scenic ferry ride across Puget Sound to enjoy charming shops, restaurants, and beautiful natural scenery.

**Olympic Sculpture Park**: Stroll through this outdoor park with large-scale sculptures and stunning views of the waterfront and mountains.

**Underground Tour**: Discover Seattle's history on this quirky tour of the city's underground passageways in Pioneer Square.

**Seattle Waterfront**: Enjoy the shops, restaurants, and attractions along the waterfront, including the Seattle Great Wheel and the aquarium.

**Discovery Park**: Explore the largest green space in Seattle, featuring trails, beaches, and views of Puget Sound.

**Food Tours**: Try out Seattle’s diverse culinary scene, including fresh seafood, international cuisines, and coffee culture (don’t miss the original Starbucks!).

**Attend a Sports Game**: Catch a Seahawks (NFL), Mariners (MLB), or Sounders (MLS) game for a lively local experience.

Whether you're interested in culture, nature, food, or history, Seattle has something for everyone to enjoy!

上述 SimpleAgent 总是响应一个仅包含系统消息和最新用户消息的新上下文。我们可以使用来自 autogen_core.model_context 的模型上下文类来使代理“记住”之前的对话。有关更多详情，请参阅 Model Context 页面。

从环境变量获取API密钥#

在上面的示例中，我们展示了您可以通过api_key参数提供API密钥。重要的是，OpenAI和Azure OpenAI客户端使用openai包，如果没有提供API密钥，它将自动从环境变量中读取API密钥。

对于OpenAI，您可以设置OPENAI_API_KEY环境变量。
对于 Azure OpenAI，您可以设置环境变量 AZURE_OPENAI_API_KEY。

此外，对于Gemini (Beta)，您可以设置GEMINI_API_KEY环境变量。

这是一个值得探索的好做法，因为它避免了在代码中包含敏感的api密钥。