代理#

AutoGen AgentChat 提供了一组预设的代理,每个代理在如何响应消息方面都有所不同。 所有代理共享以下属性和方法:

请参阅 autogen_agentchat.messages 了解更多关于 AgentChat 消息类型的信息。

助手代理#

AssistantAgent 是一个内置的代理,它使用语言模型并具备使用工具的能力。

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_agentchat.ui import Console
from autogen_core import CancellationToken
from autogen_ext.models.openai import OpenAIChatCompletionClient
# Define a tool that searches the web for information.
async def web_search(query: str) -> str:
    """Find information on the web"""
    return "AutoGen is a programming framework for building multi-agent applications."


# Create an agent that uses the OpenAI GPT-4o model.
model_client = OpenAIChatCompletionClient(
    model="gpt-4o",
    # api_key="YOUR_API_KEY",
)
agent = AssistantAgent(
    name="assistant",
    model_client=model_client,
    tools=[web_search],
    system_message="Use tools to solve tasks.",
)

获取响应#

我们可以使用on_messages()方法来获取代理对给定消息的响应。

async def assistant_run() -> None:
    response = await agent.on_messages(
        [TextMessage(content="Find information on AutoGen", source="user")],
        cancellation_token=CancellationToken(),
    )
    print(response.inner_messages)
    print(response.chat_message)


# Use asyncio.run(assistant_run()) when running in a script.
await assistant_run()
[ToolCallRequestEvent(source='assistant', models_usage=RequestUsage(prompt_tokens=598, completion_tokens=16), content=[FunctionCall(id='call_9UWYM1CgE3ZbnJcSJavNDB79', arguments='{"query":"AutoGen"}', name='web_search')], type='ToolCallRequestEvent'), ToolCallExecutionEvent(source='assistant', models_usage=None, content=[FunctionExecutionResult(content='AutoGen is a programming framework for building multi-agent applications.', call_id='call_9UWYM1CgE3ZbnJcSJavNDB79', is_error=False)], type='ToolCallExecutionEvent')]
source='assistant' models_usage=None content='AutoGen is a programming framework for building multi-agent applications.' type='ToolCallSummaryMessage'

调用 on_messages() 方法 返回一个 Response, 其中包含代理的最终响应在 chat_message 属性中, 以及一系列内部消息在 inner_messages 属性中, 该属性存储了代理生成最终响应的“思考过程”。

注意

需要注意的是,on_messages() 将会更新代理的内部状态——它会将消息添加到代理的历史记录中。因此,你应该使用新的消息来调用这个方法。 你不应该重复使用相同的消息或完整的历史记录来调用这个方法。

注意

与 v0.2 版本的 AgentChat 不同,工具是由同一个代理在同一次调用 on_messages() 中直接执行的。 默认情况下,代理将返回工具调用的结果作为最终响应。

你也可以调用run()方法,这是一个简便方法,它会调用on_messages()。 它与Teams遵循相同的接口,并返回一个TaskResult对象。

多模态输入#

AssistantAgent 可以通过提供一个 MultiModalMessage 来处理多模态输入。

from io import BytesIO

import PIL
import requests
from autogen_agentchat.messages import MultiModalMessage
from autogen_core import Image

# Create a multi-modal message with random image and text.
pil_image = PIL.Image.open(BytesIO(requests.get("https://picsum.photos/300/200").content))
img = Image(pil_image)
multi_modal_message = MultiModalMessage(content=["Can you describe the content of this image?", img], source="user")
img
# Use asyncio.run(...) when running in a script.
response = await agent.on_messages([multi_modal_message], CancellationToken())
print(response.chat_message.content)
The image depicts a vintage car, likely from the 1930s or 1940s, with a sleek, classic design. The car seems to be customized or well-maintained, as indicated by its shiny exterior and lowered stance. It has a prominent grille and round headlights. There's a license plate on the front with the text "FARMER BOY." The setting appears to be a street with old-style buildings in the background, suggesting a historical or retro theme.

你也可以使用MultiModalMessage作为run()方法的task输入。

流式消息#

我们还可以通过使用on_messages_stream()方法,流式传输代理生成的每条消息, 并使用Console在消息出现时打印到控制台。

async def assistant_run_stream() -> None:
    # Option 1: read each message from the stream (as shown in the previous example).
    # async for message in agent.on_messages_stream(
    #     [TextMessage(content="Find information on AutoGen", source="user")],
    #     cancellation_token=CancellationToken(),
    # ):
    #     print(message)

    # Option 2: use Console to print all messages as they appear.
    await Console(
        agent.on_messages_stream(
            [TextMessage(content="Find information on AutoGen", source="user")],
            cancellation_token=CancellationToken(),
        ),
        output_stats=True,  # Enable stats printing.
    )


# Use asyncio.run(assistant_run_stream()) when running in a script.
await assistant_run_stream()
---------- assistant ----------
[FunctionCall(id='call_fSp5iTGVm2FKw5NIvfECSqNd', arguments='{"query":"AutoGen information"}', name='web_search')]
[Prompt tokens: 61, Completion tokens: 16]
---------- assistant ----------
[FunctionExecutionResult(content='AutoGen is a programming framework for building multi-agent applications.', call_id='call_fSp5iTGVm2FKw5NIvfECSqNd')]
---------- assistant ----------
AutoGen is a programming framework designed for building multi-agent applications. If you need more detailed information or specific aspects about AutoGen, feel free to ask!
[Prompt tokens: 93, Completion tokens: 32]
---------- Summary ----------
Number of inner messages: 2
Total prompt tokens: 154
Total completion tokens: 48
Duration: 4.30 seconds

on_messages_stream() 方法 返回一个异步生成器,该生成器会产生代理生成的每条内部消息, 最后一项是chat_message属性中的响应消息。

从消息中,你可以观察到助手代理利用web_search工具来收集信息,并根据搜索结果进行回应。

你也可以使用 run_stream() 来获得与 on_messages_stream() 相同的流式行为。它遵循与 Teams 相同的接口。

使用工具#

大型语言模型(LLMs)通常仅限于生成文本或代码响应。 然而,许多复杂任务受益于使用执行特定操作的外部工具的能力, 例如从API或数据库中获取数据。

为了解决这一限制,现代LLM现在可以接受可用工具模式的列表(工具及其参数的描述)并生成工具调用消息。这一功能被称为工具调用函数调用,并且在构建基于智能代理的应用程序中变得越来越流行。有关LLM中工具调用的更多信息,请参阅OpenAIAnthropic的文档。

在 AgentChat 中,AssistantAgent 可以使用工具来执行特定操作。 web_search 工具就是其中一种,允许助手代理在网络上搜索信息。 自定义工具可以是一个 Python 函数或 BaseTool 的子类。

默认情况下,当AssistantAgent执行一个工具时,它会将工具的输出作为字符串返回到响应中的ToolCallSummaryMessage中。如果您的工具没有返回一个自然语言中格式良好的字符串,您可以通过在AssistantAgent构造函数中设置reflect_on_tool_use=True参数,添加一个反思步骤,让模型总结工具的输出。

内置工具#

AutoGen 扩展提供了一套内置工具,可以与助手代理一起使用。前往API 文档查看所有可用的工具,它们位于autogen_ext.tools命名空间下。例如,您可以找到以下工具:

  • graphrag: 用于使用GraphRAG索引的工具。

  • http: 用于发出HTTP请求的工具。

  • langchain: 用于使用LangChain工具的适配器。

  • mcp: 用于使用模型聊天协议 (MCP) 服务器的工具。

功能工具#

AssistantAgent 自动将 Python 函数转换为 FunctionTool,该工具可以被代理使用,并自动从函数签名和文档字符串生成工具架构。

web_search_func 工具是一个函数工具的示例。架构是自动生成的。

from autogen_core.tools import FunctionTool


# Define a tool using a Python function.
async def web_search_func(query: str) -> str:
    """Find information on the web"""
    return "AutoGen is a programming framework for building multi-agent applications."


# This step is automatically performed inside the AssistantAgent if the tool is a Python function.
web_search_function_tool = FunctionTool(web_search_func, description="Find information on the web")
# The schema is provided to the model during AssistantAgent's on_messages call.
web_search_function_tool.schema
{'name': 'web_search_func',
 'description': 'Find information on the web',
 'parameters': {'type': 'object',
  'properties': {'query': {'description': 'query',
    'title': 'Query',
    'type': 'string'}},
  'required': ['query'],
  'additionalProperties': False},
 'strict': False}

模型上下文协议工具#

AssistantAgent 也可以使用从模型上下文协议(MCP)服务器提供的工具,通过 mcp_server_tools() 来实现。

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.tools.mcp import StdioServerParams, mcp_server_tools

# Get the fetch tool from mcp-server-fetch.
fetch_mcp_server = StdioServerParams(command="uvx", args=["mcp-server-fetch"])
tools = await mcp_server_tools(fetch_mcp_server)

# Create an agent that can use the fetch tool.
model_client = OpenAIChatCompletionClient(model="gpt-4o")
agent = AssistantAgent(name="fetcher", model_client=model_client, tools=tools, reflect_on_tool_use=True)  # type: ignore

# Let the agent fetch the content of a URL and summarize it.
result = await agent.run(task="Summarize the content of https://en.wikipedia.org/wiki/Seattle")
print(result.messages[-1].content)
Seattle, located in Washington state, is the most populous city in the state and a major city in the Pacific Northwest region of the United States. It's known for its vibrant cultural scene, significant economic presence, and rich history. Here are some key points about Seattle from the Wikipedia page:

1. **History and Geography**: Seattle is situated between Puget Sound and Lake Washington, with the Cascade Range to the east and the Olympic Mountains to the west. Its history is deeply rooted in Native American heritage and its development was accelerated with the arrival of settlers in the 19th century. The city was officially incorporated in 1869.

2. **Economy**: Seattle is a major economic hub with a diverse economy anchored by sectors like aerospace, technology, and retail. It's home to influential companies such as Amazon and Starbucks, and has a significant impact on the tech industry due to companies like Microsoft and other technology enterprises in the surrounding area.

3. **Cultural Significance**: Known for its music scene, Seattle was the birthplace of grunge music in the early 1990s. It also boasts significant attractions like the Space Needle, Pike Place Market, and the Seattle Art Museum. 

4. **Education and Innovation**: The city hosts important educational institutions, with the University of Washington being a leading research university. Seattle is recognized for fostering innovation and is a leader in environmental sustainability efforts.

5. **Demographics and Diversity**: Seattle is noted for its diverse population, reflected in its rich cultural tapestry. It has seen a significant increase in population, leading to urban development and changes in its social landscape.

These points highlight Seattle as a dynamic city with a significant cultural, economic, and educational influence within the United States and beyond.

Langchain 工具#

你也可以通过将它们包装在LangChainToolAdapter中使用来自Langchain库的工具。

import pandas as pd
from autogen_ext.tools.langchain import LangChainToolAdapter
from langchain_experimental.tools.python.tool import PythonAstREPLTool

df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")
tool = LangChainToolAdapter(PythonAstREPLTool(locals={"df": df}))
model_client = OpenAIChatCompletionClient(model="gpt-4o")
agent = AssistantAgent(
    "assistant", tools=[tool], model_client=model_client, system_message="Use the `df` variable to access the dataset."
)
await Console(
    agent.on_messages_stream(
        [TextMessage(content="What's the average age of the passengers?", source="user")], CancellationToken()
    ),
    output_stats=True,
)
---------- assistant ----------
[FunctionCall(id='call_BEYRkf53nBS1G2uG60wHP0zf', arguments='{"query":"df[\'Age\'].mean()"}', name='python_repl_ast')]
[Prompt tokens: 111, Completion tokens: 22]
---------- assistant ----------
[FunctionExecutionResult(content='29.69911764705882', call_id='call_BEYRkf53nBS1G2uG60wHP0zf')]
---------- assistant ----------
29.69911764705882
---------- Summary ----------
Number of inner messages: 2
Total prompt tokens: 111
Total completion tokens: 22
Duration: 0.62 seconds
Response(chat_message=ToolCallSummaryMessage(source='assistant', models_usage=None, content='29.69911764705882', type='ToolCallSummaryMessage'), inner_messages=[ToolCallRequestEvent(source='assistant', models_usage=RequestUsage(prompt_tokens=111, completion_tokens=22), content=[FunctionCall(id='call_BEYRkf53nBS1G2uG60wHP0zf', arguments='{"query":"df[\'Age\'].mean()"}', name='python_repl_ast')], type='ToolCallRequestEvent'), ToolCallExecutionEvent(source='assistant', models_usage=None, content=[FunctionExecutionResult(content='29.69911764705882', call_id='call_BEYRkf53nBS1G2uG60wHP0zf')], type='ToolCallExecutionEvent')])

并行工具调用#

有些模型支持并行工具调用,这对于需要同时调用多个工具的任务非常有用。 默认情况下,如果模型客户端生成多个工具调用,AssistantAgent 会并行调用这些工具。

当工具具有相互干扰的副作用时,或者当需要在不同模型之间保持代理行为一致时,您可能希望禁用并行工具调用。这应该在模型客户端级别完成。

对于OpenAIChatCompletionClientAzureOpenAIChatCompletionClient,设置parallel_tool_calls=False以禁用并行工具调用。

model_client_no_parallel_tool_call = OpenAIChatCompletionClient(
    model="gpt-4o",
    parallel_tool_calls=False,  # type: ignore
)
agent_no_parallel_tool_call = AssistantAgent(
    name="assistant",
    model_client=model_client_no_parallel_tool_call,
    tools=[web_search],
    system_message="Use tools to solve tasks.",
)

在循环中运行代理#

AssistantAgent 一次执行一个步骤:一个模型调用,然后是一个工具调用(或并行工具调用),接着是可选的反思。

要在循环中运行它,例如,运行它直到它停止生成工具调用,请参考单代理团队

结构化输出#

结构化输出允许模型返回带有应用程序提供的预定义模式的JSON格式文本。与JSON模式不同,模式可以作为Pydantic BaseModel类提供,该类还可以用于验证输出。

注意

结构化输出仅适用于支持它的模型。它还需要模型客户端也支持结构化输出。 目前,OpenAIChatCompletionClientAzureOpenAIChatCompletionClient 支持结构化输出。

结构化输出对于在代理的响应中融入Chain-of-Thought(链式思维)推理也非常有用。请参阅下面的示例,了解如何与助理代理一起使用结构化输出。

from typing import Literal

from pydantic import BaseModel


# The response format for the agent as a Pydantic base model.
class AgentResponse(BaseModel):
    thoughts: str
    response: Literal["happy", "sad", "neutral"]


# Create an agent that uses the OpenAI GPT-4o model with the custom response format.
model_client = OpenAIChatCompletionClient(
    model="gpt-4o",
    response_format=AgentResponse,  # type: ignore
)
agent = AssistantAgent(
    "assistant",
    model_client=model_client,
    system_message="Categorize the input as happy, sad, or neutral following the JSON format.",
)

await Console(agent.run_stream(task="I am happy."))
---------- user ----------
I am happy.
---------- assistant ----------
{"thoughts":"The user explicitly states that they are happy.","response":"happy"}
TaskResult(messages=[TextMessage(source='user', models_usage=None, content='I am happy.', type='TextMessage'), TextMessage(source='assistant', models_usage=RequestUsage(prompt_tokens=89, completion_tokens=18), content='{"thoughts":"The user explicitly states that they are happy.","response":"happy"}', type='TextMessage')], stop_reason=None)

流式令牌#

你可以通过设置model_client_stream=True来流式传输模型客户端生成的标记。 这将使代理生成ModelClientStreamingChunkEvent消息 在on_messages_stream()run_stream()中。

底层模型API必须支持流式传输令牌才能使其正常工作。 请与您的模型提供商确认是否支持此功能。

model_client = OpenAIChatCompletionClient(model="gpt-4o")

streaming_assistant = AssistantAgent(
    name="assistant",
    model_client=model_client,
    system_message="You are a helpful assistant.",
    model_client_stream=True,  # Enable streaming tokens.
)

# Use an async function and asyncio.run() in a script.
async for message in streaming_assistant.on_messages_stream(  # type: ignore
    [TextMessage(content="Name two cities in South America", source="user")],
    cancellation_token=CancellationToken(),
):
    print(message)
source='assistant' models_usage=None content='Two' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' cities' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' in' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' South' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' America' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' are' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' Buenos' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' Aires' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' in' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' Argentina' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' and' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' São' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' Paulo' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' in' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' Brazil' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='.' type='ModelClientStreamingChunkEvent'
Response(chat_message=TextMessage(source='assistant', models_usage=RequestUsage(prompt_tokens=0, completion_tokens=0), content='Two cities in South America are Buenos Aires in Argentina and São Paulo in Brazil.', type='TextMessage'), inner_messages=[])

你可以在上面的输出中看到流式传输的块。 这些块由模型客户端生成,并在接收时由代理产生。 最终响应,即所有块的连接,在最后一个块之后立即产生。

类似地,run_stream() 也会产生相同的流式数据块,紧接着在最后一个数据块之后会生成一条完整的文本消息。

async for message in streaming_assistant.run_stream(task="Name two cities in North America."):  # type: ignore
    print(message)
source='user' models_usage=None content='Name two cities in North America.' type='TextMessage'
source='assistant' models_usage=None content='Two' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' cities' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' in' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' North' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' America' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' are' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' New' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' York' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' City' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' in' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' the' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' United' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' States' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' and' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' Toronto' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' in' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' Canada' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='.' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=RequestUsage(prompt_tokens=0, completion_tokens=0) content='Two cities in North America are New York City in the United States and Toronto in Canada.' type='TextMessage'
TaskResult(messages=[TextMessage(source='user', models_usage=None, content='Name two cities in North America.', type='TextMessage'), TextMessage(source='assistant', models_usage=RequestUsage(prompt_tokens=0, completion_tokens=0), content='Two cities in North America are New York City in the United States and Toronto in Canada.', type='TextMessage')], stop_reason=None)

使用模型上下文#

AssistantAgent 有一个 model_context 参数,可以用于传递一个 ChatCompletionContext 对象。这使得代理能够使用不同的模型上下文,例如 BufferedChatCompletionContext 来 限制发送到模型的上下文。

默认情况下,AssistantAgent使用UnboundedChatCompletionContext,它将完整的对话历史发送给模型。要将上下文限制为最后n条消息,您可以使用BufferedChatCompletionContext

from autogen_core.model_context import BufferedChatCompletionContext

# Create an agent that uses only the last 5 messages in the context to generate responses.
agent = AssistantAgent(
    name="assistant",
    model_client=model_client,
    tools=[web_search],
    system_message="Use tools to solve tasks.",
    model_context=BufferedChatCompletionContext(buffer_size=5),  # Only use the last 5 messages in the context.
)

其他预设代理#

以下预设代理可用:

  • UserProxyAgent: 一个代理,接收用户输入并将其作为响应返回。

  • CodeExecutorAgent: 一个可以执行代码的代理。

  • OpenAIAssistantAgent: 一个由OpenAI Assistant支持的代理,能够使用自定义工具。

  • MultimodalWebSurfer: 一个多模态代理,可以搜索网页并访问网页以获取信息。

  • FileSurfer: 一个可以搜索和浏览本地文件以获取信息的代理。

  • VideoSurfer: 一个可以观看视频以获取信息的代理。

下一步#

在探索了AssistantAgent的使用之后,我们现在可以继续进行下一部分,学习AgentChat中的团队功能。