代理:LangGraph

LangGraph 是一个开源库,用于构建基于LLM的有状态、多参与者的应用程序。它允许您定义多样化的控制流,以创建代理和多代理工作流。

本文档演示了如何使用BentoML提供LangGraph代理应用程序。

示例LangGraph代理在使用的LLM缺乏必要知识时,会调用DuckDuckGo来检索最新信息。例如:

{
   "query": "Who won the gold medal at the men's 100 metres event at the 2024 Summer Olympic?"
}

示例输出:

Noah Lyles (USA) won the gold medal at the men's 100 metres event at the 2024 Summer Olympic Games. He won by five-thousands of a second over Jamaica's Kishane Thompson.

此示例已准备好轻松部署和扩展在BentoCloud上。您可以使用外部LLM API或与LangGraph代理一起部署开源LLM。通过一个命令,您将获得一个生产级应用程序,具有快速自动扩展、在您的云中安全部署以及全面的可观察性。

../_static/img/use-cases/large-language-models/langgraph/langgraph-agent-on-bentocloud.png

架构

该项目由两个主要组件组成:一个BentoML服务,它作为REST API提供LangGraph代理,以及一个生成文本的LLM。LLM可以是外部API,如Claude 3.5 Sonnet,也可以是通过BentoML提供的开源模型(在本例中为Mistral 7B)。

../_static/img/use-cases/large-language-models/langgraph/langgraph-bentoml-architecture.png

用户提交查询后,它通过LangGraph代理进行处理,其中包括:

  • 一个使用LLM来理解查询并决定行动的agent node

  • 一个 tools 节点,可以在需要时调用外部工具。

在这个例子中,如果LLM需要额外的信息,tools 节点会调用DuckDuckGo在互联网上搜索必要的数据。DuckDuckGo然后将搜索结果返回给代理,代理会整理信息并将最终响应传递给用户。

代码解释

这个 示例 包含以下两个子项目,展示了不同LLMs的使用:

两个子项目在实现LangGraph代理时遵循相同的逻辑。本文档解释了langgraph-mistral中的关键代码实现。

mistral.py

mistral.py 文件定义了一个 BentoML 服务 MistralService,用于提供 Mistral 7B 模型。如果需要,您可以通过更改 MODEL_ID 来切换到不同的模型。

MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.3"

MistralService 提供与 OpenAI 兼容的 API,并使用 vLLM 作为推理后端。它是一个依赖的 BentoML 服务,可以通过 LangGraph 代理调用。

有关代码解释的更多信息,请参阅 LLM inference: vLLM

service.py

service.py 文件定义了 SearchAgentService,这是一个 BentoML 服务,它封装了 LangGraph 代理并调用 MistralService

  1. Create a Python class and decorate it with @bentoml.service, which transforms it into a BentoML Service. You can optionally set configurations like workers and concurrency.

    @bentoml.service(
        workers=2,
        resources={
            "cpu": "2000m"
        },
        traffic={
            "concurrency": 16,
            "external_queue": True
        }
    )
    class SearchAgentService:
        ...
    

    For deployment on BentoCloud, we recommend you set concurrency and enable external_queue. Concurrency refers to the number of requests the Service can handle at the same time. With external_queue enabled, if the application receives more than 16 requests simultaneously, the extra requests are placed in an external queue. They will be processed once the current ones are completed, allowing you to handle traffic spikes without dropping requests.

  2. Define the logic to call the MistralService. Use the bentoml.depends() function to invoke it, which allows SearchAgentService to utilize all its functionalities, such as calling its OpenAI-compatible API endpoints.

    from mistral import MistralService
    from langchain_openai import ChatOpenAI
    
    ...
    class SearchAgentService:
        # OpenAI compatible API
        llm_service = bentoml.depends(MistralService)
    
        def __init__(self):
            openai_api_base = f"{self.llm_service.client_url}/v1"
            self.model = ChatOpenAI(
                model="mistralai/Mistral-7B-Instruct-v0.3",
                openai_api_key="N/A",
                openai_api_base=openai_api_base,
                temperature=0,
                verbose=True,
                http_client=self.llm_service.to_sync.client,
            )
    
            # Logic to call the model, create LangGraph graph and add nodes & edge
            ...
    

    Once the Mistral Service is injected, use the ChatOpenAI API from langchain_openai to configure an interface to interact with it. Since the MistralService provides OpenAI-compatible API endpoints, you can use its HTTP client (to_sync.client) and client URL (client_url) to easily construct an OpenAI client for interaction.

    After that, define the LangGraph workflow that uses the model. The LangGraph agent will call this model and build its flow with nodes and edges, connecting the outputs of the LLM with the rest of the system. For detailed explanations of implementing LangGraph workflows, see the LangGraph documentation.

  3. Define a BentoML task endpoint invoke with @bentoml.task to handle the LangGraph workflow asynchronously. It is a background task that supports long-running operations. This ensures that complex LangGraph workflows involving external tools can complete without timing out.

    After sending the user’s query to the LangGraph agent, the task retrieves the final state and provides the results back to the user.

    # Define a task endpoint
    @bentoml.task
    async def invoke(
        self,
        input_query: str="What is the weather in San Francisco today?",
    ) -> str:
        try:
            # Invoke the LangGraph agent workflow asynchronously
            final_state = await self.app.ainvoke(
                {"messages": [HumanMessage(content=input_query)]}
            )
            # Return the final message from the workflow
            return final_state["messages"][-1].content
        # Handle errors that may occur during model invocation
        except OpenAIError as e:
            print(f"An error occurred: {e}")
            import traceback
            print(traceback.format_exc())
            return "I'm sorry, but I encountered an error while processing your request. Please try again later."
    

    Tip

    We recommend you use a task endpoint for this LangGraph agent application. This is because the LangGraph agent often uses multi-step workflows including querying an LLM and invoking external tools. Such workflows may take longer than the typical HTTP request cycle. If handled synchronously, your application could face request timeouts, especially under high traffic. BentoML task endpoints solve this problem by offloading long-running tasks to the background. You can send a query and check back later for the results, ensuring smooth inference without timeouts.

  4. Optionally, add a streaming API to send intermediate results in real time. Use @bentoml.api to turn the stream function into an API endpoint and call astream_events to stream events generated by the LangGraph agent.

    @bentoml.api
    async def stream(
        self,
        input_query: str="What is the weather in San Francisco today?",
    ) -> AsyncGenerator[str, None]:
        # Loop through the events generated by the LangGraph workflow
        async for event in self.app.astream_events(
            {"messages": [HumanMessage(content=input_query)]},
            version="v2"
        ):
            # Yield each event and stream it back
            yield str(event) + "\n"
    

    For more information about the astream_events API, see the LangGraph documentation.

bentofile.yaml

此配置文件定义了Bento的构建选项,Bento是BentoML中的统一分发格式,包含源代码、Python包、模型引用和环境设置。它有助于确保开发和生产环境之间的可重复性。

这是一个示例文件,用于 BentoLangGraph/langgraph-mistral:

service: "service:SearchAgentService"
labels:
  author: "bentoml-team"
  project: "langgraph-example"
include:
  - "*.py"
python:
  requirements_txt: "./requirements.txt"
  lock_packages: false
envs:
  # Set HF environment variable here or use BentoCloud secret
  - name: HF_TOKEN
docker:
  python_version: "3.11"

试一试

您可以在BentoCloud上运行此示例项目,或者在本地提供服务,将其容器化为符合OCI标准的镜像,并部署到任何地方。

BentoCloud

BentoCloud 提供了快速且可扩展的基础设施,用于在云端使用 BentoML 构建和扩展 AI 应用程序。

  1. 安装 BentoML 并通过 BentoML CLI 登录到 BentoCloud。如果您没有 BentoCloud 账户,可以在这里免费注册 并获得 $10 的免费信用额度。

    pip install bentoml
    bentoml cloud login
    
  2. 克隆仓库并选择要部署的项目。我们建议您创建一个BentoCloud secret 来存储所需的环境变量。

    git clone https://github.com/bentoml/BentoLangGraph.git
    
    # 使用 Mistral 7B
    cd BentoLangGraph/langgraph-mistral
    bentoml secret create huggingface HF_TOKEN=$HF_TOKEN
    bentoml deploy . --secret huggingface
    
    # 使用 Claude 3.5 Sonnet
    cd BentoLangGraph/langgraph-anthropic
    bentoml secret create anthropic ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY
    bentoml deploy . --secret anthropic
    
  3. Once it is up and running on BentoCloud, you can call the endpoint in the following ways:

    ../_static/img/use-cases/large-language-models/langgraph/langgraph-agent-on-bentocloud.png
    import bentoml
    
    with bentoml.SyncHTTPClient("<your_deployment_endpoint_url>") as client:
        result = client.invoke(
            input_query="Who won the gold medal at the men's 100 metres event at the 2024 Summer Olympic?",
        )
        print(result)
    
    curl -s -X POST \
        'https://<your_deployment_endpoint_url>/invoke' \
        -H 'Content-Type: application/json' \
        -d '{
            "input_query": "Who won the gold medal at the men's 100 metres event at the 2024 Summer Olympic?"
    }'
    
  4. 为了确保部署在某个副本范围内自动扩展,请添加扩展标志:

    bentoml deploy . --secret huggingface --scaling-min 0 --scaling-max 3 # 设置你期望的数量
    

    如果已经部署,请按以下方式更新其允许的副本:

    bentoml deployment update  --scaling-min 0 --scaling-max 3 # 设置你期望的数量
    

    更多信息,请参见如何配置并发和自动扩展

本地服务

BentoML 允许您在本地运行和测试您的代码,以便您可以使用本地计算资源快速验证您的代码。

  1. 克隆仓库并选择你需要的项目。

    git clone https://github.com/bentoml/BentoLangGraph.git
    
    # 推荐使用 Python 3.11
    
    # 使用 Mistral 7B
    cd BentoLangGraph/langgraph-mistral
    pip install -r requirements.txt
    export HF_TOKEN=
    
    # 使用 Claude 3.5 Sonnet
    cd BentoLangGraph/langgraph-anthropic
    pip install -r requirements.txt
    export ANTHROPIC_API_KEY=
    
  2. 在本地运行。

    bentoml serve .
    

    注意

    要在本地运行这个项目并使用 Mistral 7B,你需要一个至少有 16G 显存的 NVIDIA GPU。

  3. 访问或发送API请求到 http://localhost:3000

要在您自己的基础设施中进行自定义部署,请使用 BentoML 来 生成符合 OCI 标准的镜像