为LLM代理构建用户界面

Gradio 聊天机器人可以原生显示中间思考和工具使用情况。这使得它非常适合为LLM代理和思维链（CoT）演示创建用户界面。本指南将向您展示如何操作。

元数据键

除了content和role键之外，消息字典还接受一个metadata键。目前，metadata键接受一个包含单个键title的字典。如果您为消息指定了title，它将在可折叠的框中显示。

这是一个示例，我们展示了代理使用天气API工具来回答用户查询的思考过程。

with gr.Blocks() as demo:
    chatbot  = gr.Chatbot(type="messages",
            value=[{"role": "user", "content": "What is the weather in San Francisco?"},
                    {"role": "assistant", "content": "I need to use the weather API tool",
                    "metadata": {"title":  "🧠 Thinking"}}]
            )

simple-metadat-chatbot

使用代理构建

使用transformers.agents的实际示例

我们将创建一个简单的Gradio应用程序代理，该代理可以访问文本到图像工具。

提示： 请确保您首先阅读了transformers代理的[文档](https://huggingface.co/docs/transformers/en/agents)

我们将从transformers和gradio中导入必要的类开始。

import gradio as gr
from gradio import ChatMessage
from transformers import Tool, ReactCodeAgent  # type: ignore
from transformers.agents import stream_to_gradio, HfApiEngine  # type: ignore

# Import tool from Hub
image_generation_tool = Tool.from_space(
    space_id="black-forest-labs/FLUX.1-schnell",
    name="image_generator",
    description="Generates an image following your prompt. Returns a PIL Image.",
    api_name="/infer",
)

llm_engine = HfApiEngine("Qwen/Qwen2.5-Coder-32B-Instruct")
# Initialize the agent with both tools and engine
agent = ReactCodeAgent(tools=[image_generation_tool], llm_engine=llm_engine)

然后我们将构建用户界面：

def interact_with_agent(prompt, history):
    messages = []
    yield messages
    for msg in stream_to_gradio(agent, prompt):
        messages.append(asdict(msg))
        yield messages
    yield messages


demo = gr.ChatInterface(
    interact_with_agent,
    chatbot= gr.Chatbot(
        label="Agent",
        type="messages",
        avatar_images=(
            None,
            "https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
        ),
    ),
    examples=[
        ["Generate an image of an astronaut riding an alligator"],
        ["I am writing a children's book for my daughter. Can you help me with some illustrations?"],
    ],
    type="messages",
)

你可以查看完整的演示代码这里.

transformers_agent_code

使用langchain代理的真实示例

我们将为具有搜索引擎访问权限的langchain代理创建一个用户界面。

我们将从导入和设置langchain代理开始。请注意，您需要一个包含以下环境变量的.env文件 -

SERPAPI_API_KEY=
HF_TOKEN=
OPENAI_API_KEY=

from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent, load_tools
from langchain_openai import ChatOpenAI
from gradio import ChatMessage
import gradio as gr

from dotenv import load_dotenv

load_dotenv()

model = ChatOpenAI(temperature=0, streaming=True)

tools = load_tools(["serpapi"])

# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/openai-tools-agent")
# print(prompt.messages) -- to see the prompt
agent = create_openai_tools_agent(
    model.with_config({"tags": ["agent_llm"]}), tools, prompt
)
agent_executor = AgentExecutor(agent=agent, tools=tools).with_config(
    {"run_name": "Agent"}
)

然后我们将创建Gradio用户界面

async def interact_with_langchain_agent(prompt, messages):
    messages.append(ChatMessage(role="user", content=prompt))
    yield messages
    async for chunk in agent_executor.astream(
        {"input": prompt}
    ):
        if "steps" in chunk:
            for step in chunk["steps"]:
                messages.append(ChatMessage(role="assistant", content=step.action.log,
                                  metadata={"title": f"🛠️ Used tool {step.action.tool}"}))
                yield messages
        if "output" in chunk:
            messages.append(ChatMessage(role="assistant", content=chunk["output"]))
            yield messages


with gr.Blocks() as demo:
    gr.Markdown("# Chat with a LangChain Agent 🦜⛓️ and see its thoughts 💭")
    chatbot = gr.Chatbot(
        type="messages",
        label="Agent",
        avatar_images=(
            None,
            "https://em-content.zobj.net/source/twitter/141/parrot_1f99c.png",
        ),
    )
    input = gr.Textbox(lines=1, label="Chat Message")
    input.submit(interact_with_langchain_agent, [input_2, chatbot_2], [chatbot_2])

demo.launch()

langchain_agent_code

就是这样！查看我们完成的langchain演示这里。

使用Visibly Thinking LLMs构建

Gradio 聊天机器人可以原生显示思考中的 LLM 的中间想法。这使得它非常适合创建展示 AI 模型在生成响应时如何“思考”的 UI。下面的指南将向您展示如何构建一个实时显示 Gemini AI 思维过程的聊天机器人。

使用Gemini 2.0 Flash Thinking API的实际示例

让我们创建一个完整的聊天机器人，实时显示其思考和响应。我们将使用Google的Gemini API来访问Gemini 2.0 Flash Thinking LLM，并使用Gradio作为用户界面。

我们将从导入和设置gemini客户端开始。请注意，您需要获取Google Gemini API密钥首先 -

import gradio as gr
from gradio import ChatMessage
from typing import Iterator
import google.generativeai as genai

genai.configure(api_key="your-gemini-api-key")
model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-1219")

首先，让我们设置处理模型输出的流式函数：

def stream_gemini_response(user_message: str, messages: list) -> Iterator[list]:
    """
    Streams both thoughts and responses from the Gemini model.
    """
    # Initialize response from Gemini
    response = model.generate_content(user_message, stream=True)
    
    # Initialize buffers
    thought_buffer = ""
    response_buffer = ""
    thinking_complete = False
    
    # Add initial thinking message
    messages.append(
        ChatMessage(
            role="assistant",
            content="",
            metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
        )
    )
    
    for chunk in response:
        parts = chunk.candidates[0].content.parts
        current_chunk = parts[0].text
        
        if len(parts) == 2 and not thinking_complete:
            # Complete thought and start response
            thought_buffer += current_chunk
            messages[-1] = ChatMessage(
                role="assistant",
                content=thought_buffer,
                metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
            )
            
            # Add response message
            messages.append(
                ChatMessage(
                    role="assistant",
                    content=parts[1].text
                )
            )
            thinking_complete = True
            
        elif thinking_complete:
            # Continue streaming response
            response_buffer += current_chunk
            messages[-1] = ChatMessage(
                role="assistant",
                content=response_buffer
            )
            
        else:
            # Continue streaming thoughts
            thought_buffer += current_chunk
            messages[-1] = ChatMessage(
                role="assistant",
                content=thought_buffer,
                metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
            )
        
        yield messages

然后，让我们创建Gradio界面：

with gr.Blocks() as demo:
    gr.Markdown("# Chat with Gemini 2.0 Flash and See its Thoughts 💭")
    
    chatbot = gr.Chatbot(
        type="messages",
        label="Gemini2.0 'Thinking' Chatbot",
        render_markdown=True,
    )
    
    input_box = gr.Textbox(
        lines=1,
        label="Chat Message",
        placeholder="Type your message here and press Enter..."
    )
    
    # Set up event handlers
    msg_store = gr.State("")  # Store for preserving user message
    
    input_box.submit(
        lambda msg: (msg, msg, ""),  # Store message and clear input
        inputs=[input_box],
        outputs=[msg_store, input_box, input_box],
        queue=False
    ).then(
        user_message,  # Add user message to chat
        inputs=[msg_store, chatbot],
        outputs=[input_box, chatbot],
        queue=False
    ).then(
        stream_gemini_response,  # Generate and stream response
        inputs=[msg_store, chatbot],
        outputs=chatbot
    )

demo.launch()

这将创建一个聊天机器人，它：

在可折叠部分显示模型的思考过程
实时流式传输想法和最终响应
保持干净的聊天记录

就是这样！你现在拥有一个不仅能响应用户，还能展示其思考过程的聊天机器人，从而创造更加透明和吸引人的互动。查看我们完成的Gemini 2.0 Flash Thinking演示这里。