使用AutoGen的检索增强生成（RAG）应用

October 18, 2023 · 10 min read

李江

Senior Software Engineer at Microsoft

最后更新：2024年9月23日；AutoGen版本：v0.2.35

RAG Architecture

简要说明：

我们介绍了RetrieveUserProxyAgent，这是AutoGen中的RAG代理，允许进行检索增强生成，以及其基本用法。
我们展示了RAG代理的自定义功能，例如自定义嵌入函数、文本分割函数和向量数据库。
我们还展示了RAG代理的两个高级用法，包括与群聊集成以及使用Gradio构建聊天应用。

介绍

检索增强已经成为一种通过融入外部文件来缓解LLMs内在局限性的实用且有效的方法。在这篇博客文章中，我们介绍了AutoGen的RAG代理，它允许检索增强生成。该系统由两个代理组成：一个检索增强的用户代理代理，称为RetrieveUserProxyAgent，和一个助手代理，称为RetrieveAssistantAgent；RetrieveUserProxyAgent是从AutoGen内置代理扩展而来，而RetrieveAssistantAgent可以是任何配置了LLM的对话代理。RAG代理的整体架构如上图所示。

要使用检索增强聊天功能，需要初始化两个代理，包括检索增强用户代理和检索增强助手。初始化检索增强用户代理需要指定文档集合的路径。随后，检索增强用户代理可以下载文档，将其分割成特定大小的块，计算嵌入，并将其存储在向量数据库中。一旦聊天开始，代理们将按照以下程序协作进行代码生成或问答：

检索增强的用户代理根据嵌入相似度检索文档块，并将它们与问题一起发送给检索增强的助手。
Retrieval-Augmented Assistant 使用 LLM 根据提供的问题和上下文生成代码或文本作为答案。如果 LLM 无法生成满意的响应，则会指示其向 Retrieval-Augmented User Proxy 回复“更新上下文”。
如果响应中包含代码块，Retrieval-Augmented User Proxy 会执行代码并将输出作为反馈发送。如果没有代码块或更新上下文的指令，它将终止对话。否则，它将更新上下文并将问题与新上下文一起转发给 Retrieval-Augmented Assistant。请注意，如果启用了人工输入请求，个人可以主动发送任何反馈，包括“更新上下文”，给 Retrieval-Augmented Assistant。
如果检索增强助手收到“更新上下文”的指令，它会从检索增强用户代理请求下一个最相似的文档块作为新的上下文。否则，它会根据反馈和聊天记录生成新的代码或文本。如果LLM未能生成答案，它将再次回复“更新上下文”。这个过程可以重复多次。如果上下文中没有更多的文档可用，会话将终止。

RAG代理的基本使用

安装依赖项

在使用RAG代理之前，请先使用[retrievechat]选项安装autogen-agentchat。

pip install "autogen-agentchat[retrievechat]~=0.2"

如果您看到类似 #3551 的问题，您需要安装 chromadb<=0.5.0。

RetrieveChat 可以处理各种类型的文档。默认情况下，它可以处理纯文本和PDF文件，包括格式如 'txt', 'json', 'csv', 'tsv', 'md', 'html', 'htm', 'rtf', 'rst', 'jsonl', 'log', 'xml', 'yaml', 'yml' 和 'pdf'。如果你安装 unstructured，还将支持其他文档类型，如 'docx', 'doc', 'odt', 'pptx', 'ppt', 'xlsx', 'eml', 'msg', 'epub'。

在ubuntu中安装unstructured

sudo apt-get update
sudo apt-get install -y tesseract-ocr poppler-utils
pip install unstructured[all-docs]

您可以通过使用 autogen.retrieve_utils.TEXT_FORMATS 找到所有支持的文档类型的列表。

导入代理

import autogen
from autogen import AssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent

创建一个名为“assistant”的'AssistantAgent'实例和一个名为“ragproxyagent”的'RetrieveUserProxyAgent'实例

请参考 doc 以获取关于详细配置的更多信息。

assistant = AssistantAgent(
    name="assistant",
    system_message="You are a helpful assistant.",
    llm_config=llm_config,
)

ragproxyagent = RetrieveUserProxyAgent(
    name="ragproxyagent",
    retrieve_config={
        "task": "qa",
        "docs_path": "https://raw.githubusercontent.com/microsoft/autogen/main/README.md",
    },
)

初始化聊天并提问

assistant.reset()
ragproxyagent.initiate_chat(assistant, message=ragproxyagent.message_generator, problem="What is autogen?")

输出如下：

--------------------------------------------------------------------------------
assistant (to ragproxyagent):

AutoGen is a framework that enables the development of large language model (LLM) applications using multiple agents that can converse with each other to solve tasks. The agents are customizable, conversable, and allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.

--------------------------------------------------------------------------------

创建一个UserProxyAgent并询问相同的问题

assistant.reset()
userproxyagent = autogen.UserProxyAgent(name="userproxyagent")
userproxyagent.initiate_chat(assistant, message="What is autogen?")

输出如下：

--------------------------------------------------------------------------------
assistant (to userproxyagent):

In computer software, autogen is a tool that generates program code automatically, without the need for manual coding. It is commonly used in fields such as software engineering, game development, and web development to speed up the development process and reduce errors. Autogen tools typically use pre-programmed rules, templates, and data to create code for repetitive tasks, such as generating user interfaces, database schemas, and data models. Some popular autogen tools include Visual Studio's Code Generator and Unity's Asset Store.

--------------------------------------------------------------------------------

你可以看到UserProxyAgent的输出与我们的autogen无关，因为autogen的最新信息不在ChatGPT的训练数据中。RetrieveUserProxyAgent的输出是正确的，因为它可以根据给定的文档文件执行检索增强生成。

自定义RAG代理

RetrieveUserProxyAgent 可以通过 retrieve_config 进行自定义。根据不同的使用场景，有多个参数可以配置。在本节中，我们将展示如何自定义嵌入函数、文本分割函数和向量数据库。

自定义嵌入函数

默认情况下，Sentence Transformers及其预训练模型将用于计算嵌入。您可能希望使用OpenAI、Cohere、HuggingFace或其他嵌入函数。

OpenAI

from chromadb.utils import embedding_functions

openai_ef = embedding_functions.OpenAIEmbeddingFunction(
                api_key="YOUR_API_KEY",
                model_name="text-embedding-ada-002"
            )

ragproxyagent = RetrieveUserProxyAgent(
    name="ragproxyagent",
    retrieve_config={
        "task": "qa",
        "docs_path": "https://raw.githubusercontent.com/microsoft/autogen/main/README.md",
        "embedding_function": openai_ef,
    },
)

HuggingFace

huggingface_ef = embedding_functions.HuggingFaceEmbeddingFunction(
    api_key="YOUR_API_KEY",
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

更多示例可以在这里找到。

自定义文本分割函数

在我们将文档存储到向量数据库之前，需要将文本分割成块。尽管我们在autogen中实现了一个灵活的文本分割器，你可能仍然希望使用不同的文本分割器。此外，还有一些现有的文本分割工具也非常适合重用。

例如，你可以使用langchain中的所有文本分割器。

from langchain.text_splitter import RecursiveCharacterTextSplitter

recur_spliter = RecursiveCharacterTextSplitter(separators=["\n", "\r", "\t"])

ragproxyagent = RetrieveUserProxyAgent(
    name="ragproxyagent",
    retrieve_config={
        "task": "qa",
        "docs_path": "https://raw.githubusercontent.com/microsoft/autogen/main/README.md",
        "custom_text_split_function": recur_spliter.split_text,
    },
)

自定义向量数据库

我们使用chromadb作为默认的向量数据库，你也可以通过简单地设置vector_db在retrieve_config中分别为mongodb、pgvector、qdrant和couchbase来使用mongodb、pgvectordb、qdrantdb和couchbase。

要插件任何其他数据库，你也可以扩展类 agentchat.contrib.vectordb.base，查看代码这里。

RAG代理的高级用法

与其他代理在群聊中集成

在群聊中使用RetrieveUserProxyAgent与在双代理聊天中使用几乎相同。唯一需要注意的是，你需要用RetrieveUserProxyAgent初始化聊天。在群聊中，RetrieveAssistantAgent不是必需的。

然而，在某些情况下，您可能希望使用另一个代理初始化聊天。为了充分利用RetrieveUserProxyAgent，您需要从一个函数中调用它。

boss = autogen.UserProxyAgent(
    name="Boss",
    is_termination_msg=termination_msg,
    human_input_mode="TERMINATE",
    system_message="The boss who ask questions and give tasks.",
)

boss_aid = RetrieveUserProxyAgent(
    name="Boss_Assistant",
    is_termination_msg=termination_msg,
    system_message="Assistant who has extra content retrieval power for solving difficult problems.",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=3,
    retrieve_config={
        "task": "qa",
    },
    code_execution_config=False,  # we don't want to execute code in this case.
)

coder = autogen.AssistantAgent(
    name="Senior_Python_Engineer",
    is_termination_msg=termination_msg,
    system_message="You are a senior python engineer. Reply `TERMINATE` in the end when everything is done.",
    llm_config={"config_list": config_list, "timeout": 60, "temperature": 0},
)

pm = autogen.AssistantAgent(
    name="Product_Manager",
    is_termination_msg=termination_msg,
    system_message="You are a product manager. Reply `TERMINATE` in the end when everything is done.",
    llm_config={"config_list": config_list, "timeout": 60, "temperature": 0},
)

reviewer = autogen.AssistantAgent(
    name="Code_Reviewer",
    is_termination_msg=termination_msg,
    system_message="You are a code reviewer. Reply `TERMINATE` in the end when everything is done.",
    llm_config={"config_list": config_list, "timeout": 60, "temperature": 0},
)

def retrieve_content(
    message: Annotated[
        str,
        "Refined message which keeps the original meaning and can be used to retrieve content for code generation and question answering.",
    ],
    n_results: Annotated[int, "number of results"] = 3,
) -> str:
    boss_aid.n_results = n_results  # Set the number of results to be retrieved.
    _context = {"problem": message, "n_results": n_results}
    ret_msg = boss_aid.message_generator(boss_aid, None, _context)
    return ret_msg or message

for caller in [pm, coder, reviewer]:
    d_retrieve_content = caller.register_for_llm(
        description="retrieve content for code generation and question answering.", api_style="function"
    )(retrieve_content)

for executor in [boss, pm]:
    executor.register_for_execution()(d_retrieve_content)

groupchat = autogen.GroupChat(
    agents=[boss, pm, coder, reviewer],
    messages=[],
    max_round=12,
    speaker_selection_method="round_robin",
    allow_repeat_speaker=False,
)

llm_config = {"config_list": config_list, "timeout": 60, "temperature": 0}
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

# Start chatting with the boss as this is the user proxy agent.
boss.initiate_chat(
    manager,
    message="How to use spark for parallel training in FLAML? Give me sample code.",
)

使用Gradio构建一个聊天应用

现在，让我们总结一下，并使用AutoGen和Gradi来制作一个聊天应用。

RAG ChatBot with AutoGen

# Initialize Agents
def initialize_agents(config_list, docs_path=None):
    ...
    return assistant, ragproxyagent

# Initialize Chat
def initiate_chat(config_list, problem, queue, n_results=3):
    ...
    assistant.reset()
    try:
        ragproxyagent.a_initiate_chat(
            assistant, problem=problem, silent=False, n_results=n_results
        )
        messages = ragproxyagent.chat_messages
        messages = [messages[k] for k in messages.keys()][0]
        messages = [m["content"] for m in messages if m["role"] == "user"]
        print("messages: ", messages)
    except Exception as e:
        messages = [str(e)]
    queue.put(messages)

# Wrap AutoGen part into a function
def chatbot_reply(input_text):
    """Chat with the agent through terminal."""
    queue = mp.Queue()
    process = mp.Process(
        target=initiate_chat,
        args=(config_list, input_text, queue),
    )
    process.start()
    try:
        messages = queue.get(timeout=TIMEOUT)
    except Exception as e:
        messages = [str(e) if len(str(e)) > 0 else "Invalid Request to OpenAI, please check your API keys."]
    finally:
        try:
            process.terminate()
        except:
            pass
    return messages

...

# Set up UI with Gradio
with gr.Blocks() as demo:
    ...
    assistant, ragproxyagent = initialize_agents(config_list)

    chatbot = gr.Chatbot(
        [],
        elem_id="chatbot",
        bubble_full_width=False,
        avatar_images=(None, (os.path.join(os.path.dirname(__file__), "autogen.png"))),
        # height=600,
    )

    txt_input = gr.Textbox(
        scale=4,
        show_label=False,
        placeholder="Enter text and press enter",
        container=False,
    )

    with gr.Row():
        txt_model = gr.Dropdown(
            label="Model",
            choices=[
                "gpt-4",
                "gpt-35-turbo",
                "gpt-3.5-turbo",
            ],
            allow_custom_value=True,
            value="gpt-35-turbo",
            container=True,
        )
        txt_oai_key = gr.Textbox(
            label="OpenAI API Key",
            placeholder="Enter key and press enter",
            max_lines=1,
            show_label=True,
            value=os.environ.get("OPENAI_API_KEY", ""),
            container=True,
            type="password",
        )
        ...

    clear = gr.ClearButton([txt_input, chatbot])

...

if __name__ == "__main__":
    demo.launch(share=True)

在线应用和源代码托管在HuggingFace。欢迎随时尝试！

介绍​

RAG代理的基本使用​

自定义RAG代理​

自定义嵌入函数​

自定义文本分割函数​

自定义向量数据库

RAG代理的高级用法​

与其他代理在群聊中集成​

使用Gradio构建一个聊天应用​

阅读更多​

介绍