AgentOptimizer - 一种训练您的LLM代理的代理方式

December 23, 2023 · 7 min read

Shaokun Zhang

PhD student at the Pennsylvania State University

Jieyu Zhang

PhD student at University of Washington

Overall structure of AgentOptimizer

TL;DR: 介绍AgentOptimizer，这是一个在LLM即服务时代用于训练LLM代理的新类。 AgentOptimizer能够根据历史对话和性能，提示LLM迭代优化AutoGen代理的功能/技能。

更多信息可以在以下位置找到：

论文: https://arxiv.org/abs/2402.11359.

笔记本: https://github.com/microsoft/autogen/blob/0.2/notebook/agentchat_agentoptimizer.ipynb.

介绍

在传统的机器学习流程中，我们通过根据训练集上的损失更新模型的权重来训练模型，而在大语言模型（LLM）代理时代，我们应该如何训练一个代理呢？这里，我们朝着代理训练迈出了第一步。受到OpenAI提供的函数调用能力的启发，我们在模型权重和代理的函数/技能之间做了一个类比，并根据其在训练集上的历史表现来更新代理的函数/技能。具体来说，我们提议利用函数调用能力来将优化代理函数的行为表述为一组函数调用，以支持迭代地添加、修改和删除现有函数。我们还引入了两种策略，回滚和早停，以简化训练流程，克服训练过程中性能下降的问题。作为一种代理式的训练方法，我们的方法有助于提升代理的能力，而无需访问大语言模型的权重。

AgentOptimizer

AgentOptimizer 是一个旨在通过改进函数调用来优化代理的类。它包含三个主要方法：

record_one_conversation:

该方法记录了解问题时代理的对话历史记录和表现。它包含两个输入：conversation_history (List[Dict]) 和 is_satisfied (bool)。 conversation_history 是一个字典列表，可以从AgentChat类中的chat_messages_for_summary获取。 is_satisfied 是一个布尔值，表示用户是否对解决方案感到满意。如果为None，则要求用户输入满意度。

示例：

optimizer = AgentOptimizer(max_actions_per_step=3, llm_config = llm_config)
# ------------ code to solve a problem ------------
# ......
# -------------------------------------------------
history = assistant.chat_messages_for_summary(UserProxy)
optimizer.record_one_conversation(history, is_satisfied=result)

step():

step() 是 AgentOptimizer 的核心方法。在每次优化迭代中，它将返回两个字段 register_for_llm 和 register_for_executor，分别用于更新 assistant 和 UserProxy 代理。

register_for_llm, register_for_exector = optimizer.step()
for item in register_for_llm:
    assistant.update_function_signature(**item)
if len(register_for_exector.keys()) > 0:
    user_proxy.register_function(function_map=register_for_exector)

reset_optimizer:

该方法会将优化器重置为初始状态，这在您希望从头开始训练代理时非常有用。

AgentOptimizer 包含机制来检查 (1) 函数的有效性以及 (2) 在返回 register_for_llm、register_for_exector 之前的代码实现。此外，它还包含机制来检查每次更新是否可行，例如避免移除由于幻觉而不在当前函数中的函数。

优化过程的伪代码

优化过程如下：

optimizer = AgentOptimizer(max_actions_per_step=3, llm_config = llm_config)
for i in range(EPOCH):
    is_correct = user_proxy.initiate_chat(assistant, message = problem)
    history = assistant.chat_messages_for_summary(user_proxy)
    optimizer.record_one_conversation(history, is_satisfied=is_correct)
    register_for_llm, register_for_exector = optimizer.step()
    for item in register_for_llm:
        assistant.update_function_signature(**item)
    if len(register_for_exector.keys()) > 0:
        user_proxy.register_function(function_map=register_for_exector)

给定一个准备好的训练数据集，代理程序会迭代地解决训练集中的问题，以获得对话历史和统计信息。然后使用AgentOptimizer改进这些函数。每次迭代可以被视为一个类似于传统机器学习的训练步骤，优化元素是代理程序所拥有的函数。经过EPOCH次迭代后，代理程序预计会获得更好的函数，这些函数可用于未来的任务。

AgentOptimizer背后的实现技术

为了从AgentOptimizer获得稳定且结构化的函数签名和代码实现，我们利用OpenAI提供的函数调用功能，将操作函数的行为制定为一组函数调用。具体来说，我们引入了三个函数调用来在每个步骤中操作当前的函数：add_function、remove_function和revise_function。这些调用分别在现有函数列表中添加、删除和修改函数。这种做法可以充分利用GPT-4的函数调用能力，并输出具有更稳定签名和代码实现的结构化函数。以下是这些函数调用的JSON模式：

add_function: 添加一个新函数，可以在未来的任务中使用。

ADD_FUNC = {
    "type": "function",
    "function": {
        "name": "add_function",
        "description": "Add a function in the context of the conversation. Necessary Python packages must be declared. The name of the function MUST be the same with the function name in the code you generated.",
        "parameters": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "description": "The name of the function in the code implementation."},
                "description": {"type": "string", "description": "A short description of the function."},
                "arguments": {
                    "type": "string",
                    "description": 'JSON schema of arguments encoded as a string. Please note that the JSON schema only supports specific types including string, integer, object, array, boolean. (do not have float type) For example: { "url": { "type": "string", "description": "The URL", }}. Please avoid the error \'array schema missing items\' when using array type.',
                },
                "packages": {
                    "type": "string",
                    "description": "A list of package names imported by the function, and that need to be installed with pip prior to invoking the function. This solves ModuleNotFoundError. It should be string, not list.",
                },
                "code": {
                    "type": "string",
                    "description": "The implementation in Python. Do not include the function declaration.",
                },
            },
            "required": ["name", "description", "arguments", "packages", "code"],
        },
    },
}

revise_function: 根据对话历史记录和性能，在当前函数列表中修订一个现有的函数（代码实现，函数签名）。

REVISE_FUNC = {
    "type": "function",
    "function": {
        "name": "revise_function",
        "description": "Revise a function in the context of the conversation. Necessary Python packages must be declared. The name of the function MUST be the same with the function name in the code you generated.",
        "parameters": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "description": "The name of the function in the code implementation."},
                "description": {"type": "string", "description": "A short description of the function."},
                "arguments": {
                    "type": "string",
                    "description": 'JSON schema of arguments encoded as a string. Please note that the JSON schema only supports specific types including string, integer, object, array, boolean. (do not have float type) For example: { "url": { "type": "string", "description": "The URL", }}. Please avoid the error \'array schema missing items\' when using array type.',
                },
                "packages": {
                    "type": "string",
                    "description": "A list of package names imported by the function, and that need to be installed with pip prior to invoking the function. This solves ModuleNotFoundError. It should be string, not list.",
                },
                "code": {
                    "type": "string",
                    "description": "The implementation in Python. Do not include the function declaration.",
                },
            },
            "required": ["name", "description", "arguments", "packages", "code"],
        },
    },
}

remove_function: 从当前功能列表中移除一个现有的功能。它用于移除在未来任务中没有用处（冗余）的功能。

REMOVE_FUNC = {
    "type": "function",
    "function": {
        "name": "remove_function",
        "description": "Remove one function in the context of the conversation. Once remove one function, the assistant will not use this function in future conversation.",
        "parameters": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "description": "The name of the function in the code implementation."}
            },
            "required": ["name"],
        },
    },
}

限制与未来工作

目前，它仅支持优化一个典型的user_proxy和assistant代理对。在未来的工作中，我们将使此功能更加通用，以支持其他类型的代理。
当前实现的AgentOptimizer仅对OpenAI GPT-4模型有效。将此功能/概念扩展到其他LLMs是下一步。

介绍​

AgentOptimizer​

优化过程的伪代码​

AgentOptimizer背后的实现技术​

限制与未来工作​

介绍

AgentOptimizer

优化过程的伪代码

AgentOptimizer背后的实现技术

限制与未来工作