在使用演讲者选择时使用Transform消息
在群聊中使用“自动”模式选择发言者时,使用嵌套聊天来确定下一位发言者。此嵌套聊天包括群聊中的所有消息,这可能会导致大量内容,LLM需要处理这些内容以确定下一位发言者。随着对话的进行,将上下文长度保持在LLM的工作窗口内可能会变得具有挑战性。此外,减少总体标记数量将提高推理时间并减少标记成本。
使用Transform Messages,您可以控制哪些消息用于说话者选择,以及每条消息以及整体的上下文长度。
所有可用于Transform Messages的转换都可以应用于发言人选择的嵌套聊天中,例如MessageHistoryLimiter
、MessageTokenLimiter
和TextMessageCompressor
。
如何应用它们
实例化您的GroupChat
对象时,您只需将
TransformMessages
对象分配给select_speaker_transform_messages
参数,其中的转换将应用于嵌套的发言者选择聊天。
并且,当您传入一个TransformMessages
对象时,可以对该嵌套聊天应用多种转换。
作为嵌套聊天的一部分,一个名为‘checking_agent’的代理用于指导LLM选择下一个发言者。最好避免压缩或截断来自该代理的内容。如何在倒数第二个示例中完成此操作所示。
创建用于GroupChat中发言人选择的转换
我们将逐步创建一个TransformMessage
对象,以展示如何为说话者选择构建转换。
每次迭代都将替换前一次,使您能够按原样使用每个单元格中的代码。
重要的是,转换按照它们在转换列表中的顺序应用。
# Start by importing the transform capabilities
import autogen
from autogen.agentchat.contrib.capabilities import transform_messages, transforms
# Limit the number of messages
# Let's start by limiting the number of messages to consider for speaker selection using a
# MessageHistoryLimiter transform. This example will use the latest 10 messages.
select_speaker_transforms = transform_messages.TransformMessages(
transforms=[
transforms.MessageHistoryLimiter(max_messages=10),
]
)
# Compress messages through an LLM
# An interesting and very powerful method of reducing tokens is by "compressing" the text of
# a message by using an LLM that's specifically designed to do that. The default LLM used for
# this purpose is LLMLingua (https://github.com/microsoft/LLMLingua) and it aims to reduce the
# number of tokens without reducing the message's meaning. We use the TextMessageCompressor
# transform to compress messages.
# There are multiple LLMLingua models available and it defaults to the first version, LLMLingua.
# This example will show how to use LongLLMLingua which is targeted towards long-context
# information processing. LLMLingua-2 has been released and you could use that as well.
# Create the compression arguments, which allow us to specify the model and other related
# parameters, such as whether to use the CPU or GPU.
select_speaker_compression_args = dict(
model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank", use_llmlingua2=True, device_map="cpu"
)
# Now we can add the TextMessageCompressor as the second step
# Important notes on the parameters used:
# min_tokens - will only apply text compression if the message has at least 1,000 tokens
# cache - enables caching, if a message has previously been compressed it will use the
# cached version instead of recompressing it (making it much faster)
# filter_dict - to minimise the chance of compressing key information, we can include or
# exclude messages based on role and name.
# Here, we are excluding any 'system' messages as well as any messages from
# 'ceo' (just for example) and the 'checking_agent', which is an agent in the
# nested chat speaker selection chat. Change the 'ceo' name or add additional
# agent names for any agents that have critical content.
# exclude_filter - As we are setting this to True, the filter will be an exclusion filter.
# Import the cache functionality
from autogen.cache.in_memory_cache import InMemoryCache
select_speaker_transforms = transform_messages.TransformMessages(
transforms=[
transforms.MessageHistoryLimiter(max_messages=10),
transforms.TextMessageCompressor(
min_tokens=1000,
text_compressor=transforms.LLMLingua(select_speaker_compression_args, structured_compression=True),
cache=InMemoryCache(seed=43),
filter_dict={"role": ["system"], "name": ["ceo", "checking_agent"]},
exclude_filter=True,
),
]
)
# Limit the total number of tokens and tokens per message
# As a final example, we can manage the total tokens and individual message tokens. We have added a
# MessageTokenLimiter transform that will limit the total number of tokens for the messages to
# 3,000 with a maximum of 500 per individual message. Additionally, if a message is less than 300
# tokens it will not be truncated.
select_speaker_compression_args = dict(
model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank", use_llmlingua2=True, device_map="cpu"
)
select_speaker_transforms = transform_messages.TransformMessages(
transforms=[
transforms.MessageHistoryLimiter(max_messages=10),
transforms.TextMessageCompressor(
min_tokens=1000,
text_compressor=transforms.LLMLingua(select_speaker_compression_args, structured_compression=True),
cache=InMemoryCache(seed=43),
filter_dict={"role": ["system"], "name": ["ceo", "checking_agent"]},
exclude_filter=True,
),
transforms.MessageTokenLimiter(max_tokens=3000, max_tokens_per_message=500, min_tokens=300),
]
)
# Now, we apply the transforms to a group chat. We do this by assigning the message
# transforms from above to the `select_speaker_transform_messages` parameter on the GroupChat.
import os
llm_config = {
"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}],
}
# Define your agents
chief_executive_officer = autogen.ConversableAgent(
"ceo",
llm_config=llm_config,
max_consecutive_auto_reply=1,
system_message="You are leading this group chat, and the business, as the chief executive officer.",
)
general_manager = autogen.ConversableAgent(
"gm",
llm_config=llm_config,
max_consecutive_auto_reply=1,
system_message="You are the general manager of the business, running the day-to-day operations.",
)
financial_controller = autogen.ConversableAgent(
"fin_controller",
llm_config=llm_config,
max_consecutive_auto_reply=1,
system_message="You are the financial controller, ensuring all financial matters are managed accordingly.",
)
your_group_chat = autogen.GroupChat(
agents=[chief_executive_officer, general_manager, financial_controller],
select_speaker_transform_messages=select_speaker_transforms,
)