跳转到内容

简单可组合内存

注意:此内存示例已弃用,推荐使用更新且更灵活的 Memory 类。请参阅最新文档

在本笔记本中,我们将演示如何向智能体注入多个记忆源。具体来说,我们使用SimpleComposableMemory,它包含一个primary_memory以及可能多个次要记忆源(存储在secondary_memory_sources中)。主要区别在于primary_memory将用作智能体的主要聊天缓冲区,而从secondary_memory_sources检索到的任何消息将仅注入到系统提示消息中。

在需要同时使用长期记忆(例如 VectorMemory)和默认 ChatMemoryBuffer 的情况下,多种记忆源可能很有用。您将在本笔记本中看到,通过 SimpleComposableMemory 您将能够有效地将所需消息从长期记忆“加载”到主内存(即 ChatMemoryBuffer)中。

SimpleComposableMemory 如何工作?

Section titled “How SimpleComposableMemory Works?”

我们从 SimpleComposableMemory 的基本用法开始。这里我们构建了一个 VectorMemory 以及一个默认的 ChatMemoryBufferVectorMemory 将作为我们的次要记忆源,而 ChatMemoryBuffer 将作为主要记忆源。要实例化一个 SimpleComposableMemory 对象,我们需要提供一个 primary_memory 以及(可选)一个 secondary_memory_sources 列表。

SimpleComposableMemoryIllustration

import os
os.environ["OPENAI_API_KEY"] = "sk-..."
from llama_index.core.memory import (
VectorMemory,
SimpleComposableMemory,
ChatMemoryBuffer,
)
from llama_index.core.llms import ChatMessage
from llama_index.embeddings.openai import OpenAIEmbedding
vector_memory = VectorMemory.from_defaults(
vector_store=None, # leave as None to use default in-memory vector store
embed_model=OpenAIEmbedding(),
retriever_kwargs={"similarity_top_k": 1},
)
# let's set some initial messages in our secondary vector memory
msgs = [
ChatMessage.from_str("You are a SOMEWHAT helpful assistant.", "system"),
ChatMessage.from_str("Bob likes burgers.", "user"),
ChatMessage.from_str("Indeed, Bob likes apples.", "assistant"),
ChatMessage.from_str("Alice likes apples.", "user"),
]
vector_memory.set(msgs)
chat_memory_buffer = ChatMemoryBuffer.from_defaults()
composable_memory = SimpleComposableMemory.from_defaults(
primary_memory=chat_memory_buffer,
secondary_memory_sources=[vector_memory],
)
composable_memory.primary_memory
ChatMemoryBuffer(chat_store=SimpleChatStore(store={}), chat_store_key='chat_history', token_limit=3000, tokenizer_fn=functools.partial(<bound method Encoding.encode of <Encoding 'cl100k_base'>>, allowed_special='all'))
composable_memory.secondary_memory_sources
[VectorMemory(vector_index=<llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x11a2d24b0>, retriever_kwargs={'similarity_top_k': 1}, batch_by_user_message=True, cur_batch_textnode=TextNode(id_='97f800fe-1988-44d8-a6dc-7a07bfd30f8e', embedding=None, metadata={'sub_dicts': [{'role': <MessageRole.USER: 'user'>, 'additional_kwargs': {}, 'blocks': [{'block_type': 'text', 'text': 'Alice likes apples.'}], 'content': 'Alice likes apples.'}]}, excluded_embed_metadata_keys=['sub_dicts'], excluded_llm_metadata_keys=['sub_dicts'], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text='Alice likes apples.', mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'))]

由于 SimpleComposableMemory 本身是 BaseMemory 的子类,我们以与其他记忆模块相同的方式向其添加消息。请注意,对于 SimpleComposableMemory,调用 .put() 实际上会调用所有记忆源上的 .put()。换句话说,消息会被添加到 primarysecondary 源中。

msgs = [
ChatMessage.from_str("You are a REALLY helpful assistant.", "system"),
ChatMessage.from_str("Jerry likes juice.", "user"),
]
# load into all memory sources modules"
for m in msgs:
composable_memory.put(m)

当调用 .get() 时,我们同样执行 primary 内存的所有 .get() 方法以及所有 secondary 源。这给我们留下了一系列消息列表,我们必须将其"组合"成合理的一组消息(传递给下游的智能体)。通常需要特别小心,以确保最终的消息序列既合理又符合LLM提供商的聊天API。

对于 SimpleComposableMemory,我们将来自 secondary 源的消息注入到 primary 记忆的系统消息中primary 源的其余消息历史保持不变,这种组合即是最终返回的内容。

msgs = composable_memory.get("What does Bob like?")
msgs
[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='You are a REALLY helpful assistant.\n\nBelow are a set of relevant dialogues retrieved from potentially several memory sources:\n\n=====Relevant messages from memory source 1=====\n\n\tUSER: Bob likes burgers.\n\tASSISTANT: Indeed, Bob likes apples.\n\n=====End of relevant messages from memory source 1======\n\nThis is the end of the retrieved message dialogues.')]),
ChatMessage(role=<MessageRole.USER: 'user'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='Jerry likes juice.')])]
# see the memory injected into the system message of the primary memory
print(msgs[0])
system: You are a REALLY helpful assistant.
Below are a set of relevant dialogues retrieved from potentially several memory sources:
=====Relevant messages from memory source 1=====
USER: Bob likes burgers.
ASSISTANT: Indeed, Bob likes apples.
=====End of relevant messages from memory source 1======
This is the end of the retrieved message dialogues.

连续调用 get() 将直接替换系统提示中已加载的 secondary 记忆消息。

msgs = composable_memory.get("What does Alice like?")
msgs
[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='You are a REALLY helpful assistant.\n\nBelow are a set of relevant dialogues retrieved from potentially several memory sources:\n\n=====Relevant messages from memory source 1=====\n\n\tUSER: Alice likes apples.\n\n=====End of relevant messages from memory source 1======\n\nThis is the end of the retrieved message dialogues.')]),
ChatMessage(role=<MessageRole.USER: 'user'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='Jerry likes juice.')])]
# see the memory injected into the system message of the primary memory
print(msgs[0])
system: You are a REALLY helpful assistant.
Below are a set of relevant dialogues retrieved from potentially several memory sources:
=====Relevant messages from memory source 1=====
USER: Alice likes apples.
=====End of relevant messages from memory source 1======
This is the end of the retrieved message dialogues.

如果 get() 检索到 secondary 中已存在于 primary 内存中的消息会怎样?

Section titled “What if get() retrieves secondary messages that already exist in primary memory?”

如果从secondary内存中检索到的消息已存在于primary内存中,那么这些相当冗余的次要消息将不会被添加到系统消息中。在以下示例中,消息“Jerry likes juice.”被put到所有内存源中,因此系统消息未被修改。

msgs = composable_memory.get("What does Jerry like?")
msgs
[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='You are a REALLY helpful assistant.')]),
ChatMessage(role=<MessageRole.USER: 'user'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='Jerry likes juice.')])]

与其他方法 put()get() 类似,调用 reset() 将在 primarysecondary 两个记忆源上执行 reset()。如果您只想重置 primary,则应仅从中调用 reset() 方法。

composable_memory.primary_memory.reset()
composable_memory.primary_memory.get()
[]
composable_memory.secondary_memory_sources[0].get("What does Alice like?")
[ChatMessage(role=<MessageRole.USER: 'user'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='Alice likes apples.')])]
composable_memory.reset()
composable_memory.primary_memory.get()
[]

使用 SimpleComposableMemory 与智能体配合

Section titled “Use SimpleComposableMemory With An Agent”

这里我们将使用一个SimpleComposableMemory与智能体配合,并演示如何利用辅助的长期记忆源,将来自某个智能体对话的消息用于另一个智能体会话的对话中。

from llama_index.llms.openai import OpenAI
from llama_index.core.tools import FunctionTool
from llama_index.core.agent.workflow import FunctionAgent
vector_memory = VectorMemory.from_defaults(
vector_store=None, # leave as None to use default in-memory vector store
embed_model=OpenAIEmbedding(),
retriever_kwargs={"similarity_top_k": 2},
)
chat_memory_buffer = ChatMemoryBuffer.from_defaults()
composable_memory = SimpleComposableMemory.from_defaults(
primary_memory=chat_memory_buffer,
secondary_memory_sources=[vector_memory],
)
def multiply(a: int, b: int) -> int:
"""Multiply two integers and returns the result integer"""
return a * b
def mystery(a: int, b: int) -> int:
"""Mystery function on two numbers"""
return a**2 - b**2
multiply_tool = FunctionTool.from_defaults(fn=multiply)
mystery_tool = FunctionTool.from_defaults(fn=mystery)
llm = OpenAI(model="gpt-4.1-mini")
agent = FunctionAgent(
tools=[multiply_tool, mystery_tool],
llm=llm,
)

当调用 .chat() 时,消息会被放入可组合内存中,根据前一节的理解,这意味着所有消息都会被同时存入 primarysecondary 两个存储源。

response = await agent.run(
"What is the mystery function on 5 and 6?", memory=composable_memory
)
print(str(response))
The mystery function on 5 and 6 returns -11.
response = await agent.run(
"What happens if you multiply 2 and 3?", memory=composable_memory
)
print(str(response))
If you multiply 2 and 3, the result is 6.

现在我们已经将消息添加到我们的 vector_memory 中,可以看到这种记忆在新智能体会话中使用时的效果与未使用时有何不同。具体来说,我们要求新智能体“回忆”函数调用的输出结果,而不是重新计算。

一个没有我们过去记忆的智能体

Section titled “An Agent without our past memory”
response = await agent.run(
"What was the output of the mystery function on 5 and 6 again? Don't recompute."
# memory=composable_memory
)
print(str(response))
I don't have the previous output of the mystery function on 5 and 6 stored. If you want, I can recompute it for you. Would you like me to do that?

一个拥有我们过去记忆的智能体

Section titled “An Agent with our past memory”

我们发现无法访问我们过去记忆的智能体无法完成任务。通过下一个智能体,我们确实会传入我们之前的长期记忆(即vector_memory)。请注意,我们甚至使用了一个全新的ChatMemoryBuffer,这意味着该智能体没有chat_history。尽管如此,它仍能从我们的长期记忆中检索到所需的过往对话。

response = await agent.run(
"What was the output of the mystery function on 5 and 6 again? Don't recompute.",
memory=composable_memory,
)
print(str(response))
The output of the mystery function on 5 and 6 is -11.
response = await agent.run(
"What was the output of the multiply function on 2 and 3 again? Don't recompute.",
memory=composable_memory,
)
print(str(response))
The output of the multiply function on 2 and 3 was 6.

在底层,.run(user_input)调用实际上会以user_input作为参数调用内存的.get()方法。正如我们在前一节学到的,这最终将返回primary和所有secondary内存源的组合。这些组合后的消息正是作为聊天历史传递给LLM聊天API的内容。