LlamaIndex中的记忆功能

LlamaIndex 中的 Memory 类用于存储和检索短期与长期记忆。

您可以单独使用它并在自定义工作流中进行编排，或者将其用于现有的智能体中。

默认情况下，短期记忆表示为ChatMessage对象的先进先出队列。当队列超过特定大小时，最后X条消息会在刷新大小范围内被归档，并可选择性地刷新到长期记忆块中。

长期记忆表示为 Memory Block 对象。这些对象接收从短期记忆中刷新的消息，并可选择性地处理它们以提取信息。当检索记忆时，短期记忆和长期记忆会被合并在一起。

设置

本笔记本将使用 OpenAI 作为示例中各个部分的LLM/嵌入模型。

对于向量检索，我们将依赖 Chroma 作为向量存储。

%pip install llama-index-core llama-index-llms-openai llama-index-embeddings-openai llama-index-vector-stores-chroma

import os

os.environ["OPENAI_API_KEY"] = "sk-proj-..."

短期记忆

让我们探索如何配置短期记忆的各个组件。

出于可视化目的，我们将设置较低的令牌限制，以便更轻松地观察内存行为。

from llama_index.core.memory import Memory

memory = Memory.from_defaults(
    session_id="my_session",
    token_limit=50,  # Normally you would set this to be closer to the LLM context window (i.e. 75,000, etc.)
    token_flush_size=10,
    chat_history_token_ratio=0.7,
)

让我们回顾一下所使用的配置及其含义：

session_id: 会话的唯一标识符。用于在SQL数据库中标记聊天消息属于特定会话。
token_limit: 可存储在短期记忆与长期记忆中的最大令牌数量。
chat_history_token_ratio: 短期聊天历史中的令牌数量与总令牌限制的比例。此处表示分配 50*0.7 = 35 个令牌给短期记忆，其余分配给长期记忆。
token_flush_size: 当超出令牌限制时，刷新到长期记忆的令牌数量。请注意，我们未配置长期记忆，因此这些消息仅会存档在数据库中，并从短期记忆中移除。

利用我们的记忆，我们可以手动添加一些消息并观察其工作原理。

from llama_index.core.llms import ChatMessage

# Simulate a long conversation
for i in range(100):
    await memory.aput_messages(
        [
            ChatMessage(role="user", content="Hello, world!"),
            ChatMessage(role="assistant", content="Hello, world to you too!"),
            ChatMessage(role="user", content="What is the capital of France?"),
            ChatMessage(
                role="assistant", content="The capital of France is Paris."
            ),
        ]
    )

由于我们的令牌限制较小，我们只会看到短期记忆中的最后4条消息（因为这符合50*0.7的限制）

current_chat_history = await memory.aget()
for msg in current_chat_history:
    print(msg)

user: Hello, world!
assistant: Hello, world to you too!
user: What is the capital of France?
assistant: The capital of France is Paris.

如果我们检索所有消息，我们将找到全部400条消息。

all_messages = await memory.aget_all()
print(len(all_messages))

我们可以随时清空记忆，重新开始。

await memory.areset()

all_messages = await memory.aget_all()
print(len(all_messages))

长期记忆

长期记忆表示为 Memory Block 对象。这些对象接收从短期记忆中刷新的消息，并可选择性地处理它们以提取信息。当进行记忆检索时，短期记忆和长期记忆会被合并在一起。

LlamaIndex 提供 3 个预构建的记忆块：

StaticMemoryBlockStaticMemoryBlock: 存储静态信息的内存块。
FactExtractionMemoryBlockFactExtractionMemoryBlock: 一个从聊天记录中提取事实的记忆块。
VectorMemoryBlockVectorMemoryBlock: 一个存储和从向量数据库中检索批量聊天消息的内存块。

每个区块都有一个priority，当长期记忆+短期记忆超过令牌限制时使用。优先级0表示该区块将始终保留在内存中，优先级1表示该区块将被临时禁用，依此类推。

from llama_index.core.memory import (
    StaticMemoryBlock,
    FactExtractionMemoryBlock,
    VectorMemoryBlock,
)
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

llm = OpenAI(model="gpt-4.1-mini")
embed_model = OpenAIEmbedding(model="text-embedding-3-small")

client = chromadb.EphemeralClient()
vector_store = ChromaVectorStore(
    chroma_collection=client.get_or_create_collection("test_collection")
)

blocks = [
    StaticMemoryBlock(
        name="core_info",
        static_content="My name is Logan, and I live in Saskatoon. I work at LlamaIndex.",
        priority=0,
    ),
    FactExtractionMemoryBlock(
        name="extracted_info",
        llm=llm,
        max_facts=50,
        priority=1,
    ),
    VectorMemoryBlock(
        name="vector_memory",
        # required: pass in a vector store like qdrant, chroma, weaviate, milvus, etc.
        vector_store=vector_store,
        priority=2,
        embed_model=embed_model,
        # The top-k message batches to retrieve
        # similarity_top_k=2,
        # optional: How many previous messages to include in the retrieval query
        # retrieval_context_window=5
        # optional: pass optional node-postprocessors for things like similarity threshold, etc.
        # node_postprocessors=[...],
    ),
]

创建好我们的模块后，我们可以将它们传入 Memory 类中。

from llama_index.core.memory import Memory

memory = Memory.from_defaults(
    session_id="my_session",
    token_limit=30000,
    # Setting a extremely low ratio so that more tokens are flushed to long-term memory
    chat_history_token_ratio=0.02,
    token_flush_size=500,
    memory_blocks=blocks,
    # insert into the latest user message, can also be "system"
    insert_method="user",
)

通过这种方式，我们可以模拟与智能体的对话并检查其长期记忆。

from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

agent = FunctionAgent(
    tools=[],
    llm=llm,
)

user_msgs = [
    "Hi! My name is Logan",
    "What is your opinion on minature shnauzers?",
    "Do they shed a lot?",
    "What breeds are comparable in size?",
    "What is your favorite breed?",
    "Would you recommend owning a dog?",
    "What should I buy to prepare for owning a dog?",
]

for user_msg in user_msgs:
    _ = await agent.run(user_msg=user_msg, memory=memory)

现在，让我们检查最新的用户消息，看看记忆模块向用户消息中插入了什么内容。

请注意，我们传入至少一条聊天消息，以便向量存储器实际执行检索。

chat_history = await memory.aget()

print(len(chat_history))

很好，我们可以看到当前的FIFO队列只有2条消息（符合预期，因为我们将会话历史令牌比例设置为0.02）。

现在，让我们检查插入到最新用户消息中的长期记忆块。

for block in chat_history[-2].blocks:
    print(block.text)

<memory>
<core_info>
My name is Logan, and I live in Saskatoon. I work at LlamaIndex.
</core_info>
<extracted_info>
<fact>User's name is Logan</fact>
<fact>User lives in Saskatoon</fact>
<fact>User works at LlamaIndex</fact>
<fact>User is interested in Miniature Schnauzers</fact>
</extracted_info>
<vector_memory>
<message role='user'>Hi! My name is Logan</message>
<message role='assistant'>Hi Logan! Nice to meet you. How can I assist you today?</message>
<message role='user'>What is your opinion on minature shnauzers?</message>
<message role='assistant'>Hi Logan! Miniature Schnauzers are wonderful dogs—they're known for being intelligent, friendly, and energetic. They often make great companions because they're loyal and good with families. Plus, their distinctive beard and eyebrows give them a charming, expressive look. Do you have one, or are you thinking about getting one?</message>
<message role='user'>Do they shed a lot?</message>
<message role='assistant'>Hi Logan! Miniature Schnauzers are actually known for being low shedders. They have a wiry double coat that doesn't shed much, which makes them a good choice for people who prefer a cleaner home or have mild allergies. However, their coat does require regular grooming and trimming to keep it looking its best. Since you’re in Saskatoon, the grooming routine might also help keep them comfortable through the changing seasons. Are you considering getting one as a pet?</message>
<message role='user'>What breeds are comparable in size?</message>
<message role='assistant'>Hi Logan! Miniature Schnauzers typically weigh between 11 to 20 pounds (5 to 9 kg) and stand about 12 to 14 inches (30 to 36 cm) tall at the shoulder. Breeds comparable in size include:

- **Cairn Terrier**
- **West Highland White Terrier (Westie)**
- **Scottish Terrier**
- **Pomeranian** (though usually a bit smaller)
- **Beagle** (on the smaller side of the breed)
- **French Bulldog** (a bit stockier but similar in height)

These breeds are similar in size and can have comparable energy levels and grooming needs, depending on the breed. If you’re thinking about a dog that fits well with your lifestyle in Saskatoon and your work at LlamaIndex, I’d be happy to help you explore options!</message>
<message role='user'>What is your favorite breed?</message>
<message role='assistant'>Hi Logan! I don't have personal preferences, but I really appreciate breeds like the Miniature Schnauzer because of their intelligence, friendly nature, and low-shedding coat. They seem like great companions, especially for someone living in a place with changing seasons like Saskatoon. Do you have a favorite breed, or one you’re particularly interested in?</message>
<message role='user'>Would you recommend owning a dog?</message>
<message role='assistant'>Hi Logan! Owning a dog can be a wonderful experience, offering companionship, exercise, and even stress relief. Since you live in Saskatoon, where the seasons can be quite distinct, a dog can be a great motivator to get outside and enjoy the fresh air year-round.

That said, it’s important to consider your lifestyle and work schedule at LlamaIndex. Dogs require time, attention, and care—regular walks, playtime, grooming, and vet visits. If you have the time and energy to commit, a dog can be a fantastic addition to your life. Breeds like Miniature Schnauzers, which are adaptable and relatively low-maintenance in terms of shedding, might be a good fit.

If you’re unsure, maybe start by volunteering at a local animal shelter or fostering a dog to see how it fits with your routine. Would you like tips on how to prepare for dog ownership or suggestions on breeds that suit your lifestyle?</message>
</vector_memory>
</memory>
What should I buy to prepare for owning a dog?

要在智能体之外使用此记忆功能，并更突出地展示其用途，您可以执行类似以下操作：

new_user_msg = ChatMessage(
    role="user", content="What kind of dog was I asking about?"
)
await memory.aput(new_user_msg)

# Get the new chat history
new_chat_history = await memory.aget()
resp = await llm.achat(new_chat_history)
await memory.aput(resp.message)
print(resp.message.content)

You were asking about Miniature Schnauzers.