Knowledge - CrewAI

什么是知识？

在CrewAI中，知识是一个强大的系统，允许AI代理在执行任务时访问和利用外部信息源。可以将其视为为您的代理提供了一个参考库，他们可以在工作时查阅。

使用知识的主要好处：

使用领域特定信息增强代理
用真实世界的数据支持决策
在对话中保持上下文
基于事实信息的回应

支持的知识来源

CrewAI 支持多种类型的知识源，开箱即用：

文本来源

原始字符串
文本文件 (.txt)
PDF 文档

结构化数据

CSV 文件
Excel 电子表格
JSON 文档

支持的知识参数

参数	类型	是否必需	描述
`sources`	List[BaseKnowledgeSource]	Yes	提供内容以进行存储和查询的知识源列表。可以包括PDF、CSV、Excel、JSON、文本文件或字符串内容。
`collection_name`	str	No	知识将存储的集合名称。用于识别不同的知识集。如果未提供，默认为“knowledge”。
`storage`	Optional[KnowledgeStorage]	No	用于管理知识存储和检索的自定义存储配置。如果未提供，将创建一个默认存储。

快速入门示例

对于基于文件的知识源，请确保将您的文件放置在项目根目录下的knowledge目录中。此外，在创建源时，请使用从knowledge目录开始的相对路径。

这是一个使用基于字符串知识的示例：

Code
from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# Create a knowledge source
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(
    content=content,
)

# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(model="gpt-4o-mini", temperature=0)

# Create an agent with the knowledge store
agent = Agent(
    role="About User",
    goal="You know everything about the user.",
    backstory="""You are a master at understanding people and their preferences.""",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)
task = Task(
    description="Answer the following questions about the user: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[string_source], # Enable knowledge by adding the sources here. You can also add more sources to the sources list.
)

result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})

这是另一个使用CrewDoclingSource的例子。CrewDoclingSource实际上非常多功能，可以处理多种文件格式，包括TXT、PDF、DOCX、HTML等。

Code
from crewai import LLM, Agent, Crew, Process, Task
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource

# Create a knowledge source
content_source = CrewDoclingSource(
    file_paths=[
        "https://lilianweng.github.io/posts/2024-11-28-reward-hacking",
        "https://lilianweng.github.io/posts/2024-07-07-hallucination",
    ],
)

# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(model="gpt-4o-mini", temperature=0)

# Create an agent with the knowledge store
agent = Agent(
    role="About papers",
    goal="You know everything about the papers.",
    backstory="""You are a master at understanding papers and their content.""",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)
task = Task(
    description="Answer the following questions about the papers: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[
        content_source
    ],  # Enable knowledge by adding the sources here. You can also add more sources to the sources list.
)

result = crew.kickoff(
    inputs={
        "question": "What is the reward hacking paper about? Be sure to provide sources."
    }
)

知识配置

分块配置

知识源自动分块内容以便更好地处理。您可以在知识源中配置分块行为：

from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

source = StringKnowledgeSource(
    content="Your content here",
    chunk_size=4000,      # Maximum size of each chunk (default: 4000)
    chunk_overlap=200     # Overlap between chunks (default: 200)
)

分块配置有助于：

将大型文档分解为可管理的部分
通过块重叠保持上下文
优化检索准确性

嵌入配置

您还可以为知识存储配置嵌入器。如果您想为知识存储使用与代理不同的嵌入器，这将非常有用。 embedder 参数支持包括以下各种嵌入模型提供者：

openai: OpenAI的嵌入模型
google: 谷歌的文本嵌入模型
azure: Azure OpenAI 嵌入
ollama: 使用 Ollama 进行本地嵌入
vertexai: Google Cloud VertexAI 嵌入
cohere: Cohere的嵌入模型
bedrock: AWS Bedrock 嵌入
huggingface: Hugging Face 模型
watson: IBM Watson 嵌入

以下是如何使用Google的text-embedding-004模型配置知识库嵌入器的示例：

清除知识

如果您需要清除存储在CrewAI中的知识，可以使用带有--knowledge选项的crewai reset-memories命令。

Command
crewai reset-memories --knowledge

当您更新了知识源并希望确保代理使用最新信息时，这非常有用。

特定代理知识

虽然可以在团队级别使用crew.knowledge_sources提供知识，但个别代理也可以使用knowledge_sources参数拥有自己的知识源：

Code
from crewai import Agent, Task, Crew
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# Create agent-specific knowledge about a product
product_specs = StringKnowledgeSource(
    content="""The XPS 13 laptop features:
    - 13.4-inch 4K display
    - Intel Core i7 processor
    - 16GB RAM
    - 512GB SSD storage
    - 12-hour battery life""",
    metadata={"category": "product_specs"}
)

# Create a support agent with product knowledge
support_agent = Agent(
    role="Technical Support Specialist",
    goal="Provide accurate product information and support.",
    backstory="You are an expert on our laptop products and specifications.",
    knowledge_sources=[product_specs]  # Agent-specific knowledge
)

# Create a task that requires product knowledge
support_task = Task(
    description="Answer this customer question: {question}",
    agent=support_agent
)

# Create and run the crew
crew = Crew(
    agents=[support_agent],
    tasks=[support_task]
)

# Get answer about the laptop's specifications
result = crew.kickoff(
    inputs={"question": "What is the storage capacity of the XPS 13?"}
)

特定代理知识的好处：

为代理提供与其角色相关的专业信息
保持代理之间的关注点分离
结合机组级别的知识，实现分层信息访问

自定义知识源

CrewAI 允许您通过扩展 BaseKnowledgeSource 类为任何类型的数据创建自定义知识源。让我们创建一个实际的示例，该示例获取并处理太空新闻文章。

太空新闻知识源示例

关键组件解释

自定义知识源 (SpaceNewsKnowledgeSource):
- 扩展 BaseKnowledgeSource 以与 CrewAI 集成
- 可配置的 API 端点和文章限制
- 实现三个关键方法:
  - load_content(): 从 API 获取文章
  - _format_articles(): 将文章结构化为可读文本
  - add(): 处理并存储内容
代理配置:
- 专门角色为太空新闻分析师
- 使用知识源访问太空新闻
任务设置:
- 通过{user_question}接收用户问题作为输入
- 旨在根据知识源提供详细的答案
团队编排:
- 管理代理和任务之间的工作流程
- 通过启动方法处理输入/输出

这个例子演示了如何：

创建一个自定义知识源，用于获取实时数据
处理和格式化外部数据以供AI使用
使用知识源来回答特定的用户问题
通过CrewAI的代理系统无缝集成一切

关于Spaceflight新闻API

该示例使用了Spaceflight News API，它：

提供免费访问与太空相关的新闻文章
不需要身份验证
返回有关空间新闻的结构化数据
支持分页和过滤

您可以通过修改端点URL来自定义API查询：

# Fetch more articles
recent_news = SpaceNewsKnowledgeSource(
    api_endpoint="https://api.spaceflightnewsapi.net/v4/articles",
    limit=20,  # Increase the number of articles
)

# Add search parameters
recent_news = SpaceNewsKnowledgeSource(
    api_endpoint="https://api.spaceflightnewsapi.net/v4/articles?search=NASA", # Search for NASA news
    limit=10,
)

开始使用

核心概念

操作指南

工具

遥测

知识

什么是知识？

支持的知识来源

文本来源

结构化数据

支持的知识参数

快速入门示例

更多示例

文本文件知识源

PDF 知识源

CSV 知识源

Excel 知识源

JSON 知识源

知识配置

分块配置

嵌入配置

清除知识

特定代理知识

自定义知识源

太空新闻知识源示例

关键组件解释

关于Spaceflight新闻API

最佳实践

开始使用

核心概念

操作指南

工具

遥测

​什么是知识？

​支持的知识来源

文本来源

结构化数据

​支持的知识参数

​快速入门示例

​更多示例

​文本文件知识源

​PDF 知识源

​CSV 知识源

​Excel 知识源

​JSON 知识源

​知识配置

​分块配置

​嵌入配置

​清除知识

​特定代理知识

​自定义知识源

​太空新闻知识源示例

​关键组件解释

​关于Spaceflight新闻API

​最佳实践

什么是知识？

支持的知识来源

支持的知识参数

快速入门示例

更多示例

文本文件知识源

PDF 知识源

CSV 知识源

Excel 知识源

JSON 知识源

知识配置

分块配置

嵌入配置

清除知识

特定代理知识

自定义知识源

太空新闻知识源示例

关键组件解释

关于Spaceflight新闻API

最佳实践