入门教程(使用本地LLM)
本教程将向您展示如何使用LlamaIndex开始构建智能体。我们将从一个基础示例开始,然后演示如何添加RAG(检索增强生成)功能。
我们将使用BAAI/bge-base-en-v1.5作为我们的嵌入模型,并通过Ollama提供服务的llama3.1 8B。
Ollama 是一个工具,可帮助您以最少的设置在本地区域部署大型语言模型。
按照自述文件学习如何安装它。
要下载Llama3模型,只需执行ollama pull llama3.1。
注意: 您需要一台至少拥有约32GB内存的机器。
正如我们的安装指南所述,llama-index实际上是一个软件包集合。要运行Ollama和Huggingface,我们需要安装这些集成组件:
pip install llama-index-llms-ollama llama-index-embeddings-huggingface包名称明确指出了导入方式,这对于记住如何导入或安装它们非常有帮助!
from llama_index.llms.ollama import Ollamafrom llama_index.embeddings.huggingface import HuggingFaceEmbedding更多集成功能均列于 https://llamahub.ai。
让我们从一个简单的示例开始,使用一个能够通过调用工具执行基本乘法的智能体。创建一个名为 starter.py 的文件:
import asynciofrom llama_index.core.agent.workflow import FunctionAgentfrom llama_index.llms.ollama import Ollama
# Define a simple calculator tooldef multiply(a: float, b: float) -> float: """Useful for multiplying two numbers.""" return a * b
# Create an agent workflow with our calculator toolagent = FunctionAgent( tools=[multiply], llm=Ollama( model="llama3.1", request_timeout=360.0, # Manually set the context window to limit memory usage context_window=8000, ), system_prompt="You are a helpful assistant that can multiply two numbers.",)
async def main(): # Run the agent response = await agent.run("What is 1234 * 4567?") print(str(response))
# Run the agentif __name__ == "__main__": asyncio.run(main())这将输出类似:The answer to 1234 * 4567 is 5635678.
发生的情况是:
- 智能体被赋予一个问题:
What is 1234 * 4567? - 在底层,这个问题以及工具的模式(名称、文档字符串和参数)被传递给了LLM
- 智能体选择了
multiply工具并向该工具写入参数 - 智能体从工具接收到结果,并将其插入到最终响应中
AgentWorkflow 也能够记住先前的消息。这包含在 AgentWorkflow 的 Context 中。
如果传入 Context,智能体将使用它来继续对话。
from llama_index.core.workflow import Context
# create contextctx = Context(agent)
# run agent with contextresponse = await agent.run("My name is Logan", ctx=ctx)response = await agent.run("What is my name?", ctx=ctx)现在让我们通过添加文档搜索功能来增强我们的智能体。首先,让我们通过终端获取一些示例数据:
mkdir datawget https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt -O data/paul_graham_essay.txt您的目录结构现在应该看起来像这样:
├── starter.py └── data └── paul_graham_essay.txt
现在我们可以创建一个使用LlamaIndex搜索文档的工具。默认情况下,我们的VectorStoreIndex将使用来自OpenAI的text-embedding-ada-002嵌入来嵌入和检索文本。
我们修改后的 starter.py 应该如下所示:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settingsfrom llama_index.core.agent.workflow import AgentWorkflowfrom llama_index.llms.ollama import Ollamafrom llama_index.embeddings.huggingface import HuggingFaceEmbeddingimport asyncioimport os
# Settings control global defaultsSettings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")Settings.llm = Ollama( model="llama3.1", request_timeout=360.0, # Manually set the context window to limit memory usage context_window=8000,)
# Create a RAG tool using LlamaIndexdocuments = SimpleDirectoryReader("data").load_data()index = VectorStoreIndex.from_documents( documents, # we can optionally override the embed_model here # embed_model=Settings.embed_model,)query_engine = index.as_query_engine( # we can optionally override the llm here # llm=Settings.llm,)
def multiply(a: float, b: float) -> float: """Useful for multiplying two numbers.""" return a * b
async def search_documents(query: str) -> str: """Useful for answering natural language questions about an personal essay written by Paul Graham.""" response = await query_engine.aquery(query) return str(response)
# Create an enhanced workflow with both toolsagent = AgentWorkflow.from_tools_or_functions( [multiply, search_documents], llm=Settings.llm, system_prompt="""You are a helpful assistant that can perform calculations and search through documents to answer questions.""",)
# Now we can ask questions about the documents or do calculationsasync def main(): response = await agent.run( "What did the author do in college? Also, what's 7 * 8?" ) print(response)
# Run the agentif __name__ == "__main__": asyncio.run(main())智能体现在可以无缝切换使用计算器和搜索文档来回答问题。
为了避免每次重新处理文档,您可以将索引持久化到磁盘:
# Save the indexindex.storage_context.persist("storage")
# Later, load the indexfrom llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir="storage")index = load_index_from_storage( storage_context, # we can optionally override the embed_model here # it's important to use the same embed_model as the one used to build the index # embed_model=Settings.embed_model,)query_engine = index.as_query_engine( # we can optionally override the llm here # llm=Settings.llm,)index = VectorStoreIndex.from_vector_store( vector_store, # it's important to use the same embed_model as the one used to build the index # embed_model=Settings.embed_model,)这只是您使用LlamaIndex智能体所能实现功能的开始!您可以:
- 为你的智能体添加更多工具
- 使用不同的LLM
- 使用系统提示自定义智能体的行为
- 添加流式处理能力
- 实现人机协同工作流程
- 使用多个智能体协作完成任务
一些有用的后续链接: