LlamaIndex查询管道简介¶
概述¶
LlamaIndex 提供了一个声明式查询API,允许您将不同模块链接在一起,以便在数据上编排从简单到高级的工作流。
这围绕我们的QueryPipeline
抽象展开。加载各种模块(从LLM到提示词到检索器再到其他管道),将它们全部连接成一个顺序链或DAG,并端到端运行。
注意:您可以在不使用声明式管道抽象的情况下编排所有这些工作流程(通过命令式地使用模块并编写自己的函数)。那么QueryPipeline
的优势是什么?
- 用更少的代码/样板表达常见工作流程
- 更高的可读性
- 与常见低代码/无代码解决方案(如LangFlow)实现更高兼容性/更优集成点
- [未来] 声明式接口将支持轻松序列化管道组件,实现管道的可移植性/更便捷地部署到不同系统。
使用手册¶
在本教程中,我们将向您介绍我们的QueryPipeline
接口,并展示一些您可以处理的基本工作流程。
- 将提示与LLM串联起来
- 将查询重写(提示+LLM)与检索串联起来
- 将完整的RAG查询流程串联起来(查询重写、检索、重排序、响应合成)
- 设置自定义查询组件
- 逐步执行管道步骤
设置¶
在这里,我们设置了一些数据+索引(来自PG的文章),这些将在本教程的其余部分中使用。
%pip install llama-index-embeddings-openai
%pip install llama-index-postprocessor-cohere-rerank
%pip install llama-index-llms-openai
# setup Arize Phoenix for logging/observability
import phoenix as px
px.launch_app()
import llama_index.core
llama_index.core.set_global_handler("arize_phoenix")
🌍 To view the Phoenix app in your browser, visit http://127.0.0.1:6006/ 📺 To view the Phoenix app in a notebook, run `px.active_session().view()` 📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
from llama_index.core import SimpleDirectoryReader
reader = SimpleDirectoryReader("../data/paul_graham")
docs = reader.load_data()
import os
from llama_index.core import (
StorageContext,
VectorStoreIndex,
load_index_from_storage,
)
if not os.path.exists("storage"):
index = VectorStoreIndex.from_documents(docs)
# save index to disk
index.set_index_id("vector_index")
index.storage_context.persist("./storage")
else:
# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="storage")
# load index
index = load_index_from_storage(storage_context, index_id="vector_index")
1. 将提示与LLM串联起来¶
在本节中,我们将展示一个将提示与LLM链接起来的超简单工作流程。
我们只需在初始化时定义chain
。这是查询管道的一种特殊情况,其中组件完全是顺序执行的,我们会自动将输出转换为适合下一个输入的格式。
from llama_index.core.query_pipeline import QueryPipeline
from llama_index.core import PromptTemplate
# try chaining basic prompts
prompt_str = "Please generate related movies to {movie_name}"
prompt_tmpl = PromptTemplate(prompt_str)
llm = OpenAI(model="gpt-3.5-turbo")
p = QueryPipeline(chain=[prompt_tmpl, llm], verbose=True)
output = p.run(movie_name="The Departed")
> Running module 8dc57d24-9691-4d8d-87d7-151865a7cd1b with input: movie_name: The Departed > Running module 7ed9e26c-a704-4b0b-9cfd-991266e754c0 with input: messages: Please generate related movies to The Departed
print(str(output))
assistant: 1. Infernal Affairs (2002) - The original Hong Kong film that inspired The Departed 2. The Town (2010) - A crime thriller directed by and starring Ben Affleck 3. Mystic River (2003) - A crime drama directed by Clint Eastwood 4. Goodfellas (1990) - A classic mobster film directed by Martin Scorsese 5. The Irishman (2019) - Another crime drama directed by Martin Scorsese, starring Robert De Niro and Al Pacino 6. The Departed (2006) - The Departed is a 2006 American crime film directed by Martin Scorsese and written by William Monahan. It is a remake of the 2002 Hong Kong film Infernal Affairs. The film stars Leonardo DiCaprio, Matt Damon, Jack Nicholson, and Mark Wahlberg, with Martin Sheen, Ray Winstone, Vera Farmiga, and Alec Baldwin in supporting roles.
查看中间输入/输出¶
出于调试和其他目的,我们还可以查看每个步骤的输入和输出。
output, intermediates = p.run_with_intermediates(movie_name="The Departed")
> Running module 8dc57d24-9691-4d8d-87d7-151865a7cd1b with input: movie_name: The Departed > Running module 7ed9e26c-a704-4b0b-9cfd-991266e754c0 with input: messages: Please generate related movies to The Departed
intermediates["8dc57d24-9691-4d8d-87d7-151865a7cd1b"]
ComponentIntermediates(inputs={'movie_name': 'The Departed'}, outputs={'prompt': 'Please generate related movies to The Departed'})
intermediates["7ed9e26c-a704-4b0b-9cfd-991266e754c0"]
ComponentIntermediates(inputs={'messages': 'Please generate related movies to The Departed'}, outputs={'output': ChatResponse(message=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='1. Infernal Affairs (2002) - The original Hong Kong film that inspired The Departed\n2. The Town (2010) - A crime thriller directed by Ben Affleck\n3. Mystic River (2003) - A crime drama directed by Clint Eastwood\n4. Goodfellas (1990) - A classic crime film directed by Martin Scorsese\n5. The Irishman (2019) - Another crime film directed by Martin Scorsese, starring Robert De Niro and Al Pacino\n6. The Godfather (1972) - A classic crime film directed by Francis Ford Coppola\n7. Heat (1995) - A crime thriller directed by Michael Mann, starring Al Pacino and Robert De Niro\n8. The Departed (2006) - A crime thriller directed by Martin Scorsese, starring Leonardo DiCaprio and Matt Damon.', additional_kwargs={}), raw={'id': 'chatcmpl-9EKf2nZ4latFJvHy0gzOUZbaB8xwY', 'choices': [Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='1. Infernal Affairs (2002) - The original Hong Kong film that inspired The Departed\n2. The Town (2010) - A crime thriller directed by Ben Affleck\n3. Mystic River (2003) - A crime drama directed by Clint Eastwood\n4. Goodfellas (1990) - A classic crime film directed by Martin Scorsese\n5. The Irishman (2019) - Another crime film directed by Martin Scorsese, starring Robert De Niro and Al Pacino\n6. The Godfather (1972) - A classic crime film directed by Francis Ford Coppola\n7. Heat (1995) - A crime thriller directed by Michael Mann, starring Al Pacino and Robert De Niro\n8. The Departed (2006) - A crime thriller directed by Martin Scorsese, starring Leonardo DiCaprio and Matt Damon.', role='assistant', function_call=None, tool_calls=None))], 'created': 1713203040, 'model': 'gpt-3.5-turbo-0125', 'object': 'chat.completion', 'system_fingerprint': 'fp_c2295e73ad', 'usage': CompletionUsage(completion_tokens=184, prompt_tokens=15, total_tokens=199)}, delta=None, logprobs=None, additional_kwargs={})})
尝试输出解析¶
让我们将输出解析为一个结构化的Pydantic对象。
from typing import List
from pydantic import BaseModel, Field
from llama_index.core.output_parsers import PydanticOutputParser
class Movie(BaseModel):
"""Object representing a single movie."""
name: str = Field(..., description="Name of the movie.")
year: int = Field(..., description="Year of the movie.")
class Movies(BaseModel):
"""Object representing a list of movies."""
movies: List[Movie] = Field(..., description="List of movies.")
llm = OpenAI(model="gpt-3.5-turbo")
output_parser = PydanticOutputParser(Movies)
json_prompt_str = """\
Please generate related movies to {movie_name}. Output with the following JSON format:
"""
json_prompt_str = output_parser.format(json_prompt_str)
# add JSON spec to prompt template
json_prompt_tmpl = PromptTemplate(json_prompt_str)
p = QueryPipeline(chain=[json_prompt_tmpl, llm, output_parser], verbose=True)
output = p.run(movie_name="Toy Story")
> Running module 2e4093c5-ae62-420a-be91-9c28c057bada with input: movie_name: Toy Story > Running module 3b41f95c-f54b-41d7-8ef0-8e45b5d7eeb0 with input: messages: Please generate related movies to Toy Story. Output with the following JSON format: Here's a JSON schema to follow: {"title": "Movies", "description": "Object representing a list of movies.", "typ... > Running module 27e79a16-72de-4ce2-8b2e-94932c4069c3 with input: input: assistant: { "movies": [ { "name": "Finding Nemo", "year": 2003 }, { "name": "Monsters, Inc.", "year": 2001 }, { "name": "Cars", "year": 2006 ...
output
Movies(movies=[Movie(name='Finding Nemo', year=2003), Movie(name='Monsters, Inc.', year=2001), Movie(name='Cars', year=2006), Movie(name='The Incredibles', year=2004), Movie(name='Ratatouille', year=2007)])
流式支持¶
查询管道支持LLM流式处理(只需执行as_query_component(streaming=True)
)。中间输出将自动转换,最终输出可以是流式输出。以下是一些示例。
1. 通过流式处理串联多个提示
prompt_str = "Please generate related movies to {movie_name}"
prompt_tmpl = PromptTemplate(prompt_str)
# let's add some subsequent prompts for fun
prompt_str2 = """\
Here's some text:
{text}
Can you rewrite this with a summary of each movie?
"""
prompt_tmpl2 = PromptTemplate(prompt_str2)
llm = OpenAI(model="gpt-3.5-turbo")
llm_c = llm.as_query_component(streaming=True)
p = QueryPipeline(
chain=[prompt_tmpl, llm_c, prompt_tmpl2, llm_c], verbose=True
)
# p = QueryPipeline(chain=[prompt_tmpl, llm_c], verbose=True)
output = p.run(movie_name="The Dark Knight")
for o in output:
print(o.delta, end="")
> Running module 213af6d4-3450-46af-9087-b80656ae6951 with input: movie_name: The Dark Knight > Running module 3ff7e987-f5f3-4b36-a3e1-be5a4821d9d9 with input: messages: Please generate related movies to The Dark Knight > Running module a2841bd3-c833-4427-9a7e-83b19872b064 with input: text: <generator object llm_chat_callback.<locals>.wrap.<locals>.wrapped_llm_chat.<locals>.wrapped_gen at 0x298d338b0> > Running module c7e0a454-213a-460e-b029-f2d42fd7d938 with input: messages: Here's some text: 1. Batman Begins (2005) 2. The Dark Knight Rises (2012) 3. Batman v Superman: Dawn of Justice (2016) 4. Man of Steel (2013) 5. The Avengers (2012) 6. Iron Man (2008) 7. Captain Amer... 1. Batman Begins (2005): A young Bruce Wayne becomes Batman to fight crime in Gotham City, facing his fears and training under the guidance of Ra's al Ghul. 2. The Dark Knight Rises (2012): Batman returns to protect Gotham City from the ruthless terrorist Bane, who plans to destroy the city and its symbol of hope. 3. Batman v Superman: Dawn of Justice (2016): Batman and Superman clash as their ideologies collide, leading to an epic battle while a new threat emerges that threatens humanity. 4. Man of Steel (2013): The origin story of Superman, as he embraces his powers and faces General Zod, a fellow Kryptonian seeking to destroy Earth. 5. The Avengers (2012): Earth's mightiest heroes, including Iron Man, Captain America, Thor, and Hulk, join forces to stop Loki and his alien army from conquering the world. 6. Iron Man (2008): Billionaire Tony Stark builds a high-tech suit to escape captivity and becomes the superhero Iron Man, using his technology to fight against evil. 7. Captain America: The Winter Soldier (2014): Captain America teams up with Black Widow and Falcon to uncover a conspiracy within S.H.I.E.L.D. while facing a deadly assassin known as the Winter Soldier. 8. The Amazing Spider-Man (2012): Peter Parker, a high school student bitten by a radioactive spider, becomes Spider-Man and battles the Lizard, a monstrous villain threatening New York City. 9. Watchmen (2009): Set in an alternate reality, a group of retired vigilantes investigates the murder of one of their own, uncovering a conspiracy that could have catastrophic consequences. 10. Sin City (2005): A neo-noir anthology film set in the crime-ridden city of Basin City, following various characters as they navigate through corruption, violence, and redemption. 11. V for Vendetta (2005): In a dystopian future, a masked vigilante known as V fights against a totalitarian government, inspiring the people to rise up and reclaim their freedom. 12. Blade Runner 2049 (2017): A young blade runner uncovers a long-buried secret that leads him to seek out former blade runner Rick Deckard, while unraveling the mysteries of a future society. 13. Inception (2010): A skilled thief enters people's dreams to steal information, but is tasked with planting an idea instead, leading to a mind-bending journey through multiple layers of reality. 14. The Matrix (1999): A computer hacker discovers the truth about reality, joining a group of rebels fighting against sentient machines that have enslaved humanity in a simulated world. 15. The Crow (1994): A musician, resurrected by a supernatural crow, seeks vengeance against the gang that murdered him and his fiancée, unleashing a dark and atmospheric tale of revenge.
2. 将流式输出传递给输出解析器
p = QueryPipeline(
chain=[
json_prompt_tmpl,
llm.as_query_component(streaming=True),
output_parser,
],
verbose=True,
)
output = p.run(movie_name="Toy Story")
print(output)
> Running module fe1dbf6a-56e0-44bf-97d7-a2a1fe9d9b8c with input: movie_name: Toy Story > Running module a8eaaf91-df9d-46c4-bbae-06c15cd15123 with input: messages: Please generate related movies to Toy Story. Output with the following JSON format: Here's a JSON schema to follow: {"title": "Movies", "description": "Object representing a list of movies.", "typ... > Running module fcbc0b09-0ef5-43e0-b007-c4508fd6742f with input: input: <generator object llm_chat_callback.<locals>.wrap.<locals>.wrapped_llm_chat.<locals>.wrapped_gen at 0x298d32dc0> movies=[Movie(name='Finding Nemo', year=2003), Movie(name='Monsters, Inc.', year=2001), Movie(name='The Incredibles', year=2004), Movie(name='Cars', year=2006), Movie(name='Ratatouille', year=2007)]
将查询重写工作流(提示词+LLM)与检索功能串联起来¶
这里我们尝试一个稍微复杂些的工作流程,在启动检索前先将输入内容通过两个提示环节进行处理。
- 针对给定主题生成问题。
- 根据问题生成幻觉答案,以优化检索效果。
由于每个提示仅接受一个输入,请注意QueryPipeline
会自动将LLM输出链接到提示中,然后再输入到LLM。
在下一节中,您将看到如何更明确地定义链接。
# !pip install llama-index-postprocessor-cohere-rerank
from llama_index.postprocessor.cohere_rerank import CohereRerank
# generate question regarding topic
prompt_str1 = "Please generate a concise question about Paul Graham's life regarding the following topic {topic}"
prompt_tmpl1 = PromptTemplate(prompt_str1)
# use HyDE to hallucinate answer.
prompt_str2 = (
"Please write a passage to answer the question\n"
"Try to include as many key details as possible.\n"
"\n"
"\n"
"{query_str}\n"
"\n"
"\n"
'Passage:"""\n'
)
prompt_tmpl2 = PromptTemplate(prompt_str2)
llm = OpenAI(model="gpt-3.5-turbo")
retriever = index.as_retriever(similarity_top_k=5)
p = QueryPipeline(
chain=[prompt_tmpl1, llm, prompt_tmpl2, llm, retriever], verbose=True
)
nodes = p.run(topic="college")
len(nodes)
> Running module f5435516-61b6-49e9-9926-220cfb6443bd with input: topic: college > Running module 1dcaa097-cedc-4466-81bb-f8fd8768762b with input: messages: Please generate a concise question about Paul Graham's life regarding the following topic college > Running module 891afa10-5fe0-47ed-bdee-42a59d0e916d with input: query_str: assistant: How did Paul Graham's college experience shape his career and entrepreneurial mindset? > Running module 5bcd9964-b972-41a9-960d-96894c57a372 with input: messages: Please write a passage to answer the question Try to include as many key details as possible. How did Paul Graham's college experience shape his career and entrepreneurial mindset? Passage:""" > Running module 0b81a91a-2c90-4700-8ba1-25ffad5311fd with input: input: assistant: Paul Graham's college experience played a pivotal role in shaping his career and entrepreneurial mindset. As a student at Cornell University, Graham immersed himself in the world of compute...
5
创建一个完整的RAG流程作为DAG¶
在这里,我们将一个完整的RAG流程串联起来,包括查询重写、检索、重新排序和响应合成。
这里我们不能使用chain
语法,因为某些模块依赖多个输入(例如,响应合成需要同时获取节点和原始问题)。相反,我们将通过add_modules
然后add_link
显式构建一个DAG。
1. 带查询重写的RAG流程¶
在将查询传递给下游模块(检索/重排序/合成)之前,我们首先使用LLM对查询进行重写。
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.core.response_synthesizers import TreeSummarize
# define modules
prompt_str = "Please generate a question about Paul Graham's life regarding the following topic {topic}"
prompt_tmpl = PromptTemplate(prompt_str)
llm = OpenAI(model="gpt-3.5-turbo")
retriever = index.as_retriever(similarity_top_k=3)
reranker = CohereRerank()
summarizer = TreeSummarize(llm=llm)
# define query pipeline
p = QueryPipeline(verbose=True)
p.add_modules(
{
"llm": llm,
"prompt_tmpl": prompt_tmpl,
"retriever": retriever,
"summarizer": summarizer,
"reranker": reranker,
}
)
接下来我们使用add_link
在模块之间绘制链接。add_link
接收源/目标模块ID作为输入,并可选择性地接收source_key
和dest_key
参数。当存在多个输出/输入时,请分别指定source_key
或dest_key
。
您可以通过module.as_query_component().input_keys
和module.as_query_component().output_keys
查看每个模块的输入/输出键集合。
这里我们明确为reranker
和summarizer
模块指定了dest_key
,因为它们接收两个输入(query_str和nodes)。
p.add_link("prompt_tmpl", "llm")
p.add_link("llm", "retriever")
p.add_link("retriever", "reranker", dest_key="nodes")
p.add_link("llm", "reranker", dest_key="query_str")
p.add_link("reranker", "summarizer", dest_key="nodes")
p.add_link("llm", "summarizer", dest_key="query_str")
# look at summarizer input keys
print(summarizer.as_query_component().input_keys)
required_keys={'query_str', 'nodes'} optional_keys=set()
我们使用networkx
来存储图表示。这为我们提供了一种查看DAG的简便方式!
## create graph
from pyvis.network import Network
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(p.dag)
net.show("rag_dag.html")
## another option using `pygraphviz`
# from networkx.drawing.nx_agraph import to_agraph
# from IPython.display import Image
# agraph = to_agraph(p.dag)
# agraph.layout(prog="dot")
# agraph.draw('rag_dag.png')
# display(Image('rag_dag.png'))
rag_dag.html
response = p.run(topic="YC")
> Running module prompt_tmpl with input: topic: YC > Running module llm with input: messages: Please generate a question about Paul Graham's life regarding the following topic YC > Running module retriever with input: input: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)? > Running module reranker with input: query_str: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)? nodes: [NodeWithScore(node=TextNode(id_='ccd39041-5a64-4bd3-aca7-48f804b5a23f', embedding=None, metadata={'file_path': '../data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file... > Running module summarizer with input: query_str: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)? nodes: [NodeWithScore(node=TextNode(id_='120574dd-a5c9-4985-ab3e-37b1070b500a', embedding=None, metadata={'file_path': '../data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file...
print(str(response))
Paul Graham played a significant role in the founding and development of Y Combinator (YC). He was one of the co-founders of YC and provided the initial funding for the investment firm. Along with his partners, he implemented the ideas they had been discussing and started their own investment firm. Paul Graham also played a key role in shaping the unique batch model of YC, where a group of startups is funded and provided intensive support for a period of three months. He was actively involved in selecting and helping the founders, and he also wrote essays and worked on YC's internal software.
# you can do async too
response = await p.arun(topic="YC")
print(str(response))
> Running modules and inputs in parallel: Module key: prompt_tmpl. Input: topic: YC > Running modules and inputs in parallel: Module key: llm. Input: messages: Please generate a question about Paul Graham's life regarding the following topic YC > Running modules and inputs in parallel: Module key: retriever. Input: input: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)? > Running modules and inputs in parallel: Module key: reranker. Input: query_str: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)? nodes: [NodeWithScore(node=TextNode(id_='ccd39041-5a64-4bd3-aca7-48f804b5a23f', embedding=None, metadata={'file_path': '../data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file... > Running modules and inputs in parallel: Module key: summarizer. Input: query_str: assistant: What role did Paul Graham play in the founding and development of Y Combinator (YC)? nodes: [NodeWithScore(node=TextNode(id_='120574dd-a5c9-4985-ab3e-37b1070b500a', embedding=None, metadata={'file_path': '../data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file... Paul Graham played a significant role in the founding and development of Y Combinator (YC). He was one of the co-founders of YC and provided the initial funding for the investment firm. Along with his partners, he implemented the ideas they had been discussing and decided to start their own investment firm. Paul Graham also played a key role in shaping the unique batch model of YC, where a group of startups is funded and provided intensive support for a period of three months. He was actively involved in selecting and helping the founders and worked on various projects related to YC, including writing essays and developing internal software.
2. 不进行查询重写的RAG流程¶
这里我们设置了一个不包含查询重写步骤的RAG流程。
这里我们需要一种方法将输入查询与检索器、重新排序器和摘要生成器连接起来。我们可以通过定义一个特殊的InputComponent
来实现这一点,它允许我们将输入连接到多个下游模块。
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core.query_pipeline import InputComponent
retriever = index.as_retriever(similarity_top_k=5)
summarizer = TreeSummarize(llm=OpenAI(model="gpt-3.5-turbo"))
reranker = CohereRerank()
p = QueryPipeline(verbose=True)
p.add_modules(
{
"input": InputComponent(),
"retriever": retriever,
"summarizer": summarizer,
}
)
p.add_link("input", "retriever")
p.add_link("input", "summarizer", dest_key="query_str")
p.add_link("retriever", "summarizer", dest_key="nodes")
output = p.run(input="what did the author do in YC")
> Running module input with input: input: what did the author do in YC > Running module retriever with input: input: what did the author do in YC > Running module summarizer with input: query_str: what did the author do in YC nodes: [NodeWithScore(node=TextNode(id_='86dea730-ca35-4bcb-9f9b-4c99e8eadd08', embedding=None, metadata={'file_path': '../data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file...
print(str(output))
The author worked on various projects at YC, including writing essays and working on YC's internal software. They also played a key role in the creation and operation of YC by funding the program with their own money and organizing a batch model where they would fund a group of startups twice a year. They provided support and guidance to the startups during a three-month intensive program and used their building in Cambridge as the headquarters for YC. Additionally, they hosted weekly dinners where experts on startups would give talks.
在查询管道中定义自定义组件¶
您可以轻松定义一个自定义组件。只需继承QueryComponent
类,实现验证/运行函数和一些辅助方法,然后将其插入即可。
让我们将第一个示例中与电影生成相关的提示+LLM链封装成一个自定义组件。
from llama_index.core.query_pipeline import (
CustomQueryComponent,
InputKeys,
OutputKeys,
)
from typing import Dict, Any
from llama_index.core.llms.llm import LLM
from pydantic import Field
class RelatedMovieComponent(CustomQueryComponent):
"""Related movie component."""
llm: LLM = Field(..., description="OpenAI LLM")
def _validate_component_inputs(
self, input: Dict[str, Any]
) -> Dict[str, Any]:
"""Validate component inputs during run_component."""
# NOTE: this is OPTIONAL but we show you here how to do validation as an example
return input
@property
def _input_keys(self) -> set:
"""Input keys dict."""
# NOTE: These are required inputs. If you have optional inputs please override
# `optional_input_keys_dict`
return {"movie"}
@property
def _output_keys(self) -> set:
return {"output"}
def _run_component(self, **kwargs) -> Dict[str, Any]:
"""Run the component."""
# use QueryPipeline itself here for convenience
prompt_str = "Please generate related movies to {movie_name}"
prompt_tmpl = PromptTemplate(prompt_str)
p = QueryPipeline(chain=[prompt_tmpl, llm])
return {"output": p.run(movie_name=kwargs["movie"])}
让我们来试试这个自定义组件吧!我们还会添加一个步骤,将输出转换为莎士比亚风格。
llm = OpenAI(model="gpt-3.5-turbo")
component = RelatedMovieComponent(llm=llm)
# let's add some subsequent prompts for fun
prompt_str = """\
Here's some text:
{text}
Can you rewrite this in the voice of Shakespeare?
"""
prompt_tmpl = PromptTemplate(prompt_str)
p = QueryPipeline(chain=[component, prompt_tmpl, llm], verbose=True)
output = p.run(movie="Love Actually")
> Running module 31ca224a-f226-4956-882b-73878843d869 with input: movie: Love Actually > Running module febb41b5-2528-416a-bde7-6accdb0f9c51 with input: text: assistant: 1. "Valentine's Day" (2010) 2. "New Year's Eve" (2011) 3. "The Holiday" (2006) 4. "Crazy, Stupid, Love" (2011) 5. "Notting Hill" (1999) 6. "Four Weddings and a Funeral" (1994) 7. "Bridget J... > Running module e834ffbe-e97c-4ab0-9726-24f1534745b2 with input: messages: Here's some text: 1. "Valentine's Day" (2010) 2. "New Year's Eve" (2011) 3. "The Holiday" (2006) 4. "Crazy, Stupid, Love" (2011) 5. "Notting Hill" (1999) 6. "Four Weddings and a Funeral" (1994) 7. "B...
print(str(output))
assistant: 1. "Valentine's Day" (2010) - "A day of love, where hearts entwine, And Cupid's arrow finds its mark divine." 2. "New Year's Eve" (2011) - "When old year fades, and new year dawns, We gather 'round, to celebrate the morns." 3. "The Holiday" (2006) - "Two souls, adrift in search of cheer, Find solace in a holiday so dear." 4. "Crazy, Stupid, Love" (2011) - "A tale of love, both wild and mad, Where hearts are lost, then found, and glad." 5. "Notting Hill" (1999) - "In London town, where love may bloom, A humble man finds love, and breaks the gloom." 6. "Four Weddings and a Funeral" (1994) - "Four times the vows, and one time mourn, Love's journey, with laughter and tears adorned." 7. "Bridget Jones's Diary" (2001) - "A maiden fair, with wit and charm, Records her life, and love's alarm." 8. "About Time" (2013) - "A tale of time, where love transcends, And moments cherished, never truly ends." 9. "The Best Exotic Marigold Hotel" (2011) - "In India's land, where dreams unfold, A hotel blooms, where hearts find gold." 10. "The Notebook" (2004) - "A love that spans both time and space, Where words and memories find their place." 11. "Serendipity" (2001) - "By chance or fate, two souls collide, In search of love, they cannot hide." 12. "P.S. I Love You" (2007) - "In letters penned, from love's embrace, A departed soul, still finds its trace." 13. "500 Days of Summer" (2009) - "A tale of love, both sweet and sour, Where seasons change, and hearts devour." 14. "The Fault in Our Stars" (2014) - "Two hearts, aflame, in starlit skies, Love's tragedy, where hope never dies." 15. "La La Land" (2016) - "In dreams and songs, two hearts entwine, A city's magic, where love's stars align."
流水线的逐步执行¶
如果您有以下需求,逐步执行管道是一个很好的主意:
- 想要更好地调试执行顺序
- 记录每个步骤之间的数据
- 向用户反馈当前正在处理的内容
- 还有更多!
要执行一个流水线,你必须创建一个run_state
,然后循环执行。下面是一个基本示例。
from llama_index.core.query_pipeline import QueryPipeline
from llama_index.core import PromptTemplate
from llama_index.llms.openai import OpenAI
# try chaining basic prompts
prompt_str = "Please generate related movies to {movie_name}"
prompt_tmpl = PromptTemplate(prompt_str)
llm = OpenAI(model="gpt-3.5-turbo")
p = QueryPipeline(chain=[prompt_tmpl, llm], verbose=True)
run_state = p.get_run_state(movie_name="The Departed")
next_module_keys = p.get_next_module_keys(run_state)
while True:
for module_key in next_module_keys:
# get the module and input
module = run_state.module_dict[module_key]
module_input = run_state.all_module_inputs[module_key]
# run the module
output_dict = module.run_component(**module_input)
# process the output
p.process_component_output(
output_dict,
module_key,
run_state,
)
# get the next module keys
next_module_keys = p.get_next_module_keys(
run_state,
)
# if no more modules to run, break
if not next_module_keys:
run_state.result_outputs[module_key] = output_dict
break
# the final result is at `module_key`
# it is a dict of 'output' -> ChatResponse object in this case
print(run_state.result_outputs[module_key]["output"].message.content)
1. Infernal Affairs (2002) - The original Hong Kong film that inspired The Departed 2. The Town (2010) - A crime thriller directed by Ben Affleck 3. Mystic River (2003) - A crime drama directed by Clint Eastwood 4. Goodfellas (1990) - A classic mobster film directed by Martin Scorsese 5. The Irishman (2019) - Another crime drama directed by Martin Scorsese, starring Robert De Niro and Al Pacino 6. The Departed (2006) - The Departed is a 2006 American crime film directed by Martin Scorsese and written by William Monahan. It is a remake of the 2002 Hong Kong film Infernal Affairs. The film stars Leonardo DiCaprio, Matt Damon, Jack Nicholson, and Mark Wahlberg, with Martin Sheen, Ray Winstone, Vera Farmiga, and Alec Baldwin in supporting roles.