多步查询引擎¶
我们有一个多步骤查询引擎,能够将复杂查询分解为顺序子问题。本指南将引导您完成如何设置它!
如果你在Colab上打开这个Notebook,你可能需要安装LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-llms-openai
%pip install llama-index-llms-openai
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
下载数据¶
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
加载文档,构建VectorStoreIndex¶
In [ ]:
Copied!
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
In [ ]:
Copied!
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from IPython.display import Markdown, display
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from IPython.display import Markdown, display
In [ ]:
Copied!
# LLM (gpt-3.5)
gpt35 = OpenAI(temperature=0, model="gpt-3.5-turbo")
# LLM (gpt-4)
gpt4 = OpenAI(temperature=0, model="gpt-4")
# LLM (gpt-3.5)
gpt35 = OpenAI(temperature=0, model="gpt-3.5-turbo")
# LLM (gpt-4)
gpt4 = OpenAI(temperature=0, model="gpt-4")
In [ ]:
Copied!
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
# 加载文档
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
In [ ]:
Copied!
index = VectorStoreIndex.from_documents(documents)
index = VectorStoreIndex.from_documents(documents)
查询索引¶
In [ ]:
Copied!
from llama_index.core.indices.query.query_transform.base import (
StepDecomposeQueryTransform,
)
# gpt-4
step_decompose_transform = StepDecomposeQueryTransform(llm=gpt4, verbose=True)
# gpt-3
step_decompose_transform_gpt3 = StepDecomposeQueryTransform(
llm=gpt35, verbose=True
)
from llama_index.core.indices.query.query_transform.base import (
StepDecomposeQueryTransform,
)
# gpt-4
step_decompose_transform = StepDecomposeQueryTransform(llm=gpt4, verbose=True)
# gpt-3
step_decompose_transform_gpt3 = StepDecomposeQueryTransform(
llm=gpt35, verbose=True
)
In [ ]:
Copied!
index_summary = "Used to answer questions about the author"
index_summary = "用于回答关于作者的问题"
In [ ]:
Copied!
# set Logging to DEBUG for more detailed outputs
from llama_index.core.query_engine import MultiStepQueryEngine
query_engine = index.as_query_engine(llm=gpt4)
query_engine = MultiStepQueryEngine(
query_engine=query_engine,
query_transform=step_decompose_transform,
index_summary=index_summary,
)
response_gpt4 = query_engine.query(
"Who was in the first batch of the accelerator program the author"
" started?",
)
# 将日志级别设置为DEBUG以获取更详细的输出
from llama_index.core.query_engine import MultiStepQueryEngine
query_engine = index.as_query_engine(llm=gpt4)
query_engine = MultiStepQueryEngine(
query_engine=query_engine,
query_transform=step_decompose_transform,
index_summary=index_summary,
)
response_gpt4 = query_engine.query(
"作者创办的加速器项目第一批参与者是谁?",
)
> Current query: Who was in the first batch of the accelerator program the author started? > New query: Who is the author of the accelerator program? > Current query: Who was in the first batch of the accelerator program the author started? > New query: Who was in the first batch of the accelerator program started by Paul Graham? > Current query: Who was in the first batch of the accelerator program the author started? > New query: None
In [ ]:
Copied!
display(Markdown(f"<b>{response_gpt4}</b>"))
display(Markdown(f"{response_gpt4}"))
在作者启动的首批加速器计划中,参与者包括Reddit联合创始人贾斯汀·坎和埃米特·希尔(后来创立了Twitch),曾协助编写RSS规范并最终成为开放获取殉道者的亚伦·斯沃茨,以及后来成为YC第二任总裁的山姆·奥特曼。
In [ ]:
Copied!
sub_qa = response_gpt4.metadata["sub_qa"]
tuples = [(t[0], t[1].response) for t in sub_qa]
print(tuples)
sub_qa = response_gpt4.metadata["sub_qa"]
tuples = [(t[0], t[1].response) for t in sub_qa]
print(tuples)
[('Who is the author of the accelerator program?', 'The author of the accelerator program is Paul Graham.'), ('Who was in the first batch of the accelerator program started by Paul Graham?', 'The first batch of the accelerator program started by Paul Graham included the founders of Reddit, Justin Kan and Emmett Shear who later founded Twitch, Aaron Swartz who had helped write the RSS spec and later became a martyr for open access, and Sam Altman who later became the second president of YC.')]
In [ ]:
Copied!
response_gpt4 = query_engine.query(
"In which city did the author found his first company, Viaweb?",
)
response_gpt4 = query_engine.query(
"作者在哪个城市创办了他的第一家公司Viaweb?",
)
> Current query: In which city did the author found his first company, Viaweb? > New query: Who is the author who founded Viaweb? > Current query: In which city did the author found his first company, Viaweb? > New query: In which city did Paul Graham found his first company, Viaweb? > Current query: In which city did the author found his first company, Viaweb? > New query: None
In [ ]:
Copied!
print(response_gpt4)
print(response_gpt4)
The author founded his first company, Viaweb, in Cambridge.
In [ ]:
Copied!
query_engine = index.as_query_engine(llm=gpt35)
query_engine = MultiStepQueryEngine(
query_engine=query_engine,
query_transform=step_decompose_transform_gpt3,
index_summary=index_summary,
)
response_gpt3 = query_engine.query(
"In which city did the author found his first company, Viaweb?",
)
query_engine = index.as_query_engine(llm=gpt35)
query_engine = MultiStepQueryEngine(
query_engine=query_engine,
query_transform=step_decompose_transform_gpt3,
index_summary=index_summary,
)
response_gpt3 = query_engine.query(
"作者在哪个城市创办了他的第一家公司Viaweb?",
)
> Current query: In which city did the author found his first company, Viaweb? > New query: None
In [ ]:
Copied!
print(response_gpt3)
print(response_gpt3)
Empty Response