使用`LabelledRagDatatset`对RAG管道进行基准测试¶

LabelledRagDataset 旨在用于评估任何给定的RAG流程，该流程可能有多种配置（例如选择LLM、设置similarity_top_k、chunk_size等参数）。我们将这个抽象概念类比于传统机器学习数据集，其中X特征用于预测真实标签y。在本例中，我们使用query和检索到的contexts作为"特征"，并将查询的答案（称为reference_answer）作为真实标签。

当然，这些数据集由观测值或示例组成。对于LabelledRagDataset来说，它们是由一组LabelledRagDataExample构成的。

在本笔记本中，我们将展示如何从头开始构建一个LabelledRagDataset。请注意，另一种方法是直接从llama-hub下载社区提供的LabelledRagDataset，以便在其上评估/基准测试您自己的RAG流程。

`LabelledRagDataExample` 类¶

In [ ]:

Copied!

%pip install llama-index-llms-openai
%pip install llama-index-readers-wikipedia
%pip install llama-index-llms-openai
%pip install llama-index-readers-wikipedia

In [ ]:

Copied!





from llama_index.core.llama_dataset import (
    LabelledRagDataExample,
    CreatedByType,
    CreatedBy,
)

# constructing a LabelledRagDataExample
query = "This is a test query, is it not?"
query_by = CreatedBy(type=CreatedByType.AI, model_name="gpt-4")
reference_answer = "Yes it is."
reference_answer_by = CreatedBy(type=CreatedByType.HUMAN)
reference_contexts = ["This is a sample context"]

rag_example = LabelledRagDataExample(
    query=query,
    query_by=query_by,
    reference_contexts=reference_contexts,
    reference_answer=reference_answer,
    reference_answer_by=reference_answer_by,
)
from llama_index.core.llama_dataset import (
    LabelledRagDataExample,
    CreatedByType,
    CreatedBy,
)

# 构建一个LabelledRagDataExample
query = "This is a test query, is it not?"
query_by = CreatedBy(type=CreatedByType.AI, model_name="gpt-4")
reference_answer = "Yes it is."
reference_answer_by = CreatedBy(type=CreatedByType.HUMAN)
reference_contexts = ["This is a sample context"]

rag_example = LabelledRagDataExample(
    query=query,
    query_by=query_by,
    reference_contexts=reference_contexts,
    reference_answer=reference_answer,
    reference_answer_by=reference_answer_by,
)

LabelledRagDataExample 是一个 Pydantic Model，因此可以在 json 或 dict 之间进行转换（反之亦然）。

In [ ]:

Copied!

print(rag_example.json())
print(rag_example.json())

{"query": "This is a test query, is it not?", "query_by": {"model_name": "gpt-4", "type": "ai"}, "reference_contexts": ["This is a sample context"], "reference_answer": "Yes it is.", "reference_answer_by": {"model_name": "", "type": "human"}}

In [ ]:

Copied!

LabelledRagDataExample.parse_raw(rag_example.json())
LabelledRagDataExample.parse_raw(rag_example.json())

输出[ ]:

LabelledRagDataExample(query='This is a test query, is it not?', query_by=CreatedBy(model_name='gpt-4', type=<CreatedByType.AI: 'ai'>), reference_contexts=['This is a sample context'], reference_answer='Yes it is.', reference_answer_by=CreatedBy(model_name='', type=<CreatedByType.HUMAN: 'human'>))

In [ ]:

Copied!

rag_example.dict()
rag_example.dict()

输出[ ]:

{'query': 'This is a test query, is it not?',
 'query_by': {'model_name': 'gpt-4', 'type': <CreatedByType.AI: 'ai'>},
 'reference_contexts': ['This is a sample context'],
 'reference_answer': 'Yes it is.',
 'reference_answer_by': {'model_name': '',
  'type': <CreatedByType.HUMAN: 'human'>}}

In [ ]:

Copied!

LabelledRagDataExample.parse_obj(rag_example.dict())
LabelledRagDataExample.parse_obj(rag_example.dict())

输出[ ]:

LabelledRagDataExample(query='This is a test query, is it not?', query_by=CreatedBy(model_name='gpt-4', type=<CreatedByType.AI: 'ai'>), reference_contexts=['This is a sample context'], reference_answer='Yes it is.', reference_answer_by=CreatedBy(model_name='', type=<CreatedByType.HUMAN: 'human'>))

让我们创建第二个示例，这样我们就可以得到一个（稍微）更有趣的LabelledRagDataset。

In [ ]:

Copied!





query = "This is a test query, is it so?"
reference_answer = "I think yes, it is."
reference_contexts = ["This is a second sample context"]

rag_example_2 = LabelledRagDataExample(
    query=query,
    query_by=query_by,
    reference_contexts=reference_contexts,
    reference_answer=reference_answer,
    reference_answer_by=reference_answer_by,
)
query = "这是一个测试查询，是这样吗？"
reference_answer = "我认为是的。"
reference_contexts = ["这是第二个示例上下文"]

rag_example_2 = LabelledRagDataExample(
    query=query,
    query_by=query_by,
    reference_contexts=reference_contexts,
    reference_answer=reference_answer,
    reference_answer_by=reference_answer_by,
)

`LabelledRagDataset` 类¶

In [ ]:

Copied!

from llama_index.core.llama_dataset import LabelledRagDataset

rag_dataset = LabelledRagDataset(examples=[rag_example, rag_example_2])
从llama_index.core.llama_dataset导入LabelledRagDataset

rag_dataset = LabelledRagDataset(示例=[rag_example, rag_example_2])

提供了一个便捷方法，可以将数据集以pandas.DataFrame格式查看。

In [ ]:

Copied!

rag_dataset.to_pandas()
rag_dataset.to_pandas()

输出[ ]:

	查询	参考上下文	参考答案	参考答案提供者	查询提供者
0	这是一个测试查询，不是吗？	[这是一个示例上下文]	是的，没错。	human	ai (gpt-4)
1	这是一个测试查询，是这样吗？	[这是第二个示例上下文]	我认为是的。	human	ai (gpt-4)

序列化¶

要将数据集持久化保存到磁盘或从磁盘加载，可以使用save_json和from_json方法。

In [ ]:

Copied!

rag_dataset.save_json("rag_dataset.json")
rag_dataset.save_json("rag_dataset.json")

In [ ]:

Copied!

reload_rag_dataset = LabelledRagDataset.from_json("rag_dataset.json")
reload_rag_dataset = LabelledRagDataset.from_json("rag_dataset.json")

In [ ]:

Copied!

reload_rag_dataset.to_pandas()
reload_rag_dataset.to_pandas()

输出[ ]:

	查询	参考上下文	参考答案	参考答案提供者	查询提供者
0	这是一个测试查询，不是吗？	[这是一个示例上下文]	是的，没错。	human	ai (gpt-4)
1	这是一个测试查询，是这样吗？	[这是第二个示例上下文]	我认为是的。	human	ai (gpt-4)

基于维基百科构建合成`LabelledRagDataset`¶

在本节中，我们将首先使用合成生成器创建一个LabelledRagDataset。最终，我们将使用GPT-4来为合成的LabelledRagDataExample生成query和reference_answer。

注意：如果用户拥有针对文本语料库的查询、参考答案和上下文，则无需使用数据合成即可进行预测并随后评估这些预测。

In [ ]:

Copied!

import nest_asyncio

nest_asyncio.apply()
import nest_asyncio

nest_asyncio.apply()

In [ ]:

Copied!

!pip install wikipedia -q
!pip install wikipedia -q

In [ ]:

Copied!





# wikipedia pages
from llama_index.readers.wikipedia import WikipediaReader
from llama_index.core import VectorStoreIndex

cities = [
    "San Francisco",
]

documents = WikipediaReader().load_data(
    pages=[f"History of {x}" for x in cities]
)
index = VectorStoreIndex.from_documents(documents)
# 维基百科页面
from llama_index.readers.wikipedia import WikipediaReader
from llama_index.core import VectorStoreIndex

cities = [
    "San Francisco",
]

documents = WikipediaReader().load_data(
    pages=[f"History of {x}" for x in cities]
)
index = VectorStoreIndex.from_documents(documents)

RagDatasetGenerator可以基于一组文档构建，用于生成LabelledRagDataExample。

In [ ]:

Copied!





# generate questions against chunks
from llama_index.core.llama_dataset.generator import RagDatasetGenerator
from llama_index.llms.openai import OpenAI

# set context for llm provider
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.3)

# instantiate a DatasetGenerator
dataset_generator = RagDatasetGenerator.from_documents(
    documents,
    llm=llm,
    num_questions_per_chunk=2,  # set the number of questions per nodes
    show_progress=True,
)
# 针对文本块生成问题
from llama_index.core.llama_dataset.generator import RagDatasetGenerator
from llama_index.llms.openai import OpenAI

# 设置LLM提供者的上下文
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.3)

# 实例化一个数据集生成器
dataset_generator = RagDatasetGenerator.from_documents(
    documents,
    llm=llm,
    num_questions_per_chunk=2,  # 设置每个节点的问题数量
    show_progress=True,
)

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

In [ ]:

Copied!

len(dataset_generator.nodes)
len(dataset_generator.nodes)

输出[ ]:

In [ ]:

Copied!

# since there are 13 nodes, there should be a total of 26 questions
rag_dataset = dataset_generator.generate_dataset_from_nodes()
# 由于有13个节点，总共应生成26个问题
rag_dataset = dataset_generator.generate_dataset_from_nodes()

100%|███████████████████████████████████████████████████████| 13/13 [00:02<00:00,  5.04it/s]
100%|█████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.14s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.95s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:13<00:00,  6.55s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.89s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.66s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.85s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.03s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:08<00:00,  4.07s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:06<00:00,  3.48s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.34s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.50s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:08<00:00,  4.35s/it]
100%|█████████████████████████████████████████████████████████| 2/2 [00:08<00:00,  4.34s/it]

In [ ]:

Copied!

rag_dataset.to_pandas()
rag_dataset.to_pandas()

输出[ ]:

	查询	参考上下文	参考答案	参考答案提供者	查询提供者
0	1849年的淘金热如何影响了旧金山的发展？	[加利福尼亚州旧金山市的历史...	1849年的淘金热对旧金山的发展产生了重大影响...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
1	早期欧洲定居点是在哪里建立的...	[加利福尼亚州旧金山市的历史...	在加利福尼亚州旧金山地区建立的早期欧洲定居点...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
2	欧洲人的到来如何影响了...	[== 欧洲人的到来和早期定居...	欧洲人的到来对...产生了重大影响	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
3	旧金山早期定居者面临了哪些主要挑战...	[== 欧洲人抵达与早期定居...	旧金山的早期定居者面临着多重挑战...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
4	加利福尼亚淘金热如何影响了人口...	[== 1848年淘金热 ==\n加利福尼亚淘金热...	19世纪中叶的加利福尼亚淘金热...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
5	探讨中国移民在...中的作用	[== 1848年淘金热 ==\n加利福尼亚淘金热...	中国移民在...中发挥了重要作用...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
6	旧金山是如何发展成为主要城市的...	[== 西方的巴黎 ==\n\n正是在这个时期...	旧金山在...期间发展成为主要城市...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
7	有哪些重要的发展和变化...	[== 西部巴黎 ==\n\n正是在这一时期...	在19世纪末和20世纪初,...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
8	Abe Ruef是如何为Eugene Schmitz做出贡献的...	[== 腐败与贪污审判 ==\n\n市长Eu...	Abe Ruef为Eugene Schmitz的竞选贡献了16,000美元...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
9	描述1906年地震的影响以及...	[== 腐败与贪污审判 ==\n\n市长Eu...	1906年的地震和大火对...造成了毁灭性影响	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
10	1906年旧金山地震如何影响...	[=== 重建 ===\n地震发生后几乎立即...	1906年旧金山地震对...产生了重大影响...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
11	发生了哪些重大事件和发展...	[=== 重建 ===\n几乎在...之后立即...	在20世纪30年代和第二次世界大战期间，发生了几个重大...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
12	第二次世界大战后的时代如何促进了...	[== 第二次世界大战后 ==\n第二次世界大战后，...	第二次世界大战后，许多美国军事人员...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
13	讨论城市更新计划的影响...	[== 二战后 ==\n第二次世界大战后，...	M. Justin Herman领导了城市更新计划...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
14	旧金山是如何成为反文化运动中心的...	[== 1960 – 1970年代 ==\n\n\n=== "爱之夏" ...	旧金山成为反文化运动中心...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
15	解释旧金山作为"同性恋圣地"的角色...	[== 1960 – 1970年代 ==\n\n\n=== "爱之夏" ...	在20世纪60年代及以后，旧金山成为...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
16	BART和Muni的建设如何影响了...	[=== 新的公共基础设施 ===\n1970年代...	1970年代BART和Muni的建设...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
17	旧金山面临的主要挑战有哪些...	[=== 新建公共基础设施 ===\n20世纪70年代...	20世纪80年代，旧金山面临着几项重大挑战...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
18	1989年洛马普列塔地震如何影响...	[=== 1989年洛马普列塔地震 ===\n\n1989年10月...	1989年洛马普列塔地震对...产生了重大影响...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
19	讨论互联网泡沫对...的影响	[=== 1989年洛马普列塔地震 ===\n\n1989年10月...	20世纪90年代末的互联网泡沫对...产生了重大影响...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
20	米慎湾社区的重建是如何...	[== 2010年代 ==\n2000年代初至2010年代...	米慎湾社区的重建...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
21	旧金山发生过哪些重大事件...	[== 2010年代 ==\n2000年代初至2010年代...	2010年，旧金山巨人队赢得了他们的首...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
22	在旧金山的历史背景下，dis...	[=== 文化主题 ===\nBerglund, Barbara (2...	1906年地震对...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
23	不同民族和宗教群体如何...	[=== 文化主题 ===\nBerglund, Barbara (2...	资料中提到的两个特定群体...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
24	在旧金山的历史背景下，什么...	[=== 淘金热与早期岁月 ===\nHittell, John...	旧金山发展历程中的一些重大事件与进展...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
25	政治如何影响...的增长与转型...	[=== 淘金热与早期岁月 ===\nHittell, John...	所提供的资料全面阐述了...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)

In [ ]:

Copied!

rag_dataset.save_json("rag_dataset.json")
rag_dataset.save_json("rag_dataset.json")

使用LabelledRagDatatset对RAG管道进行基准测试¶

LabelledRagDataExample 类¶

LabelledRagDataset 类¶

序列化¶

基于维基百科构建合成LabelledRagDataset¶

使用`LabelledRagDatatset`对RAG管道进行基准测试¶

`LabelledRagDataExample` 类¶

`LabelledRagDataset` 类¶

基于维基百科构建合成`LabelledRagDataset`¶