# Install the latest version of DSPy
%pip install -U dspy
# Install the Hugging Face datasets library to load the CoNLL-2003 dataset
%pip install datasets
推荐:设置 MLflow 追踪以了解底层运行机制。
MLflow DSPy 集成¶
MLflow 是一个 LLMOps 工具,它与 DSPy 原生集成,提供可解释性和实验追踪功能。在本教程中,您可以使用 MLflow 将提示和优化进度可视化为追踪记录,以更好地理解 DSPy 的行为。您可以通过以下四个步骤轻松设置 MLflow。
- 安装 MLflow
%pip install mlflow>=2.20
- 在单独的终端中启动 MLflow UI
mlflow ui --port 5000
- 将笔记本连接到 MLflow
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("DSPy")
- 启用追踪。
mlflow.dspy.autolog()
要了解更多关于集成的信息,请访问MLflow DSPy Documentation。
加载并准备数据集¶
在本节中,我们准备CoNLL-2003数据集,该数据集通常用于实体抽取任务。该数据集包含带有实体标签的标记,例如人物、组织和地点。
我们将:
- 使用 Hugging Face 的
datasets
库加载数据集。 - 定义一个函数来提取指代人物的标记。
- 对数据集进行切片,创建用于训练和测试的较小子集。
DSPy期望示例以结构化格式呈现,因此我们也会将数据集转换为DSPy Examples
以便于集成。
import os
import tempfile
from datasets import load_dataset
from typing import Dict, Any, List
import dspy
def load_conll_dataset() -> dict:
"""
Loads the CoNLL-2003 dataset into train, validation, and test splits.
Returns:
dict: Dataset splits with keys 'train', 'validation', and 'test'.
"""
with tempfile.TemporaryDirectory() as temp_dir:
# Use a temporary Hugging Face cache directory for compatibility with certain hosted notebook
# environments that don't support the default Hugging Face cache directory
os.environ["HF_DATASETS_CACHE"] = temp_dir
return load_dataset("conll2003", trust_remote_code=True)
def extract_people_entities(data_row: dict[str, Any]) -> list[str]:
"""
Extracts entities referring to people from a row of the CoNLL-2003 dataset.
Args:
data_row (dict[str, Any]): A row from the dataset containing tokens and NER tags.
Returns:
list[str]: List of tokens tagged as people.
"""
return [
token
for token, ner_tag in zip(data_row["tokens"], data_row["ner_tags"])
if ner_tag in (1, 2) # CoNLL entity codes 1 and 2 refer to people
]
def prepare_dataset(data_split, start: int, end: int) -> list[dspy.Example]:
"""
Prepares a sliced dataset split for use with DSPy.
Args:
data_split: The dataset split (e.g., train or test).
start (int): Starting index of the slice.
end (int): Ending index of the slice.
Returns:
list[dspy.Example]: List of DSPy Examples with tokens and expected labels.
"""
return [
dspy.Example(
tokens=row["tokens"],
expected_extracted_people=extract_people_entities(row)
).with_inputs("tokens")
for row in data_split.select(range(start, end))
]
# Load the dataset
dataset = load_conll_dataset()
# Prepare the training and test sets
train_set = prepare_dataset(dataset["train"], 0, 50)
test_set = prepare_dataset(dataset["test"], 0, 200)
配置DSPy并创建实体提取程序¶
在这里,我们定义一个DSPy程序,用于从标记化文本中提取指代人物的实体。
然后,我们配置 DSPy 使用特定的语言模型 (gpt-4o-mini
) 来执行程序的所有调用。
关键DSPy概念介绍:
- 签名:为你的程序定义结构化的输入/输出模式。
- 模块:将程序逻辑封装在可复用、可组合的单元中。
具体来说,我们将:
- 创建一个
PeopleExtraction
DSPy 签名,用于指定输入(tokens
)和输出(extracted_people
)字段。 - 定义一个
people_extractor
程序,该程序使用DSPy内置的dspy.ChainOfThought
模块来实现PeopleExtraction
签名。该程序通过语言模型(LM)提示从输入令牌列表中提取指代人物的实体。 - 使用
dspy.LM
类和dspy.settings.configure()
方法来配置 dspy 在调用程序时将使用的语言模型。
from typing import List
class PeopleExtraction(dspy.Signature):
"""
Extract contiguous tokens referring to specific people, if any, from a list of string tokens.
Output a list of tokens. In other words, do not combine multiple tokens into a single value.
"""
tokens: list[str] = dspy.InputField(desc="tokenized text")
extracted_people: list[str] = dspy.OutputField(desc="all tokens referring to specific people extracted from the tokenized text")
people_extractor = dspy.ChainOfThought(PeopleExtraction)
这里,我们告诉DSPy在程序中使用OpenAI的gpt-4o-mini
模型。为了认证,DSPy会读取你的OPENAI_API_KEY
。你可以轻松地将其替换为其他提供商或本地模型。
lm = dspy.LM(model="openai/gpt-4o-mini")
dspy.settings.configure(lm=lm)
定义指标与评估函数¶
在DSPy中,评估程序的性能对于迭代开发至关重要。一个好的评估框架使我们能够:
- 衡量我们程序输出的质量。
- 将输出与真实标签进行比较。
- 识别需要改进的领域。
我们将要做什么:
- 定义一个自定义指标(
extraction_correctness_metric
)来评估提取的实体是否与真实情况匹配。 - 创建一个评估函数 (
evaluate_correctness
),将此指标应用于训练或测试数据集并计算整体准确率。
评估函数使用DSPy的Evaluate
实用工具来处理并行化和结果可视化。
def extraction_correctness_metric(example: dspy.Example, prediction: dspy.Prediction, trace=None) -> bool:
"""
Computes correctness of entity extraction predictions.
Args:
example (dspy.Example): The dataset example containing expected people entities.
prediction (dspy.Prediction): The prediction from the DSPy people extraction program.
trace: Optional trace object for debugging.
Returns:
bool: True if predictions match expectations, False otherwise.
"""
return prediction.extracted_people == example.expected_extracted_people
evaluate_correctness = dspy.Evaluate(
devset=test_set,
metric=extraction_correctness_metric,
num_threads=24,
display_progress=True,
display_table=True
)
评估初始提取器¶
在优化我们的程序之前,我们需要一个基准评估来了解其当前性能。这有助于我们:
- 为优化后的比较建立参考点。
- 识别初始实现中的潜在弱点。
在这一步中,我们将在测试集上运行我们的people_extractor
程序,并使用之前定义的评估框架来测量其准确性。
evaluate_correctness(people_extractor, devset=test_set)
Average Metric: 172.00 / 200 (86.0%): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:16<00:00, 11.94it/s]
2024/11/18 21:08:04 INFO dspy.evaluate.evaluate: Average Metric: 172 / 200 (86.0%)
tokens | expected_extracted_people | rationale | extracted_people | extraction_correctness_metric | |
---|---|---|---|---|---|
0 | [足球, -, 日本, 获得, 幸运, 胜利, ,, 中国, 在, 意外, 失利... | [中国] | 我们提取了"日本"和"中国",因为它们指的是特定国家... | [日本, 中国] | |
1 | [Nadim, Ladki] | [Nadim, Ladki] | 我们提取了标记"Nadim"和"Ladki",因为它们指的是特定... | [Nadim, Ladki] | ✔️ [正确] |
2 | [艾因, ,, 阿拉伯联合酋长国, 1996-12-06] | [] | 所提供的列表中没有任何指向特定人员的标记... | [] | ✔️ [True] |
3 | [日本, 开始, 了, 他们, 亚洲杯, 冠军, 的, 卫冕, 以... | [] | 我们在处理过的文本中未发现任何指向特定人员的标记... | [] | ✔️ [正确] |
4 | [但是,中国,看到,他们的,运气,抛弃,他们,在,第二,场比赛中... | [] | 提取的指代特定人员的标记是"中国"和... | [中国, 乌兹别克斯坦] | |
... | ... | ... | ... | ... | ... |
195 | ['The', 'Wallabies', 'have', 'their', 'sights', 'set', 'on', 'a', ... | [David, Campese] | 提取的人物包含"David Campese",因为它指的是一个特定... | [David, Campese] | ✔️ [True] |
196 | ['The', 'Wallabies', 'currently', 'have', 'no', 'plans', 'to', 'ma... | [] | 提取的人物信息包含"Wallabies",因为它指的是一个特定的... | [] | ✔️ [True] |
197 | ['Campese', 'will', 'be', 'up', 'against', 'a', 'familiar', 'foe',... | [Campese, Rob, Andrew] | 提取的标记指向文本中提到的特定人物... | [Campese, Rob, Andrew] | ✔️ [True] |
198 | ['"', 'Campo', '有', '一个', '庞大的', '粉丝群', '在', '这个', '... | [Campo, Andrew] | 提取的指代特定人物的标记包括"Campo"... | [Campo, Andrew] | ✔️ [正确] |
199 | ['在', '巡回', '赛', '中', ',', '澳大利亚', '已', '赢得', '全部', '四', '场', '比... | [] | 我们从标记化文本中提取了特定人员的姓名... | [] | ✔️ [正确] |
200 行 × 5 列
86.0
在MLflow实验中追踪评估结果
要随时间追踪和可视化评估结果,您可以将结果记录在MLflow实验中。
import mlflow
with mlflow.start_run(run_name="extractor_evaluation"):
evaluate_correctness = dspy.Evaluate(
devset=test_set,
metric=extraction_correctness_metric,
num_threads=24,
display_progress=True,
)
# Evaluate the program as usual
result = evaluate_correctness(people_extractor)
# Log the aggregated score
mlflow.log_metric("exact_match", result.score)
# Log the detailed evaluation results as a table
mlflow.log_table(
{
"Tokens": [example.tokens for example in test_set],
"Expected": [example.expected_extracted_people for example in test_set],
"Predicted": [output[1] for output in result.results],
"Exact match": [output[2] for output in result.results],
},
artifact_file="eval_results.json",
)
要了解更多关于集成的信息,请访问MLflow DSPy Documentation。
优化模型¶
DSPy 包含强大的优化器,可以提升系统的质量。
这里,我们使用DSPy的MIPROv2
优化器来:
- 自动调整程序的语言模型(LM)提示,通过 1. 使用LM调整提示的指令 2. 从训练数据集中构建少样本示例,这些示例通过
dspy.ChainOfThought
生成的推理进行增强。 - 在训练集上最大化正确性。
这种优化过程是自动化的,节省时间和精力,同时提高准确性。
mipro_optimizer = dspy.MIPROv2(
metric=extraction_correctness_metric,
auto="medium",
)
optimized_people_extractor = mipro_optimizer.compile(
people_extractor,
trainset=train_set,
max_bootstrapped_demos=4,
minibatch=False
)
评估优化后的程序¶
优化后,我们在测试集上重新评估程序以衡量改进。通过比较优化前后的结果,我们可以:
- 量化优化的益处。
- 验证程序对未见数据的泛化能力良好。
在这种情况下,我们看到程序在测试数据集上的准确率显著提高。
evaluate_correctness(optimized_people_extractor, devset=test_set)
Average Metric: 186.00 / 200 (93.0%): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:23<00:00, 8.58it/s]
2024/11/18 21:15:00 INFO dspy.evaluate.evaluate: Average Metric: 186 / 200 (93.0%)
tokens | expected_extracted_people | rationale | extracted_people | extraction_correctness_metric | |
---|---|---|---|---|---|
0 | [足球, -, 日本, 获得, 幸运, 胜利, ,, 中国, 在, 意外, 失利... | [中国] | 提供的标记中没有提到具体人物。该... | [] | |
1 | [Nadim, Ladki] | [Nadim, Ladki] | 标记"Nadim Ladki"指的是一个特定个体。两个标记... | [Nadim, Ladki] | ✔️ [True] |
2 | [艾因, ,, 阿拉伯, 联合酋长国, 1996-12-06] | [] | 提供的列表中没有任何涉及特定人员的标记... | [] | ✔️ [正确] |
3 | [日本, 开始, 了, 他们, 亚洲杯, 冠军, 的, 卫冕, 以... | [] | 提供的标记中没有提到具体人物。该... | [] | ✔️ [正确] |
4 | [但是, 中国, 看到, 他们的, 运气, 抛弃, 他们, 在, 第二, 场比赛中... | [] | 提供的文本中没有指向特定人员的标记... | [] | ✔️ [正确] |
... | ... | ... | ... | ... | ... |
195 | ['The', 'Wallabies', 'have', 'their', 'sights', 'set', 'on', 'a', ... | [David, Campese] | 提取的标记指向文本中提到的特定人物... | [David, Campese] | ✔️ [True] |
196 | ['The', 'Wallabies', 'currently', 'have', 'no', 'plans', 'to', 'ma... | [] | 所提供的标记中没有提及具体个人... | [] | ✔️ [True] |
197 | ['坎普斯', '将', '面对', '一个', '熟悉的', '对手', ...] | [Campese, Rob, Andrew] | 这些标记包含名字"坎普斯"和"罗伯·安德鲁",两者都... | [Campese, Rob, Andrew] | ✔️ [正确] |
198 | ['"', 'Campo', '有', '一个', '庞大的', '追随者', '在', '这个', '... | [Campo, Andrew] | 提取的标记指向文本中提到的特定人物... | [Campo, Andrew] | ✔️ [正确] |
199 | ['在', '巡回', '赛', '中', ',', '澳大利亚', '队', '已', '赢得', '全部', '四场', '比... | [] | 所提供的词条中没有提及特定人物。该... | [] | ✔️ [正确] |
200行 × 5列
93.0
检查优化后程序的提示¶
在优化程序后,我们可以检查交互历史,看看DSPy是如何通过少量示例增强程序提示的。此步骤展示了:
- 程序使用的提示结构。
- 如何添加少量示例来指导模型的行为。
使用 inspect_history(n=1)
查看最后一次交互并分析生成的提示。
dspy.inspect_history(n=1)
[2024-11-18T21:15:00.584497] System message: Your input fields are: 1. `tokens` (list[str]): tokenized text Your output fields are: 1. `rationale` (str): ${produce the extracted_people}. We ... 2. `extracted_people` (list[str]): all tokens referring to specific people extracted from the tokenized text All interactions will be structured in the following way, with the appropriate values filled in. [[ ## tokens ## ]] {tokens} [[ ## rationale ## ]] {rationale} [[ ## extracted_people ## ]] {extracted_people} # note: the value you produce must be pareseable according to the following JSON schema: {"type": "array", "items": {"type": "string"}} [[ ## completed ## ]] In adhering to this structure, your objective is: In a high-stakes situation where accurate identification of individuals is critical for regulatory compliance and public health communication, extract contiguous tokens referring to specific people from the provided list of string tokens. Ensure that you output each identified individual as separate tokens without combining multiple tokens into a single value. This task is essential for ensuring clarity and accountability in communications pertaining to EU regulations and health matters. User message: [[ ## tokens ## ]] ["He", "said", "further", "scientific", "study", "was", "required", "and", "if", "it", "was", "found", "that", "action", "was", "needed", "it", "should", "be", "taken", "by", "the", "European", "Union", "."] Respond with the corresponding output fields, starting with the field `[[ ## rationale ## ]]`, then `[[ ## extracted_people ## ]]` (must be formatted as a valid Python list[str]), and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## rationale ## ]] There are no specific people mentioned in the provided tokens. The only reference is to the "European Union," which is an organization rather than a specific person. We therefore conclude that there are no extracted people. [[ ## extracted_people ## ]] [] [[ ## completed ## ]] User message: [[ ## tokens ## ]] ["BRUSSELS", "1996-08-22"] Respond with the corresponding output fields, starting with the field `[[ ## rationale ## ]]`, then `[[ ## extracted_people ## ]]` (must be formatted as a valid Python list[str]), and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## rationale ## ]] There are no tokens referring to specific people in the provided list. We only have a location and a date. [[ ## extracted_people ## ]] [] [[ ## completed ## ]] User message: [[ ## tokens ## ]] ["Fischler", "proposed", "EU-wide", "measures", "after", "reports", "from", "Britain", "and", "France", "that", "under", "laboratory", "conditions", "sheep", "could", "contract", "Bovine", "Spongiform", "Encephalopathy", "(", "BSE", ")", "--", "mad", "cow", "disease", "."] Respond with the corresponding output fields, starting with the field `[[ ## rationale ## ]]`, then `[[ ## extracted_people ## ]]` (must be formatted as a valid Python list[str]), and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## rationale ## ]] The extracted tokens refer to specific people mentioned in the text. In this case, "Fischler" is the only token that refers to a specific person. We extract this token as it is contiguous and represents an individual involved in the proposed measures. [[ ## extracted_people ## ]] ["Fischler"] [[ ## completed ## ]] User message: [[ ## tokens ## ]] ["Germany", "'s", "representative", "to", "the", "European", "Union", "'s", "veterinary", "committee", "Werner", "Zwingmann", "said", "on", "Wednesday", "consumers", "should", "buy", "sheepmeat", "from", "countries", "other", "than", "Britain", "until", "the", "scientific", "advice", "was", "clearer", "."] Respond with the corresponding output fields, starting with the field `[[ ## rationale ## ]]`, then `[[ ## extracted_people ## ]]` (must be formatted as a valid Python list[str]), and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## rationale ## ]] We extracted the name "Werner Zwingmann" from the tokenized text as it refers to a specific person mentioned in the context of a statement regarding sheepmeat consumption. [[ ## extracted_people ## ]] ["Werner", "Zwingmann"] [[ ## completed ## ]] User message: [[ ## tokens ## ]] ["LONDON", "1996-12-06"] Respond with the corresponding output fields, starting with the field `[[ ## rationale ## ]]`, then `[[ ## extracted_people ## ]]` (must be formatted as a valid Python list[str]), and then ending with the marker for `[[ ## completed ## ]]`. Response: [[ ## rationale ## ]] There are no tokens referring to specific people in the provided list. The tokens only include a location and a date. [[ ## extracted_people ## ]] [] [[ ## completed ## ]]
关注成本¶
DSPy 允许您追踪程序的成本。以下代码演示了如何获取到目前为止由 DSPy 提取器程序进行的所有 LM 调用的成本。
cost = sum([x['cost'] for x in lm.history if x['cost'] is not None]) # cost in USD, as calculated by LiteLLM for certain providers
cost
0.26362742999999983
保存与加载优化程序¶
DSPy 支持保存和加载程序,使您能够重用优化后的系统,无需从头开始重新优化。此功能对于在生产环境中部署程序或与协作者共享特别有用。
在这一步中,我们将把优化后的程序保存到文件中,并演示如何重新加载以供将来使用。
optimized_people_extractor.save("optimized_extractor.json")
loaded_people_extractor = dspy.ChainOfThought(PeopleExtraction)
loaded_people_extractor.load("optimized_extractor.json")
loaded_people_extractor(tokens=["Italy", "recalled", "Marcello", "Cuttitta"]).extracted_people
['Marcello', 'Cuttitta']
在MLflow实验中保存程序
除了将程序保存到本地文件,您还可以在MLflow中追踪它以获得更好的可重现性和协作性。
- 依赖管理: MLflow 自动保存冻结的环境元数据及程序,以确保可复现性。
- 实验跟踪: 使用MLflow,你可以跟踪程序的性能和成本,同时记录程序本身。
- 协作: 您可以通过共享 MLflow 实验与团队成员共享程序和结果。
要将程序保存到 MLflow 中,请运行以下代码:
import mlflow
# Start an MLflow Run and save the program
with mlflow.start_run(run_name="optimized_extractor"):
model_info = mlflow.dspy.log_model(
optimized_people_extractor,
artifact_path="model", # Any name to save the program in MLflow
)
# Load the program back from MLflow
loaded = mlflow.dspy.load_model(model_info.model_uri)
要了解更多关于集成的信息,请访问MLflow DSPy Documentation。
结论¶
在本教程中,我们展示了如何:
- 使用DSPy构建一个模块化、可解释的实体提取系统。
- 使用DSPy的内置工具评估和优化系统。
通过利用结构化输入和输出,我们确保系统易于理解和改进。优化过程使我们能够快速提升性能,无需手动编写提示或调整参数。
下一步:
- 尝试提取其他实体类型(例如,位置或组织)。
- 探索DSPy的其他内置模块,如
ReAct
,用于更复杂的推理任务。 - 在更大的工作流程中使用该系统,例如大规模文档处理或摘要生成。