DSPy

编程—而非提示—语言模型

DSPy 是一个用于构建模块化人工智能软件的声明式框架。它允许您快速迭代结构化代码，而不是脆弱的字符串，并提供算法，将人工智能程序编译为有效的提示和权重，适用于您的语言模型，无论您是构建简单的分类器、复杂的 RAG 管道还是智能体循环。

无需费力调整提示或训练任务，DSPy（Declarative Self-improving Python）使您能够从自然语言模块构建AI软件，并通用地组合它们，适用于不同的模型、推理策略或学习算法。这使得AI软件在模型和策略之间更可靠、可维护且可移植。

简而言之 将DSPy视为AI编程的一种高级语言（讲座），就像从汇编语言转向C语言，或者从指针运算转向SQL一样。通过GitHub和Discord加入社区、寻求帮助或开始贡献。

入门指南 I：安装 DSPy 并设置你的语言模型

> pip install -U dspy

OpenAIAnthropicDatabricksGemini笔记本电脑上的本地语言模型GPU服务器上的本地语言模型其他提供者

您可以通过设置环境变量 OPENAI_API_KEY 或传递下方的 api_key 进行身份验证。

import dspy
lm = dspy.LM("openai/gpt-4o-mini", api_key="YOUR_OPENAI_API_KEY")
dspy.configure(lm=lm)

您可以通过设置环境变量 ANTHROPIC_API_KEY 或在下方传递 api_key 进行身份验证。

import dspy
lm = dspy.LM("anthropic/claude-3-opus-20240229", api_key="YOUR_ANTHROPIC_API_KEY")
dspy.configure(lm=lm)

如果您在Databricks平台上，通过他们的SDK认证是自动的。如果不是，您可以设置环境变量DATABRICKS_API_KEY和DATABRICKS_API_BASE，或者在下方传入api_key和api_base。

import dspy
lm = dspy.LM(
    "databricks/databricks-llama-4-maverick",
    api_key="YOUR_DATABRICKS_ACCESS_TOKEN",
    api_base="YOUR_DATABRICKS_WORKSPACE_URL",  # e.g.: https://dbc-64bf4923-e39e.cloud.databricks.com/serving-endpoints
)
dspy.configure(lm=lm)

你可以通过设置 GEMINI_API_KEY 环境变量或在下方传递 api_key 来进行身份验证。

import dspy
lm = dspy.LM("gemini/gemini-2.5-flash", api_key="YOUR_GEMINI_API_KEY")
dspy.configure(lm=lm)

首先，安装 Ollama 并使用你的语言模型启动其服务器。

> curl -fsSL https://ollama.ai/install.sh | sh
> ollama run llama3.2:1b

然后，从您的DSPy代码中连接到它。

import dspy
lm = dspy.LM("ollama_chat/llama3.2:1b", api_base="http://localhost:11434", api_key="")
dspy.configure(lm=lm)

首先，安装 SGLang 并使用你的语言模型启动其服务器。

> pip install "sglang[all]"
> pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ 

> CUDA_VISIBLE_DEVICES=0 python -m sglang.launch_server --port 7501 --model-path meta-llama/Llama-3.1-8B-Instruct

如果你没有Meta的权限下载meta-llama/Llama-3.1-8B-Instruct，可以使用Qwen/Qwen2.5-7B-Instruct作为示例。

接下来，从你的DSPy代码中连接本地LM作为一个与OpenAI兼容的端点。

lm = dspy.LM("openai/meta-llama/Llama-3.1-8B-Instruct",
             api_base="http://localhost:7501/v1",  # ensure this points to your port
             api_key="local", model_type="chat")
dspy.configure(lm=lm)

在DSPy中，您可以使用LiteLLM支持的数十种LLM提供商中的任何一个。只需按照他们的说明设置哪个{PROVIDER}_API_KEY以及如何将{provider_name}/{model_name}传递给构造函数。

一些示例：

anyscale/mistralai/Mistral-7B-Instruct-v0.1, 使用 ANYSCALE_API_KEY
together_ai/togethercomputer/llama-2-70b-chat, 使用 TOGETHERAI_API_KEY
sagemaker/, 使用 AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, 和 AWS_REGION_NAME
azure/，使用 AZURE_API_KEY、AZURE_API_BASE、AZURE_API_VERSION，以及可选的 AZURE_AD_TOKEN 和 AZURE_API_TYPE

如果你的提供商提供OpenAI兼容的端点，只需在你的完整模型名称前添加openai/前缀。

import dspy
lm = dspy.LM("openai/your-model-name", api_key="PROVIDER_API_KEY", api_base="YOUR_PROVIDER_URL")
dspy.configure(lm=lm)

直接调用语言模型。

习惯性的DSPy涉及使用模块，我们将在本页其余部分定义这些模块。然而，仍然可以轻松直接调用您之前配置的lm。这为您提供了统一的API，并使您能够受益于自动缓存等实用功能。

lm("Say this is a test!", temperature=0.7)  # => ['This is a test!']
lm(messages=[{"role": "user", "content": "Say this is a test!"}])  # => ['This is a test!']

1) 模块帮助你以代码而非字符串的形式描述AI行为。

要构建可靠的AI系统，你必须快速迭代。但维护提示词使这变得困难：它迫使你每次更改LM、指标或流水线时都要摆弄字符串或数据。自2020年以来，我们构建了十几个顶尖的复合LM系统，我们艰难地学到了这一点——因此构建了dspy，将AI系统设计与关于特定LM或提示策略的混乱附带选择解耦。

DSPy 将你的注意力从调整提示字符串转向使用结构化和声明式自然语言模块进行编程。对于系统中的每个AI组件，你可以将输入/输出行为指定为签名，并选择一个模块来分配调用你的语言模型的策略。DSPy 将你的签名扩展为提示并解析你的类型化输出，因此你可以将不同的模块组合成符合人体工程学、可移植且可优化的AI系统。

入门指南 II：为各种任务构建 DSPy 模块

在配置好上方的 lm 后，尝试以下示例。调整字段以探索你的语言模型开箱即用能胜任哪些任务。每个标签页下方都设置了一个 DSPy 模块，例如 dspy.Predict、dspy.ChainOfThought 或 dspy.ReAct，并带有特定任务的签名。例如，question -> answer: float 告诉模块接收一个问题并生成一个 float 类型的答案。

数学RAG分类信息抽取智能体多阶段流水线

math = dspy.ChainOfThought("question -> answer: float")
math(question="Two dice are tossed. What is the probability that the sum equals two?")

可能输出:

Prediction(
    reasoning='当两个骰子被投掷时，每个骰子有6个面，总共产生6 x 6 = 36种可能结果。两个骰子上的数字之和等于二仅当两个骰子都显示1。这只是一个特定结果：(1, 1)。因此，只有一个有利结果。和为二的概率是有利结果数除以总可能结果数，即1/36。',
    answer=0.0277776
)

def search_wikipedia(query: str) -> list[str]:
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
    return [x["text"] for x in results]

rag = dspy.ChainOfThought("context, question -> response")

question = "What's the name of the castle that David Gregory inherited?"
rag(context=search_wikipedia(question), question=question)

可能的输出:

Prediction(
    reasoning='上下文提供了关于苏格兰医生兼发明家David Gregory的信息。特别提到他在1664年继承了Kinnairdy城堡。这一细节直接回答了关于David Gregory继承的城堡名称的问题。',
    response='Kinnairdy Castle'
)

from typing import Literal

class Classify(dspy.Signature):
    """Classify sentiment of a given sentence."""

    sentence: str = dspy.InputField()
    sentiment: Literal["positive", "negative", "neutral"] = dspy.OutputField()
    confidence: float = dspy.OutputField()

classify = dspy.Predict(Classify)
classify(sentence="This book was super fun to read, though not the last chapter.")

可能的输出：

Prediction(
    sentiment='positive',
    confidence=0.75
)

class ExtractInfo(dspy.Signature):
    """Extract structured information from text."""

    text: str = dspy.InputField()
    title: str = dspy.OutputField()
    headings: list[str] = dspy.OutputField()
    entities: list[dict[str, str]] = dspy.OutputField(desc="a list of entities and their metadata")

module = dspy.Predict(ExtractInfo)

text = "Apple Inc. announced its latest iPhone 14 today." \
    "The CEO, Tim Cook, highlighted its new features in a press release."
response = module(text=text)

print(response.title)
print(response.headings)
print(response.entities)

可能的输出：

苹果公司宣布iPhone 14
['介绍', "CEO声明", '新功能']
[{'name': 'Apple Inc.', 'type': 'Organization'}, {'name': 'iPhone 14', 'type': 'Product'}, {'name': 'Tim Cook', 'type': 'Person'}]

def evaluate_math(expression: str):
    return dspy.PythonInterpreter({}).execute(expression)

def search_wikipedia(query: str):
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
    return [x["text"] for x in results]

react = dspy.ReAct("question -> answer: float", tools=[evaluate_math, search_wikipedia])

pred = react(question="What is 9362158 divided by the year of birth of David Gregory of Kinnairdy castle?")
print(pred.answer)

可能的输出：

5761.328

class Outline(dspy.Signature):
    """Outline a thorough overview of a topic."""

    topic: str = dspy.InputField()
    title: str = dspy.OutputField()
    sections: list[str] = dspy.OutputField()
    section_subheadings: dict[str, list[str]] = dspy.OutputField(desc="mapping from section headings to subheadings")

class DraftSection(dspy.Signature):
    """Draft a top-level section of an article."""

    topic: str = dspy.InputField()
    section_heading: str = dspy.InputField()
    section_subheadings: list[str] = dspy.InputField()
    content: str = dspy.OutputField(desc="markdown-formatted section")

class DraftArticle(dspy.Module):
    def __init__(self):
        self.build_outline = dspy.ChainOfThought(Outline)
        self.draft_section = dspy.ChainOfThought(DraftSection)

    def forward(self, topic):
        outline = self.build_outline(topic=topic)
        sections = []
        for heading, subheadings in outline.section_subheadings.items():
            section, subheadings = f"## {heading}", [f"### {subheading}" for subheading in subheadings]
            section = self.draft_section(topic=outline.title, section_heading=section, section_subheadings=subheadings)
            sections.append(section.content)
        return dspy.Prediction(title=outline.title, sections=sections)

draft_article = DraftArticle()
article = draft_article(topic="World Cup 2002")

可能的输出：

一篇关于该主题的1500字文章，例如

## Qualification Process

The qualification process for the 2002 FIFA World Cup involved a series of..... [shortened here for presentation].

### UEFA Qualifiers

The UEFA qualifiers involved 50 teams competing for 13..... [shortened here for presentation].

.... [rest of the article]

请注意，DSPy 使得优化像这样的多阶段模块变得非常简单。只要你能评估系统的最终输出，每个 DSPy 优化器都可以调整所有中间模块。

在实践中使用DSPy：从快速脚本编写到构建复杂系统。

标准提示将界面（“LM应该做什么？”）与实现（“我们如何告诉它这样做？”）混为一谈。DSPy将前者隔离为签名，以便我们可以在更大程序的上下文中推断后者或从数据中学习。

在开始使用优化器之前，DSPy的模块允许您将有效的语言模型系统编写为符合人体工程学的便携式代码。在许多任务和语言模型中，我们维护签名测试套件，用于评估内置DSPy适配器的可靠性。适配器是在优化之前将签名映射到提示的组件。如果您发现某个任务中，对于您的语言模型，一个简单的提示始终优于惯用的DSPy，请将其视为一个错误并提交问题。我们将利用这一点来改进内置适配器。

2) 优化器调整您AI模块的提示和权重。

DSPy 为您提供工具，将带有自然语言注释的高级代码编译成与您的程序结构和指标对齐的低层计算、提示或权重更新。如果您更改代码或指标，只需相应重新编译即可。

给定几十或几百个代表性任务输入以及一个可以衡量系统输出质量的指标，你可以使用DSPy优化器。DSPy中的不同优化器通过为每个模块合成良好的少样本示例（如dspy.BootstrapRS¹）、为每个提示提出并智能探索更好的自然语言指令（如dspy.MIPROv2²），以及为模块构建数据集并使用它们微调系统中的语言模型权重（如dspy.BootstrapFinetune³）来工作。

入门指南 III：优化DSPy程序中的LM提示或权重

一次典型的简单优化运行成本约为2美元，耗时约20分钟，但在运行使用非常大的语言模型或非常大数据集的优化器时需谨慎。根据您的语言模型、数据集和配置，优化成本可能低至几美分，也可能高达数十美元。

以下示例依赖于 HuggingFace/datasets，您可以通过以下命令安装它。

> pip install -U datasets

为ReAct智能体优化提示词优化RAG提示优化分类权重

这是一个最小但完全可运行的示例，展示了如何设置一个通过维基百科搜索来回答问题的dspy.ReAct智能体，并使用dspy.MIPROv2在廉价的light模式下，对从HotPotQA数据集中采样的500个问答对进行优化。

import dspy
from dspy.datasets import HotPotQA

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

def search_wikipedia(query: str) -> list[str]:
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
    return [x["text"] for x in results]

trainset = [x.with_inputs('question') for x in HotPotQA(train_seed=2024, train_size=500).train]
react = dspy.ReAct("question -> answer", tools=[search_wikipedia])

tp = dspy.MIPROv2(metric=dspy.evaluate.answer_exact_match, auto="light", num_threads=24)
optimized_react = tp.compile(react, trainset=trainset)

通过让gpt-4o-mini更了解任务的具体细节，这种非正式运行将ReAct的得分从24%提升至51%。

给定一个检索索引用于search，你最喜欢的dspy.LM，以及一个包含问题和真实回答的小型trainset，以下代码片段可以通过内置的SemanticF1指标来优化你的具有长输出的RAG系统，该指标被实现为一个DSPy模块。

class RAG(dspy.Module):
    def __init__(self, num_docs=5):
        self.num_docs = num_docs
        self.respond = dspy.ChainOfThought("context, question -> response")

    def forward(self, question):
        context = search(question, k=self.num_docs)   # defined in tutorial linked below
        return self.respond(context=context, question=question)

tp = dspy.MIPROv2(metric=dspy.evaluate.SemanticF1(decompositional=True), auto="medium", num_threads=24)
optimized_rag = tp.compile(RAG(), trainset=trainset, max_bootstrapped_demos=2, max_labeled_demos=2)

要运行一个完整的RAG示例，请启动这个教程。它使RAG系统在StackExchange社区子集上的质量相对提升了10%。

点击显示数据集设置代码。

import random
from typing import Literal

from datasets import load_dataset

import dspy
from dspy.datasets import DataLoader

# Load the Banking77 dataset.
CLASSES = load_dataset("PolyAI/banking77", split="train", trust_remote_code=True).features["label"].names
kwargs = {"fields": ("text", "label"), "input_keys": ("text",), "split": "train", "trust_remote_code": True}

# Load the first 2000 examples from the dataset, and assign a hint to each *training* example.
trainset = [
    dspy.Example(x, hint=CLASSES[x.label], label=CLASSES[x.label]).with_inputs("text", "hint")
    for x in DataLoader().from_huggingface(dataset_name="PolyAI/banking77", **kwargs)[:2000]
]
random.Random(0).shuffle(trainset)

import dspy
lm=dspy.LM('openai/gpt-4o-mini-2024-07-18')

# Define the DSPy module for classification. It will use the hint at training time, if available.
signature = dspy.Signature("text, hint -> label").with_updated_fields("label", type_=Literal[tuple(CLASSES)])
classify = dspy.ChainOfThought(signature)
classify.set_lm(lm)

# Optimize via BootstrapFinetune.
optimizer = dspy.BootstrapFinetune(metric=(lambda x, y, trace=None: x.label == y.label), num_threads=24)
optimized = optimizer.compile(classify, trainset=trainset)

optimized(text="What does a pending cash withdrawal mean?")

# For a complete fine-tuning tutorial, see: https://dspy.ai/tutorials/classification_finetuning/

可能的输出（来自最后一行）：

Prediction(
    reasoning='待处理的现金取款表示取款请求已发起但尚未完成或处理。此状态意味着交易仍在进行中，资金尚未从账户中扣除或提供给用户。',
    label='pending_cash_withdrawal'
)

在DSPy 2.5.29上进行类似非正式运行，将GPT-4o-mini的得分从66%提升至87%。

DSPy优化器的一个例子是什么？不同的优化器如何工作？

以dspy.MIPROv2优化器为例。首先，MIPRO从引导阶段开始。它接收你的程序（此时可能尚未优化），并在不同输入上多次运行，为每个模块收集输入/输出行为的轨迹。它会过滤这些轨迹，仅保留那些在你的指标评分较高的轨迹中出现的部分。其次，MIPRO进入其基于实际的提议阶段。它会预览你的DSPy程序代码、你的数据以及运行程序时的轨迹，并利用这些信息为程序中的每个提示起草许多潜在的指令。第三，MIPRO启动离散搜索阶段。它从你的训练集中采样小批量数据，提出用于构建流程中每个提示的指令和轨迹组合，并在小批量数据上评估候选程序。利用得到的评分，MIPRO更新一个代理模型，帮助提议随时间逐渐改进。

让DSPy优化器如此强大的一个特点是它们可以组合使用。你可以运行dspy.MIPROv2并将生成的程序作为输入再次传递给dspy.MIPROv2，或者传递给dspy.BootstrapFinetune以获得更好的结果。这在一定程度上是dspy.BetterTogether的精髓所在。或者，你可以运行优化器然后提取前5个候选程序，并构建一个dspy.Ensemble。这使你可以以高度系统化的方式扩展推理时计算（例如，集成）以及DSPy独特的预推理时计算（即优化预算）。

3) DSPy的生态系统推动开源人工智能研究。

与单体语言模型相比，DSPy的模块化范式使得广大社区能够以开放、分布式的方式改进语言模型程序的组合架构、推理时策略和优化器。这为DSPy用户提供了更多控制权，帮助他们更快地进行迭代，并通过应用最新的优化器或模块使他们的程序随着时间的推移变得更好。

DSPy研究项目于2022年2月在斯坦福NLP启动，建立在我们从开发早期复合语言模型系统如ColBERT-QA、Baleen和Hindsight中获得的经验基础上。首个版本于2022年12月作为DSP发布，并在2023年10月演变为DSPy。感谢250位贡献者，DSPy已经向数万人介绍了如何构建和优化模块化语言模型程序。

自那时起，DSPy的社区在优化器方面产生了大量工作，例如MIPROv2、BetterTogether和LeReT，在程序架构方面，如STORM、IReRa和DSPy Assertions，以及在新问题上的成功应用，例如PAPILLON、PATH、WangLab@MEDIQA、UMD的提示案例研究和Haize的红队测试项目，此外还有许多开源项目、生产应用和其他用例。

编程—而非提示—语言模型

1) 模块帮助你以代码而非字符串的形式描述AI行为。

2) 优化器 调整您AI模块的提示和权重。

3) DSPy的生态系统推动开源人工智能研究。

2) 优化器调整您AI模块的提示和权重。