使用和自定义DSPy缓存

在本教程中，我们将探讨DSPy缓存机制的设计，并演示如何有效使用和自定义它。

DSPy 缓存结构

DSPy的缓存系统架构分为三个不同的层级：

内存缓存: 使用 cachetools.LRUCache 实现，该层提供对常用数据的快速访问。
磁盘缓存: 利用 diskcache.FanoutCache，该层为缓存项提供持久化存储。
提示缓存（服务器端缓存）: 该层由LLM服务提供商（例如OpenAI、Anthropic）管理。

虽然DSPy不直接控制服务器端提示缓存，但它为用户提供了启用、禁用和自定义内存与磁盘缓存的灵活性，以满足他们的特定需求。

使用DSPy缓存

默认情况下，DSPy会自动启用内存和磁盘缓存。无需特定操作即可开始使用缓存。当缓存命中发生时，您会观察到模块调用的执行时间显著减少。此外，如果启用了使用跟踪，缓存调用的使用指标将为None。

考虑以下示例：

import dspy
import os
import time

os.environ["OPENAI_API_KEY"] = "{your_openai_key}"

dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"), track_usage=True)

predict = dspy.Predict("question->answer")

start = time.time()
result1 = predict(question="Who is the GOAT of basketball?")
print(f"Time elapse: {time.time() - start: 2f}\n\nTotal usage: {result1.get_lm_usage()}")

start = time.time()
result2 = predict(question="Who is the GOAT of basketball?")
print(f"Time elapse: {time.time() - start: 2f}\n\nTotal usage: {result2.get_lm_usage()}")

示例输出如下：

Time elapse:  4.384113
Total usage: {'openai/gpt-4o-mini': {'completion_tokens': 97, 'prompt_tokens': 144, 'total_tokens': 241, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'text_tokens': None}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0, 'text_tokens': None, 'image_tokens': None}}}

Time elapse:  0.000529
Total usage: {}

禁用/启用DSPy缓存

在某些场景下，您可能需要完全或选择性地禁用内存或磁盘缓存。例如：

对于相同的LM请求，你需要不同的响应。
您缺少磁盘写入权限，需要禁用磁盘缓存。
您拥有有限的内存资源，并希望禁用内存缓存。

DSPy 提供了 dspy.configure_cache() 工具函数用于此目的。您可以使用相应的标志来控制每种缓存类型的启用/禁用状态：

dspy.configure_cache(
    enable_disk_cache=False,
    enable_memory_cache=False,
)

此外，你可以管理内存和磁盘缓存的大小：

dspy.configure_cache(
    enable_disk_cache=True,
    enable_memory_cache=True,
    disk_size_limit_bytes=YOUR_DESIRED_VALUE,
    memory_max_entries=YOUR_DESIRED_VALUE,
)

请注意，disk_size_limit_bytes定义了磁盘缓存的最大字节大小，而memory_max_entries则指定了内存缓存的最大条目数。

理解与自定义缓存

在特定情况下，您可能希望实现自定义缓存，例如，以获得对缓存密钥生成方式的更精细控制。默认情况下，缓存密钥源自发送到 litellm 的所有请求参数的哈希值，不包括像 api_key 这样的凭据。

要创建自定义缓存，你需要继承 dspy.clients.Cache 并重写相关方法：

class CustomCache(dspy.clients.Cache):
    def __init__(self, **kwargs):
        {write your own constructor}

    def cache_key(self, request: dict[str, Any], ignored_args_for_cache_key: Optional[list[str]] = None) -> str:
        {write your logic of computing cache key}

    def get(self, request: dict[str, Any], ignored_args_for_cache_key: Optional[list[str]] = None) -> Any:
        {write your cache read logic}

    def put(
        self,
        request: dict[str, Any],
        value: Any,
        ignored_args_for_cache_key: Optional[list[str]] = None,
        enable_memory_cache: bool = True,
    ) -> None:
        {write your cache write logic}

为确保与DSPy其余部分的无缝集成，建议使用与基类相同的方法签名来实现您的自定义缓存，或者至少包含**kwargs在方法定义中，以防止缓存读写操作期间的运行时错误。

一旦您的自定义缓存类定义完成，您可以指示 DSPy 使用它：

dspy.cache = CustomCache()

让我们通过一个实际例子来说明这一点。假设我们希望缓存键的计算仅依赖于请求消息内容，忽略其他参数（如被调用的具体LM）。我们可以按如下方式创建一个自定义缓存：

class CustomCache(dspy.clients.Cache):

    def cache_key(self, request: dict[str, Any], ignored_args_for_cache_key: Optional[list[str]] = None) -> str:
        messages = request.get("messages", [])
        return sha256(ujson.dumps(messages, sort_keys=True).encode()).hexdigest()

dspy.cache = CustomCache(enable_disk_cache=True, enable_memory_cache=True, disk_cache_dir=dspy.clients.DISK_CACHE_DIR)

作为对比，考虑执行以下代码而不使用自定义缓存：

import dspy
import os
import time

os.environ["OPENAI_API_KEY"] = "{your_openai_key}"

dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"))

predict = dspy.Predict("question->answer")

start = time.time()
result1 = predict(question="Who is the GOAT of soccer?")
print(f"Time elapse: {time.time() - start: 2f}")

start = time.time()
with dspy.context(lm=dspy.LM("openai/gpt-4.1-mini")):
    result2 = predict(question="Who is the GOAT of soccer?")
print(f"Time elapse: {time.time() - start: 2f}")

经过的时间将表明第二次调用未命中缓存。但是，当使用自定义缓存时：

import dspy
import os
import time
from typing import Dict, Any, Optional
import ujson
from hashlib import sha256

os.environ["OPENAI_API_KEY"] = "{your_openai_key}"

dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"))

class CustomCache(dspy.clients.Cache):

    def cache_key(self, request: dict[str, Any], ignored_args_for_cache_key: Optional[list[str]] = None) -> str:
        messages = request.get("messages", [])
        return sha256(ujson.dumps(messages, sort_keys=True).encode()).hexdigest()

dspy.cache = CustomCache(enable_disk_cache=True, enable_memory_cache=True, disk_cache_dir=dspy.clients.DISK_CACHE_DIR)

predict = dspy.Predict("question->answer")

start = time.time()
result1 = predict(question="Who is the GOAT of volleyball?")
print(f"Time elapse: {time.time() - start: 2f}")

start = time.time()
with dspy.context(lm=dspy.LM("openai/gpt-4.1-mini")):
    result2 = predict(question="Who is the GOAT of volleyball?")
print(f"Time elapse: {time.time() - start: 2f}")

你会观察到第二次调用时缓存被命中，这展示了自定义缓存键逻辑的效果。