Prompt Caching 101

OpenAI为超过1024个标记的提示提供折扣化的提示缓存服务，对于超过10,000个标记的长提示可降低高达80%的延迟。通过在LLM API请求间缓存重复信息，您可以显著减少延迟和成本。提示缓存的适用范围是组织层级，意味着只有同一组织的成员才能访问共享缓存。此外，该缓存过程符合零数据保留政策，因为在此过程中不会存储任何数据。

对于超过1024个标记的提示词，系统会自动启用提示缓存功能——您无需在补全请求中进行任何更改。当发起API请求时，系统会首先检查该提示词的开头部分（前缀）是否已被缓存。若找到匹配项（缓存命中），则使用已缓存的提示词，从而降低延迟和成本。若未找到匹配项，系统将从零开始处理完整提示词，并将此前缀缓存以供后续使用。

考虑到这些优势，提示缓存特别有利的一些关键应用场景包括：

使用工具和结构化输出的智能体: 缓存扩展的工具和模式列表。
编程与写作助手: 在提示词中直接插入大段代码库或工作区内容摘要。
聊天机器人: 缓存多轮对话中的静态部分，以在长时间对话中高效保持上下文。

在本指南中，我们将通过几个示例来介绍缓存工具和图像的使用。请注意，通常您需要将静态内容（如指令和示例）放在提示的开头，而将可变内容（如用户特定信息）放在末尾。这也适用于图像和工具，它们的顺序在请求之间必须保持一致。所有请求（包括少于1024个token的请求）都会在usage.prompt_tokens_details聊天补全对象中显示一个cached_tokens字段，表示提示token中有多少是缓存命中的。对于少于1024个token的请求，cached_tokens将为零。缓存折扣基于实际处理的token数量，包括用于图像的token，这些token也会计入您的速率限制。

示例1：缓存工具与多轮对话

在这个示例中，我们为客服助手定义了工具和交互功能，使其能够处理诸如查询配送日期、取消订单和更新支付方式等任务。该助手会处理两条独立消息：首先回复初始查询，随后对后续查询进行延迟响应。

在缓存工具时，确保工具定义及其顺序保持一致非常重要，这样才能将它们包含在提示前缀中。要在多轮对话中缓存消息历史记录，需将新元素追加到消息数组的末尾。在响应对象和下方输出中，对于第二次补全run2，您可以看到cached_tokens值大于零，表明缓存成功。

from openai import OpenAI
import os
import json 
import time


api_key = os.getenv("OPENAI_API_KEY")
client = OpenAI(organization='org-l89177bnhkme4a44292n5r3j', api_key=api_key)

import time
import json

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_delivery_date",
            "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID.",
                    },
                },
                "required": ["order_id"],
                "additionalProperties": False,
            },
        }
    },
    {
        "type": "function",
        "function": {
            "name": "cancel_order",
            "description": "Cancel an order that has not yet been shipped. Use this when a customer requests order cancellation.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    },
                    "reason": {
                        "type": "string",
                        "description": "The reason for cancelling the order."
                    }
                },
                "required": ["order_id", "reason"],
                "additionalProperties": False
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "return_item",
            "description": "Process a return for an order. This should be called when a customer wants to return an item and the order has already been delivered.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    },
                    "item_id": {
                        "type": "string",
                        "description": "The specific item ID the customer wants to return."
                    },
                    "reason": {
                        "type": "string",
                        "description": "The reason for returning the item."
                    }
                },
                "required": ["order_id", "item_id", "reason"],
                "additionalProperties": False
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "update_shipping_address",
            "description": "Update the shipping address for an order that hasn't been shipped yet. Use this if the customer wants to change their delivery address.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    },
                    "new_address": {
                        "type": "object",
                        "properties": {
                            "street": {
                                "type": "string",
                                "description": "The new street address."
                            },
                            "city": {
                                "type": "string",
                                "description": "The new city."
                            },
                            "state": {
                                "type": "string",
                                "description": "The new state."
                            },
                            "zip": {
                                "type": "string",
                                "description": "The new zip code."
                            },
                            "country": {
                                "type": "string",
                                "description": "The new country."
                            }
                        },
                        "required": ["street", "city", "state", "zip", "country"],
                        "additionalProperties": False
                    }
                },
                "required": ["order_id", "new_address"],
                "additionalProperties": False
            }
        }
    },
    # New tool: Update payment method
    {
        "type": "function",
        "function": {
            "name": "update_payment_method",
            "description": "Update the payment method for an order that hasn't been completed yet. Use this if the customer wants to change their payment details.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    },
                    "payment_method": {
                        "type": "object",
                        "properties": {
                            "card_number": {
                                "type": "string",
                                "description": "The new credit card number."
                            },
                            "expiry_date": {
                                "type": "string",
                                "description": "The new credit card expiry date in MM/YY format."
                            },
                            "cvv": {
                                "type": "string",
                                "description": "The new credit card CVV code."
                            }
                        },
                        "required": ["card_number", "expiry_date", "cvv"],
                        "additionalProperties": False
                    }
                },
                "required": ["order_id", "payment_method"],
                "additionalProperties": False
            }
        }
    }
]

# Enhanced system message with guardrails
messages = [
    {
        "role": "system", 
        "content": (
            "You are a professional, empathetic, and efficient customer support assistant. Your mission is to provide fast, clear, "
            "and comprehensive assistance to customers while maintaining a warm and approachable tone. "
            "Always express empathy, especially when the user seems frustrated or concerned, and ensure that your language is polite and professional. "
            "Use simple and clear communication to avoid any misunderstanding, and confirm actions with the user before proceeding. "
            "In more complex or time-sensitive cases, assure the user that you're taking swift action and provide regular updates. "
            "Adapt to the user’s tone: remain calm, friendly, and understanding, even in stressful or difficult situations."
            "\n\n"
            "Additionally, there are several important guardrails that you must adhere to while assisting users:"
            "\n\n"
            "1. **Confidentiality and Data Privacy**: Do not share any sensitive information about the company or other users. When handling personal details such as order IDs, addresses, or payment methods, ensure that the information is treated with the highest confidentiality. If a user requests access to their data, only provide the necessary information relevant to their request, ensuring no other user's information is accidentally revealed."
            "\n\n"
            "2. **Secure Payment Handling**: When updating payment details or processing refunds, always ensure that payment data such as credit card numbers, CVVs, and expiration dates are transmitted and stored securely. Never display or log full credit card numbers. Confirm with the user before processing any payment changes or refunds."
            "\n\n"
            "3. **Respect Boundaries**: If a user expresses frustration or dissatisfaction, remain calm and empathetic but avoid overstepping professional boundaries. Do not make personal judgments, and refrain from using language that might escalate the situation. Stick to factual information and clear solutions to resolve the user's concerns."
            "\n\n"
            "4. **Legal Compliance**: Ensure that all actions you take comply with legal and regulatory standards. For example, if the user requests a refund, cancellation, or return, follow the company’s refund policies strictly. If the order cannot be canceled due to being shipped or another restriction, explain the policy clearly but sympathetically."
            "\n\n"
            "5. **Consistency**: Always provide consistent information that aligns with company policies. If unsure about a company policy, communicate clearly with the user, letting them know that you are verifying the information, and avoid providing false promises. If escalating an issue to another team, inform the user and provide a realistic timeline for when they can expect a resolution."
            "\n\n"
            "6. **User Empowerment**: Whenever possible, empower the user to make informed decisions. Provide them with relevant options and explain each clearly, ensuring that they understand the consequences of each choice (e.g., canceling an order may result in loss of loyalty points, etc.). Ensure that your assistance supports their autonomy."
            "\n\n"
            "7. **No Speculative Information**: Do not speculate about outcomes or provide information that you are not certain of. Always stick to verified facts when discussing order statuses, policies, or potential resolutions. If something is unclear, tell the user you will investigate further before making any commitments."
            "\n\n"
            "8. **Respectful and Inclusive Language**: Ensure that your language remains inclusive and respectful, regardless of the user’s tone. Avoid making assumptions based on limited information and be mindful of diverse user needs and backgrounds."
        )
    },
    {
        "role": "user", 
        "content": (
            "Hi, I placed an order three days ago and haven’t received any updates on when it’s going to be delivered. "
            "Could you help me check the delivery date? My order number is #9876543210. I’m a little worried because I need this item urgently."
        )
    }
]

# Enhanced user_query2
user_query2 = {
    "role": "user", 
    "content": (
        "Since my order hasn't actually shipped yet, I would like to cancel it. "
        "The order number is #9876543210, and I need to cancel because I’ve decided to purchase it locally to get it faster. "
        "Can you help me with that? Thank you!"
    )
}

# Function to run completion with the provided message history and tools
def completion_run(messages, tools):
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        tools=tools,
        messages=messages,
        tool_choice="required"
    )
    usage_data = json.dumps(completion.to_dict(), indent=4)
    return usage_data

# Main function to handle the two runs
def main(messages, tools, user_query2):
    # Run 1: Initial query
    print("Run 1:")
    run1 = completion_run(messages, tools)
    print(run1)

    # Delay for 7 seconds
    time.sleep(7)

    # Append user_query2 to the message history
    messages.append(user_query2)

    # Run 2: With appended query
    print("\nRun 2:")
    run2 = completion_run(messages, tools)
    print(run2)


# Run the main function
main(messages, tools, user_query2)

Run 1:
{
    "id": "chatcmpl-ADeOueQSi2DIUMdLXnZIv9caVfnro",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "call_5TnLcdD9tyVMVbzNGdejlJJa",
                        "function": {
                            "arguments": "{\"order_id\":\"9876543210\"}",
                            "name": "get_delivery_date"
                        },
                        "type": "function"
                    }
                ]
            }
        }
    ],
    "created": 1727816928,
    "model": "gpt-4o-mini-2024-07-18",
    "object": "chat.completion",
    "system_fingerprint": "fp_f85bea6784",
    "usage": {
        "completion_tokens": 17,
        "prompt_tokens": 1079,
        "total_tokens": 1096,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}

Run 2:
{
    "id": "chatcmpl-ADeP2i0frELC4W5RVNNkKz6TQ7hig",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "call_viwwDZPuQh8hJFPf2Co1dYJK",
                        "function": {
                            "arguments": "{\"order_id\": \"9876543210\"}",
                            "name": "get_delivery_date"
                        },
                        "type": "function"
                    },
                    {
                        "id": "call_t1FFdAhrfvRc5IgqA6WkPKYj",
                        "function": {
                            "arguments": "{\"order_id\": \"9876543210\", \"reason\": \"Decided to purchase locally to get it faster.\"}",
                            "name": "cancel_order"
                        },
                        "type": "function"
                    }
                ]
            }
        }
    ],
    "created": 1727816936,
    "model": "gpt-4o-mini-2024-07-18",
    "object": "chat.completion",
    "system_fingerprint": "fp_f85bea6784",
    "usage": {
        "completion_tokens": 64,
        "prompt_tokens": 1136,
        "total_tokens": 1200,
        "prompt_tokens_details": {
            "cached_tokens": 1024
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}

示例2：图像

在第二个示例中，我们在消息数组中包含多个食品杂货商品的图片URL，同时附带用户查询，并设置了延迟运行三次。无论是通过链接还是以base64编码形式嵌入用户消息的图片，都符合缓存条件。请确保detail参数保持一致，因为它会影响图片的分词方式。请注意，GPT-4o-mini会额外添加token来覆盖图片处理成本，尽管它对文本使用了低成本的token模型。缓存折扣基于实际处理的token数量（包括用于图片的token），这些token也会计入您的速率限制。

这个示例的输出显示第二次运行时命中了缓存，但第三次运行由于使用了不同的第一个URL（eggs_url而非veggie_url）而未命中缓存，尽管用户查询是相同的。

sauce_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/9/97/12-04-20-saucen-by-RalfR-15.jpg/800px-12-04-20-saucen-by-RalfR-15.jpg"
veggie_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/Veggies.jpg/800px-Veggies.jpg"
eggs_url= "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Egg_shelf.jpg/450px-Egg_shelf.jpg"
milk_url= "https://upload.wikimedia.org/wikipedia/commons/thumb/c/cd/Lactaid_brand.jpg/800px-Lactaid_brand.jpg"

def multiimage_completion(url1, url2, user_query):
    completion = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {
        "role": "user",
        "content": [
            {
            "type": "image_url",
            "image_url": {
                "url": url1,
                "detail": "high"
            },
            },
            {
            "type": "image_url",
            "image_url": {
                "url": url2,
                "detail": "high"
            },
            },
            {"type": "text", "text": user_query}
        ],
        }
    ],
    max_tokens=300,
    )
    print(json.dumps(completion.to_dict(), indent=4))
    

def main(sauce_url, veggie_url):
    multiimage_completion(sauce_url, veggie_url, "Please list the types of sauces are shown in these images")
    #delay for 20 seconds
    time.sleep(20)
    multiimage_completion(sauce_url, veggie_url, "Please list the types of vegetables are shown in these images")
    time.sleep(20)
    multiimage_completion(milk_url, sauce_url, "Please list the types of sauces are shown in these images")

if __name__ == "__main__":
    main(sauce_url, veggie_url)

{
    "id": "chatcmpl-ADeV3IrUqhpjMXEgv29BFHtTQ0Pzt",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The images show the following types of sauces:\n\n1. **Soy Sauce** - Kikkoman brand.\n2. **Worcester Sauce** - Appel brand, listed as \"Dresdner Art.\"\n3. **Tabasco Sauce** - Original pepper sauce.\n\nThe second image shows various vegetables, not sauces.",
                "refusal": null,
                "role": "assistant"
            }
        }
    ],
    "created": 1727817309,
    "model": "gpt-4o-2024-08-06",
    "object": "chat.completion",
    "system_fingerprint": "fp_2f406b9113",
    "usage": {
        "completion_tokens": 65,
        "prompt_tokens": 1548,
        "total_tokens": 1613,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}
{
    "id": "chatcmpl-ADeVRSI6zFINkx99k7V6ux1v5iF5f",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The images show different types of items. In the first image, you'll see bottles of sauces like soy sauce, Worcester sauce, and Tabasco. The second image features various vegetables, including:\n\n1. Napa cabbage\n2. Kale\n3. Carrots\n4. Bok choy\n5. Swiss chard\n6. Leeks\n7. Parsley\n\nThese vegetables are arranged on shelves in a grocery store setting.",
                "refusal": null,
                "role": "assistant"
            }
        }
    ],
    "created": 1727817333,
    "model": "gpt-4o-2024-08-06",
    "object": "chat.completion",
    "system_fingerprint": "fp_2f406b9113",
    "usage": {
        "completion_tokens": 86,
        "prompt_tokens": 1548,
        "total_tokens": 1634,
        "prompt_tokens_details": {
            "cached_tokens": 1280
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}
{
    "id": "chatcmpl-ADeVphj3VALQVrdnt2efysvSmdnBx",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The second image shows three types of sauces:\n\n1. Soy Sauce (Kikkoman)\n2. Worcestershire Sauce\n3. Tabasco Sauce",
                "refusal": null,
                "role": "assistant"
            }
        }
    ],
    "created": 1727817357,
    "model": "gpt-4o-2024-08-06",
    "object": "chat.completion",
    "system_fingerprint": "fp_2f406b9113",
    "usage": {
        "completion_tokens": 29,
        "prompt_tokens": 1548,
        "total_tokens": 1577,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}

总体提示

要充分利用提示缓存功能，请考虑遵循以下最佳实践：

将静态或频繁重复使用的内容放在提示的开头：通过将动态数据保留在提示的末尾，有助于确保更好的缓存效率。
保持一致的用法模式：未定期使用的提示会自动从缓存中移除。为防止缓存被清除，请保持提示用法的一致性。
监控关键指标：定期跟踪缓存命中率、延迟和缓存令牌的比例。利用这些洞察来优化您的缓存策略并最大化性能。

通过实施这些实践，您可以充分利用提示缓存，确保您的应用程序既响应迅速又经济高效。一个管理良好的缓存策略将显著减少处理时间、降低成本，并有助于保持流畅的用户体验。

2024年10月1日

提示缓存基础入门

示例1：缓存工具与多轮对话

示例2：图像

总体提示