💰 预算与速率限制

要求：

需要一个 PostgreSQL 数据库（例如 Supabase、Neon 等）查看设置

设置预算

你可以在三个层级设置预算：

代理层级
内部用户层级
客户（终端用户）层级
密钥层级
密钥层级（特定模型的预算）

在代理上为所有调用应用预算

步骤 1. 修改 config.yaml

general_settings:
  master_key: sk-1234

litellm_settings:
  # 其他 litellm 设置
  max_budget: 0 # (浮点数) 设置最大预算为 $0 美元
  budget_duration: 30d # (字符串) 重置频率 - 你可以设置持续时间为秒 ("30s")、分钟 ("30m")、小时 ("30h")、天 ("30d")。

步骤 2. 启动代理

litellm /path/to/config.yaml

步骤 3. 发送测试调用

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Autherization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "gpt-3.5-turbo",
    "messages": [
        {
        "role": "user",
        "content": "你是哪个大语言模型"
        }
    ],
}'

你可以： - 为团队添加预算

info

设置、重置团队预算的逐步教程（通过 API 或使用管理 UI）

👉 https://docs.litellm.ai/docs/proxy/team_budgets

为团队添加预算

curl --location 'http://localhost:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_alias": "my-new-team_4",
  "members_with_roles": [{"role": "admin", "user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a"}],
  "rpm_limit": 99
}' 

查看 Swagger

示例响应

{
    "team_alias": "my-new-team_4",
    "team_id": "13e83b19-f851-43fe-8e93-f96e21033100",
    "admins": [],
    "members": [],
    "members_with_roles": [
        {
            "role": "admin",
            "user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a"
        }
    ],
    "metadata": {},
    "tpm_limit": null,
    "rpm_limit": 99,
    "max_budget": null,
    "models": [],
    "spend": 0.0,
    "max_parallel_requests": null,
    "budget_duration": null,
    "budget_reset_at": null
}

为团队添加预算持续时间

budget_duration: 在指定的持续时间结束时重置预算。如果未设置，预算永不重置。你可以设置持续时间为秒 ("30s")、分钟 ("30m")、小时 ("30h")、天 ("30d")。

curl 'http://0.0.0.0:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_alias": "my-new-team_4",
  "members_with_roles": [{"role": "admin", "user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a"}],
  "budget_duration": 10s,
}'

当你想为团队内的用户设定预算时使用此功能

步骤 1. 创建用户

创建一个 user_id=ishaan 的用户

curl --location 'http://0.0.0.0:4000/user/new' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
        "user_id": "ishaan"
}'

步骤 2. 将用户添加到现有团队 - 设置 `max_budget_in_team`

在将用户添加到团队时设置 max_budget_in_team。我们使用步骤 1 中设置的相同 user_id

curl -X POST 'http://0.0.0.0:4000/team/member_add' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{"team_id": "e8d1460f-846c-45d7-9b43-55f3cc52ac32", "max_budget_in_team": 0.000000000001, "member": {"role": "user", "user_id": "ishaan"}}'

步骤 3. 为步骤 1 中的团队成员创建密钥

设置步骤 1 中的 user_id=ishaan

curl --location 'http://0.0.0.0:4000/key/generate' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
        "user_id": "ishaan",
        "team_id": "e8d1460f-846c-45d7-9b43-55f3cc52ac32"
}'

来自 /key/generate 的响应

我们在步骤 4 中使用此响应中的 key

{"key":"sk-RV-l2BJEZ_LYNChSx2EueQ", "models":[],"spend":0.0,"max_budget":null,"user_id":"ishaan","team_id":"e8d1460f-846c-45d7-9b43-55f3cc52ac32","max_parallel_requests":null,"metadata":{},"tpm_limit":null,"rpm_limit":null,"budget_duration":null,"allowed_cache_controls":[],"soft_budget":null,"key_alias":null,"duration":null,"aliases":{},"config":{},"permissions":{},"model_max_budget":{},"key_name":null,"expires":null,"token_id":null}% 

步骤 4. 为团队成员进行 /chat/completions 请求

使用步骤3中的密钥进行此请求。在2-3次请求后，预计会看到以下错误 ExceededBudget: Crossed spend within team

curl --location 'http://localhost:4000/chat/completions' \
    --header 'Authorization: Bearer sk-RV-l2BJEZ_LYNChSx2EueQ' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "llama3",
    "messages": [
        {
        "role": "user",
        "content": "tes4"
        }
    ]
}'

使用此方法为传递给 /chat/completions 的 user 设置预算，无需为每个用户创建密钥

步骤1. 修改 config.yaml 定义 litellm.max_end_user_budget

general_settings:
  master_key: sk-1234

litellm_settings:
  max_end_user_budget: 0.0001 # 传递给 /chat/completions 的 'user' 的预算

进行 /chat/completions 调用，传递 'user' - 第一次调用成功

curl --location 'http://0.0.0.0:4000/chat/completions' \
        --header 'Content-Type: application/json' \
        --header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
        --data ' {
        "model": "azure-gpt-3.5",
        "user": "ishaan3",
        "messages": [
            {
            "role": "user",
            "content": "what time is it"
            }
        ]
        }'

进行 /chat/completions 调用，传递 'user' - 调用失败，因为 'ishaan3' 超预算

curl --location 'http://0.0.0.0:4000/chat/completions' \
        --header 'Content-Type: application/json' \
        --header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
        --data ' {
        "model": "azure-gpt-3.5",
        "user": "ishaan3",
        "messages": [
            {
            "role": "user",
            "content": "what time is it"
            }
        ]
        }'

错误

{"error":{"message":"Budget has been exceeded: User ishaan3 has exceeded their budget. Current spend: 0.0008869999999999999; Max Budget: 0.0001","type":"auth_error","param":"None","code":401}}%                

对密钥应用预算。

你可以：

为密钥添加预算跳转
添加预算持续时间，以重置支出跳转

预期行为

每个密钥的成本会自动填充到 LiteLLM_VerificationToken 表中
密钥超过其 max_budget 后，请求失败
如果设置了持续时间，支出将在持续时间结束时重置

默认情况下，max_budget 设置为 null，不对密钥进行检查

为密钥添加预算

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_id": "core-infra", # [OPTIONAL]
  "max_budget": 10,
}'

当密钥超过预算时，向 /chat/completions 发送请求的示例

curl --location 'http://0.0.0.0:4000/chat/completions' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <generated-key>' \
  --data ' {
  "model": "azure-gpt-3.5",
  "user": "e09b4da8-ed80-4b05-ac93-e16d9eb56fca",
  "messages": [
      {
      "role": "user",
      "content": "respond in 50 lines"
      }
  ],
}'

当密钥超过预算时，从 /chat/completions 接收的预期响应

{
  "detail":"Authentication Error, ExceededTokenBudget: Current spend for token: 7.2e-05; Max Budget for Token: 2e-07"
}   

为密钥添加预算持续时间

budget_duration: 预算在指定持续时间结束时重置。如果未设置，预算永不重置。你可以将持续时间设置为秒 ("30s")、分钟 ("30m")、小时 ("30h")、天 ("30d")。

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_id": "core-infra", # [OPTIONAL]
  "max_budget": 10,
  "budget_duration": 10s,
}'

对内部用户（密钥所有者）在代理上进行的所有调用应用预算。

info

对于大多数用例，我们建议设置团队成员预算

LiteLLM 提供了一个 /user/new 端点来为此创建预算。

你可以：

为用户添加预算跳转
添加预算持续时间，以重置支出跳转

默认情况下，max_budget 设置为 null，不对密钥进行检查

为用户添加预算

curl --location 'http://localhost:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["azure-models"], "max_budget": 0, "user_id": "krrish3@berri.ai"}' 

查看 Swagger

示例响应

{
    "key": "sk-YF2OxDbrgd1y2KgwxmEA2w",
    "expires": "2023-12-22T09:53:13.861000Z",
    "user_id": "krrish3@berri.ai",
    "max_budget": 0.0
}

为用户添加预算持续时间

budget_duration: 预算在指定的持续时间结束时重置。如果未设置，预算永不重置。你可以将持续时间设置为秒（"30s"）、分钟（"30m"）、小时（"30h"）、天（"30d"）。

curl 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_id": "core-infra", # [可选]
  "max_budget": 10,
  "budget_duration": 10s,
}'

为现有用户创建新密钥

现在你可以通过调用 /key/generate 并传入 user_id（例如 krrish3@berri.ai）来实现：

预算检查：此密钥将检查 krrish3@berri.ai 的预算（例如 $10）
支出追踪：此密钥的支出将更新 krrish3@berri.ai 的支出

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data '{"models": ["azure-models"], "user_id": "krrish3@berri.ai"}'

在密钥上应用模型特定的预算。

预期行为

model_spend 会在 LiteLLM_VerificationToken 表中自动填充
当密钥超过 model_max_budget 中为 model 设置的预算时，调用将失败

默认情况下，model_max_budget 设置为 {}，不对密钥进行检查

info

LiteLLM 将追踪传递给 LLM 端点（/chat/completions，/embeddings）的 model 的成本/预算

为密钥添加模型特定预算

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  model_max_budget={"gpt4": 0.5, "gpt-5": 0.01}
}'

重置预算

重置密钥/内部用户/团队/客户的预算

内部用户
密钥
团队

curl 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "max_budget": 10,
  "budget_duration": 10s, # 👈 关键更改
}'

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "max_budget": 10,
  "budget_duration": 10s, # 👈 关键更改
}'

curl 'http://0.0.0.0:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "max_budget": 10,
  "budget_duration": 10s, # 👈 关键更改
}'

注意: 默认情况下，服务器每 10 分钟检查一次重置，以最小化数据库调用。

要更改这一点，请设置 proxy_budget_rescheduler_min_time 和 proxy_budget_rescheduler_max_time

例如：每 1 秒检查一次

general_settings: 
  proxy_budget_rescheduler_min_time: 1
  proxy_budget_rescheduler_max_time: 1

设置速率限制

你可以设置：

tpm 限制（每分钟令牌数）
rpm 限制（每分钟请求数）
最大并行请求数
给定密钥的模型特定 rpm/tpm 限制

每团队
每内部用户
每密钥
每个API密钥每个模型
For customers

使用 /team/new 或 /team/update，为团队的多密钥持久化速率限制。

curl --location 'http://0.0.0.0:4000/team/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"team_id": "my-prod-team", "max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}' 

查看 Swagger

预期响应

{
    "key": "sk-sA7VDkyhlQ7m8Gt77Mbt3Q",
    "expires": "2024-01-19T01:21:12.816168",
    "team_id": "my-prod-team",
}

使用 /user/new 或 /user/update，为内部用户的多密钥持久化速率限制。

curl --location 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"user_id": "krrish@berri.ai", "max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}' 

查看 Swagger

预期响应

{
    "key": "sk-sA7VDkyhlQ7m8Gt77Mbt3Q",
    "expires": "2024-01-19T01:21:12.816168",
    "user_id": "krrish@berri.ai",
}

使用 /key/generate，如果你只想为该密钥设置速率限制。

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}' 

预期响应

{
    "key": "sk-ulGNRXWtv7M0lFnnsQk0wQ",
    "expires": "2024-01-18T20:48:44.297973",
    "user_id": "78c2c8fc-c233-43b9-b0c3-eb931da27b84"  // 👈 自动生成
}

为每个模型每个API密钥设置速率限制

设置 model_rpm_limit 和 model_tpm_limit 以设置每个模型每个API密钥的速率限制

这里 gpt-4 是在 litellm config.yaml 中设置的 model_name

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"model_rpm_limit": {"gpt-4": 2}, "model_tpm_limit": {"gpt-4":}}' 

预期响应

{
    "key": "sk-ulGNRXWtv7M0lFnnsQk0wQ",
    "expires": "2024-01-18T20:48:44.297973",
}

验证此密钥的模型速率限制是否正确设置

进行 /chat/completions 请求检查是否返回 x-litellm-key-remaining-requests-gpt-4

curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-ulGNRXWtv7M0lFnnsQk0wQ" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Hello, Claude!ss eho ares"}
    ]
  }'

预期头信息

x-litellm-key-remaining-requests-gpt-4: 1
x-litellm-key-remaining-tokens-gpt-4: 179

这些头信息表示：

对于密钥=sk-ulGNRXWtv7M0lFnnsQk0wQ，GPT-4模型剩余1个请求
对于密钥=sk-ulGNRXWtv7M0lFnnsQk0wQ，GPT-4模型剩余179个令牌

info

您还可以在UI的“速率限制”选项卡下为客户创建预算ID。

使用此方法为传递给 /chat/completions 的 user 设置速率限制，而无需为每个用户创建密钥

步骤1. 创建预算

在预算上设置 tpm_limit（如果需要，也可以传递 rpm_limit）

curl --location 'http://0.0.0.0:4000/budget/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
    "budget_id" : "free-tier",
    "tpm_limit": 5
}'

步骤2. 使用预算创建 `Customer`

我们在创建此新客户时使用步骤1中的 budget_id="free-tier"

curl --location 'http://0.0.0.0:4000/customer/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
    "user_id" : "palantir",
    "budget_id": "free-tier"
}'

步骤3. 在 `/chat/completions` 请求中传递 `user_id`

将步骤2中的 user_id 作为 user="palantir" 传递

curl --location 'http://localhost:4000/chat/completions' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "llama3",
    "user": "palantir",
    "messages": [
        {
        "role": "user",
        "content": "gm"
        }
    ]
}'

为所有内部用户设置默认预算

使用此方法为向您提供密钥的用户设置默认预算。

当用户具有 user_role="internal_user" 时（通过 /user/new 或 /user/update 设置），这将适用。

如果密钥具有 team_id，则此方法将不适用（届时将应用团队预算）。告诉我们如何改进这一点！

在您的 config.yaml 中定义最大预算

model_list: 
  - model_name: "gpt-3.5-turbo"
    litellm_params:
      model: gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY

litellm_settings:
  max_internal_user_budget: 0 # 金额以美元为单位
  internal_user_budget_duration: "1mo" # 每月重置

为用户创建密钥

curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{}'

预期响应:

{
  ...
  "key": "sk-X53RdxnDhzamRwjKXR4IHg"
}

测试一下！

curl -L -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-X53RdxnDhzamRwjKXR4IHg' \
-d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hey, how's it going?"}]
}'

预期响应:

{
    "error": {
        "message": "ExceededBudget: User=<user_id> over budget. Spend=3.7e-05, Budget=0.0",
        "type": "budget_exceeded",
        "param": null,
        "code": "400"
    }
}

授予对新模型的访问权限

使用模型访问组为用户提供对选定模型的访问权限，并随着时间的推移向其中添加新模型（例如 mistral、llama-2 等）。

在 /key/generate 与 /user/new 之间这样做有什么区别？如果在 /user/new 上这样做，它将跨为该用户生成的多个密钥持续存在。

步骤1. 在 config.yaml 中分配模型、访问组

model_list:
  - model_name: text-embedding-ada-002
    litellm_params:
      model: azure/azure-embedding-model
      api_base: "os.environ/AZURE_API_BASE"
      api_key: "os.environ/AZURE_API_KEY"
      api_version: "2023-07-01-preview"
    model_info:
      access_groups: ["beta-models"] # 👈 模型访问组

步骤 2. 创建带有访问组密钥

curl --location 'http://localhost:4000/user/new' \
-H 'Authorization: Bearer <your-master-key>' \
-H 'Content-Type: application/json' \
-d '{"models": ["beta-models"], # 👈 模型访问组
"max_budget": 0}'

为现有内部用户创建新密钥

只需在 /key/generate 请求中包含 user_id。

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data '{"models": ["azure-models"], "user_id": "krrish@berri.ai"}'

💰 预算与速率限制

设置预算​

为团队添加预算​

为团队添加预算持续时间​

步骤 1. 创建用户​

步骤 2. 将用户添加到现有团队 - 设置 max_budget_in_team​

步骤 3. 为步骤 1 中的团队成员创建密钥​

步骤 4. 为团队成员进行 /chat/completions 请求​

为密钥添加预算​

为密钥添加预算持续时间​

为用户添加预算​

为用户添加预算持续时间​

为现有用户创建新密钥​

为密钥添加模型特定预算​

重置预算​

设置速率限制​

步骤1. 创建预算​

步骤2. 使用预算创建 Customer​

步骤3. 在 /chat/completions 请求中传递 user_id​

为所有内部用户设置默认预算​

授予对新模型的访问权限​

为现有内部用户创建新密钥​

设置预算

为团队添加预算

为团队添加预算持续时间

步骤 1. 创建用户

步骤 2. 将用户添加到现有团队 - 设置 `max_budget_in_team`

步骤 3. 为步骤 1 中的团队成员创建密钥

步骤 4. 为团队成员进行 /chat/completions 请求

为密钥添加预算

为密钥添加预算持续时间

为用户添加预算

为用户添加预算持续时间

为现有用户创建新密钥

为密钥添加模型特定预算

重置预算

设置速率限制

步骤1. 创建预算

步骤2. 使用预算创建 `Customer`

步骤3. 在 `/chat/completions` 请求中传递 `user_id`

为所有内部用户设置默认预算

授予对新模型的访问权限

为现有内部用户创建新密钥