API参考：推理

`POST /inference`

推理终端是TensorZero网关API的核心。

在底层实现中，网关会验证请求有效性，从函数中采样一个变体，处理适用的模板化逻辑，并将推理请求路由至合适的模型服务商。若出现问题，系统会尝试优雅地回退至其他模型服务商或变体版本。成功完成推理后，网关会将数据返回客户端，并异步地将结构化信息存储至数据库。

请求

`additional_tools`

类型： 工具列表（见下方）
必填： 否（默认值：[]）

推理时定义的模型可调用工具列表。该字段支持动态工具使用，例如在运行时定义工具。

如果可能的话，建议优先在配置文件中定义工具。只有在您的使用场景需要动态工具调用时，才使用此字段。

每个工具都是一个包含以下字段的对象：description、name、parameters和strict。

这些字段与配置文件中的字段相同，唯一的区别是parameters字段应包含JSON模式本身，而不是其路径。详情请参阅配置参考。

`allowed_tools`

类型： 字符串列表
必填: 否

模型允许调用的工具名称列表。这些工具必须在配置文件中定义。

无论此字段如何设置，additional_tools中提供的任何工具始终被允许使用。

`cache_options`

类型： 对象
必填： 否（默认值：{"enabled": "write_only"}）

用于控制推理缓存行为的选项。该对象包含以下字段。

详情请参阅推理缓存。

`cache_options.enabled`

类型： string
必填： 否（默认值："write_only"）

要使用的缓存模式。必须是以下之一：

"write_only" (默认): 仅写入缓存但不提供缓存响应
"read_only": 仅从缓存读取数据，但不写入新条目
"on": 同时读取和写入缓存
"off": 完全禁用缓存

注意：当使用dryrun=true时，网关不会写入缓存。

`cache_options.max_age_s`

类型： integer
必填： 否（默认值：null）

缓存条目的最长存活时间（秒）。如果设置，超过此时间的缓存响应将不会被使用。

例如，如果您设置max_age_s=3600，网关将仅使用过去一小时内创建的缓存条目。

`credentials`

类型： object (从动态凭证名称到API密钥的映射)
必填： 否（默认：无凭据）

在您的TensorZero配置中，每个模型提供商都可以通过使用dynamic位置（例如dynamic::my_dynamic_api_key_name）来配置为在推理时接受凭据。更多详情请参阅配置参考。网关期望凭据按照以下说明在请求体的credentials字段中提供。如果未提供凭据且模型提供商已配置为使用动态凭据，网关将返回400错误。

Example

[models.my_model_name.providers.my_provider_name]
# ...
# Note: the name of the credential field (e.g. `api_key_location`) depends on the provider type
api_key_location = "dynamic::my_dynamic_api_key_name"
# ...

{
  // ...
  "credentials": {
    // ...
    "my_dynamic_api_key_name": "sk-..."
    // ...
  }
  // ...
}

`dryrun`

类型： boolean
必填: 否

如果设为true，推理请求会被执行但不会存入数据库。网关仍会调用下游模型供应商。

此字段主要用于调试和测试，通常不应在生产环境中使用。

`episode_id`

类型： UUID
必填: 否

要关联推理的现有剧集ID。

对于新一集(episode)的首次推理，您不应提供episode_id。如果该值为空(null)，网关将生成一个新的episode ID并在响应中返回。

仅使用由TensorZero网关返回的剧集ID。

`extra_body`

类型： 对象数组（见下文）
必填: 否

extra_body字段允许您修改TensorZero发送给模型提供商的请求体。这个高级功能是一个"应急出口"，让您可以使用TensorZero尚未实现的特定于提供商的功能。

数组中的每个对象必须包含三个字段：

variant_name 或 model_provider_name: 修改将仅应用于指定的变体或模型提供者
pointer: 一个JSON Pointer字符串，用于指定修改请求体的位置
value: 要插入该位置的值；可以是任何类型，包括嵌套类型

示例：extra_body

如果TensorZero通常会向供应商发送此请求体…

{
  "project": "tensorzero",
  "safety_checks": {
    "no_internet": false,
    "no_agi": true
  }
}

…那么在推理请求中的以下 extra_body…

{
  // ...
  "extra_body": [
    {
      "variant_name": "my_variant", // or "model_provider_name": "my_model_provider"
      "pointer": "/agi",
      "value": true
    },
    {
      "variant_name": "my_variant", // or "model_provider_name": "my_model_provider"
      "pointer": "/safety_checks/no_agi",
      "value": {
        "bypass": "on"
      }
    }
  ]
}

…覆盖请求体（仅针对 my_variant）为：

{
  "agi": true,
  "project": "tensorzero",
  "safety_checks": {
    "no_internet": false,
    "no_agi": {
      "bypass": "on"
    }
  }
}

`extra_headers`

类型： 对象数组（见下方）
必填: 否

extra_headers字段允许您修改TensorZero发送给模型提供商的请求头信息。这一高级功能是一个"应急出口"，让您能够使用TensorZero尚未实现的特定提供商功能。

数组中的每个对象必须包含三个字段：

variant_name 或 model_provider_name: 修改将仅应用于指定的变体或模型提供商
name: 需要修改的请求头名称
value: 要设置的标头值

示例：extra_headers

如果TensorZero通常会向服务提供商发送以下请求头…

Safety-Checks: on

…那么以下的 extra_headers…

{
  "extra_headers": [
    {
      "variant_name": "my_variant", // or "model_provider_name": "my_model_provider"
      "name": "Safety-Checks",
      "value": "off"
    },
    {
      "variant_name": "my_variant", // or "model_provider_name": "my_model_provider"
      "name": "Intelligence-Level",
      "value": "AGI"
    }
  ]
}

…覆盖请求头为：

Safety-Checks: off
Intelligence-Level: AGI

`function_name`

类型： string
必填项：必须提供function_name或model_name中的任意一个

要调用的函数名称。

该函数必须在配置文件中定义。

或者，您可以直接使用model_name字段调用模型，而无需定义函数。详情请参阅下文。

`include_original_response`

类型： boolean
必填: 否

如果设为true，模型的原始响应将以字符串形式包含在返回结果的original_response字段中。

目前，该字段无法用于流式推理。

更多详情请参阅response部分中的original_response。

`input`

类型： 可变
必填项： 是

函数的输入。

输入的类型取决于函数类型。

`input.messages`

类型： 消息列表（见下文）
必填： 否（默认值：[]）

提供给模型的消息列表。

每条消息都是一个包含以下字段的对象：

role: 消息的角色 (assistant 或 user)。
content: 消息的内容（见下文）。

content 字段可以包含以下类型之一：

string: 文本消息的内容文本（仅在该角色没有定义schema时允许使用）
内容块列表：消息的内容块（见下文）

内容块是一个包含type字段的对象，并根据类型包含其他字段。

如果内容块的类型为text，则必须包含以下任一附加字段：

text: 内容块的文本内容。
arguments: 一个包含TensorZero函数参数的JSON对象，适用于带有模板和模式的函数（详见Prompt Templates & Schemas）。

如果内容块的类型为 tool_call，则必须包含以下附加字段：

arguments: 工具调用的参数。
id: 内容块的ID。
name: 内容区块的工具名称。

如果内容块的类型为 tool_result，则必须包含以下附加字段：

id: 内容块的ID。
name: 内容区块的工具名称。
result: 工具调用的结果。

如果内容块的类型为 image，则必须包含以下任一附加字段：

url: 远程图片的URL地址。
mime_type and data: The MIME type and base64-encoded data for an embedded image.
- 我们支持以下MIME类型：image/png, image/jpeg, 以及 image/webp。

有关如何在推理中使用图像的更多详情，请参阅多模态推理指南。

如果内容块的类型为raw_text，则必须包含以下附加字段：

value: 内容块的文本内容。该内容块将忽略此函数相关的任何模板和模式。

如果内容块的类型为unknown，则必须包含以下额外字段：

data: 来自提供方的原始内容块，未经TensorZero的任何验证或转换。
model_provider_name (可选): 一个字符串，用于指定何时该内容块应包含在模型提供者的输入中。如果设置，该内容块将仅提供给特定的模型提供者。如果未设置，该内容块将传递给所有模型提供者。

例如，以下假设的未知内容块会将daydreaming内容块发送给针对your_model_provider_name模型提供商的推理请求。

{
  "type": "unknown",
  "data": {
    "type": "daydreaming",
    "dream": "..."
  },
  "model_provider_name": "tensorzero::model_name::your_model_name::provider_name::your_model_provider_name"
}

这是整个API中最复杂的字段。详情请参阅此示例。

Example

{
  // ...
  "input": {
    "messages": [
      // If you don't have a user (or assistant) schema...
      {
        "role": "user", // (or "assistant")
        "content": "What is the weather in Tokyo?"
      },
      // If you have a user (or assistant) schema...
      {
        "role": "user", // (or "assistant")
        "content": [
          {
            "type": "text",
            "arguments": {
              "location": "Tokyo"
            }
          }
        ]
      },
      // If the model previously called a tool...
      {
        "role": "assistant",
        "content": [
          {
            "type": "tool_call",
            "id": "0",
            "name": "get_temperature",
            "arguments": "{\"location\": \"Tokyo\"}"
          }
        ]
      },
      // ...and you're providing the result of that tool call...
      {
        "role": "user",
        "content": [
          {
            "type": "tool_result",
            "id": "0",
            "name": "get_temperature",
            "result": "70"
          }
        ]
      },
      // You can also specify a text message using a content block...
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What about NYC?" // (or object if there is a schema)
          }
        ]
      },
      // You can also provide multiple content blocks in a single message...
      {
        "role": "assistant",
        "content": [
          {
            "type": "text",
            "text": "Sure, I can help you with that." // (or object if there is a schema)
          },
          {
            "type": "tool_call",
            "id": "0",
            "name": "get_temperature",
            "arguments": "{\"location\": \"New York\"}"
          }
        ]
      }
      // ...
    ]
    // ...
  }
  // ...
}

`input.system`

类型： 字符串或对象
必填: 否

系统消息的输入内容。

如果函数没有系统架构，该字段应为字符串。

如果函数有系统模式，此字段应为符合该模式的对象。

`model_name`

类型： string
必填项：必须提供 model_name 或 function_name 其中一项

要调用的模型名称。

在底层实现中，网关将使用一个名为tensorzero::default的内置透传聊天功能。

调用方式…	使用以下格式…
在您的`tensorzero.toml`配置文件中定义为`[functions.my_function]`的函数	`function_name="my_function"` (not `model_name`)
在您的`tensorzero.toml`配置文件中定义为`[models.my_model]`的模型	`model_name="my_model"`
由模型提供商提供的模型，无需在您的`tensorzero.toml`配置文件中定义（如果支持，请参阅下文）	`model_name="{provider_type}::{model_name}"`

例如，如果您有以下配置：

[models.gpt-4o]
routing = ["openai", "azure"]

[models.gpt-4o.providers.openai]
# ...

[models.gpt-4o.providers.azure]
# ...

[functions.extract-data]
# ...

然后：

function_name="extract-data" 调用上面定义的 extract-data 函数。
model_name="gpt-4o" 调用您配置中的 gpt-4o 模型，该模型支持从 openai 回退到 azure。详情请参阅重试与回退机制。
model_name="openai::gpt-4o" 直接调用OpenAI API获取gpt-4o模型，忽略上面定义的gpt-4o模型。

`output_schema`

类型： object (有效的JSON Schema)
必填: 否

如果设置，此模式将覆盖函数配置中为JSON函数定义的output_schema。此动态输出模式用于验证函数的输出，并发送给支持结构化输出的提供者。

`parallel_tool_calls`

类型： boolean
必填: 否

如果设为true，该函数将被允许在单轮对话中请求多个工具调用。若未设置，则默认采用被调用函数的配置值。

大多数模型提供商不支持并行工具调用。在这些情况下，网关会忽略此字段。目前，只有Fireworks AI和OpenAI支持并行工具调用。

`params`

类型： 对象（见下文）
必填项： 否（默认值：{}）

为特定变体类型覆盖推理时参数。此字段支持动态推理参数，即在运行时定义参数。

该字段的格式为{ variant_type: { param: value, ... }, ... }。如果可能，建议在配置文件中设置这些参数。仅当需要在运行时动态设置这些参数时才使用此字段。

请注意，这些参数将应用于指定类型的每个变体。

目前，我们支持以下功能：

chat_completion
- frequency_penalty
- max_tokens
- presence_penalty
- seed
- temperature
- top_p

有关参数的更多详情，请参阅配置参考，下方示例展示了具体用法。

Example

例如，如果你想动态覆盖chat_completion变体中的temperature参数，你需要在请求体中包含以下内容：

{
  // ...
  "params": {
    "chat_completion": {
      "temperature": 0.7
    }
  }
  // ...
}

查看“动态推理参数的聊天功能”获取完整示例。

`stream`

类型： boolean
必填: 否

如果true，网关将流式传输来自模型提供商的响应。

`tags`

类型： 带有字符串键和值的扁平JSON对象
必填: 否

用户提供的标签，用于与推理相关联。

例如，{"user_id": "123"} 或 {"author": "Alice"}。

`tool_choice`

类型： string
必填: 否

如果设置，将覆盖请求的工具选择策略。

支持的工具有以下选择策略：

none: 该函数不应使用任何工具。
auto: 模型自行决定是否使用工具。如果决定使用工具，还会选择具体使用哪些工具。
required: 模型必须使用工具。若存在多个可用工具，则由模型决定使用哪个工具。
{ specific = "tool_name" }: 模型应使用特定工具。该工具必须在配置文件的tools部分定义或在additional_tools中提供。

`variant_name`

类型： string
必填: 否

如果设置此项，会将推理请求固定到特定变体（不推荐）。

通常您不应手动设置此字段，而应让TensorZero网关自动分配变体版本。此字段主要用于测试或调试目的。

响应

响应格式取决于函数类型（如配置文件中定义）以及响应是否为流式传输。

聊天功能

当函数类型为chat时，响应结构如下。

常规
流式处理

在常规（非流式）模式下，响应是一个包含以下字段的JSON对象：

`content`

类型： 内容块列表（见下文）

模型生成的内容块。

内容块可以具有等于text和tool_call的type属性。推理模型（例如DeepSeek R1）可能还会包含thought类型的内容块。

如果 type 是 text，内容块将包含以下字段：

text: 内容块的文本内容。

如果 type 是 tool_call，内容块将包含以下字段：

arguments (object): 工具调用的已验证参数(若无效则为null)。
id (字符串): 内容块的ID。
name (string): 工具经过验证的名称（如果无效则为null）。
raw_arguments (string): 模型生成的工具调用参数(可能无效)。
raw_name (string): 模型生成的工具名称（可能无效）。

如果 type 是 thought，内容块包含以下字段：

text (string): 思考内容的文本。

如果模型提供商返回了未知类型的内容块，它将被包含在响应中作为一个类型为unknown的内容块，并带有以下附加字段：

data: 来自提供方的原始内容块，未经TensorZero的任何验证或转换。
model_provider_name: 返回内容块的模型提供者的全限定名称。

例如，如果模型提供者 your_model_provider_name 返回一个类型为 daydreaming 的内容块，它将被包含在响应中，如下所示：

{
  "type": "unknown",
  "data": {
    "type": "daydreaming",
    "dream": "..."
  },
  "model_provider_name": "tensorzero::model_name::your_model_name::provider_name::your_model_provider_name"
}

`episode_id`

类型： UUID

与推理相关联的剧集ID。

`inference_id`

类型： UUID

分配给推理的ID。

`original_response`

类型： string (可选)

来自模型提供方的原始响应（仅在include_original_response参数设为true时可用）。

返回的数据取决于变体类型：

chat_completion: 模型推理返回的原始响应
experimental_best_of_n_sampling: 从推理到evaluator的原始响应
experimental_mixture_of_n_sampling: 从推理到fuser的原始响应
experimental_dynamic_in_context_learning: 从推理到model的原始响应
experimental_chain_of_thought: 从推理到model的原始响应

`variant_name`

类型： string

用于推理的变体名称。

`usage`

类型： 对象（可选）

推理的使用指标。

该对象包含以下字段：

input_tokens: 推理过程中使用的输入令牌数量。
output_tokens: 推理过程中使用的输出token数量。

在流式模式下，响应是一个SSE的JSON消息流，最后会跟随一个[DONE]结束消息。

每条JSON消息包含以下字段：

`content`

类型： 内容块分片列表（见下文）

推理的内容增量。

内容块片段可以具有等于text或tool_call的type属性。推理模型（例如DeepSeek R1）可能还会包含thought类型的内容块片段。

如果 type 是 text，该数据块包含以下字段：

id: 内容块的ID。
text: 内容块的文本增量。

如果type是tool_call，则该数据块包含以下字段（均为字符串类型）：

id: 内容块的ID。
raw_name: 工具名称。网关在流式推理期间不会验证此字段。
raw_arguments: 工具调用的参数增量。网关在流式推理期间不会验证此字段。

如果 type 是 thought，则该数据块包含以下字段：

id: 内容块的ID。
text: 思维对应的文本增量。

`episode_id`

类型： UUID

与推理相关联的剧集ID。

`inference_id`

类型： UUID

分配给推理的ID。

`variant_name`

类型： string

用于推理的变体名称。

`usage`

类型： 对象（可选）

推理的使用指标。

该对象包含以下字段：

input_tokens: 推理过程中使用的输入令牌数量。
output_tokens: 推理过程中使用的输出token数量。

JSON 函数

当函数类型为json时，响应结构如下。

常规
流式处理

在常规（非流式）模式下，响应是一个包含以下字段的JSON对象：

`inference_id`

类型： UUID

分配给推理的ID。

`episode_id`

类型： UUID

与推理相关联的剧集ID。

`original_response`

类型： string (可选)

来自模型提供商的原始响应（仅在include_original_response参数设为true时可用）。

返回的数据取决于变体类型：

chat_completion: 模型推理返回的原始响应
experimental_best_of_n_sampling: 从推理到evaluator的原始响应
experimental_mixture_of_n_sampling: 从推理到fuser的原始响应
experimental_dynamic_in_context_learning: 从推理到model的原始响应
experimental_chain_of_thought: 从推理到model的原始响应

`output`

类型： 对象（见下文）

输出对象包含以下字段：

raw: 来自模型提供方的原始响应（可能是无效的JSON格式）。
parsed: 来自模型提供方的解析响应（若JSON无效则为null）。

`variant_name`

类型： string

用于推理的变体名称。

`usage`

类型： 对象（可选）

推理的使用指标。

该对象包含以下字段：

input_tokens: 推理过程中使用的输入令牌数量。
output_tokens: 推理过程中使用的输出token数量。

在流式模式下，响应是一个SSE的JSON消息流，最后会跟随一个[DONE]结束消息。

每条JSON消息包含以下字段：

`episode_id`

类型： UUID

与推理相关联的剧集ID。

`inference_id`

类型： UUID

分配给推理的ID。

`raw`

类型： string

来自模型提供商的原始响应增量。

TensorZero网关不提供用于流式JSON推理的parsed字段。如果您的应用程序依赖于格式良好的JSON响应，我们建议使用常规（非流式）推理。

`variant_name`

类型： string

用于推理的变体名称。

`usage`

类型： object (可选)

推理的使用指标。

该对象包含以下字段：

input_tokens: 推理过程中使用的输入令牌数量。
output_tokens: 推理过程中使用的输出token数量。

示例

聊天功能

Chat Function

配置

# ...
[functions.draft_email]
type = "chat"
# ...

from tensorzero import AsyncTensorZeroGateway

async with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
    result = await client.inference(
        function_name="draft_email",
        input={
            "system": "You are an AI assistant...",
            "messages": [
                {
                  "role": "user",
                  "content": "I need to write an email to Gabriel explaining..."
                }
            ]
        }
        # optional: stream=True,
    )

curl -X POST http://localhost:3000/inference \
  -H "Content-Type: application/json" \
  -d '{
    "function_name": "draft_email",
    "input": {
      "system": "You are an AI assistant...",
      "messages": [
        {
          "role": "user",
          "content": "I need to write an email to Gabriel explaining..."
        }
      ]
    }
    // optional: "stream": true
  }'

响应

常规
流式处理

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "content": [
    {
      "type": "text",
      "text": "Hi Gabriel,\n\nI noticed...",
    }
  ]
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}

在流式模式下，响应是一个SSE的JSON消息流，最后会跟随一个[DONE]结束消息。

每条JSON消息包含以下字段：

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "content": [
    {
      "type": "text",
      "id": "0",
      "text": "Hi Gabriel," // a text content delta
    }
  ],
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}

支持Schema的聊天功能

Chat Function with Schemas

配置

# ...
[functions.draft_email]
type = "chat"
system_schema = "system_schema.json"
user_schema = "user_schema.json"
# ...

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "tone": {
      "type": "string"
    }
  },
  "required": ["tone"],
  "additionalProperties": false
}

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "recipient": {
      "type": "string"
    },
    "email_purpose": {
      "type": "string"
    }
  },
  "required": ["recipient", "email_purpose"],
  "additionalProperties": false
}

请求

Python
HTTP

from tensorzero import AsyncTensorZeroGateway

async with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
    result = await client.inference(
        function_name="draft_email",
        input={
            "system": {"tone": "casual"},
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "arguments": {
                                "recipient": "Gabriel",
                                "email_purpose": "Request a meeting to..."
                            }
                        }
                    ]
                }
            ]
        }
        # optional: stream=True,
    )

curl -X POST http://localhost:3000/inference \
  -H "Content-Type: application/json" \
  -d '{
    "function_name": "draft_email",
    "input": {
      "system": {"tone": "casual"},
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "type": "text",
              "arguments": {
                "recipient": "Gabriel",
                "email_purpose": "Request a meeting to..."
              }
            }
          ]
        }
      ]
    }
    // optional: "stream": true
  }'

响应

常规
流式处理

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "content": [
    {
      "type": "text",
      "text": "Hi Gabriel,\n\nI noticed...",
    }
  ]
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}

在流式模式下，响应是一个SSE的JSON消息流，最后会跟随一个[DONE]结束消息。

每条JSON消息包含以下字段：

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "content": [
    {
      "type": "text",
      "id": "0",
      "text": "Hi Gabriel," // a text content delta
    }
  ],
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}

支持工具调用的聊天功能

Chat Function with Tool Use

配置

# ...

[functions.weather_bot]
type = "chat"
tools = ["get_temperature"]

# ...

[tools.get_temperature]
description = "Get the current temperature in a given location"
parameters = "get_temperature.json"

# ...

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "location": {
      "type": "string",
      "description": "The location to get the temperature for (e.g. \"New York\")"
    },
    "units": {
      "type": "string",
      "description": "The units to get the temperature in (must be \"fahrenheit\" or \"celsius\")",
      "enum": ["fahrenheit", "celsius"]
    }
  },
  "required": ["location"],
  "additionalProperties": false
}

请求

Python
HTTP

from tensorzero import AsyncTensorZeroGateway

async with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
    result = await client.inference(
        function_name="weather_bot",
        input={
            "messages": [
                {
                    "role": "user",
                    "content": "What is the weather like in Tokyo?"
                }
            ]
        }
        # optional: stream=True,
    )

curl -X POST http://localhost:3000/inference \
  -H "Content-Type: application/json" \
  -d '{
    "function_name": "weather_bot",
    "input": {
      "messages": [
        {
          "role": "user",
          "content": "What is the weather like in Tokyo?"
        }
      ]
    }
    // optional: "stream": true
  }'

响应

常规
流式处理

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "content": [
    {
      "type": "tool_call",
      "arguments": {
        "location": "Tokyo",
        "units": "celsius"
      },
      "id": "123456789",
      "name": "get_temperature",
      "raw_arguments": "{\"location\": \"Tokyo\", \"units\": \"celsius\"}",
      "raw_name": "get_temperature"
    }
  ],
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}

在流式模式下，响应是一个SSE的JSON消息流，最后会跟随一个[DONE]结束消息。

每条JSON消息包含以下字段：

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "content": [
    {
      "type": "tool_call",
      "id": "123456789",
      "name": "get_temperature",
      "arguments": "{\"location\":" // a tool arguments delta
    }
  ],
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}

支持多轮工具调用的聊天功能

Chat Function with Multi-Turn Tool Use

配置

# ...

[functions.weather_bot]
type = "chat"
tools = ["get_temperature"]

# ...

[tools.get_temperature]
description = "Get the current temperature in a given location"
parameters = "get_temperature.json"

# ...

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "location": {
      "type": "string",
      "description": "The location to get the temperature for (e.g. \"New York\")"
    },
    "units": {
      "type": "string",
      "description": "The units to get the temperature in (must be \"fahrenheit\" or \"celsius\")",
      "enum": ["fahrenheit", "celsius"]
    }
  },
  "required": ["location"],
  "additionalProperties": false
}

请求

Python
HTTP

from tensorzero import AsyncTensorZeroGateway

async with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
    result = await client.inference(
        function_name="weather_bot",
        input={
            "messages": [
                {
                    "role": "user",
                    "content": "What is the weather like in Tokyo?"
                },
                {
                    "role": "assistant",
                    "content": [
                        {
                            "type": "tool_call",
                            "arguments": {
                                "location": "Tokyo",
                                "units": "celsius"
                            },
                            "id": "123456789",
                            "name": "get_temperature",
                        }
                    ]
                },
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "tool_result",
                            "id": "123456789",
                            "name": "get_temperature",
                            "result": "25"  # the tool result must be a string
                        }
                    ]
                }
            ]
        }
        # optional: stream=True,
    )

curl -X POST http://localhost:3000/inference \
  -H "Content-Type: application/json" \
  -d '{
    "function_name": "weather_bot",
    "input": {
      "messages": [
        {
          "role": "user",
          "content": "What is the weather like in Tokyo?"
        },
        {
          "role": "assistant",
          "content": [
            {
              "type": "tool_call",
              "arguments": {
                "location": "Tokyo",
                "units": "celsius"
              },
              "id": "123456789",
              "name": "get_temperature",
            }
          ]
        },
        {
          "role": "user",
          "content": [
            {
              "type": "tool_result",
              "id": "123456789",
              "name": "get_temperature",
              "result": "25"  // the tool result must be a string
            }
          ]
        }
      ]
    }
    // optional: "stream": true
  }'

响应

常规
流式处理

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "content": [
    {
      "type": "text",
      "content": [
        {
          "type": "text",
          "text": "The weather in Tokyo is 25 degrees Celsius."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}

在流式模式下，响应是一个SSE的JSON消息流，最后会跟随一个[DONE]结束消息。

每条JSON消息包含以下字段：

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "content": [
    {
      "type": "text",
      "id": "0",
      "text": "The weather in" // a text content delta
    }
  ],
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}

支持动态工具调用的聊天功能

Chat Function with Dynamic Tool Use

配置

# ...

[functions.weather_bot]
type = "chat"
# Note: no `tools = ["get_temperature"]` field in configuration

# ...

请求

Python
HTTP

from tensorzero import AsyncTensorZeroGateway

async with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
    result = await client.inference(
        function_name="weather_bot",
        input={
            "messages": [
                {
                    "role": "user",
                    "content": "What is the weather like in Tokyo?"
                }
            ]
        },
        additional_tools=[
            {
                "name": "get_temperature",
                "description": "Get the current temperature in a given location",
                "parameters": {
                    "$schema": "http://json-schema.org/draft-07/schema#",
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The location to get the temperature for (e.g. \"New York\")"
                        },
                        "units": {
                            "type": "string",
                            "description": "The units to get the temperature in (must be \"fahrenheit\" or \"celsius\")",
                            "enum": ["fahrenheit", "celsius"]
                        }
                    },
                    "required": ["location"],
                    "additionalProperties": false
                }
            }
        ],
        # optional: stream=True,
    )

curl -X POST http://localhost:3000/inference \
  -H "Content-Type: application/json" \
  -d '{
    "function_name": "weather_bot",
    input: {
      "messages": [
        {
          "role": "user",
          "content": "What is the weather like in Tokyo?"
        }
      ]
    },
    additional_tools: [
      {
        "name": "get_temperature",
        "description": "Get the current temperature in a given location",
        "parameters": {
          "$schema": "http://json-schema.org/draft-07/schema#",
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The location to get the temperature for (e.g. \"New York\")"
            },
            "units": {
              "type": "string",
              "description": "The units to get the temperature in (must be \"fahrenheit\" or \"celsius\")",
              "enum": ["fahrenheit", "celsius"]
            }
          },
          "required": ["location"],
          "additionalProperties": false
        }
      }
    ]
    // optional: "stream": true
  }'

响应

常规
流式处理

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "content": [
    {
      "type": "tool_call",
      "arguments": {
        "location": "Tokyo",
        "units": "celsius"
      },
      "id": "123456789",
      "name": "get_temperature",
      "raw_arguments": "{\"location\": \"Tokyo\", \"units\": \"celsius\"}",
      "raw_name": "get_temperature"
    }
  ],
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}

在流式模式下，响应是一个SSE的JSON消息流，最后会跟随一个[DONE]结束消息。

每条JSON消息包含以下字段：

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "content": [
    {
      "type": "tool_call",
      "id": "123456789",
      "name": "get_temperature",
      "arguments": "{\"location\":" // a tool arguments delta
    }
  ],
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}

支持动态推理参数的聊天功能

Chat Function with Dynamic Inference Parameters

配置

# ...
[functions.draft_email]
type = "chat"
# ...

[functions.draft_email.variants.prompt_v1]
type = "chat_completion"
temperature = 0.5  # the API request will override this value
# ...

请求

Python
HTTP

from tensorzero import AsyncTensorZeroGateway

async with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
    result = await client.inference(
        function_name="draft_email",
        input={
            "system": "You are an AI assistant...",
            "messages": [
                {
                    "role": "user",
                    "content": "I need to write an email to Gabriel explaining..."
                }
            ]
        },
        # Override parameters for every variant with type "chat_completion"
        params={
            "chat_completion": {
                "temperature": 0.7,
            }
        },
        # optional: stream=True,
    )

curl -X POST http://localhost:3000/inference \
  -H "Content-Type: application/json" \
  -d '{
    "function_name": "draft_email",
    "input": {
      "system": "You are an AI assistant...",
      "messages": [
        {
          "role": "user",
          "content": "I need to write an email to Gabriel explaining..."
        }
      ]
    },
    params={
      // Override parameters for every variant with type "chat_completion"
      "chat_completion": {
        "temperature": 0.7,
      }
    }
    // optional: "stream": true
  }'

响应

常规
流式处理

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "content": [
    {
      "type": "text",
      "text": "Hi Gabriel,\n\nI noticed...",
    }
  ]
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}

在流式模式下，响应是一个SSE的JSON消息流，最后会跟随一个[DONE]结束消息。

每条JSON消息包含以下字段：

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "content": [
    {
      "id": "0",
      "text": "Hi Gabriel," // a text content delta
    }
  ],
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}

JSON函数

JSON Function

配置

# ...
[functions.extract_email]
type = "json"
output_schema = "output_schema.json"
# ...

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "email": {
      "type": "string"
    }
  },
  "required": ["email"]
}

请求

Python
HTTP

from tensorzero import AsyncTensorZeroGateway

async with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
    result = await client.inference(
        function_name="extract_email",
        input={
            "system": "You are an AI assistant...",
            "messages": [
                {
                    "role": "user",
                    "content": "...blah blah blah [email protected] blah blah blah..."
                }
            ]
        }
        # optional: stream=True,
    )

curl -X POST http://localhost:3000/inference \
  -H "Content-Type: application/json" \
  -d '{
    "function_name": "extract_email",
    "input": {
      "system": "You are an AI assistant...",
      "messages": [
        {
          "role": "user",
          "content": "...blah blah blah [email protected] blah blah blah..."
        }
      ]
    }
    // optional: "stream": true
  }'

响应

常规
流式处理

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "output": {
    "raw": "{\"email\": \"[email protected]\"}",
    "parsed": {
      "email": "[email protected]"
    }
  }
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}

在流式模式下，响应是一个SSE的JSON消息流，最后会跟随一个[DONE]结束消息。

每条JSON消息包含以下字段：

{
  "inference_id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "variant_name": "prompt_v1",
  "raw": "{\"email\":", // a JSON content delta
  "usage": {
    "input_tokens": 100,
    "output_tokens": 100
  }
}