API参考：推理（兼容OpenAI）

`POST /openai/v1/chat/completions`

/openai/v1/chat/completions 端点允许TensorZero用户使用OpenAI客户端进行TensorZero推理。该网关将OpenAI请求参数转换为inference端点预期的参数，并调用相同的底层实现。此端点支持inference端点提供的大部分功能，但存在一些限制。最值得注意的是，此端点不支持动态凭证，因此必须通过其他方法指定。

请求

OpenAI兼容的推理端点将OpenAI请求参数转换为inference端点所需的参数。

TensorZero专用参数以tensorzero::为前缀（例如tensorzero::episode_id）。这些字段应作为请求体中的额外参数提供。

`tensorzero::dryrun`

类型： boolean
必填: 否

如果设为true，推理请求会被执行但不会存入数据库。网关仍会调用下游模型供应商。

此字段主要用于调试和测试，通常不应在生产环境中使用。

该字段应作为请求体中的额外参数提供。

`tensorzero::episode_id`

类型： UUID
必填: 否

要关联推理的现有剧集ID。

对于新一集的首次推理，您不应提供episode_id。如果该值为空，网关将生成一个新的集ID并在响应中返回。请仅使用由TensorZero网关返回的集ID。

该字段应作为请求体中的额外参数提供。

`tensorzero::extra_body`

类型： 对象数组（见下方）
必填: 否

tensorzero::extra_body 字段允许您修改 TensorZero 发送给模型提供商的请求体。这个高级功能是一个"应急出口"，让您能够使用 TensorZero 尚未实现的特定于提供商的功能。

数组中的每个对象必须包含三个字段：

variant_name 或 model_provider_name: 修改将仅应用于指定的变体或模型提供商
pointer: 一个JSON Pointer字符串，用于指定修改请求体的位置
value: 要插入该位置的值；可以是任何类型，包括嵌套类型

示例：tensorzero::extra_body

如果TensorZero通常会向供应商发送此请求体…

{
  "project": "tensorzero",
  "safety_checks": {
    "no_internet": false,
    "no_agi": true
  }
}

…那么在推理请求中的以下 extra_body…

{
  // ...
  "tensorzero::extra_body": [
    {
      "variant_name": "my_variant", // or "model_provider_name": "my_model_provider"
      "pointer": "/agi",
      "value": true
    },
    {
      "variant_name": "my_variant", // or "model_provider_name": "my_model_provider"
      "pointer": "/safety_checks/no_agi",
      "value": {
        "bypass": "on"
      }
    }
  ]
}

…覆盖请求体（仅针对 my_variant）为：

{
  "agi": true,
  "project": "tensorzero",
  "safety_checks": {
    "no_internet": false,
    "no_agi": {
      "bypass": "on"
    }
  }
}

`tensorzero::extra_headers`

类型： 对象数组（见下方）
必填: 否

tensorzero::extra_headers 字段允许您修改 TensorZero 发送给模型提供商的请求头信息。这个高级功能是一个"应急出口"，让您能够使用 TensorZero 尚未实现的特定提供商功能。

数组中的每个对象必须包含三个字段：

variant_name 或 model_provider_name: 修改将仅应用于指定的变体或模型提供商
name: 需要修改的请求头名称
value: 要设置的标头值

示例：tensorzero::extra_headers

如果TensorZero通常会向提供商发送以下请求头…

Safety-Checks: on

…那么以下的 extra_headers…

{
  "extra_headers": [
    {
      "variant_name": "my_variant", // or "model_provider_name": "my_model_provider"
      "name": "Safety-Checks",
      "value": "off"
    },
    {
      "variant_name": "my_variant", // or "model_provider_name": "my_model_provider"
      "name": "Intelligence-Level",
      "value": "AGI"
    }
  ]
}

…覆盖请求头为：

Safety-Checks: off
Intelligence-Level: AGI

`frequency_penalty`

类型： float
必填： 否（默认值：null）

根据新词元在当前文本中的出现频率进行惩罚（正值时）或鼓励（负值时）。将覆盖所用聊天补全变体中的frequency_penalty设置。

`max_completion_tokens`

类型： integer
必填： 否（默认值：null）

限制模型在聊天补全变体中可生成的令牌数量。如果同时设置了此参数和max_tokens，则使用较小的值。

`max_tokens`

类型： integer
必填： 否（默认值：null）

限制模型在聊天补全变体中可生成的令牌数量。如果同时设置了此参数和max_completion_tokens，则使用较小的值。

`messages`

类型： 列表
必填项： 是

提供给模型的消息列表。

每条消息都是一个包含以下字段的对象：

role (必填): 消息发送者在OpenAI消息中的角色 (assistant, system, tool, 或 user)。
content (对于user和system消息是必填项，对于assistant和tool消息是可选项): 消息内容。内容必须是字符串或内容块数组(见下文)。
tool_calls (optional for assistant messages, otherwise disallowed): A list of tool calls. Each tool call is an object with the following fields:
- id: 工具调用的唯一标识符
- type: 被调用工具的类型（目前仅支持"function"类型）
- function: An object containing:
  - name: 要调用的函数名称
  - arguments: 包含函数参数的JSON字符串
tool_call_id (工具消息必填，其他情况禁止使用): 用于关联消息的工具调用ID。该ID应来自网关在工具调用id字段中最初返回的值。

内容块是一个对象，可以具有text或image_url类型。

如果内容块的类型为text，则必须包含以下任一附加字段：

text: 内容块的文本内容。
tensorzero::arguments: 一个包含TensorZero函数参数的JSON对象，这些函数带有模板和模式（详见Prompt Templates & Schemas）。

如果内容块的类型为image_url，则必须包含以下附加字段：

"image_url": A JSON object with the following field:
- url: 远程图片的URL（例如"https://example.com/image.png"）或内嵌图片的base64编码数据（例如"data:image/png;base64,..."）。

`model`

类型： string
必填项： 是

所调用的TensorZero函数或模型的名称，需带有适当的前缀。

调用方式…	使用以下格式…
在您的`tensorzero.toml`配置文件中定义为`[functions.my_function]`的函数	`tensorzero::function_name::my_function`
在您的`tensorzero.toml`配置文件中定义为`[models.my_model]`的模型	`tensorzero::model_name::my_model`
由模型提供商提供的模型，无需在您的`tensorzero.toml`配置文件中定义（如果支持，请参阅下文）	`tensorzero::model_name::{provider_type}::{model_name}`

例如，如果您有以下配置：

[models.gpt-4o]
routing = ["openai", "azure"]

[models.gpt-4o.providers.openai]
# ...

[models.gpt-4o.providers.azure]
# ...

[functions.extract-data]
# ...

然后：

tensorzero::function_name::extract-data 调用上面定义的 extract-data 函数。
tensorzero::model_name::gpt-4o 调用您配置中的 gpt-4o 模型，该模型支持从 openai 回退到 azure。详情请参阅重试与回退机制。
tensorzero::model_name::openai::gpt-4o 直接调用OpenAI API获取gpt-4o模型，忽略上面定义的gpt-4o模型。

`parallel_tool_calls`

类型： boolean
必填： 否（默认值：null）

覆盖被调用函数的parallel_tool_calls设置。

`presence_penalty`

类型： float
必填： 否（默认值：null）

根据新词元是否已在当前文本中出现进行惩罚（正值时）或鼓励（负值时）。覆盖所用任何聊天补全变体的presence_penalty设置。

`response_format`

类型： 可以是字符串或对象
必填： 否（默认值：null）

这里的选项包括"text"、"json_object"和"{"type": "json_schema", "schema": ...}"，其中schema字段包含有效的JSON模式。实际上这个字段只有在"json_schema"变体中才会被遵循，此时schema字段可用于动态设置json函数的输出模式。

`seed`

类型： integer
必填： 否（默认值：null）

覆盖任何正在使用的聊天补全变体的seed设置。

`stream`

类型： boolean
必填： 否（默认值：false）

如果为true，网关将以OpenAI兼容的格式将响应流式传输给客户端。

`stream_options`

类型： 带有字段 "include_usage" 的对象
必填： 否（默认值：null）

如果 "include_usage" 为 true，网关将在响应中包含使用信息。

示例：stream_options

如果提供了以下stream_options...

{
  ...
  "stream_options": {
    "include_usage": true
  }
  ...
}

…那么网关将在响应中包含使用信息。

{
  ...
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "total_tokens": 579
  }
  ...

`temperature`

类型： float
必填： 否（默认值：null）

覆盖任何正在使用的聊天补全变体的temperature设置。

`tools`

类型： tool 对象列表（见下方）
必填： 否（默认值：null）

允许用户在推理时动态指定工具，而不仅限于配置文件中预设的工具。

每个 tool 对象具有以下结构：

type: 必须为 "function"
function: An object containing:
- name: 函数名称（字符串类型，必填）
- description: 对该函数功能的描述（字符串类型，可选）
- parameters: 描述函数参数的JSON Schema对象（必需）
- strict: 是否强制执行严格模式验证（布尔值，默认为false）

`tool_choice`

类型： 字符串或对象
必填： 否（默认值：若无工具则为"none"，若存在工具则为"auto"）

通过覆盖配置中的值来控制模型调用哪个工具（如果有的话）。支持的取值：

"none": 模型不会调用任何工具，而是生成一条消息
"auto": 模型可以选择生成消息或调用一个或多个工具
"required": 模型必须调用一个或多个工具
{"type": "function", "function": {"name": "my_function"}}: 强制模型调用指定的工具

`top_p`

类型： float
必填： 否（默认值：null）

覆盖所有正在使用的聊天补全变体的top_p设置。

`tensorzero::variant_name`

类型： string
必填: 否

如果设置此项，会将推理请求固定到特定变体（不推荐）。

通常您不应手动设置此字段，而应让TensorZero网关自动分配变体版本。此字段主要用于测试或调试目的。

该字段应作为请求体中的额外参数提供。

在常规（非流式）模式下，响应是一个包含以下字段的JSON对象：

`choices`

Type: list of choice objects, where each choice contains:
- index: 从零开始的索引，表示选项在列表中的位置（整数）
- finish_reason: 始终为"stop"。
- message: An object containing:
  - content: 消息内容（字符串，可选）
  - tool_calls: 模型调用的工具列表（可选）。格式与请求中的相同。
  - role: 消息发送者的角色（始终为"assistant"）。

`created`

类型： integer

推理创建时的Unix时间戳（以秒为单位）。

`episode_id`

类型： UUID

推理所针对的剧集ID。

`id`

类型： UUID

推理ID。

`model`

类型： string

实际用于推理的变体名称。

`object`

类型： string

推理对象的类型（始终为"chat.completion"）。

`system_fingerprint`

类型： string

始终 ""

`usage`

类型： 对象

包含请求和响应的令牌使用信息，包含以下字段：

prompt_tokens: 提示词中的token数量（整数）
completion_tokens: 补全内容中的token数量（整数）
total_tokens: 使用的总token数（整数）

在流式模式下，响应是一个SSE的JSON消息流，最后会跟随一个[DONE]结束消息。

每条JSON消息包含以下字段：

`choices`

类型： 列表

模型生成的一系列选项列表，其中每个选项包含：

index: 选项的索引值（整数）
finish_reason: 始终为""
delta: An object containing either:
- content: 下一段生成的文本内容（字符串），或
- tool_calls: 工具调用列表，其中每个元素包含正在生成的工具调用的下一部分

`created`

类型： integer

推理创建时的Unix时间戳（以秒为单位）。

`episode_id`

类型： UUID

推理所针对的剧集ID。

`id`

类型： UUID

推理ID。

`model`

类型： string

实际用于推理的变体名称。

`object`

类型： string

推理对象的类型（始终为"chat.completion"）。

`system_fingerprint`

类型： string

始终 ""

`usage`

类型： 对象
必填: 否

包含请求和响应的令牌使用信息，包含以下字段：

prompt_tokens: 提示词中的token数量（整数）
completion_tokens: 补全部分的token数量（整数）
total_tokens: 使用的总token数（整数）

示例

带结构化系统提示的聊天功能

Chat Function with Structured System Prompt

配置

# ...
[functions.draft_email]
type = "chat"
system_schema = "functions/draft_email/system_schema.json"
# ...

{
  "type": "object",
  "properties": {
    "assistant_name": { "type": "string" }
  }
}

请求

Python
HTTP

from openai import AsyncOpenAI

async with AsyncOpenAI(
    base_url="http://localhost:3000/openai/v1"
) as client:
    result = await client.chat.completions.create(
        # there already was an episode_id from an earlier inference
        extra_body={"tensorzero::episode_id": str(episode_id)},
        messages=[
            {
                "role": "system",
                "content": [{"assistant_name": "Alfred Pennyworth"}]
                # NOTE: the JSON is in an array here so that a structured system message can be sent
            },
            {
                "role": "user",
                "content": "I need to write an email to Gabriel explaining..."
            }
        ],
        model="tensorzero::function_name::draft_email",
        temperature=0.4,
        # Optional: stream=True
    )

curl -X POST http://localhost:3000/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "episode_id: your_episode_id_here" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": [{"assistant_name": "Alfred Pennyworth"}]
      },
      {
        "role": "user",
        "content": "I need to write an email to Gabriel explaining..."
      }
    ],
    "model": "tensorzero::function_name::draft_email",
    "temperature": 0.4
    // Optional: "stream": true
  }'

响应

常规
流式处理

{
  "id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "model": "email_draft_variant",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "content": "Hi Gabriel,\n\nI noticed...",
        "role": "assistant"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200
  }
}

在流式模式下，响应是一个SSE的JSON消息流，最后会跟随一个[DONE]结束消息。

每条JSON消息包含以下字段：

{
  "id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "model": "email_draft_variant",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "delta": {
        "content": "Hi Gabriel,\n\nI noticed..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200
  }
}

支持动态工具调用的聊天功能

Chat Function with Dynamic Tool Use

配置

# ...

[functions.weather_bot]
type = "chat"
# Note: no `tools = ["get_temperature"]` field in configuration

# ...

请求

Python
HTTP

from openai import AsyncOpenAI

async with AsyncOpenAI(
    base_url="http://localhost:3000/openai/v1"
) as client:
    result = await client.chat.completions.create(
        model="tensorzero::function_name::weather_bot",
        input={
            "messages": [
                {
                    "role": "user",
                    "content": "What is the weather like in Tokyo?"
                }
            ]
        },
        tools=[
            {
              "type": "function",
              "function": {
                  "name": "get_temperature",
                  "description": "Get the current temperature in a given location",
                  "parameters": {
                    "$schema": "http://json-schema.org/draft-07/schema#",
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The location to get the temperature for (e.g. \"New York\")"
                        },
                        "units": {
                            "type": "string",
                            "description": "The units to get the temperature in (must be \"fahrenheit\" or \"celsius\")",
                            "enum": ["fahrenheit", "celsius"]
                        }
                    },
                    "required": ["location"],
                    "additionalProperties": false
                }
              }
            }
        ],
        # optional: stream=True,
    )

curl -X POST http://localhost:3000/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tensorzero::function_name::weather_bot",
    "input": {
      "messages": [
        {
          "role": "user",
          "content": "What is the weather like in Tokyo?"
        }
      ]
    },
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_temperature",
          "description": "Get the current temperature in a given location",
          "parameters": {
            "$schema": "http://json-schema.org/draft-07/schema#",
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The location to get the temperature for (e.g. \"New York\")"
              },
              "units": {
                "type": "string",
                "description": "The units to get the temperature in (must be \"fahrenheit\" or \"celsius\")",
                "enum": ["fahrenheit", "celsius"]
              }
            },
            "required": ["location"],
            "additionalProperties": false
          }
        }
      }
    ]
    // optional: "stream": true
  }'

响应

常规
流式处理

{
  "id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "model": "weather_bot_variant",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "content": null,
        "tool_calls": [
          {
            "id": "123456789",
            "type": "function",
            "function": {
              "name": "get_temperature",
              "arguments": "{\"location\": \"Tokyo\", \"units\": \"celsius\"}"
            }
          }
        ],
        "role": "assistant"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200
  }
}

在流式模式下，响应是一个SSE的JSON消息流，最后会跟随一个[DONE]结束消息。

每条JSON消息包含以下字段：

{
  "id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "model": "weather_bot_variant",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "content": null,
        "tool_calls": [
          {
            "id": "123456789",
            "type": "function",
            "function": {
              "name": "get_temperature",
              "arguments": "{\"location\":" // a tool arguments delta
            }
          }
        ]
      }
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200
  }
}

支持动态输出模式的Json函数

JSON Function with Dynamic Output Schema

配置

# ...
[functions.extract_email]
type = "json"
output_schema = "output_schema.json"
# ...

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "email": {
      "type": "string"
    }
  },
  "required": ["email"]
}

请求

Python
HTTP

from openai import AsyncOpenAI

dynamic_output_schema = {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "email": { "type": "string" },
    "domain": { "type": "string" }
  },
  "required": ["email", "domain"]
}

async with AsyncOpenAI(
    base_url="http://localhost:3000/openai/v1"
) as client:
    result = await client.chat.completions.create(
        model="tensorzero::function_name::extract_email",
        input={
            "system": "You are an AI assistant...",
            "messages": [
                {
                    "role": "user",
                    "content": "...blah blah blah [email protected] blah blah blah..."
                }
            ]
        }
        # Override the output schema using the `response_format` field
        response_format={"type": "json_schema", "schema": dynamic_output_schema}
        # optional: stream=True,
    )

curl -X POST http://localhost:3000/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tensorzero::function_name::extract_email",
    "input": {
      "system": "You are an AI assistant...",
      "messages": [
        {
          "role": "user",
          "content": "...blah blah blah [email protected] blah blah blah..."
        }
      ]
    },
    "response_format": {
      "type": "json_schema",
      "schema": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
          "email": { "type": "string" },
          "domain": { "type": "string" }
        },
        "required": ["email", "domain"]
      }
    },
    // optional: "stream": true
  }'

响应

常规
流式处理

{
  "id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "model": "extract_email_variant",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "content": "{\"email\": \"[email protected]\", \"domain\": \"tensorzero.com\"}"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200
  }
}

在流式模式下，响应是一个SSE的JSON消息流，最后会跟随一个[DONE]结束消息。

每条JSON消息包含以下字段：

{
  "id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "model": "extract_email_variant",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "content": "{\"email\":" // a JSON content delta
      }
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200
  }
}