多模态推理（视觉语言模型）

TensorZero Gateway 支持视觉语言模型(VLMs)的多模态推理(例如图像输入)。

查看集成获取支持的模型列表。

设置

对象存储

TensorZero利用对象存储来存储多模态推理过程中使用的图像。它支持任何兼容S3的对象存储服务，包括AWS S3、GCP云存储、Cloudflare R2等多种服务。您可以在配置文件的object_storage部分配置对象存储服务。

在本示例中，我们将使用MinIO的本地部署，这是一个兼容S3协议的开源对象存储服务。

[object_storage]
type = "s3_compatible"
endpoint = "http://minio:9000"  # optional: defaults to AWS S3
# region = "us-east-1"  # optional: depends on your S3-compatible storage provider
bucket_name = "tensorzero"  # optional: depends on your S3-compatible storage provider
# IMPORTANT: for production environments, remove the following setting and use a secure method of authentication in
# combination with a production-grade object storage service.
allow_http = true

您也可以将图像存储在本地目录中（type = "filesystem"）或禁用图像存储功能（type = "disabled"）。详情请参阅配置参考。

TensorZero网关将按以下优先级顺序尝试从以下资源获取凭据：

S3_ACCESS_KEY_ID 和 S3_SECRET_ACCESS_KEY 环境变量
AWS_ACCESS_KEY_ID 和 AWS_SECRET_ACCESS_KEY 环境变量
AWS SDK 的默认凭证配置

Docker Compose

我们将使用Docker Compose来部署TensorZero Gateway、ClickHouse和MinIO。

docker-compose.yml

# This is a simplified example for learning purposes. Do not use this in production.
# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment

services:
  clickhouse:
    image: clickhouse/clickhouse-server:24.12-alpine
    environment:
      - CLICKHOUSE_USER=chuser
      - CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1
      - CLICKHOUSE_PASSWORD=chpassword
    ports:
      - "8123:8123"
    healthcheck:
      test: wget --spider --tries 1 http://chuser:chpassword@clickhouse:8123/ping
      start_period: 30s
      start_interval: 1s
      timeout: 1s

  gateway:
    image: tensorzero/gateway
    volumes:
      # Mount our tensorzero.toml file into the container
      - ./config:/app/config:ro
    command: --config-file /app/config/tensorzero.toml
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY:?Environment variable OPENAI_API_KEY must be set.}
      - S3_ACCESS_KEY_ID=miniouser
      - S3_SECRET_ACCESS_KEY=miniopassword
      - TENSORZERO_CLICKHOUSE_URL=http://chuser:chpassword@clickhouse:8123/tensorzero
    ports:
      - "3000:3000"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    depends_on:
      clickhouse:
        condition: service_healthy
      minio:
        condition: service_healthy

  # For a production deployment, you can use AWS S3, GCP Cloud Storage, Cloudflare R2, etc.
  minio:
    image: bitnami/minio
    ports:
      - "9000:9000" # API port
      - "9001:9001" # Console port
    environment:
      - MINIO_ROOT_USER=miniouser
      - MINIO_ROOT_PASSWORD=miniopassword
      - MINIO_DEFAULT_BUCKETS=tensorzero
    healthcheck:
      test: "mc ls local/tensorzero || exit 1"
      start_period: 30s
      start_interval: 1s
      timeout: 1s

推理

完成设置后，您现在可以使用TensorZero Gateway进行多模态推理。

TensorZero网关支持接收嵌入式图像（以base64字符串编码）和远程图像（通过URL指定）。

from tensorzero import TensorZeroGateway

with TensorZeroGateway.build_http(
    gateway_url="http://localhost:3000",
) as client:
    response = client.inference(
        model_name="openai::gpt-4o-mini",
        input={
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": "Do the images share any common features?",
                        },
                        # Remote image of Ferris the crab
                        {
                            "type": "image",
                            "url": "https://raw.githubusercontent.com/tensorzero/tensorzero/ff3e17bbd3e32f483b027cf81b54404788c90dc1/tensorzero-internal/tests/e2e/providers/ferris.png",
                        },
                        # One-pixel orange image encoded as a base64 string
                        {
                            "type": "image",
                            "mime_type": "image/png",
                            "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAA1JREFUGFdj+O/P8B8ABe0CTsv8mHgAAAAASUVORK5CYII=",
                        },
                    ],
                }
            ],
        },
    )

    print(response)

from openai import OpenAI

with OpenAI(base_url="http://localhost:3000/openai/v1") as client:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Do the images share any common features?",
                    },
                    # Remote image of Ferris the crab
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://raw.githubusercontent.com/tensorzero/tensorzero/ff3e17bbd3e32f483b027cf81b54404788c90dc1/tensorzero-internal/tests/e2e/providers/ferris.png",
                        },
                    },
                    # One-pixel orange image encoded as a base64 string
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAA1JREFUGFdj+O/P8B8ABe0CTsv8mHgAAAAASUVORK5CYII=",
                        },
                    },
                ],
            }
        ],
    )

    print(response)

curl -X POST http://localhost:3000/inference \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "openai::gpt-4o-mini",
    "input": {
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "type": "text",
              "text": "Do the images share any common features?"
            },
            {
              "type": "image",
              "url": "https://raw.githubusercontent.com/tensorzero/tensorzero/ff3e17bbd3e32f483b027cf81b54404788c90dc1/tensorzero-internal/tests/e2e/providers/ferris.png"
            },
            {
              "type": "image",
              "mime_type": "image/png",
              "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAA1JREFUGFdj+O/P8B8ABe0CTsv8mHgAAAAASUVORK5CYII="
            }
          ]
        }
      ]
    }
  }'