Text Generation Inference (TGI) 入门指南

本指南展示如何通过Text Generation Inference (TGI)搭建最小化部署，将自托管的大语言模型与TensorZero网关结合使用。

本示例中我们使用的是Phi-4模型，但实际上您可以使用TGI支持的任何模型。

设置

本指南假设您正在本地运行TGI

docker run \
    --gpus all \
    # Set shared memory size - needed for loading large models and processing requests
    --shm-size 64g \
    # Map the host's port 8080 to the container's port 80
    -p 8080:80 \
    # Mount the host's './data' directory to the container's '/data' directory
    -v $PWD/data:/data \
    ghcr.io/huggingface/text-generation-inference:3.0.1 \
    --model-id microsoft/phi-4

请确保更新下方配置中的api_base以匹配您的TGI服务器。

对于这个最小化配置，您的项目目录中只需要两个文件：

Directoryconfig/
- tensorzero.toml
docker-compose.yml

关于生产环境部署，请参阅我们的部署指南。

配置

创建一个最小化的配置文件，定义模型和一个简单的聊天功能：

[models.phi_4]
routing = ["tgi"]

[models.phi_4.providers.tgi]
type = "tgi"
api_base = "http://host.docker.internal:8080/v1/"  # for TGI running locally on the host
api_key_location = "none"  # by default, TGI requires no API key

[functions.my_function_name]
type = "chat"

[functions.my_function_name.variants.my_variant_name]
type = "chat_completion"
model = "phi_4"

Credentials

模型提供商配置中的api_key_location字段用于指定如何处理API密钥认证：

如果您的终端节点不需要API密钥（例如默认情况下的TGI）：
```
api_key_location = "none"
```
如果您的终端节点需要API密钥，您有两种选择：
1. 通过环境变量预先配置：
```
api_key_location = "env::ENVIRONMENT_VARIABLE_NAME"
```
  在启动网关前需要先设置好环境变量。
2. 在推理时提供:
```
api_key_location = "dynamic::ARGUMENT_NAME"
```
  然后可以在推理请求中传递API密钥。

更多详情请参阅配置参考和API参考。

在本示例中，TGI在本地运行且无需身份验证，因此我们使用api_key_location = "none"。

部署 (Docker Compose)

创建一个最小化的Docker Compose配置：

# This is a simplified example for learning purposes. Do not use this in production.
# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment

services:
  gateway:
    image: tensorzero/gateway
    volumes:
      - ./config:/app/config:ro
    command: --config-file /app/config/tensorzero.toml
    # environment:
    #   - TGI_API_KEY=${TGI_API_KEY:?Environment variable TGI_API_KEY must be set.}
    ports:
      - "3000:3000"
    extra_hosts:
      - "host.docker.internal:host-gateway"

您可以通过docker compose up命令启动网关。

推理

向网关发起推理请求：

curl -X POST http://localhost:3000/inference \
  -H "Content-Type: application/json" \
  -d '{
    "function_name": "my_function_name",
    "input": {
      "messages": [
        {
          "role": "user",
          "content": "What is the capital of Japan?"
        }
      ]
    }
  }'