SGLang 入门指南
本指南展示如何通过SGLang搭建一个最小化部署环境,以使用TensorZero Gateway对接自托管的大型语言模型。
本示例中我们使用的是Llama-3.1-8B-Instruct模型,但实际上您可以使用SGLang支持的几乎所有模型。
设置
This guide assumes that you are running SGLang locally with this command (from https://docs.sglang.ai/start/install.html)
docker run --gpus all \ # Set shared memory size - needed for loading large models and processing requests --shm-size 32g \ -p 30000:30000 \ # Mount the host's ~/.cache/huggingface directory to the container's /root/.cache/huggingface directory -v ~/.cache/huggingface:/root/.cache/huggingface \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000请确保更新下方配置中的api_base以匹配您的SGLang服务器。
对于这个最小化配置,您的项目目录中只需要两个文件:
Directoryconfig/
- tensorzero.toml
- docker-compose.yml
关于生产环境部署,请参阅我们的部署指南。
配置
创建一个最小化的配置文件,定义模型和一个简单的聊天功能:
[models.llama]routing = ["sglang"]
[models.llama.providers.sglang]type = "sglang"api_base = "http://host.docker.internal:8080/v1/" # for SGLang running locally on the hostapi_key_location = "none" # by default, SGLang requires no API keymodel_name = "my-sglang-model"
[functions.my_function_name]type = "chat"
[functions.my_function_name.variants.my_variant_name]type = "chat_completion"model = "llama"Credentials
模型提供商配置中的api_key_location字段用于指定如何处理API密钥认证:
-
如果您的终端节点不需要API密钥(例如默认情况下的SGLang):
api_key_location = "none" -
如果您的终端节点需要API密钥,您有两种选择:
-
通过环境变量预先配置:
api_key_location = "env::ENVIRONMENT_VARIABLE_NAME"在启动网关前需要先设置好环境变量。
-
在推理时提供:
api_key_location = "dynamic::ARGUMENT_NAME"然后可以在推理请求中传递API密钥。
-
在本示例中,SGLang在本地运行且无需身份验证,因此我们使用api_key_location = "none"。
部署 (Docker Compose)
创建一个最小化的Docker Compose配置:
# This is a simplified example for learning purposes. Do not use this in production.# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment
services: gateway: image: tensorzero/gateway volumes: - ./config:/app/config:ro command: --config-file /app/config/tensorzero.toml # environment: # - SGLANG_API_KEY=${SGLANG_API_KEY:?Environment variable SGLANG_API_KEY must be set.} ports: - "3000:3000" extra_hosts: - "host.docker.internal:host-gateway"您可以通过docker compose up命令启动网关。
推理
向网关发起推理请求:
curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "my_function_name", "input": { "messages": [ { "role": "user", "content": "What is the capital of Japan?" } ] } }'