授权#

SkyServe 在副本级别提供了强大的授权功能,允许您使用 API 密钥控制对服务端点的访问。

设置API密钥#

SkyServe依赖于运行在底层服务副本上的服务的授权,例如推理引擎。我们以vLLM推理引擎为例,它支持通过参数--api-key进行静态API密钥授权。

我们定义了一个SkyServe服务规范,用于使用vLLM和API密钥提供Llama-3聊天机器人服务。在下面的YAML示例中,我们将授权令牌定义为环境变量AUTH_TOKEN,并将其传递给服务字段,以启用readiness_probe访问副本,并将vllm入口点传递给副本以使用API密钥启动服务。

# auth.yaml
envs:
  MODEL_NAME: meta-llama/Meta-Llama-3-8B-Instruct
  HF_TOKEN: # TODO: Fill with your own huggingface token, or use --env to pass.
  AUTH_TOKEN: # TODO: Fill with your own auth token (a random string), or use --env to pass.

service:
  readiness_probe:
    path: /v1/models
    headers:
      Authorization: Bearer $AUTH_TOKEN
  replicas: 2

resources:
  accelerators: {L4, A10g, A10, L40, A40, A100, A100-80GB}
  ports: 8000

setup: |
  pip install vllm==0.4.0.post1 flash-attn==2.5.7 gradio openai
  # python -c "import huggingface_hub; huggingface_hub.login('${HF_TOKEN}')"

run: |
  export PATH=$PATH:/sbin
  python -m vllm.entrypoints.openai.api_server \
    --model $MODEL_NAME --trust-remote-code \
    --gpu-memory-utilization 0.95 \
    --host 0.0.0.0 --port 8000 \
    --api-key $AUTH_TOKEN

要部署服务,请运行以下命令:

HF_TOKEN=xxx AUTH_TOKEN=yyy sky serve up auth.yaml -n auth --env HF_TOKEN --env AUTH_TOKEN

要向服务端点发送请求,服务客户端需要在请求头中包含静态API密钥:

$ ENDPOINT=$(sky serve status --endpoint auth)
$ AUTH_TOKEN=yyy
$ curl http://$ENDPOINT/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $AUTH_TOKEN" \
    -d '{
      "model": "meta-llama/Meta-Llama-3-8B-Instruct",
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant."
        },
        {
          "role": "user",
          "content": "Who are you?"
        }
      ],
      "stop_token_ids": [128009, 128001]
    }' | jq
Example output
{
  "id": "cmpl-cad2c1a2a6ee44feabed0b28be294d6f",
  "object": "chat.completion",
  "created": 1716819147,
  "model": "meta-llama/Meta-Llama-3-8B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'm so glad you asked! I'm LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a conversational manner. I'm here to help you with any questions, tasks, or topics you'd like to discuss. I can provide information on a wide range of subjects, from science and history to entertainment and culture. I can also assist with language-related tasks such as language translation, text summarization, and even writing and proofreading. My goal is to provide accurate and helpful responses to your inquiries, while also being friendly and engaging. So, what's on your mind? How can I assist you today?"
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": 128009
    }
  ],
  "usage": {
    "prompt_tokens": 26,
    "total_tokens": 160,
    "completion_tokens": 134
  }
}

没有API密钥的服务客户端将无法访问服务,并会收到401 Unauthorized错误:

$ curl http://$ENDPOINT/v1/models
{"error": "Unauthorized"}

$ curl http://$ENDPOINT/v1/models -H "Authorization: Bearer random-string"
{"error": "Unauthorized"}