[测试版] Vertex AI 端点

使用 VertexAI SDK 在 LiteLLM 网关上调用端点（原生提供者格式）

tip

寻找 VertexAI 的统一 API（OpenAI 格式）？点击这里 - 使用 VertexAI 与 LiteLLM SDK 或 LiteLLM 代理服务器

支持的 API 端点

Gemini API
Embeddings API
Imagen API
Code Completion API
Batch prediction API
Tuning API
CountTokens API

认证到 Vertex AI

LiteLLM 代理服务器支持两种认证到 Vertex AI 的方法：

将 Vertex 凭证从客户端传递给代理服务器
在代理服务器上设置 Vertex AI 凭证

快速开始使用

将 Vertex 凭证从客户端传递给代理服务器
在代理服务器上设置 Vertex AI 凭证

1. 启动 litellm 代理

litellm --config /path/to/config.yaml

2. 测试它

import vertexai
from vertexai.preview.generative_models import GenerativeModel

LITE_LLM_ENDPOINT = "http://localhost:4000"

vertexai.init(
    project="<your-vertex-ai-project-id>", # 输入你的项目 ID
    location="<your-vertex-ai-location>", # 输入你的区域
    api_endpoint=f"{LITE_LLM_ENDPOINT}/vertex-ai", # litellm 上的路由
    api_transport="rest",
)

model = GenerativeModel(model_name="gemini-1.0-pro")
model.generate_content("hi")

1. 在你的 `config.yaml` 中设置 `default_vertex_config`

将以下凭证添加到你的 litellm config.yaml 以使用 Vertex AI 端点。

default_vertex_config:
  vertex_project: "adroit-crow-413218"
  vertex_location: "us-central1"
  vertex_credentials: "/Users/ishaanjaffer/Downloads/adroit-crow-413218-a956eef1a2a8.json" # 添加服务账户.json 的路径

2. 启动 litellm 代理

litellm --config /path/to/config.yaml

3. 测试它

import vertexai
from google.auth.credentials import Credentials
from vertexai.generative_models import GenerativeModel

LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"

import datetime

class CredentialsWrapper(Credentials):
    def __init__(self, token=None):
        super().__init__()
        self.token = token
        self.expiry = None  # 或者设置为未来的日期（如果需要）

    def refresh(self, request):
        pass

    def apply(self, headers, token=None):
        headers["Authorization"] = f"Bearer {self.token}"

    @property
    def expired(self):
        return False  # 始终认为令牌未过期

    @property
    def valid(self):
        return True  # 始终认为凭证有效

credentials = CredentialsWrapper(token=LITELLM_PROXY_API_KEY)

vertexai.init(
    project="adroit-crow-413218",
    location="us-central1",
    api_endpoint=LITELLM_PROXY_BASE,
    credentials=credentials,
    api_transport="rest",
)

model = GenerativeModel("gemini-1.5-flash-001")

response = model.generate_content(
    "What's a good name for a flower shop that specializes in selling bouquets of dried flowers?"
)

print(response.text)

使用示例

Gemini API（生成内容）

Vertex Python SDK（客户端 Vertex 凭证）
Vertex Python SDK（litellm 虚拟密钥客户端）
Curl

import vertexai
from vertexai.generative_models import GenerativeModel

LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"

vertexai.init(
    project="adroit-crow-413218",
    location="us-central1",
    api_endpoint=LITELLM_PROXY_BASE,
    api_transport="rest",
)

model = GenerativeModel("gemini-1.5-flash-001")

response = model.generate_content(
    "What's a good name for a flower shop that specializes in selling bouquets of dried flowers?"
)

print(response.text)

import vertexai
from google.auth.credentials import Credentials
from vertexai.generative_models import GenerativeModel

LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"

import datetime


class CredentialsWrapper(Credentials):
    def __init__(self, token=None):
        super().__init__()
        self.token = token
        self.expiry = None  # 或者根据需要设置为未来的日期

    def refresh(self, request):
        pass

    def apply(self, headers, token=None):
        headers["Authorization"] = f"Bearer {self.token}"

    @property
    def expired(self):
        return False  # 始终认为令牌未过期

    @property
    def valid(self):
        return True  # 始终认为凭证有效


credentials = CredentialsWrapper(token=LITELLM_PROXY_API_KEY)

vertexai.init(
    project="adroit-crow-413218",
    location="us-central1",
    api_endpoint=LITELLM_PROXY_BASE,
    credentials=credentials,
    api_transport="rest",
   
)

model = GenerativeModel("gemini-1.5-flash-001")

response = model.generate_content(
    "什么是一个好的花店名称，专门销售干花束？"
)

print(response.text)

curl http://localhost:4000/vertex-ai/publishers/google/models/gemini-1.5-flash-001:generateContent \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{"contents":[{"role": "user", "parts":[{"text": "hi"}]}]}'

嵌入API

Vertex Python SDK (客户端侧Vertex凭证)
Vertex Python SDK (litellm虚拟密钥客户端侧)
Curl

from typing import List, Optional
from vertexai.language_models import TextEmbeddingInput, TextEmbeddingModel
import vertexai
from vertexai.generative_models import GenerativeModel

LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"

import datetime

vertexai.init(
    project="adroit-crow-413218",
    location="us-central1",
    api_endpoint=LITELLM_PROXY_BASE,
    api_transport="rest",
)


def embed_text(
    texts: List[str] = ["香蕉松饼？", "香蕉面包？香蕉松饼？"],
    task: str = "RETRIEVAL_DOCUMENT",
    model_name: str = "text-embedding-004",
    dimensionality: Optional[int] = 256,
) -> List[List[float]]:
    """使用预训练的基础模型嵌入文本。"""
    model = TextEmbeddingModel.from_pretrained(model_name)
    inputs = [TextEmbeddingInput(text, task) for text in texts]
    kwargs = dict(output_dimensionality=dimensionality) if dimensionality else {}
    embeddings = model.get_embeddings(inputs, **kwargs)
    return [embedding.values for embedding in embeddings]

from typing import List, Optional
from vertexai.language_models import TextEmbeddingInput, TextEmbeddingModel
import vertexai
from google.auth.credentials import Credentials
from vertexai.generative_models import GenerativeModel

LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"

import datetime


class CredentialsWrapper(Credentials):
    def __init__(self, token=None):
        super().__init__()
        self.token = token
        self.expiry = None  # 如果需要，可以设置为未来的日期

    def refresh(self, request):
        pass

    def apply(self, headers, token=None):
        headers["Authorization"] = f"Bearer {self.token}"

    @property
    def expired(self):
        return False  # 始终认为令牌未过期

    @property
    def valid(self):
        return True  # 始终认为凭证有效


credentials = CredentialsWrapper(token=LITELLM_PROXY_API_KEY)

vertexai.init(
    project="adroit-crow-413218",
    location="us-central1",
    api_endpoint=LITELLM_PROXY_BASE,
    credentials=credentials,
    api_transport="rest",
)


def embed_text(
    texts: List[str] = ["香蕉松饼？", "香蕉面包？香蕉松饼？"],
    task: str = "RETRIEVAL_DOCUMENT",
    model_name: str = "text-embedding-004",
    dimensionality: Optional[int] = 256,
) -> List[List[float]]:
    """使用预训练的基础模型嵌入文本。"""
    model = TextEmbeddingModel.from_pretrained(model_name)
    inputs = [TextEmbeddingInput(text, task) for text in texts]
    kwargs = dict(output_dimensionality=dimensionality) if dimensionality else {}
    embeddings = model.get_embeddings(inputs, **kwargs)
    return [embedding.values for embedding in embeddings]

curl http://localhost:4000/vertex-ai/publishers/google/models/textembedding-gecko@001:predict \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{"instances":[{"content": "gm"}]}'

Imagen API

Vertex Python SDK (客户端侧Vertex凭证)
Vertex Python SDK (litellm虚拟密钥客户端侧)
Curl

from typing import List, Optional
from vertexai.preview.vision_models import ImageGenerationModel
import vertexai
from google.auth.credentials import Credentials

LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"

import datetime

vertexai.init(
    project="adroit-crow-413218",
    location="us-central1",
    api_endpoint=LITELLM_PROXY_BASE,
    api_transport="rest",
)

model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-001")

images = model.generate_images(
    prompt=prompt,
    # 可选参数
    number_of_images=1,
    language="en",
    # 不能同时使用种子值和水印。
    # add_watermark=False,
    # seed=100,
    aspect_ratio="1:1",
    safety_filter_level="block_some",
    person_generation="allow_adult",
)

images[0].save(location=output_file, include_generation_parameters=False)

# 可选。在笔记本中查看生成的图像。
# images[0].show()

print(f"使用 {len(images[0]._image_bytes)} 字节创建了输出图像")

from typing import List, Optional
from vertexai.preview.vision_models import ImageGenerationModel
import vertexai
from google.auth.credentials import Credentials

LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"

import datetime


class CredentialsWrapper(Credentials):
    def __init__(self, token=None):
        super().__init__()
        self.token = token
        self.expiry = None  # 或者根据需要设置为未来的日期

    def refresh(self, request):
        pass

    def apply(self, headers, token=None):
        headers["Authorization"] = f"Bearer {self.token}"

    @property
    def expired(self):
        return False  # 总是认为令牌未过期

    @property
    def valid(self):
        return True  # 总是认为凭据有效


credentials = CredentialsWrapper(token=LITELLM_PROXY_API_KEY)

vertexai.init(
    project="adroit-crow-413218",
    location="us-central1",
    api_endpoint=LITELLM_PROXY_BASE,
    credentials=credentials,
    api_transport="rest",
)

model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-001")

images = model.generate_images(
    prompt=prompt,
    # 可选参数
    number_of_images=1,
    language="en",
    # 不能同时使用种子值和水印。
    # add_watermark=False,
    # seed=100,
    aspect_ratio="1:1",
    safety_filter_level="block_some",
    person_generation="allow_adult",
)

images[0].save(location=output_file, include_generation_parameters=False)

# 可选。在笔记本中查看生成的图像。
# images[0].show()

print(f"使用 {len(images[0]._image_bytes)} 字节创建输出图像")

curl http://localhost:4000/vertex-ai/publishers/google/models/imagen-3.0-generate-001:predict \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{"instances":[{"prompt": "make an otter"}], "parameters": {"sampleCount": 1}}'

计数令牌API

Vertex Python SDK (客户端顶点凭证)
Vertex Python SDK (litellm虚拟密钥客户端)
Curl

from typing import List, Optional
from vertexai.generative_models import GenerativeModel
import vertexai

LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"

import datetime

vertexai.init(
    project="adroit-crow-413218",
    location="us-central1",
    api_endpoint=LITELLM_PROXY_BASE,
    api_transport="rest",
)


model = GenerativeModel("gemini-1.5-flash-001")

prompt = "为什么天空是蓝色的？"

# 提示令牌计数
response = model.count_tokens(prompt)
print(f"提示令牌计数: {response.total_tokens}")
print(f"提示字符计数: {response.total_billable_characters}")

# 发送文本到Gemini
response = model.generate_content(prompt)

# 响应令牌计数
usage_metadata = response.usage_metadata
print(f"提示令牌计数: {usage_metadata.prompt_token_count}")
print(f"候选令牌计数: {usage_metadata.candidates_token_count}")
print(f"总令牌计数: {usage_metadata.total_token_count}")

from typing import List, Optional
from vertexai.generative_models import GenerativeModel
import vertexai
from google.auth.credentials import Credentials

LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"

import datetime


class CredentialsWrapper(Credentials):
    def __init__(self, token=None):
        super().__init__()
        self.token = token
        self.expiry = None  # 或根据需要设置为未来日期

    def refresh(self, request):
        pass

    def apply(self, headers, token=None):
        headers["Authorization"] = f"Bearer {self.token}"

    @property
    def expired(self):
        return False  # 始终认为令牌未过期

    @property
    def valid(self):
        return True  # 始终认为凭证有效


credentials = CredentialsWrapper(token=LITELLM_PROXY_API_KEY)

vertexai.init(
    project="adroit-crow-413218",
    location="us-central1",
    api_endpoint=LITELLM_PROXY_BASE,
    credentials=credentials,
    api_transport="rest",
)


model = GenerativeModel("gemini-1.5-flash-001")

prompt = "为什么天空是蓝色的？"

# 提示令牌计数
response = model.count_tokens(prompt)
print(f"提示令牌计数: {response.total_tokens}")
print(f"提示字符计数: {response.total_billable_characters}")

# 发送文本到Gemini
response = model.generate_content(prompt)

# 响应令牌计数
usage_metadata = response.usage_metadata
print(f"提示令牌计数: {usage_metadata.prompt_token_count}")
print(f"候选令牌计数: {usage_metadata.candidates_token_count}")
print(f"总令牌计数: {usage_metadata.total_token_count}")

curl http://localhost:4000/vertex-ai/publishers/google/models/gemini-1.5-flash-001:countTokens \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{"contents":[{"role": "user", "parts":[{"text": "hi"}]}]}'

调优API

创建微调作业

Vertex Python SDK (客户端顶点凭证)
Vertex Python SDK (litellm虚拟密钥客户端)
Curl

from typing import List, Optional
from vertexai.preview.tuning import sft
import vertexai

LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"


vertexai.init(
    project="adroit-crow-413218",
    location="us-central1",
    api_endpoint=LITELLM_PROXY_BASE,
    api_transport="rest",
)


# TODO(开发者): 更新项目
vertexai.init(project=PROJECT_ID, location="us-central1")

sft_tuning_job = sft.train(
    source_model="gemini-1.0-pro-002",
    train_dataset="gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",
)

# 轮询作业完成情况
while not sft_tuning_job.has_ended:
    time.sleep(60)
    sft_tuning_job.refresh()

print(sft_tuning_job.tuned_model_name)
print(sft_tuning_job.tuned_model_endpoint_name)
print(sft_tuning_job.experiment)

from typing import List, Optional
from vertexai.preview.tuning import sft
import vertexai
from google.auth.credentials import Credentials

LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"

import datetime


class CredentialsWrapper(Credentials):
    def __init__(self, token=None):
        super().__init__()
        self.token = token
        self.expiry = None  # 或者根据需要设置为未来的日期

    def refresh(self, request):
        pass

    def apply(self, headers, token=None):
        headers["Authorization"] = f"Bearer {self.token}"

    @property
    def expired(self):
        return False  # 始终认为令牌未过期

    @property
    def valid(self):
        return True  # 始终认为凭据有效


credentials = CredentialsWrapper(token=LITELLM_PROXY_API_KEY)

vertexai.init(
    project="adroit-crow-413218",
    location="us-central1",
    api_endpoint=LITELLM_PROXY_BASE,
    credentials=credentials,
    api_transport="rest",
)


# TODO(developer): 更新项目
vertexai.init(project=PROJECT_ID, location="us-central1")

sft_tuning_job = sft.train(
    source_model="gemini-1.0-pro-002",
    train_dataset="gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",
)

# 轮询作业完成情况
while not sft_tuning_job.has_ended:
    time.sleep(60)
    sft_tuning_job.refresh()

print(sft_tuning_job.tuned_model_name)
print(sft_tuning_job.tuned_model_endpoint_name)
print(sft_tuning_job.experiment)

curl http://localhost:4000/vertex-ai/tuningJobs \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer sk-1234" \
      -d '{
  "baseModel": "gemini-1.0-pro-002",
  "supervisedTuningSpec" : {
      "training_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl"
  }
}'

上下文缓存

使用 Vertex AI 上下文缓存

相关的 VertexAI 文档

LiteLLM 代理

在 config.yaml 中添加模型

model_list:
  # 用于 /chat/completions, /completions, /embeddings 端点
  - model_name: gemini-1.5-pro-001
    litellm_params:
      model: vertex_ai/gemini-1.5-pro-001
      vertex_project: "project-id"
      vertex_location: "us-central1"
      vertex_credentials: "adroit-crow-413218-a956eef1a2a8.json" # 添加服务账号.json 的路径

# 用于 /cachedContent 和 VertexAI 原生端点
default_vertex_config:
  vertex_project: "adroit-crow-413218"
  vertex_location: "us-central1"
  vertex_credentials: "adroit-crow-413218-a956eef1a2a8.json" # 添加服务账号.json 的路径

启动代理

$ litellm --config /path/to/config.yaml

发起请求！我们分两步进行请求：

创建一个 cachedContents 对象
在你的 /chat/completions 中使用 cachedContents 对象

创建一个 cachedContents 对象

首先，通过调用 Vertex 的 cachedContents 端点来创建一个 cachedContents 对象。LiteLLM 代理会将 /cachedContents 请求转发到 VertexAI API。

import httpx

# 设置 LiteLLM 代理变量
LITELLM_BASE_URL = "http://0.0.0.0:4000"
LITELLM_PROXY_API_KEY = "sk-1234"

httpx_client = httpx.Client(timeout=30)

print("创建缓存内容")
create_cache = httpx_client.post(
    url=f"{LITELLM_BASE_URL}/vertex-ai/cachedContents",
    headers={"Authorization": f"Bearer {LITELLM_PROXY_API_KEY}"},
    json={
        "model": "gemini-1.5-pro-001",
        "contents": [
            {
                "role": "user",
                "parts": [{
                    "text": "这是一个示例文本，用于演示显式缓存。" * 4000
                }]
            }
        ],
    }
)

print("create_cache 的响应:", create_cache)
create_cache_response = create_cache.json()
print("create_cache 的 JSON:", create_cache_response)
cached_content_name = create_cache_response["name"]

在你的 /chat/completions 请求中使用 cachedContents 对象到 VertexAI

import openai

# 设置 LiteLLM 代理变量
LITELLM_BASE_URL = "http://0.0.0.0:4000"
LITELLM_PROXY_API_KEY = "sk-1234"

client = openai.OpenAI(api_key=LITELLM_PROXY_API_KEY, base_url=LITELLM_BASE_URL)

response = client.chat.completions.create(
    model="gemini-1.5-pro-001",
    max_tokens=8192,
    messages=[
        {
            "role": "user",
            "content": "示例文本是关于什么的？",
        },
    ],
    temperature=0.7,
    extra_body={"cached_content": cached_content_name},  # 使用缓存内容
)

print("代理的响应:", response)

[测试版] Vertex AI 端点

支持的 API 端点​

认证到 Vertex AI​

快速开始使用​

1. 启动 litellm 代理​

2. 测试它​

1. 在你的 config.yaml 中设置 default_vertex_config​

2. 启动 litellm 代理​

3. 测试它​

使用示例​

Gemini API（生成内容）​

嵌入API​

Imagen API​

计数令牌API​

调优API​

上下文缓存​