Replicate
LiteLLM 支持 Replicate 上的所有模型
使用方法
API KEYS
import os
os.environ["REPLICATE_API_KEY"] = ""
示例调用
from litellm import completion
import os
## 设置环境变量
os.environ["REPLICATE_API_KEY"] = "replicate key"
# replicate llama-3 调用
response = completion(
model="replicate/meta/meta-llama-3-8b-instruct",
messages = [{ "content": "Hello, how are you?","role": "user"}]
)
将模型添加到您的 config.yaml 文件中
model_list:
- model_name: llama-3
litellm_params:
model: replicate/meta/meta-llama-3-8b-instruct
api_key: os.environ/REPLICATE_API_KEY
启动代理
$ litellm --config /path/to/config.yaml --debug向 LiteLLM 代理服务器发送请求
import openai
client = openai.OpenAI(
api_key="sk-1234", # 如果使用虚拟密钥,请传递 litellm 代理密钥
base_url="http://0.0.0.0:4000" # litellm-proxy-base 的 URL
)
response = client.chat.completions.create(
model="llama-3",
messages = [
{
"role": "system",
"content": "Be a good human!"
},
{
"role": "user",
"content": "What do you know about earth?"
}
]
)
print(response)curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "llama-3",
"messages": [
{
"role": "system",
"content": "Be a good human!"
},
{
"role": "user",
"content": "What do you know about earth?"
}
],
}'
预期的 Replicate 调用
这是 litellm 将从上述示例中向 replicate 发起的调用:
POST 请求由 LiteLLM 发送:
curl -X POST \
https://api.replicate.com/v1/models/meta/meta-llama-3-8b-instruct \
-H 'Authorization: Token your-api-key' -H 'Content-Type: application/json' \
-d '{'version': 'meta/meta-llama-3-8b-instruct', 'input': {'prompt': '<|start_header_id|>system<|end_header_id|>\n\nBe a good human!<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat do you know about earth?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'}}'
高级用法 - 提示格式化
LiteLLM 为所有 meta-llama llama3 指导模型提供了提示模板映射。查看代码
要应用自定义提示模板:
import litellm
import os
os.environ["REPLICATE_API_KEY"] = ""
# 创建您自己的自定义提示模板
litellm.register_prompt_template(
model="togethercomputer/LLaMA-2-7B-32K",
initial_prompt_value="You are a good assistant" # [可选]
roles={
"system": {
"pre_message": "[INST] <<SYS>>\n", # [可选]
"post_message": "\n<</SYS>>\n [/INST]\n" # [可选]
},
"user": {
"pre_message": "[INST] ", # [可选]
"post_message": " [/INST]" # [可选]
},
"assistant": {
"pre_message": "\n" # [可选]
"post_message": "\n" # [可选]
}
}
final_prompt_value="Now answer as best you can:" # [可选]
)
def test_replicate_custom_model():
model = "replicate/togethercomputer/LLaMA-2-7B-32K"
response = completion(model=model, messages=messages)
print(response['choices'][0]['message']['content'])
return response
test_replicate_custom_model()
# 模型特定的参数
model_list:
- model_name: mistral-7b # 模型别名
litellm_params: # litellm.completion() 的实际参数
model: "replicate/mistralai/Mistral-7B-Instruct-v0.1"
api_key: os.environ/REPLICATE_API_KEY
initial_prompt_value: "\n"
roles: {"system":{"pre_message":"<|im_start|>system\n", "post_message":"<|im_end|>"}, "assistant":{"pre_message":"<|im_start|>assistant\n","post_message":"<|im_end|>"}, "user":{"pre_message":"<|im_start|>user\n","post_message":"<|im_end|>"}}
final_prompt_value: "\n"
bos_token: "<s>"
eos_token: "</s>"
max_tokens: 4096
高级用法 - 调用 Replicate 部署
调用已部署的 Replicate LLM
在模型前添加 replicate/deployments/ 前缀,这样 litellm 将调用 deployments 端点。这将调用 replicate 上的 ishaan-jaff/ishaan-mistral 部署
response = completion(
model="replicate/deployments/ishaan-jaff/ishaan-mistral",
messages= [{ "content": "Hello, how are you?","role": "user"}]
)
Replicate 冷启动
由于 Replicate 冷启动,响应可能需要 3-5 分钟,如果您尝试调试,请尝试使用 litellm.set_verbose=True 进行请求。更多关于 Replicate 冷启动的信息
Replicate 模型
liteLLM 支持所有 Replicate 的 LLM
对于 Replicate 模型,请确保在 model 参数前添加 replicate/ 前缀。liteLLM 通过此参数进行检测。
以下是使用 liteLLM 调用 Replicate LLM 的示例
| 模型名称 | 函数调用 | 所需的系统变量 |
|---|---|---|
| replicate/llama-2-70b-chat | completion(model='replicate/llama-2-70b-chat:2796ee9483c3fd7aa2e171d38f4ca12251a30609463dcfd4cd76703f22e96cdf', messages) | os.environ['REPLICATE_API_KEY'] |
| a16z-infra/llama-2-13b-chat | completion(model='replicate/a16z-infra/llama-2-13b-chat:2a7f981751ec7fdf87b5b91ad4db53683a98082e9ff7bfd12c8cd5ea85980a52', messages) | os.environ['REPLICATE_API_KEY'] |
| replicate/vicuna-13b | completion(model='replicate/vicuna-13b:6282abe6a492de4145d7bb601023762212f9ddbbe78278bd6771c8b3b2f2a13b', messages) | os.environ['REPLICATE_API_KEY'] |
| daanelson/flan-t5-large | completion(model='replicate/daanelson/flan-t5-large:ce962b3f6792a57074a601d3979db5839697add2e4e02696b3ced4c022d4767f', messages) | os.environ['REPLICATE_API_KEY'] |
| custom-llm | completion(model='replicate/custom-llm-version-id', messages) | os.environ['REPLICATE_API_KEY'] |
| replicate deployment | completion(model='replicate/deployments/ishaan-jaff/ishaan-mistral', messages) | os.environ['REPLICATE_API_KEY'] |
传递额外参数 - max_tokens, temperature
查看所有支持的 litellm.completion 参数 这里
# !pip install litellm
from litellm import completion
import os
## 设置环境变量
os.environ["REPLICATE_API_KEY"] = "replicate key"
# replicate llama-2 调用
response = completion(
model="replicate/llama-2-70b-chat:2796ee9483c3fd7aa2e171d38f4ca12251a30609463dcfd4cd76703f22e96cdf",
messages = [{ "content": "Hello, how are you?","role": "user"}],
max_tokens=20,
temperature=0.5
)
代理
model_list:
- model_name: llama-3
litellm_params:
model: replicate/meta/meta-llama-3-8b-instruct
api_key: os.environ/REPLICATE_API_KEY
max_tokens: 20
temperature: 0.5
传递 Replicate 特定参数
发送 不支持 litellm.completion() 但支持 Replicate 的参数,通过传递给 litellm.completion
例如 seed, min_tokens 是 Replicate 特定的参数
# !pip install litellm
from litellm import completion
import os
## 设置环境变量
os.environ["REPLICATE_API_KEY"] = "replicate key"
# replicate llama-2 调用
response = completion(
model="replicate/llama-2-70b-chat:2796ee9483c3fd7aa2e171d38f4ca12251a30609463dcfd4cd76703f22e96cdf",
messages = [{ "content": "Hello, how are you?","role": "user"}],
seed=-1,
min_tokens=2,
top_k=20,
)
代理
model_list:
- model_name: llama-3
litellm_params:
model: replicate/meta/meta-llama-3-8b-instruct
api_key: os.environ/REPLICATE_API_KEY
min_tokens: 2
top_k: 20