AWS Sagemaker
LiteLLM 支持所有 Sagemaker Huggingface Jumpstart 模型
我们支持所有 Sagemaker 模型,只需在发送 litellm 请求时将 model=sagemaker/<任何-sagemaker-上的-模型> 作为前缀
API KEYS
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
使用方法
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="sagemaker/<your-endpoint-name>",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.2,
max_tokens=80
)
使用方法 - 流式传输
Sagemaker 目前不支持流式传输 - LiteLLM 通过返回响应字符串的块来模拟流式传输
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.2,
max_tokens=80,
stream=True,
)
for chunk in response:
print(chunk)
LiteLLM 代理使用方法
以下是如何使用 LiteLLM 代理服务器调用 Sagemaker
1. 设置 config.yaml
model_list:
- model_name: jumpstart-model
litellm_params:
model: sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614
aws_access_key_id: os.environ/CUSTOM_AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/CUSTOM_AWS_SECRET_ACCESS_KEY
aws_region_name: os.environ/CUSTOM_AWS_REGION_NAME
所有可能的身份验证参数:
aws_access_key_id: Optional[str],
aws_secret_access_key: Optional[str],
aws_session_token: Optional[str],
aws_region_name: Optional[str],
aws_session_name: Optional[str],
aws_profile_name: Optional[str],
aws_role_name: Optional[str],
aws_web_identity_token: Optional[str],
2. 启动代理
litellm --config /path/to/config.yaml
3. 测试它
- Curl 请求
- OpenAI v1.0.0+
- Langchain
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "jumpstart-model",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(model="jumpstart-model", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
chat = ChatOpenAI(
openai_api_base="http://0.0.0.0:4000", # 设置 openai_api_base 为 LiteLLM 代理
model = "jumpstart-model",
temperature=0.1
)
messages = [
SystemMessage(
content="You are a helpful assistant that I'm using to make a test request to."
),
HumanMessage(
content="test from litellm. tell me why it's amazing in 1 sentence"
),
]
response = chat(messages)
print(response)
设置 temperature, top p 等
- SDK
- 代理
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.7,
top_p=1
)
在 yaml 中设置
model_list:
- model_name: jumpstart-model
litellm_params:
model: sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614
temperature: <your-temp>
top_p: <your-top-p>
在请求中设置
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
# 发送到在 litellm 代理上设置的模型的请求, `litellm --model`
response = client.chat.completions.create(model="jumpstart-model", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
],
temperature=0.7,
top_p=1
)
print(response)
允许为 Sagemaker 设置 temperature=0
默认情况下,当向LiteLLM发送请求时,如果temperature=0,LiteLLM会将其四舍五入为temperature=0.1,因为Sagemaker在temperature=0时会拒绝大多数请求。
如果你想为你的模型发送temperature=0,以下是如何设置的方法(由于Sagemaker可以托管任何类型的模型,某些模型允许零温度):
- SDK
- PROXY
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0,
aws_sagemaker_allow_zero_temp=True,
)
在yaml中设置aws_sagemaker_allow_zero_temp
model_list:
- model_name: jumpstart-model
litellm_params:
model: sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614
aws_sagemaker_allow_zero_temp: true
在请求中设置temperature=0
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
# 发送到在litellm代理上设置的模型请求,`litellm --model`
response = client.chat.completions.create(model="jumpstart-model", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
],
temperature=0,
)
print(response)
传递特定于提供者的参数
如果你传递一个非OpenAI的参数给litellm,我们会假设它是特定于提供者的,并将其作为请求体中的kwargs发送。了解更多
- SDK
- PROXY
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614",
messages=[{ "content": "Hello, how are you?","role": "user"}],
top_k=1 # 👈 PROVIDER-SPECIFIC PARAM
)
在yaml中设置
model_list:
- model_name: jumpstart-model
litellm_params:
model: sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614
top_k: 1 # 👈 PROVIDER-SPECIFIC PARAM
在请求中设置
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
# 发送到在litellm代理上设置的模型请求,`litellm --model`
response = client.chat.completions.create(model="jumpstart-model", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
],
temperature=0.7,
extra_body={
top_k=1 # 👈 PROVIDER-SPECIFIC PARAM
}
)
print(response)
传递推理组件名称
如果你在一个端点上有多个模型,你需要通过model_id指定单个模型的名称。
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="sagemaker/<your-endpoint-name>",
model_id="<your-model-name",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.2,
max_tokens=80
)
将凭证作为参数传递 - Completion()
将AWS凭证作为参数传递给litellm.completion。
import os
from litellm import completion
response = completion(
model="sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b",
messages=[{ "content": "Hello, how are you?","role": "user"}],
aws_access_key_id="",
aws_secret_access_key="",
aws_region_name="",
)
应用提示模板
为了为你的Sagemaker部署应用正确的提示模板,同时传递其hf模型名称。
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b",
messages=messages,
temperature=0.2,
max_tokens=80,
hf_model_name="meta-llama/Llama-2-7b",
)
你也可以传递你自己的自定义提示模板。
Sagemaker消息API
使用路由sagemaker_chat/*来路由到Sagemaker消息API。
model: sagemaker_chat/<your-endpoint-name>
- SDK
- PROXY
import os
import litellm
from litellm import completion
litellm.set_verbose = True # 👈 查看原始请求
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="sagemaker_chat/<your-endpoint-name>",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.2,
max_tokens=80
)
1. 设置 config.yaml
model_list:
- model_name: "sagemaker-model"
litellm_params:
model: "sagemaker_chat/jumpstart-dft-hf-textgeneration1-mp-20240815-185614"
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: os.environ/AWS_REGION_NAME
2. 启动代理
litellm --config /path/to/config.yaml
3. 测试它
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "sagemaker-model",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
补全模型
我们支持所有 Sagemaker 模型,只需在发送 litellm 请求时将 model=sagemaker/<任意-sagemaker-模型> 作为前缀设置即可
以下是使用 LiteLLM 与 Sagemaker 模型的一个示例
| 模型名称 | 函数调用 |
|---|---|
| 自定义 Huggingface 模型 | completion(model='sagemaker/<your-deployment-name>', messages=messages) |
| Meta Llama 2 7B | completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b', messages=messages) |
| Meta Llama 2 7B (聊天/微调) | completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b-f', messages=messages) |
| Meta Llama 2 13B | completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-13b', messages=messages) |
| Meta Llama 2 13B (聊天/微调) | completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-13b-f', messages=messages) |
| Meta Llama 2 70B | completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-70b', messages=messages) |
| Meta Llama 2 70B (聊天/微调) | completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-70b-b-f', messages=messages) |
嵌入模型
LiteLLM 支持所有 Sagemaker Jumpstart Huggingface 嵌入模型。以下是如何调用它的示例:
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = litellm.embedding(model="sagemaker/<your-deployment-name>", input=["good morning from litellm", "this is another item"])
print(f"response: {response}")