Llama2 - Huggingface 教程

Huggingface 是一个用于部署机器学习模型的开源平台。

使用 Huggingface 推理端点调用 Llama2

LiteLLM 使得调用你的公共、私有或默认的 Huggingface 端点变得简单。

在这个案例中，我们尝试调用 3 个模型：

模型	端点类型
deepset/deberta-v3-large-squad2	默认 Huggingface 端点
meta-llama/Llama-2-7b-hf	公共端点
meta-llama/Llama-2-7b-chat-hf	私有端点

案例 1: 调用默认的 Huggingface 端点

以下是完整的示例：

from litellm import completion 

model = "deepset/deberta-v3-large-squad2"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM 遵循 OpenAI 格式 

### 调用端点
completion(model=model, messages=messages, custom_llm_provider="huggingface")

发生了什么？

model: 这是部署在 Huggingface 上的模型名称
messages: 这是输入。我们接受 OpenAI 聊天格式。对于 Huggingface，默认情况下我们遍历列表并将 message["content"] 添加到提示中。相关代码
custom_llm_provider: 可选参数。这是一个可选标志，仅在 Azure、Replicate、Huggingface 和 Together-ai（你部署自己模型的平台）上需要。这使得 litellm 能够为你的模型选择正确的提供者。

案例 2: 调用 Llama2 公共 Huggingface 端点

我们在公共端点 https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud 后面部署了 meta-llama/Llama-2-7b-hf。

让我们尝试一下：

from litellm import completion 

model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM 遵循 OpenAI 格式 
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"

### 调用端点
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base)

发生了什么？

api_base: 可选参数。由于这使用了部署的端点（不是默认的 Huggingface 推理端点），我们将其传递给 LiteLLM。

案例 3: 调用 Llama2 私有 Huggingface 端点

这与公共端点的唯一区别是，你需要一个 api_key。

在 LiteLLM 上，有三种方式可以传递 api_key。

通过环境变量设置，将其设置为包变量，或在调用 completion() 时传递。

通过环境变量设置
你需要添加的代码行如下：

os.environ["HF_TOKEN"] = "..."

完整代码如下：

from litellm import completion 

os.environ["HF_TOKEN"] = "..."

model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM 遵循 OpenAI 格式 
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"

### 调用端点
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base)

将其设置为包变量
你需要添加的代码行如下：

litellm.huggingface_key = "..."

完整代码如下：

import litellm
from litellm import completion 

litellm.huggingface_key = "..."

model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM 遵循 OpenAI 格式 
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"

### 调用端点
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base)

在 completion 调用时传递

completion(..., api_key="...")

完整代码如下：

from litellm import completion 

model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM 遵循 OpenAI 格式 
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"

### 调用端点
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base, api_key="...")

Llama2 - Huggingface 教程

使用 Huggingface 推理端点调用 Llama2​

案例 1: 调用默认的 Huggingface 端点​

案例 2: 调用 Llama2 公共 Huggingface 端点​

案例 3: 调用 Llama2 私有 Huggingface 端点​

使用 Huggingface 推理端点调用 Llama2

案例 1: 调用默认的 Huggingface 端点

案例 2: 调用 Llama2 公共 Huggingface 端点

案例 3: 调用 Llama2 私有 Huggingface 端点