跳至主要内容

批量推理(测试版)

⚠️ 这是一个测试版功能。API和行为可能会发生变化。

使用批量推理API以更低的成本批量处理推理请求。

准备您的数据集

批量推理接收一个数据集作为输入,数据集中的每一行都将作为模型的提示词。输入数据集必须符合OpenAI批量推理格式

以下是数据集中某一行数据的(格式化显示)示例:

{ 
"custom_id": "1", # Each row in your dataset must have a unique string ID.
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "", # Empty string means prompt the base model without any adapters applied.
"messages": [
{
"role": "user",
"content": "Formulate an equation to calculate the height of a triangle given the angle, side lengths and opposite side length."
}
],
"max_tokens": 1000
}
}

要在任意行使用适配器,请按照上述方式在model字段中指定适配器ID。推理任务的基础模型将在后续指定。其他参数与LoRAX OpenAI兼容API中的描述一致。

一旦您的JSONL文件准备就绪,您就可以上传它到Predibase。 现在您可以开始批量推理了!

运行批量推理

⚠️ 目前仅支持参数不超过160亿的基础模型进行批量推理。

当您启动批量推理任务时,Predibase会负责部署目标基础模型并加载所有必要的适配器。请注意,批量任务可能不会立即调度!

创建部署时可用的许多选项在批量推理期间也可用于配置。要开始您的推理任务,请使用Predibase SDK:

from predibase import Predibase
from predibase.beta.config import BatchInferenceServerConfig

# Parameters have the same meaning as those used to create deployments.
config = BatchInferenceServerConfig(
base_model="mistral-7b-instruct-v0-2",
lorax_image_tag=None, # Optional.
hf_token=None, # Optional.
quantization=None, # Optional.
)

# Or, if you want to re-use the configuration of an existing deployment:
dep = pb.deployments.get("my_deployment")
config_from_deployment = BatchInferenceServerConfig.from_deployment(dep)

job = pb.beta.batch_inference.create(
dataset="my_inference_dataset",
server_config=config,
)
# => Successfully requested batch inference over my_inference_dataset using mistral-7b-instruct-v0-2 as <JOB UUID>.

检查批量推理作业的进度

# To check the status of a specific job:
print(pb.beta.batch_inference.get(job).status)
print(pb.beta.batch_inference.get("<JOB_UUID>").status)

# To see a list of all batch inference jobs:
jobs = pb.beta.batch_inference.list()
for j in jobs:
print(j)

下载批量推理结果

当您的任务达到completed状态后,您可以像这样下载输出结果:

pb.beta.batch_inference.download_results(job, dest="path/to/my_output.jsonl")
pb.beta.batch_inference.download_results("<JOB_UUID>", dest="path/to/my_output.jsonl")

结果行将如下所示(为便于阅读进行了格式化):

{
"custom_id": "1",
"response": {
"status_code": 200,
"body": {
"id": "null",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": " To calculate the height of a triangle when the angle, side lengths, and the length ...",
"refusal": null,
"role": "assistant",
"audio": null,
"function_call": null,
"tool_calls": null
}
}
],
"created": 1737575137,
"model": "predibase/Mistral-7B-Instruct-v0.2-dequantized",
"object": "text_completion",
"service_tier": null,
"system_fingerprint": "0.1.0-native",
"usage": {
"completion_tokens": 379,
"prompt_tokens": 30,
"total_tokens": 409,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}
},
"error": null
}

请注意,下载结果文件中的行顺序可能与原始数据集中的顺序不一致。您可以使用唯一的custom_id来匹配结果行和输入行。

定价

批量推理的价格为每百万token 0.50美元。输入token和输出token的价格没有差异。