批量推理（测试版）

⚠️ 这是一个测试版功能。API和行为可能会发生变化。

使用批量推理API以更低的成本批量处理推理请求。

准备您的数据集

批量推理接收一个数据集作为输入，数据集中的每一行都将作为模型的提示词。输入数据集必须符合OpenAI批量推理格式。

以下是数据集中某一行数据的(格式化显示)示例:

{ 
  "custom_id": "1",  # Each row in your dataset must have a unique string ID.
  "method": "POST",
  "url": "/v1/chat/completions",
  "body": {
    "model": "",  # Empty string means prompt the base model without any adapters applied.
    "messages": [
      {
        "role": "user",
        "content": "Formulate an equation to calculate the height of a triangle given the angle, side lengths and opposite side length."
      }
    ],
    "max_tokens": 1000
  }
}

要在任意行使用适配器，请按照上述方式在model字段中指定适配器ID。推理任务的基础模型将在后续指定。其他参数与LoRAX OpenAI兼容API中的描述一致。

一旦您的JSONL文件准备就绪，您就可以上传它到Predibase。现在您可以开始批量推理了！

运行批量推理

⚠️ 目前仅支持参数不超过160亿的基础模型进行批量推理。

当您启动批量推理任务时，Predibase会负责部署目标基础模型并加载所有必要的适配器。请注意，批量任务可能不会立即调度！

创建部署时可用的许多选项在批量推理期间也可用于配置。要开始您的推理任务，请使用Predibase SDK：

from predibase import Predibase
from predibase.beta.config import BatchInferenceServerConfig

# Parameters have the same meaning as those used to create deployments.
config = BatchInferenceServerConfig(
    base_model="mistral-7b-instruct-v0-2",
    lorax_image_tag=None,  # Optional.
    hf_token=None,  # Optional.
    quantization=None,  # Optional.
)

# Or, if you want to re-use the configuration of an existing deployment:
dep = pb.deployments.get("my_deployment")
config_from_deployment = BatchInferenceServerConfig.from_deployment(dep)

job = pb.beta.batch_inference.create(
    dataset="my_inference_dataset",
    server_config=config,
)
# => Successfully requested batch inference over my_inference_dataset using mistral-7b-instruct-v0-2 as <JOB UUID>.

检查批量推理作业的进度

# To check the status of a specific job:
print(pb.beta.batch_inference.get(job).status)
print(pb.beta.batch_inference.get("<JOB_UUID>").status)

# To see a list of all batch inference jobs:
jobs = pb.beta.batch_inference.list()
for j in jobs:
    print(j)

下载批量推理结果

当您的任务达到completed状态后，您可以像这样下载输出结果：

pb.beta.batch_inference.download_results(job, dest="path/to/my_output.jsonl")
pb.beta.batch_inference.download_results("<JOB_UUID>", dest="path/to/my_output.jsonl")

结果行将如下所示（为便于阅读进行了格式化）：

{
  "custom_id": "1",
  "response": {
    "status_code": 200,
    "body": {
      "id": "null",
      "choices": [
        {
          "finish_reason": "stop",
          "index": 0,
          "logprobs": null,
          "message": {
            "content": " To calculate the height of a triangle when the angle, side lengths, and the length ...",
            "refusal": null,
            "role": "assistant",
            "audio": null,
            "function_call": null,
            "tool_calls": null
          }
        }
      ],
      "created": 1737575137,
      "model": "predibase/Mistral-7B-Instruct-v0.2-dequantized",
      "object": "text_completion",
      "service_tier": null,
      "system_fingerprint": "0.1.0-native",
      "usage": {
        "completion_tokens": 379, 
        "prompt_tokens": 30,
        "total_tokens": 409,
        "completion_tokens_details": null, 
        "prompt_tokens_details": null
      }
    }
  },
  "error": null
}

请注意，下载结果文件中的行顺序可能与原始数据集中的顺序不一致。您可以使用唯一的custom_id来匹配结果行和输入行。

定价

批量推理的价格为每百万token 0.50美元。输入token和输出token的价格没有差异。

准备您的数据集​

运行批量推理​

检查批量推理作业的进度​

下载批量推理结果​

定价​

准备您的数据集

运行批量推理

检查批量推理作业的进度

下载批量推理结果

定价