LLM 示例介绍

这里是一个简单的例子，展示如何使用LLM与TinyLlama。

from tensorrt_llm import LLM, SamplingParams


def main():

    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

    llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0")

    outputs = llm.generate(prompts, sampling_params)

    # Print the outputs.
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")


# The entry point of the program need to be protected for spawning processes.
if __name__ == '__main__':
    main()

LLM API 可以用于离线或在线使用。在这里查看更多 LLM API 的示例：

LLM API 示例

有关如何充分利用此API的更多详细信息，请查看：

支持的模型

Llama（包括变种Mistral、Mixtral、InternLM）
GPT（包括变体 Starcoder-1/2, Santacoder）
Gemma-1/2
Phi-1/2/3
ChatGLM（包括变体 glm-10b、chatglm、chatglm2、chatglm3、glm4）
QWen-1/1.5/2
猎鹰
百川-1/2
GPT-J
Mamba-1/2

模型准备

LLM 类支持来自以下任何一项的输入：

Hugging Face Hub: 从Hugging Face模型中心触发下载，例如TinyLlama/TinyLlama-1.1B-Chat-v1.0。
本地 Hugging Face 模型: 使用本地存储的 Hugging Face 模型。
本地 TensorRT-LLM 引擎：由 trtllm-build 工具构建或由 Python LLM API 保存。

这些格式中的任何一种都可以与LLM(model=)构造函数互换使用。

以下部分展示了如何使用这些不同的格式来调用LLM API。

Hugging Face Hub

使用 Hugging Face hub 就像在 LLM 构造函数中指定仓库名称一样简单：

llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0")

本地Hugging Face模型

鉴于Hugging Face模型中心的流行，API支持Hugging Face格式作为起点之一。要使用API与Llama 3.1模型，请使用以下命令从Meta Llama 3.1 8B模型页面下载模型：

git lfs install
git clone https://huggingface.co/meta-llama/Meta-Llama-3.1-8B

模型下载完成后，我们可以按如下方式加载模型：

llm = LLM(model=<path_to_meta_llama_from_hf>)

Note:: 使用此模型需遵守特定许可证。同意条款并通过HuggingFace进行身份验证以开始下载。

来自 TensorRT-LLM 引擎

有两种方法可以构建TensorRT-LLM引擎：

使用 ``trtllm-build`` 工具：您可以直接使用 trtllm-build 工具从 Hugging Face 模型构建 TensorRT-LLM 引擎，然后将引擎保存到磁盘以供以后使用。请参考 GitHub 上 examples/llama 仓库中的 README。

引擎构建完成后，我们可以按如下方式加载模型：

llm = LLM(model=<path_to_trt_engine>)

使用 ``LLM`` 实例: 使用 LLM 实例创建引擎并持久化到本地磁盘:

llm = LLM(<model-path>)

# Save engine to local disk
llm.save(<engine-dir>)

引擎可以如上重新加载。