在Python中使用Torch-TensorRT¶

与仅支持TorchScript编译的CLI和C++ API相比，Torch-TensorRT Python API支持许多独特的用例。

Torch-TensorRT Python API 可以接受 torch.nn.Module、torch.jit.ScriptModule 或 torch.fx.GraphModule 作为输入。根据提供的模块类型，将选择两种前端（TorchScript 或 FX）之一来编译模块。如果模块类型受支持，用户可以使用 ir 标志显式设置他们希望使用的前端。如果给定一个 torch.nn.Module 并且 ir 标志设置为 default 或 torchscript，模块将通过 torch.jit.script 转换为 TorchScript 模块。

要使用Torch-TensorRT编译您的输入torch.nn.Module，您只需将模块和输入提供给Torch-TensorRT，您将获得一个优化的TorchScript模块，可以运行或添加到另一个PyTorch模块中。输入是一个torch_tensorrt.Input类的列表，这些类定义了输入张量的形状、数据类型和内存格式。或者，如果您的输入是更复杂的数据类型，例如张量的元组或列表，您可以使用input_signature参数来指定基于集合的输入，例如(List[Tensor], Tuple[Tensor, Tensor])。请参阅下面的第二个示例以获取示例。您还可以指定引擎的操作精度或目标设备等设置。编译后，您可以像保存任何其他模块一样保存该模块，以便在部署应用程序中加载。为了加载TensorRT/TorchScript模块，请确保首先导入torch_tensorrt。

import torch_tensorrt

...

model = MyModel().eval()  # torch module needs to be in eval (not training) mode

inputs = [
    torch_tensorrt.Input(
        min_shape=[1, 1, 16, 16],
        opt_shape=[1, 1, 32, 32],
        max_shape=[1, 1, 64, 64],
        dtype=torch.half,
    )
]
enabled_precisions = {torch.float, torch.half}  # Run with fp16

trt_ts_module = torch_tensorrt.compile(
    model, inputs=inputs, enabled_precisions=enabled_precisions
)

input_data = input_data.to("cuda").half()
result = trt_ts_module(input_data)
torch.jit.save(trt_ts_module, "trt_ts_module.ts")

# Sample using collection-based inputs via the input_signature argument
import torch_tensorrt

...

model = MyModel().eval()

# input_signature expects a tuple of individual input arguments to the module
# The module below, for example, would have a docstring of the form:
# def forward(self, input0: List[torch.Tensor], input1: Tuple[torch.Tensor, torch.Tensor])
input_signature = (
    [torch_tensorrt.Input(shape=[64, 64], dtype=torch.half), torch_tensorrt.Input(shape=[64, 64], dtype=torch.half)],
    (torch_tensorrt.Input(shape=[64, 64], dtype=torch.half), torch_tensorrt.Input(shape=[64, 64], dtype=torch.half)),
)
enabled_precisions = {torch.float, torch.half}

trt_ts_module = torch_tensorrt.compile(
    model, input_signature=input_signature, enabled_precisions=enabled_precisions
)

input_data = input_data.to("cuda").half()
result = trt_ts_module(input_data)
torch.jit.save(trt_ts_module, "trt_ts_module.ts")

# Deployment application
import torch
import torch_tensorrt

trt_ts_module = torch.jit.load("trt_ts_module.ts")
input_data = input_data.to("cuda").half()
result = trt_ts_module(input_data)

Torch-TensorRT Python API 还提供了 torch_tensorrt.ts.compile，它接受一个 TorchScript 模块作为输入，以及 torch_tensorrt.fx.compile，它接受一个 FX GraphModule 作为输入。