使用Torch-TensorRT的动态形状¶

默认情况下，您可以运行一个具有不同输入形状的pytorch模型，并且输出形状是即时确定的。然而，Torch-TensorRT是一个AOT编译器，它需要一些关于输入形状的先验信息来编译和优化模型。

使用 torch.export 的动态形状（AOT）¶

在动态输入形状的情况下，我们必须提供 (min_shape, opt_shape, max_shape) 参数，以便模型可以针对此输入形状范围进行优化。静态和动态形状的示例如下。

注意：以下代码使用Dynamo前端。如果使用Torchscript前端，请将ir=dynamo替换为ir=ts，行为完全相同。

import torch
import torch_tensorrt

model = MyModel().eval().cuda()
# Compile with static shapes
inputs = torch_tensorrt.Input(shape=[1, 3, 224, 224], dtype=torch.float32)
# or compile with dynamic shapes
inputs = torch_tensorrt.Input(min_shape=[1, 3, 224, 224],
                              opt_shape=[4, 3, 224, 224],
                              max_shape=[8, 3, 224, 224],
                              dtype=torch.float32)
trt_gm = torch_tensorrt.compile(model, ir="dynamo", inputs)

内部机制¶

当我们使用torch_tensorrt.compile API并设置ir=dynamo（默认）时，编译过程分为两个阶段。

torch_tensorrt.dynamo.trace（使用torch.export来跟踪给定输入的图）

我们使用torch.export.export() API来跟踪并将PyTorch模块导出为torch.export.ExportedProgram。在动态形状输入的情况下，通过torch_tensorrt.Input API提供的(min_shape, opt_shape, max_shape)范围用于构建torch.export.Dim对象，这些对象用于导出API的dynamic_shapes参数。请查看_tracer.py文件以了解其背后的工作原理。

torch_tensorrt.dynamo.compile（使用TensorRT编译torch.export.ExportedProgram对象）

在转换为TensorRT的过程中，图已经在节点的元数据中包含了动态形状信息，这些信息将在引擎构建阶段使用。

自定义动态形状约束¶

给定一个输入 x = torch_tensorrt.Input(min_shape, opt_shape, max_shape, dtype)， Torch-TensorRT 尝试通过在 torch.export 跟踪期间构建 torch.export.Dim 对象来自动设置约束条件，这些对象会根据提供的动态维度进行构建。有时，我们可能需要设置额外的约束条件，如果我们不指定这些约束条件，Torchdynamo 会报错。如果您必须为模型设置任何自定义约束条件（通过使用 torch.export.Dim），我们建议在编译 Torch-TensorRT 之前先导出您的程序。请参考此文档以导出具有动态形状的 Pytorch 模块。以下是一个简单的示例，展示了如何导出具有动态维度限制的矩阵乘法层。

import torch
import torch_tensorrt

class MatMul(torch.nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, query, key):
        attn_weight = torch.matmul(query, key.transpose(-1, -2))
        return attn_weight

model = MatMul().eval().cuda()
inputs = [torch.randn(1, 12, 7, 64).cuda(), torch.randn(1, 12, 7, 64).cuda()]
seq_len = torch.export.Dim("seq_len", min=1, max=10)
dynamic_shapes=({2: seq_len}, {2: seq_len})
# Export the model first with custom dynamic shape constraints
exp_program = torch.export.export(model, tuple(inputs), dynamic_shapes=dynamic_shapes)
trt_gm = torch_tensorrt.dynamo.compile(exp_program, inputs)
# Run inference
trt_gm(*inputs)

使用torch.compile（JIT）的动态形状¶

torch_tensorrt.compile(model, inputs, ir="torch_compile") 返回一个带有配置为TensorRT后端的torch.compile封装函数。在ir=torch_compile的情况下，用户可以使用torch._dynamo.mark_dynamic API (https://pytorch.org/docs/stable/torch.compiler_dynamic_shapes.html)为输入提供动态形状信息，以避免重新编译TensorRT引擎。

import torch
import torch_tensorrt

model = MyModel().eval().cuda()
inputs = torch.randn((1, 3, 224, 224), dtype=float32)
# This indicates the dimension 0 is dynamic and the range is [1, 8]
torch._dynamo.mark_dynamic(inputs, 0, min=1, max=8)
trt_gm = torch.compile(model, backend="tensorrt")
# Compilation happens when you call the model
trt_gm(inputs)

# No recompilation of TRT engines with modified batch size
inputs_bs2 = torch.randn((2, 3, 224, 224), dtype=torch.float32)
trt_gm(inputs_bs2)