torch_tensorrt.ts¶

函数¶

torch_tensorrt.ts.compile(module: ScriptModule, inputs: Optional[Sequence[输入 | torch.Tensor]] = None, input_signature: Optional[Tuple[Union[输入, Tensor, Sequence[Any]]]] = None, device: 设备 = Device(type=DeviceType.GPU, gpu_id=0), disable_tf32: bool = False, sparse_weights: bool = False, enabled_precisions: Optional[Set[Union[dtype, dtype]]] = None, refit: bool = False, debug: bool = False, capability: EngineCapability = EngineCapability.STANDARD, num_avg_timing_iters: int = 1, workspace_size: int = 0, dla_sram_size: int = 1048576, dla_local_dram_size: int = 1073741824, dla_global_dram_size: int = 536870912, calibrator: object = None, truncate_long_and_double: bool = False, require_full_compilation: bool = False, min_block_size: int = 3, torch_executed_ops: Optional[List[str]] = None, torch_executed_modules: Optional[List[str]] = None, allow_shape_tensors: bool = False) → ScriptModule[source]¶

使用TensorRT为NVIDIA GPU编译TorchScript模块

获取一个现有的TorchScript模块和一组设置来配置编译器，并将方法转换为调用等效TensorRT引擎的JIT图。

专门转换TorchScript模块的前向方法

Parameters

模块 (torch.jit.ScriptModule) – 源模块，是跟踪或脚本化 PyTorch torch.nn.Module 的结果

Keyword Arguments

inputs (List[Union(输入, torch.Tensor)]) –

必需输入模块的输入形状、数据类型和内存布局的规范列表。此参数是必需的。输入大小可以指定为torch大小、元组或列表。数据类型可以使用torch数据类型或torch_tensorrt数据类型指定，您可以使用torch设备或torch_tensorrt设备类型枚举来选择设备类型。

input=[
    torch_tensorrt.Input((1, 3, 224, 224)), # 输入#1的静态NCHW输入形状
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # 输入#2的动态输入形状
    torch.randn((1, 3, 224, 244)) # 使用示例张量并让torch_tensorrt推断设置
]

Union (input_signature) –

模块输入规范的格式化集合。输入大小可以指定为torch大小、元组或列表。数据类型可以使用torch数据类型或torch_tensorrt数据类型指定，您可以使用torch设备或torch_tensorrt设备类型枚举来选择设备类型。此API应被视为测试版稳定，未来可能会更改

input_signature=([
    torch_tensorrt.Input((1, 3, 224, 224)), # 输入#1的静态NCHW输入形状
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # 输入#2的动态输入形状
], torch.randn((1, 3, 224, 244))) # 使用示例张量并让torch_tensorrt推断输入#3的设置

device (Union(设备, torch.device, dict)) –
TensorRT引擎运行的目标设备
```
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
```
disable_tf32 (bool) – 强制FP32层使用传统的FP32格式，而不是默认行为，即在乘法之前将输入舍入为10位尾数，但使用23位尾数累加和。
sparse_weights (bool) – 为卷积层和全连接层启用稀疏性。
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – TensorRT 在选择内核时可以使用的数据类型集合
refit (bool) – 启用重新拟合
debug (bool) – 启用可调试引擎
能力 (EngineCapability) – 将内核选择限制为安全的GPU内核或安全的DLA内核
num_avg_timing_iters (python:int) – 用于选择内核的平均计时迭代次数
workspace_size (python:int) – 提供给TensorRT的最大工作空间大小
dla_sram_size (python:int) – DLA用于在层内通信的快速软件管理RAM。
dla_local_dram_size (python:int) – DLA用于在操作之间共享中间张量数据的主机RAM
dla_global_dram_size (python:int) – DLA用于存储权重和执行元数据的主机RAM
truncate_long_and_double (bool) – 将int64或double（float64）提供的权重截断为int32和float32
calibrator (Union(torch_tensorrt._C.IInt8Calibrator, tensorrt.IInt8Calibrator)) – 校准器对象，将为PTQ系统提供数据以进行INT8校准
require_full_compilation (bool) – 要求模块从头到尾编译或返回错误，而不是返回一个混合图，其中无法在TensorRT中运行的操作在PyTorch中运行。
min_block_size (python:int) – 为了在TensorRT中运行一组操作，所需的最小连续TensorRT可转换操作的数量
torch_executed_ops (List[str]) – 必须在PyTorch中运行的aten操作符列表。如果此列表不为空但require_full_compilation为True，则会抛出错误。
torch_executed_modules (List[str]) – 必须在PyTorch中运行的模块列表。如果此列表不为空但require_full_compilation为True，则会抛出错误
allow_shape_tensors – (实验性) 允许 aten::size 使用 TensorRT 中的 IShapeLayer 输出形状张量

Returns

编译后的TorchScript模块，运行时将通过TensorRT执行

Return type

torch.jit.ScriptModule

torch_tensorrt.ts.convert_method_to_trt_engine(module: ScriptModule, method_name: str = 'forward', inputs: Optional[Sequence[输入 | torch.Tensor]] = None, device: 设备 = Device(type=DeviceType.GPU, gpu_id=0), disable_tf32: bool = False, sparse_weights: bool = False, enabled_precisions: Optional[Set[Union[dtype, dtype]]] = None, refit: bool = False, debug: bool = False, capability: EngineCapability = EngineCapability.STANDARD, num_avg_timing_iters: int = 1, workspace_size: int = 0, dla_sram_size: int = 1048576, dla_local_dram_size: int = 1073741824, dla_global_dram_size: int = 536870912, truncate_long_and_double: int = False, calibrator: object = None, allow_shape_tensors: bool = False) → bytes[source]¶

将TorchScript模块方法转换为序列化的TensorRT引擎

将模块的指定方法转换为序列化的TensorRT引擎，给定转换设置的字典

Parameters

模块 (torch.jit.ScriptModule) – 源模块，是跟踪或脚本化 PyTorch torch.nn.Module 的结果

Keyword Arguments

inputs (List[Union(输入, torch.Tensor)]) –

必需输入模块的输入形状、数据类型和内存布局的规范列表。此参数是必需的。输入大小可以指定为torch大小、元组或列表。数据类型可以使用torch数据类型或torch_tensorrt数据类型指定，您可以使用torch设备或torch_tensorrt设备类型枚举来选择设备类型。

input=[
    torch_tensorrt.Input((1, 3, 224, 224)), # 输入#1的静态NCHW输入形状
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # 输入#2的动态输入形状
    torch.randn((1, 3, 224, 244)) # 使用示例张量并让torch_tensorrt推断设置
]

method_name (str) – 要转换的方法的名称

Union (input_signature) –

模块输入规范的格式化集合。输入大小可以指定为torch大小、元组或列表。数据类型可以使用torch数据类型或torch_tensorrt数据类型指定，您可以使用torch设备或torch_tensorrt设备类型枚举来选择设备类型。此API应被视为测试版稳定，未来可能会更改

input_signature=([
    torch_tensorrt.Input((1, 3, 224, 224)), # 输入#1的静态NCHW输入形状
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # 输入#2的动态输入形状
], torch.randn((1, 3, 224, 244))) # 使用示例张量并让torch_tensorrt推断输入#3的设置

device (Union(设备, torch.device, dict)) –
TensorRT引擎运行的目标设备
```
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
```
disable_tf32 (bool) – 强制FP32层使用传统的FP32格式，而不是默认行为，即在乘法之前将输入舍入为10位尾数，但使用23位尾数累加和。
sparse_weights (bool) – 为卷积层和全连接层启用稀疏性。
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – TensorRT 在选择内核时可以使用的数据类型集合
refit (bool) – 启用重新拟合
debug (bool) – 启用可调试引擎
能力 (EngineCapability) – 将内核选择限制为安全的GPU内核或安全的DLA内核
num_avg_timing_iters (python:int) – 用于选择内核的平均计时迭代次数
workspace_size (python:int) – 提供给TensorRT的最大工作空间大小
dla_sram_size (python:int) – DLA用于在层内通信的快速软件管理RAM。
dla_local_dram_size (python:int) – DLA用于在操作之间共享中间张量数据的主机RAM
dla_global_dram_size (python:int) – DLA用于存储权重和执行元数据的主机RAM
truncate_long_and_double (bool) – 将int64或double（float64）提供的权重截断为int32和float32
calibrator (Union(torch_tensorrt._C.IInt8Calibrator, tensorrt.IInt8Calibrator)) – 校准器对象，将为PTQ系统提供数据以进行INT8校准
allow_shape_tensors – (实验性) 允许 aten::size 使用 TensorRT 中的 IShapeLayer 输出形状张量

Returns

序列化的TensorRT引擎，可以保存到文件中或通过TensorRT API进行反序列化

Return type

字节

torch_tensorrt.ts.check_method_op_support(module: ScriptModule, method_name: str = 'forward') → bool[source]¶

检查一个方法是否完全被torch_tensorrt支持

检查TorchScript模块的方法是否可以被torch_tensorrt编译，如果不能，则会打印出不支持的运算符列表，并且函数返回false，否则返回true。

Parameters

模块 (torch.jit.ScriptModule) – 源模块，是跟踪或脚本化 PyTorch torch.nn.Module 的结果
method_name (str) – 要检查的方法名称

Returns

如果支持方法则为真

Return type

bool

torch_tensorrt.ts.embed_engine_in_new_module(serialized_engine: bytes, input_binding_names: Optional[List[str]] = None, output_binding_names: Optional[List[str]] = None, device: 设备 = Device(type=DeviceType.GPU, gpu_id=0)) → ScriptModule[source]¶

获取一个预构建的序列化TensorRT引擎并将其嵌入到TorchScript模块中

接受一个预构建的序列化 TensorRT 引擎（作为字节）并将其嵌入到 TorchScript 模块中。注册 forward 方法以使用以下函数签名执行 TensorRT 引擎：

forward(Tensor[]) -> Tensor[]

TensorRT bindings either be explicitly specified using [in/out]put_binding_names or have names with the following format:

[符号].[输入/输出数组中的索引]

例如. - [x.0, x.1, x.2] -> [y.0]

模块可以通过嵌入引擎使用torch.jit.save保存，并根据torch_tensorrt的可移植性规则移动/加载

Parameters

serialized_engine (bytearray) – 来自 torch_tensorrt 或 TensorRT API 的序列化 TensorRT 引擎

Keyword Arguments

input_binding_names (List[str]) – 按顺序传递给包含的 PyTorch 模块的 TensorRT 绑定名称列表
output_binding_names (List[str]) – 应该从包含的PyTorch模块返回的TensorRT绑定名称列表，按顺序排列
device (Union(设备, torch.device, dict)) – 运行引擎的目标设备。必须与提供的引擎兼容。默认值：当前活动设备

Returns

嵌入引擎的新TorchScript模块

Return type

torch.jit.ScriptModule

torch_tensorrt.ts.TensorRTCompileSpec(inputs: Optional[List[torch.Tensor | 输入]] = None, input_signature: Optional[Any] = None, device: Optional[Union[device, 设备]] = None, disable_tf32: bool = False, sparse_weights: bool = False, enabled_precisions: Optional[Set[Union[dtype, dtype]]] = None, refit: bool = False, debug: bool = False, capability: EngineCapability = EngineCapability.STANDARD, num_avg_timing_iters: int = 1, workspace_size: int = 0, dla_sram_size: int = 1048576, dla_local_dram_size: int = 1073741824, dla_global_dram_size: int = 536870912, truncate_long_and_double: bool = False, calibrator: object = None, allow_shape_tensors: bool = False) → <torch.ScriptClass object at 0x7fa3dde966b0>[source]¶

用于创建格式化规范字典的工具，以便使用PyTorch TensorRT后端

Keyword Arguments

inputs (List[Union(输入, torch.Tensor)]) –

必需输入模块的输入形状、数据类型和内存布局的规范列表。此参数是必需的。输入大小可以指定为torch大小、元组或列表。数据类型可以使用torch数据类型或torch_tensorrt数据类型指定，您可以使用torch设备或torch_tensorrt设备类型枚举来选择设备类型。

input=[
    torch_tensorrt.Input((1, 3, 224, 224)), # 输入#1的静态NCHW输入形状
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # 输入#2的动态输入形状
    torch.randn((1, 3, 224, 244)) # 使用示例张量并让torch_tensorrt推断设置
]

device (Union(设备, torch.device, dict)) –
TensorRT引擎运行的目标设备
```
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
```
disable_tf32 (bool) – 强制FP32层使用传统的FP32格式，而不是默认行为，即在乘法之前将输入舍入为10位尾数，但使用23位尾数累加和。
sparse_weights (bool) – 为卷积层和全连接层启用稀疏性。
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – TensorRT 在选择内核时可以使用的数据类型集合
refit (bool) – 启用重新拟合
debug (bool) – 启用可调试引擎
能力 (EngineCapability) – 将内核选择限制为安全的GPU内核或安全的DLA内核
num_avg_timing_iters (python:int) – 用于选择内核的平均计时迭代次数
workspace_size (python:int) – 提供给TensorRT的最大工作空间大小
truncate_long_and_double (bool) – 将int64或double（float64）提供的权重截断为int32和float32
calibrator (Union(torch_tensorrt._C.IInt8Calibrator, tensorrt.IInt8Calibrator)) – 校准器对象，将为PTQ系统提供数据以进行INT8校准
allow_shape_tensors –
(实验性) 允许 aten::size 使用 TensorRT 中的 IShapeLayer 输出形状张量

返回:
torch.classes.tensorrt.CompileSpec: 提供给 torch._C._jit_to_tensorrt 的方法和格式化规范对象列表