torch_tensorrt¶

函数¶

torch_tensorrt.compile(module: Any, ir: str = 'default', inputs: Optional[Sequence[输入 | torch.Tensor | InputTensorSpec]] = None, arg_inputs: Optional[Sequence[Sequence[Any]]] = None, kwarg_inputs: Optional[dict[Any, Any]] = None, enabled_precisions: Optional[Set[Union[dtype, dtype]]] = None, **kwargs: Any) → Union[Module, ScriptModule, GraphModule, Callable[[...], Any]][source]¶

使用TensorRT为NVIDIA GPU编译PyTorch模块

获取一个现有的PyTorch模块和一组设置来配置编译器，并使用ir中指定的路径将模块降低并编译为TensorRT，返回一个PyTorch模块。

专门转换模块的前向方法

Parameters

模块 (联合(torch.nn.Module,torch.jit.ScriptModule) – 源模块

Keyword Arguments

inputs (List[Union(输入, torch.Tensor)]) –

必需输入模块的输入形状、数据类型和内存布局的规范列表。此参数是必需的。输入大小可以指定为torch大小、元组或列表。数据类型可以使用torch数据类型或torch_tensorrt数据类型指定，您可以使用torch设备或torch_tensorrt设备类型枚举来选择设备类型。

inputs=[
    torch_tensorrt.Input((1, 3, 224, 224)), # 输入 #1 的静态 NCHW 输入形状
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # 输入 #2 的动态输入形状
    torch.randn((1, 3, 224, 244)) # 使用示例张量并让 torch_tensorrt 推断设置
]

arg_inputs (Tuple[Any, ...]) – 与inputs相同。为了更好地理解kwarg_inputs的别名。
kwarg_inputs (dict[Any, ...]) – 可选的，模块前向函数的kwarg输入。
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – TensorRT在选择内核时可以使用的数据类型集合
ir (str) – 请求的编译策略。（选项：default - 让 Torch-TensorRT 决定，ts - 使用脚本路径的 TorchScript）
**kwargs – 特定请求策略的附加设置（有关更多信息，请参阅子模块）

Returns

编译模块，运行时将通过TensorRT执行

Return type

torch.nn.Module

torch_tensorrt.convert_method_to_trt_engine(module: Any, method_name: str = 'forward', inputs: Optional[Sequence[输入 | torch.Tensor | InputTensorSpec]] = None, arg_inputs: Optional[Sequence[Sequence[Any]]] = None, kwarg_inputs: Optional[dict[Any, Any]] = None, ir: str = 'default', enabled_precisions: Optional[Set[Union[dtype, dtype]]] = None, **kwargs: Any) → bytes[source]¶

将TorchScript模块方法转换为序列化的TensorRT引擎

将模块的指定方法转换为序列化的TensorRT引擎，给定转换设置的字典

Parameters

模块 (联合(torch.nn.Module,torch.jit.ScriptModule) – 源模块

Keyword Arguments

inputs (List[Union(输入, torch.Tensor)]) –

必需模块输入的形状、数据类型和内存布局的规范列表。此参数是必需的。输入大小可以指定为torch大小、元组或列表。数据类型可以使用torch数据类型或torch_tensorrt数据类型指定，您可以使用torch设备或torch_tensorrt设备类型枚举来选择设备类型。

input=[
    torch_tensorrt.Input((1, 3, 224, 224)), # 输入 #1 的静态 NCHW 输入形状
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # 输入 #2 的动态输入形状
    torch.randn((1, 3, 224, 244)) # 使用示例张量并让 torch_tensorrt 推断设置
]

arg_inputs (Tuple[Any, ...]) – 与inputs相同。为了更好地理解kwarg_inputs的别名。
kwarg_inputs (dict[Any, ...]) – 可选的，模块前向函数的kwarg输入。
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – TensorRT 在选择内核时可以使用的数据类型集合
ir (str) – 请求的编译策略。（选项：default - 让 Torch-TensorRT 决定，ts - 使用脚本路径的 TorchScript）
**kwargs – 特定请求策略的附加设置（有关更多信息，请参阅子模块）

Returns

序列化的TensorRT引擎，可以保存到文件中或通过TensorRT API进行反序列化

Return type

字节

torch_tensorrt.save(module: Any, file_path: str = '', *, output_format: str = 'exported_program', inputs: Optional[Sequence[Tensor]] = None, arg_inputs: Optional[Sequence[Tensor]] = None, kwarg_inputs: Optional[dict[str, Any]] = None, retrace: bool = False) → None[source]¶

将模型以指定的输出格式保存到磁盘。

Parameters

模块 (可选(torch.jit.ScriptModule | torch.export.ExportedProgram | torch.fx.GraphModule | CudaGraphsTorchTensorRTModule)) – 编译后的 Torch-TensorRT 模块
inputs (torch.Tensor) – Torch 输入张量
arg_inputs (Tuple[Any, ...]) – 与inputs相同。为了更好地理解kwarg_inputs的别名。
kwarg_inputs (dict[Any, ...]) – 可选的，模块前向函数的kwarg输入。
output_format (str) – 保存模型的格式。选项包括 exported_program | torchscript。
retrace (bool) – 当模块类型为 fx.GraphModule 时，此选项使用 torch.export.export(strict=False) 重新导出图形以保存它。此标志目前是实验性的。

torch_tensorrt.load(file_path: str = '') → Any[source]¶

加载Torchscript模型或ExportedProgram。

从磁盘加载TorchScript或ExportedProgram文件。文件类型将通过try, except检测。

Parameters: file_path (str) – 磁盘上文件的路径
Raises: ValueError – 如果没有文件或文件既不是TorchScript文件也不是ExportedProgram文件

类¶

class torch_tensorrt.MutableTorchTensorRTModule(pytorch_model: Module, *, device: Optional[Union[设备, device, str]] = None, disable_tf32: bool = False, assume_dynamic_shape_support: bool = False, sparse_weights: bool = False, enabled_precisions: Set[Union[dtype, dtype]] = {dtype.f32}, engine_capability: EngineCapability = EngineCapability.STANDARD, immutable_weights: bool = True, debug: bool = False, num_avg_timing_iters: int = 1, workspace_size: int = 0, dla_sram_size: int = 1048576, dla_local_dram_size: int = 1073741824, dla_global_dram_size: int = 536870912, truncate_double: bool = False, require_full_compilation: bool = False, min_block_size: int = 5, torch_executed_ops: Optional[Collection[Union[Callable[[...], Any], str]]] = None, torch_executed_modules: Optional[List[str]] = None, pass_through_build_failures: bool = False, max_aux_streams: Optional[int] = None, version_compatible: bool = False, optimization_level: Optional[int] = None, use_python_runtime: bool = False, use_fast_partitioner: bool = True, enable_experimental_decompositions: bool = False, dryrun: bool = False, hardware_compatible: bool = False, timing_cache_path: str = '/tmp/torch_tensorrt_engine_cache/timing_cache.bin', **kwargs: Any)[source]¶

初始化一个MutableTorchTensorRTModule，以便像常规的PyTorch模块一样无缝操作它。所有TensorRT编译和重新适配过程在您使用该模块时会自动处理。对其属性的任何更改或加载不同的state_dict都会触发重新适配或重新编译，这些操作将在下一次前向传递期间进行管理。

MutableTorchTensorRTModule 接受一个 PyTorch 模块和一组编译器的配置设置。一旦编译完成，该模块会保持 TensorRT 图模块和原始 PyTorch 模块之间的连接。对 MutableTorchTensorRTModule 所做的任何修改都会反映在 TensorRT 图模块和原始 PyTorch 模块中。

__init__(pytorch_model: Module, *, device: Optional[Union[设备, device, str]] = None, disable_tf32: bool = False, assume_dynamic_shape_support: bool = False, sparse_weights: bool = False, enabled_precisions: Set[Union[dtype, dtype]] = {dtype.f32}, engine_capability: EngineCapability = EngineCapability.STANDARD, immutable_weights: bool = True, debug: bool = False, num_avg_timing_iters: int = 1, workspace_size: int = 0, dla_sram_size: int = 1048576, dla_local_dram_size: int = 1073741824, dla_global_dram_size: int = 536870912, truncate_double: bool = False, require_full_compilation: bool = False, min_block_size: int = 5, torch_executed_ops: Optional[Collection[Union[Callable[[...], Any], str]]] = None, torch_executed_modules: Optional[List[str]] = None, pass_through_build_failures: bool = False, max_aux_streams: Optional[int] = None, version_compatible: bool = False, optimization_level: Optional[int] = None, use_python_runtime: bool = False, use_fast_partitioner: bool = True, enable_experimental_decompositions: bool = False, dryrun: bool = False, hardware_compatible: bool = False, timing_cache_path: str = '/tmp/torch_tensorrt_engine_cache/timing_cache.bin', **kwargs: Any) → None[source]¶

Parameters

pytorch_model (torch.nn.module) – 需要加速的源模块

Keyword Arguments

device (Union(设备, torch.device, dict)) –
TensorRT引擎运行的目标设备
```
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
```
disable_tf32 (bool) – 强制FP32层使用传统的FP32格式，而不是默认行为，即在乘法之前将输入舍入为10位尾数，但使用23位尾数累加和。
assume_dynamic_shape_support (bool) – 将此设置为true可以使转换器同时支持动态和静态形状。默认值：False
sparse_weights (bool) – 为卷积层和全连接层启用稀疏性。
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – TensorRT 在选择内核时可以使用的数据类型集合
immutable_weights (bool) – 构建不可重新拟合的引擎。这对于一些不可重新拟合的层非常有用。
debug (bool) – 启用可调试引擎
能力 (EngineCapability) – 将内核选择限制为安全的GPU内核或安全的DLA内核
num_avg_timing_iters (python:int) – 用于选择内核的平均计时迭代次数
workspace_size (python:int) – 提供给TensorRT的最大工作空间大小
dla_sram_size (python:int) – DLA用于在层内通信的快速软件管理RAM。
dla_local_dram_size (python:int) – DLA用于在操作之间共享中间张量数据的主机RAM
dla_global_dram_size (python:int) – DLA用于存储权重和执行元数据的主机RAM
truncate_double (bool) – 将双精度（float64）提供的权重截断为单精度（float32）
calibrator (Union(torch_tensorrt._C.IInt8Calibrator, tensorrt.IInt8Calibrator)) – 校准器对象，将为PTQ系统提供数据以进行INT8校准
require_full_compilation (bool) – 要求模块从头到尾编译或返回错误，而不是返回一个混合图，其中无法在TensorRT中运行的操作在PyTorch中运行。
min_block_size (python:int) – 为了在TensorRT中运行一组操作，所需的最小连续TensorRT可转换操作的数量
torch_executed_ops (Collection[Target]) – 必须在PyTorch中运行的aten操作符集合。如果此集合不为空但require_full_compilation为True，则会抛出错误。
torch_executed_modules (List[str]) – 必须在PyTorch中运行的模块列表。如果此列表不为空但require_full_compilation为True，则会抛出错误。
pass_through_build_failures (bool) – 如果在编译过程中出现问题，则报错（仅适用于torch.compile工作流）
max_aux_stream (可选[python:int]) – 引擎中的最大流数
version_compatible (bool) – 构建与未来版本的TensorRT兼容的TensorRT引擎（限制为精简运行时操作符，以提供引擎的版本向前兼容性）
optimization_level – (Optional[int]): 设置更高的优化级别允许TensorRT花费更长的引擎构建时间来寻找更多的优化选项。与使用较低优化级别构建的引擎相比，生成的引擎可能具有更好的性能。默认优化级别为3。有效值包括从0到最大优化级别的整数，目前为5。将其设置为大于最大级别的值将导致与最大级别相同的行为。
use_python_runtime – (bool): 返回使用纯Python运行时的图，减少序列化的选项
use_fast_partitioner – (bool): 使用基于邻接的分区方案而不是全局分区器。邻接分区更快，但可能不是最优的。如果追求最佳性能，请使用全局分区器 (False)。
enable_experimental_decompositions (bool) – 使用完整的操作符分解集。这些分解可能未经测试，但有助于使图更容易转换为TensorRT，从而可能增加在TensorRT中运行的图的数量。
dryrun (bool) – 切换“Dryrun”模式，运行除转换为TRT和记录输出之外的所有内容
hardware_compatible (bool) – 构建与构建引擎的GPU架构不同的GPU架构兼容的TensorRT引擎（目前适用于NVIDIA Ampere及更新版本）
timing_cache_path (str) – 如果存在，则为定时缓存的路径（或）编译后保存的路径
lazy_engine_init (bool) – 延迟设置引擎，直到所有引擎的编译完成。这可以允许具有多个图中断的较大模型进行编译，但可能导致运行时GPU内存的过度使用。
**kwargs – 任意,

Returns

MutableTorchTensorRTModule

compile() → None[source]¶: 使用PyTorch模块重新编译TRT图模块。每当权重结构发生变化时（形状、更多层等），应调用此函数。 MutableTorchTensorRTModule会自动捕获权重值的更新并调用此函数重新编译。如果未能捕获到更改，请手动调用此函数以重新编译TRT图模块。

refit_gm() → None[source]¶: 使用任何更新重新拟合TRT图模块。每当权重值发生变化但权重结构保持不变时，应调用此函数。 MutableTorchTensorRTModule会自动捕获权重值更新并调用此函数以重新拟合模块。如果未能捕获更改，请手动调用此函数以更新TRT图模块。

class torch_tensorrt.Input(*args: Any, **kwargs: Any)[source]¶

根据预期的形状、数据类型和张量格式定义模块的输入。

Variables

shape_mode (torch_tensorrt.Input._ShapeMode) – 输入是静态形状还是动态形状
shape (Tuple 或 Dict) –
可以是一个单一的元组或一个定义输入形状的元组字典。静态形状的输入将有一个单一的元组。动态输入将有一个如下形式的字典
```
{"min_shape": Tuple, "opt_shape": Tuple, "max_shape": Tuple}
```
dtype (torch_tensorrt.dpython:type) – 输入张量的预期数据类型（默认：torch_tensorrt.dtype.float32）
format (torch_tensorrt.TensorFormat) – 输入张量的预期格式（默认：torch_tensorrt.TensorFormat.NCHW）

__init__(*args: Any, **kwargs: Any) → None[source]¶

__init__ 方法用于 torch_tensorrt.Input

输入接受几种构造模式之一

Parameters

shape (Tuple 或 List, 可选) – 输入张量的静态形状

Keyword Arguments

shape (Tuple 或 List, 可选) – 输入张量的静态形状
min_shape (Tuple 或 List, 可选) – 输入张量形状范围的最小大小注意：必须提供 min_shape、opt_shape、max_shape 三者，不能有位置参数，形状不能定义，这隐式地将 Input 的 shape_mode 设置为 DYNAMIC
opt_shape (Tuple 或 List, 可选) – 输入张量形状范围的最佳大小注意：必须提供 min_shape、opt_shape 和 max_shape 三者，不能有位置参数，形状不能定义，并且这隐式地将 Input 的 shape_mode 设置为 DYNAMIC
max_shape (Tuple 或 List, 可选) – 输入张量形状范围的最大尺寸注意：必须提供 min_shape、opt_shape、max_shape 三者，不能有位置参数，形状不能定义，这隐式地将 Input 的 shape_mode 设置为 DYNAMIC
dtype (torch.dpython:type 或 torch_tensorrt.dpython:type) – 输入张量的预期数据类型（默认：torch_tensorrt.dtype.float32）
format (torch.memory_format 或 torch_tensorrt.TensorFormat) – 输入张量的预期格式（默认：torch_tensorrt.TensorFormat.NCHW）
tensor_domain (Tuple(python:float, python:float), optional) – 张量允许值的域，以区间表示法表示：[tensor_domain[0], tensor_domain[1])。注意：输入“None”（或不指定）将设置边界为[0, 2)
torch_tensor (torch.Tensor) – 保存与此输入对应的torch张量。
name (str, optional) – 此输入在输入 nn.Module 的 forward 函数中的名称。用于在 dynamo tracer 中为相应的输入指定动态形状。

示例

输入([1,3,32,32], dtype=torch.float32, format=torch.channel_last)
输入(shape=(1,3,32,32), dtype=torch_tensorrt.dtype.int32, format=torch_tensorrt.TensorFormat.NCHW)
输入(min_shape=(1,3,32,32), opt_shape=[2,3,32,32], max_shape=(3,3,32,32)) #隐式 dtype=torch_tensorrt.dtype.float32, 格式=torch_tensorrt.TensorFormat.NCHW

example_tensor(optimization_profile_field: Optional[str] = None) → Tensor[source]¶

获取由Input对象指定形状的示例张量

Parameters: optimization_profile_field (Optional(str)) – 在输入动态形状的情况下，用于形状的字段名称
Returns: 一个 PyTorch 张量

classmethod from_tensor(t: Tensor, disable_memory_format_check: bool = False) → 输入[source]¶

生成一个包含给定PyTorch张量信息的输入。

Parameters

张量 (torch.Tensor) – 一个 PyTorch 张量。
disable_memory_format_check (bool) – 是否验证输入张量的内存格式

Returns

一个输入对象。

classmethod from_tensors(ts: Sequence[Tensor], disable_memory_format_check: bool = False) → List[输入][source]¶

生成一个包含所有给定PyTorch张量信息的输入列表。

Parameters

tensors (Iterable[torch.Tensor]) – PyTorch张量的列表。
disable_memory_format_check (bool) – 是否验证输入张量的内存格式

Returns

输入列表。

dtype: dtype = 1¶

torch_tensorrt.dtype.float32)

Type: 输入张量的预期数据类型（默认

format: memory_format = 1¶

torch_tensorrt.memory_format.linear)

Type: 输入张量的预期格式（默认

class torch_tensorrt.Device(*args: Any, **kwargs: Any)[source]¶

定义一个设备，可用于指定引擎的目标设备

Variables

device_type (DeviceType) – 目标设备类型（GPU 或 DLA）。根据是否指定了 dla_core 隐式设置。
gpu_id (python:int) – 目标GPU的设备ID
dla_core (python:int) – 目标DLA核心的核心ID
allow_gpu_fallback (bool) – 如果DLA无法支持某个操作，是否允许回退到GPU

__init__(*args: Any, **kwargs: Any)[source]¶

__init__ 方法用于 torch_tensorrt.Device

设备接受几种构造模式之一

Parameters

spec (str) – 带有设备规格的字符串，例如“dla:0”表示dla，核心ID为0

Keyword Arguments

gpu_id (python:int) – 目标GPU的ID（如果指定了dla_core给管理DLA的GPU，将会被覆盖）。如果指定了，不应提供位置参数
dla_core (python:int) – 目标DLA核心的ID。如果指定了，则不应提供位置参数。
allow_gpu_fallback (bool) – 如果操作在DLA上不受支持，允许TensorRT在GPU上调度操作（如果设备类型不是DLA，则忽略此选项）

示例

设备(“gpu:1”)
设备(“cuda:1”)
设备(“dla:0”, 允许GPU回退=True)
设备(gpu_id=0, dla_core=0, allow_gpu_fallback=True)
设备(dla_core=0, 允许GPU回退=True)
设备(gpu_id=1)

device_type: 设备类型 = 1¶: 目标设备类型（GPU 或 DLA）。根据是否指定了 dla_core 隐式设置。

dla_core: int = -1¶: 目标DLA核心的核心ID

gpu_id: int = -1¶: 目标GPU的设备ID

枚举¶

class torch_tensorrt.dtype(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

枚举用于描述Torch-TensorRT的数据类型，与torch、tensorrt和numpy的数据类型兼容

to(t: Union[Type[dtype], Type[DataType], Type[dtype], Type[dtype]], use_default: bool = False) → Union[dtype, DataType, dtype, dtype][source]¶

将dtype转换为[torch, numpy, tensorrt]中的等效类型

将self转换为numpy、torch或tensorrt中的等效数据类型。如果目标库不支持self，则会引发异常。因此，不建议直接使用此方法。

或者使用 torch_tensorrt.dtype.try_to()

Parameters

t (Union(Type(torch.dpython:type), Type(tensorrt.DataType), Type(numpy.dpython:type), Type(dpython:type))) – 从另一个库转换而来的数据类型枚举
use_default (bool) – 在某些情况下，一个通用的类型（例如 torch.float）已经足够，因此不抛出异常，而是返回默认值。

Returns

dtype 等效于库枚举 t 中的 torch_tensorrt.dtype

Return type

联合(torch.dtype, tensorrt.DataType, numpy.dtype, dtype)

Raises

TypeError – 不支持的数据类型或未知目标

示例

# Succeeds
float_dtype = torch_tensorrt.dtype.f32.to(torch.dtype) # Returns torch.float

# Failure
float_dtype = torch_tensorrt.dtype.bf16.to(numpy.dtype) # Throws exception

classmethod try_from(t: Union[dtype, DataType, dtype, dtype], use_default: bool = False) → Optional[dtype][source]¶

从另一个库的dtype系统创建一个Torch-TensorRT dtype。

从numpy、torch和tensorrt中获取一个dtype枚举并创建一个torch_tensorrt.dtype。如果源dtype系统不受支持或类型在Torch-TensorRT中不受支持，则返回None。

Parameters

t (Union(torch.dpython:type, tensorrt.DataType, numpy.dpython:type, dpython:type)) – 来自另一个库的数据类型枚举
use_default (bool) – 在某些情况下，一个通用的类型（例如 torch_tensorrt.dtype.f32）已经足够，因此不抛出异常，而是返回默认值。

Returns

等效的 torch_tensorrt.dtype 到 t 或 None

Return type

可选(dtype)

示例

# Succeeds
float_dtype = torch_tensorrt.dtype.try_from(torch.float) # Returns torch_tensorrt.dtype.f32

# Unsupported type
float_dtype = torch_tensorrt.dtype.try_from(torch.complex128) # Returns None

try_to(t: Union[Type[dtype], Type[DataType], Type[dtype], Type[dtype]], use_default: bool) → Optional[Union[dtype, DataType, dtype, dtype]][source]¶

将dtype转换为[torch, numpy, tensorrt]中的等效类型

将self转换为numpy、torch或tensorrt中的等效数据类型。如果目标库不支持self，则返回None。

Parameters

t (Union(Type(torch.dpython:type), Type(tensorrt.DataType), Type(numpy.dpython:type), Type(dpython:type))) – 从另一个库转换而来的数据类型枚举
use_default (bool) – 在某些情况下，一个通用的类型（例如 torch.float）已经足够，因此不抛出异常，而是返回默认值。

Returns

dtype 等效于库枚举 t 中的 torch_tensorrt.dtype

Return type

可选(联合(torch.dtype, tensorrt.DataType, numpy.dtype, dtype))

示例

# Succeeds
float_dtype = torch_tensorrt.dtype.f32.to(torch.dtype) # Returns torch.float

# Failure
float_dtype = torch_tensorrt.dtype.bf16.to(numpy.dtype) # Returns None

b¶

布尔值，等同于 dtype.bool

bf16¶

16位“Brain”浮点数，相当于 dtype.bfloat16

f16¶

16位浮点数，相当于dtype.half，dtype.fp16和dtype.float16

f32¶

32位浮点数，等同于dtype.float，dtype.fp32和dtype.float32

f64¶

64位浮点数，等同于dtype.double、dtype.fp64和dtype.float64

f8¶

8位浮点数，相当于 dtype.fp8 和 dtype.float8

i32¶

有符号的32位整数，等同于 dtype.int32 和 dtype.int

i64¶

有符号的64位整数，等同于 dtype.int64 和 dtype.long

i8¶

有符号8位整数，相当于dtype.int8，当启用为内核精度时，通常需要模型支持量化

u8¶

无符号8位整数，相当于 dtype.uint8

unknown¶

哨兵值

class torch_tensorrt.DeviceType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

TensorRT 将针对的设备类型

to(t: Union[Type[DeviceType], Type[设备类型]], use_default: bool = False) → Union[DeviceType, 设备类型][source]¶

将DeviceType转换为tensorrt中的等效类型

将self转换为torch或tensorrt等效的设备类型。如果目标库不支持self，则会引发异常。因此，不建议直接使用此方法。

或者使用 torch_tensorrt.DeviceType.try_to()

Parameters: t (Union(Type(tensorrt.DeviceType), Type(DeviceType))) – 从另一个库转换的设备类型枚举
Returns: 设备类型等同于枚举中的 torch_tensorrt.DeviceType t
Return type: Union(tensorrt.DeviceType, DeviceType)
Raises: TypeError – 未知的目标类型或不支持的设备类型

示例

# Succeeds
trt_dla = torch_tensorrt.DeviceType.DLA.to(tensorrt.DeviceType) # Returns tensorrt.DeviceType.DLA

classmethod try_from(d: Union[DeviceType, 设备类型]) → Optional[设备类型][source]¶

从TensorRT设备类型枚举创建一个Torch-TensorRT设备类型枚举。

从tensorrt中获取设备类型枚举并创建一个torch_tensorrt.DeviceType。如果源不受支持或设备类型在Torch-TensorRT中不受支持，则会引发异常。因此，不建议直接使用此方法。

或者使用 torch_tensorrt.DeviceType.try_from()

Parameters: d (Union(tensorrt.DeviceType, DeviceType)) – 来自另一个库的设备类型枚举
Returns: 等效的 torch_tensorrt.DeviceType 到 d
Return type: 设备类型

示例

torchtrt_dla = torch_tensorrt.DeviceType._from(tensorrt.DeviceType.DLA)

try_to(t: Union[Type[DeviceType], Type[设备类型]], use_default: bool = False) → Optional[Union[DeviceType, 设备类型]][source]¶

将DeviceType转换为tensorrt中的等效类型

将self转换为torch或tensorrt等效的内存格式。如果目标库不支持self，则将返回None。

Parameters: t (Union(Type(tensorrt.DeviceType), Type(DeviceType))) – 从另一个库转换过来的设备类型枚举
Returns: 设备类型等同于枚举 torch_tensorrt.DeviceType 中的 t
Return type: 可选(联合(tensorrt.DeviceType, DeviceType))

示例

# Succeeds
trt_dla = torch_tensorrt.DeviceType.DLA.to(tensorrt.DeviceType) # Returns tensorrt.DeviceType.DLA

DLA¶

目标是一个DLA核心

GPU¶

目标是GPU

UNKNOWN¶

哨兵值

class torch_tensorrt.EngineCapability(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

EngineCapability 决定了网络在构建时的限制以及它针对的运行时。

to(t: Union[Type[EngineCapability], Type[EngineCapability]]) → Union[EngineCapability, EngineCapability][source]¶

将 EngineCapability 转换为 tensorrt 中的等效类型

将self转换为torch或tensorrt等效引擎能力。如果目标库不支持self，则会引发异常。因此，不建议直接使用此方法。

或者使用 torch_tensorrt.EngineCapability.try_to()

Parameters: t (Union(Type(tensorrt.EngineCapability), Type(EngineCapability))) – 从另一个库转换而来的引擎能力枚举
Returns: 引擎能力等同于枚举 torch_tensorrt.EngineCapability 中的 t
Return type: Union(tensorrt.EngineCapability, EngineCapability)
Raises: TypeError – 未知的目标类型或不支持的引擎功能

示例

# Succeeds
torchtrt_dla_ec = torch_tensorrt.EngineCapability.DLA_STANDALONE.to(tensorrt.EngineCapability) # Returns tensorrt.EngineCapability.DLA

classmethod try_from() → Optional[EngineCapability][source]¶

从TensorRT引擎能力枚举创建一个Torch-TensorRT引擎能力枚举。

从tensorrt中获取设备类型枚举并创建一个torch_tensorrt.EngineCapability。如果源不受支持或Torch-TensorRT中不支持引擎能力级别，则会引发异常。因此，不建议直接使用此方法。

或者使用 torch_tensorrt.EngineCapability.try_from()

Parameters: c (Union(tensorrt.EngineCapability, EngineCapability)) – 来自另一个库的引擎能力枚举
Returns: 等效的 torch_tensorrt.EngineCapability 到 c
Return type: EngineCapability

示例

torchtrt_safety_ec = torch_tensorrt.EngineCapability._from(tensorrt.EngineCapability.SAEFTY)

try_to(t: Union[Type[EngineCapability], Type[EngineCapability]]) → Optional[Union[EngineCapability, EngineCapability]][source]¶

将 EngineCapability 转换为 tensorrt 中的等效类型

将self转换为torch或tensorrt等效引擎能力。如果目标库不支持self，则将返回None。

Parameters: t (Union(Type(tensorrt.EngineCapability), Type(EngineCapability))) – 从另一个库转换而来的引擎能力枚举
Returns: 引擎能力等同于枚举 torch_tensorrt.EngineCapability 中的 t
Return type: 可选(联合(tensorrt.EngineCapability, EngineCapability))

示例

# Succeeds
trt_dla_ec = torch_tensorrt.EngineCapability.DLA.to(tensorrt.EngineCapability) # Returns tensorrt.EngineCapability.DLA_STANDALONE

DLA_STANDALONE¶

EngineCapability.DLA_STANDALONE 提供了一个受限的网络操作子集，这些操作与DLA兼容，并且生成的序列化引擎可以使用独立的DLA运行时API执行。

SAFETY¶

EngineCapability.SAFETY 提供了一组经过安全认证的网络操作受限子集，生成的序列化引擎可以在 tensorrt.safe 命名空间中使用 TensorRT 的安全运行时 API 执行。

STANDARD¶

EngineCapability.STANDARD 不对功能提供任何限制，生成的序列化引擎可以使用 TensorRT 的标准运行时 API 执行。

class torch_tensorrt.memory_format(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

to(t: Union[Type[memory_format], Type[TensorFormat], Type[memory_format]]) → Union[memory_format, TensorFormat, memory_format][source]¶

将 memory_format 转换为 torch 或 tensorrt 中的等效类型

将self转换为torch或tensorrt等效的内存格式。如果目标库不支持self，则会引发异常。因此，不建议直接使用此方法。

或者使用 torch_tensorrt.memory_format.try_to()

Parameters: t (Union(Type(torch.memory_format), Type(tensorrt.TensorFormat), Type(memory_format))) – 从另一个库转换的内存格式类型枚举
Returns: 内存格式等效于枚举 t 中的 torch_tensorrt.memory_format
Return type: Union(torch.memory_format, tensorrt.TensorFormat, memory_format)
Raises: TypeError – 未知的目标类型或不支持的内存格式

示例

# Succeeds
tf = torch_tensorrt.memory_format.linear.to(torch.dtype) # Returns torch.contiguous

classmethod try_from(f: Union[memory_format, TensorFormat, memory_format]) → Optional[memory_format][source]¶

从另一个库的内存格式枚举创建一个Torch-TensorRT内存格式枚举。

从torch或tensorrt中获取一个内存格式枚举，并创建一个torch_tensorrt.memory_format。如果源不受支持或Torch-TensorRT中不支持该内存格式，则返回None。

Parameters: f (Union(torch.memory_format, tensorrt.TensorFormat, memory_format)) – 来自另一个库的内存格式枚举
Returns: 等效的 torch_tensorrt.memory_format 到 f
Return type: 可选(memory_format)

示例

torchtrt_linear = torch_tensorrt.memory_format.try_from(torch.contiguous)

try_to(t: Union[Type[memory_format], Type[TensorFormat], Type[memory_format]]) → Optional[Union[memory_format, TensorFormat, memory_format]][source]¶

将 memory_format 转换为 torch 或 tensorrt 中的等效类型

将self转换为torch或tensorrt等效的内存格式。如果目标库不支持self，则将返回None

Parameters: t (联合(类型(torch.memory_format), 类型(tensorrt.TensorFormat), 类型(memory_format))) – 从另一个库转换的内存格式类型枚举
Returns: 内存格式等效于枚举 t 中的 torch_tensorrt.memory_format
Return type: 可选(联合(torch.memory_format, tensorrt.TensorFormat, memory_format))

示例

# Succeeds
tf = torch_tensorrt.memory_format.linear.to(torch.dtype) # Returns torch.contiguous

cdhw32¶

三十二个宽通道向量化的行主格式，具有3个空间维度。

此格式仅限于FP16和INT8。仅适用于维度 >= 4。

对于一个维度为 {N, C, D, H, W} 的张量，其内存布局等同于一个维度为 [N][(C+31)/32][D][H][W][32] 的 C 数组，其中张量坐标 (n, d, c, h, w) 映射到数组下标 [n][c/32][d][h][w][c%32]。

chw16¶

十六个宽通道向量化行主格式。

此格式绑定到FP16。仅适用于维度 >= 3。

对于一个维度为 {N, C, H, W} 的张量，其内存布局等同于一个维度为 [N][(C+15)/16][H][W][16] 的 C 数组，张量坐标 (n, c, h, w) 映射到数组下标 [n][c/16][h][w][c%16]。

chw2¶

两个宽通道向量化行主格式。

此格式在TensorRT中绑定到FP16。仅适用于维度 >= 3。

对于一个维度为 {N, C, H, W} 的张量，其内存布局等同于一个维度为 [N][(C+1)/2][H][W][2] 的 C 数组，其中张量坐标 (n, c, h, w) 映射到数组下标 [n][c/2][h][w][c%2]。

chw32¶

三十二宽通道向量化行主格式。

此格式仅适用于维度 >= 3。

对于一个维度为 {N, C, H, W} 的张量，其内存布局等同于一个维度为 [N][(C+31)/32][H][W][32] 的 C 数组，其中张量坐标 (n, c, h, w) 映射到数组下标 [n][c/32][h][w][c%32]。

chw4¶

四宽通道向量化行主格式。此格式绑定到INT8。仅适用于维度 >= 3。

对于一个维度为 {N, C, H, W} 的张量，其内存布局等同于一个维度为 [N][(C+3)/4][H][W][4] 的 C 数组，张量坐标 (n, c, h, w) 映射到数组下标 [n][c/4][h][w][c%4]。

dhwc¶

非向量化的通道最后格式。此格式绑定到FP32。仅适用于维度 >= 4。

等同于 memory_format.channels_last_3d

dhwc8¶

八通道格式，其中C被填充到8的倍数。

此格式绑定到FP16，并且仅适用于维度 >= 4。

对于一个维度为 {N, C, D, H, W} 的张量，其内存布局等同于一个维度为 [N][D][H][W][(C+7)/8*8] 的数组，张量坐标 (n, c, d, h, w) 映射到数组下标 [n][d][h][w][c]。

dla_hwc4¶

DLA图像格式。通道最后格式。C只能是1、3、4。如果C == 3，它将被四舍五入为4。沿着H轴步进的步幅被四舍五入到32字节。

此格式绑定到FP16/Int8，仅适用于维度 >= 3。

对于一个维度为 {N, C, H, W} 的张量，当 C 分别为 1, 3, 4 时，C' 分别为 1, 4, 4，其内存布局等同于一个维度为 [N][H][roundUp(W, 32/C’/elementSize)][C’] 的 C 数组，其中 elementSize 对于 FP16 为 2，对于 Int8 为 1，C' 是 C 的取整值。张量坐标 (n, c, h, w) 映射到数组下标 [n][h][w][c]。

dla_linear¶

DLA平面格式。行主格式。沿H轴步进的步幅向上舍入到64字节。

此格式绑定到FP16/Int8，并且仅适用于维度 >= 3。

对于一个维度为 {N, C, H, W} 的张量，其内存布局等同于一个维度为 [N][C][H][roundUp(W, 64/elementSize)] 的 C 数组，其中 elementSize 对于 FP16 为 2，对于 Int8 为 1，张量坐标 (n, c, h, w) 映射到数组下标 [n][c][h][w]。

hwc¶

非向量化的通道最后格式。此格式绑定到FP32，并且仅适用于维度 >= 3。

等同于 memory_format.channels_last

hwc16¶

十六通道格式，其中C被填充为16的倍数。此格式绑定到FP16。仅适用于维度 >= 3。

对于一个维度为 {N, C, H, W} 的张量，其内存布局等同于维度为 [N][H][W][(C+15)/16*16] 的数组，张量坐标 (n, c, h, w) 映射到数组下标 [n][h][w][c]。

hwc8¶

八通道格式，其中C被填充到8的倍数。

此格式绑定到FP16。仅适用于维度 >= 3。

对于一个维度为 {N, C, H, W} 的张量，其内存布局等同于维度为 [N][H][W][(C+7)/8*8] 的数组，张量坐标 (n, c, h, w) 映射到数组下标 [n][h][w][c]。

linear¶

行主线性格式。

对于一个维度为 {N, C, H, W} 的张量，W 轴始终具有单位步幅，而其他每个轴的步幅至少是下一个维度的乘积乘以下一个步幅。步幅与维度为 [N][C][H][W] 的 C 数组相同。

等同于 memory_format.contiguous

torch_tensorrt¶

函数¶

类¶

枚举¶

子模块¶