torch_tensorrt.ts¶

函数¶

torch_tensorrt.ts.compile(module: ScriptModule, inputs: Optional[Sequence[Input | torch.Tensor]] = None, input_signature: Optional[Tuple[Union[Input, Tensor, Sequence[Any]]] = None, device: Device = Device(type=DeviceType.GPU, gpu_id=0), disable_tf32: bool = False, sparse_weights: bool = False, enabled_precisions: Optional[Set[Union[dtype, dtype]]] = None, refit: bool = False, debug: bool = False, capability: EngineCapability = EngineCapability.STANDARD, num_avg_timing_iters: int = 1, workspace_size: int = 0, dla_sram_size: int = 1048576, dla_local_dram_size: int = 1073741824, dla_global_dram_size: int = 536870912, truncate_long_and_double: bool = False, require_full_compilation: bool = False, min_block_size: int = 3, torch_executed_ops: Optional[List[str]] = None, torch_executed_modules: Optional[List[str]] = None, allow_shape_tensors: bool = False) → ScriptModule[源]¶

使用 TensorRT 为 NVIDIA GPU 编译 TorchScript 模块

接收一个现有的 TorchScript 模块和一组配置编译器的设置，并将方法转换为调用等效 TensorRT 引擎的 JIT 图

特别转换 TorchScript 模块的 forward 方法

参数

module (torch.jit.ScriptModule) – 源模块，这是对 PyTorch torch.nn.Module 进行跟踪或脚本化后的结果

关键字参数

inputs (List[Union(Input, torch.Tensor)]) –

必需模块输入的形状、数据类型和内存布局的规格列表。此参数是必需的。输入大小可以指定为 torch sizes、tuples 或 lists。数据类型可以使用 torch 数据类型或 torch_tensorrt 数据类型指定，并且您可以使用 torch 设备或 torch_tensorrt 设备类型枚举来选择设备类型。

input=[
    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # Dynamic input shape for input #2
    torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings
]

Union (input_signature) –

模块输入的格式化集合。输入大小可以指定为 torch sizes、tuples 或 lists。数据类型可以使用 torch 数据类型或 torch_tensorrt 数据类型指定，并且您可以使用 torch 设备或 torch_tensorrt 设备类型枚举来选择设备类型。此 API 应被视为 Beta 级别稳定，并且将来可能会发生更改

input_signature=([
    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # Dynamic input shape for input #2
], torch.randn((1, 3, 224, 244))) # Use an example tensor and let torch_tensorrt infer settings for input #3

device (Union(Device, torch.device, dict)) –
TensorRT 引擎运行的目标设备
```
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
```
disable_tf32 (bool) – 强制 FP32 层使用传统的 FP32 格式，而不是默认行为，即在相乘前将输入四舍五入到 10 位尾数，但使用 23 位尾数累加求和
sparse_weights (bool) – 为卷积层和全连接层启用稀疏性。
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – TensorRT 在选择内核时可以使用的数据类型集合
refit (bool) – 启用重拟合
debug (bool) – 启用可调试引擎
capability (EngineCapability) – 将内核选择限制为安全的 GPU 内核或安全的 DLA 内核
num_avg_timing_iters (python:int) – 用于选择内核的平均计时迭代次数
workspace_size (python:int) – 分配给 TensorRT 的最大工作空间大小
dla_sram_size (python:int) – DLA 用于在层内通信的快速软件管理 RAM。
dla_local_dram_size (python:int) – DLA 用于在操作间共享中间张量数据的主机 RAM
dla_global_dram_size (python:int) – DLA 用于存储权重和元数据以供执行的主机 RAM
truncate_long_and_double (bool) – 将 int64 或 double (float64) 类型的权重截断为 int32 和 float32
require_full_compilation (bool) – 要求模块端到端编译，否则返回错误，而不是返回一个混合图，其中无法在 TensorRT 中运行的操作在 PyTorch 中运行
min_block_size (python:int) – 为了在 TensorRT 中运行一组操作，连续的可转换 TensorRT 操作的最小数量
torch_executed_ops (List[str]) – 必须在 PyTorch 中运行的 aten 操作符列表。如果此列表非空但 require_full_compilation 为 True，则会抛出错误
torch_executed_modules (List[str]) – 必须在 PyTorch 中运行的模块列表。如果此列表非空但 require_full_compilation 为 True，则会抛出错误
allow_shape_tensors – (实验性) 允许 aten::size 使用 TensorRT 中的 IShapeLayer 输出形状张量

返回

已编译的 TorchScript 模块，运行时将通过 TensorRT 执行

返回类型

torch.jit.ScriptModule

torch_tensorrt.ts.convert_method_to_trt_engine(module: ScriptModule, method_name: str = 'forward', inputs: Optional[Sequence[Input | torch.Tensor]] = None, device: Device = Device(type=DeviceType.GPU, gpu_id=0), disable_tf32: bool = False, sparse_weights: bool = False, enabled_precisions: Optional[Set[Union[dtype, dtype]]] = None, refit: bool = False, debug: bool = False, capability: EngineCapability = EngineCapability.STANDARD, num_avg_timing_iters: int = 1, workspace_size: int = 0, dla_sram_size: int = 1048576, dla_local_dram_size: int = 1073741824, dla_global_dram_size: int = 536870912, truncate_long_and_double: int = False, allow_shape_tensors: bool = False) → bytes[源]¶

将 TorchScript 模块方法转换为已序列化的 TensorRT 引擎

给定转换设置的字典，将模块的指定方法转换为已序列化的 TensorRT 引擎

参数

module (torch.jit.ScriptModule) – 源模块，这是对 PyTorch torch.nn.Module 进行跟踪或脚本化后的结果

关键字参数

inputs (List[Union(Input, torch.Tensor)]) –

必需模块输入的形状、数据类型和内存布局的规格列表。此参数是必需的。输入大小可以指定为 torch sizes、tuples 或 lists。数据类型可以使用 torch 数据类型或 torch_tensorrt 数据类型指定，并且您可以使用 torch 设备或 torch_tensorrt 设备类型枚举来选择设备类型。

input=[
    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # Dynamic input shape for input #2
    torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings
]

method_name (str) – 要转换的方法名称

Union (input_signature) –

模块输入的格式化集合。输入大小可以指定为 torch sizes、tuples 或 lists。数据类型可以使用 torch 数据类型或 torch_tensorrt 数据类型指定，并且您可以使用 torch 设备或 torch_tensorrt 设备类型枚举来选择设备类型。此 API 应被视为 Beta 级别稳定，并且将来可能会发生更改

input_signature=([
    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # Dynamic input shape for input #2
], torch.randn((1, 3, 224, 244))) # Use an example tensor and let torch_tensorrt infer settings for input #3

device (Union(Device, torch.device, dict)) –
TensorRT 引擎运行的目标设备
```
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
```
disable_tf32 (bool) – 强制 FP32 层使用传统的 FP32 格式，而不是默认行为，即在相乘前将输入四舍五入到 10 位尾数，但使用 23 位尾数累加求和
sparse_weights (bool) – 为卷积层和全连接层启用稀疏性。
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – TensorRT 在选择内核时可以使用的数据类型集合
refit (bool) – 启用重拟合
debug (bool) – 启用可调试引擎
capability (EngineCapability) – 将内核选择限制为安全的 GPU 内核或安全的 DLA 内核
num_avg_timing_iters (python:int) – 用于选择内核的平均计时迭代次数
workspace_size (python:int) – 分配给 TensorRT 的最大工作空间大小
dla_sram_size (python:int) – DLA 用于在层内通信的快速软件管理 RAM。
dla_local_dram_size (python:int) – DLA 用于在操作间共享中间张量数据的主机 RAM
dla_global_dram_size (python:int) – DLA 用于存储权重和元数据以供执行的主机 RAM
truncate_long_and_double (bool) – 将 int64 或 double (float64) 类型的权重截断为 int32 和 float32
allow_shape_tensors – (实验性) 允许 aten::size 使用 TensorRT 中的 IShapeLayer 输出形状张量

返回

序列化的 TensorRT 引擎，可以保存到文件或通过 TensorRT API 进行反序列化

返回类型

字节

torch_tensorrt.ts.check_method_op_support(module: ScriptModule, method_name: str = 'forward') → bool[源]¶

检查方法是否完全受 torch_tensorrt 支持

检查 TorchScript 模块的方法是否可以被 torch_tensorrt 编译，如果不行，将打印出不支持的操作符列表，函数返回 false，否则返回 true。

参数

module (torch.jit.ScriptModule) – 源模块，这是对 PyTorch torch.nn.Module 进行跟踪或脚本化后的结果
method_name (str) – 要检查的方法名称

返回

如果方法受支持，则返回 True

返回类型

布尔值

torch_tensorrt.ts.embed_engine_in_new_module(serialized_engine: bytes, input_binding_names: Optional[List[str]] = None, output_binding_names: Optional[List[str]] = None, device: Device = Device(type=DeviceType.GPU, gpu_id=0)) → ScriptModule[源]¶

将预构建的序列化 TensorRT 引擎嵌入到 TorchScript 模块中

将预构建的序列化 TensorRT 引擎（以字节形式）嵌入到 TorchScript 模块中。使用函数签名中的 forward 方法注册以执行 TensorRT 引擎

forward(Tensor[]) -> Tensor[]

TensorRT 绑定可以通过 [in/out]put_binding_names 显式指定，或者使用以下格式的名称

[symbol].[输入/输出数组中的索引]

例如：- [x.0, x.1, x.2] -> [y.0]

模块可以使用 torch.jit.save 进行保存，其中嵌入了引擎，并根据 torch_tensorrt 的可移植性规则进行移动/加载

参数

serialized_engine (bytearray) – 来自 torch_tensorrt 或 TensorRT API 的已序列化 TensorRT 引擎

关键字参数

input_binding_names (List[str]) – 将传递给包含的 PyTorch 模块的 TensorRT 绑定名称列表（按顺序）
output_binding_names (List[str]) – 应从包含的 PyTorch 模块返回的 TensorRT 绑定名称列表（按顺序）
device (Union(Device, torch.device, dict)) – 引擎运行的目标设备。必须与提供的引擎兼容。默认值：当前活动设备

返回

嵌入了引擎的新 TorchScript 模块

返回类型

torch.jit.ScriptModule

torch_tensorrt.ts.TensorRTCompileSpec(inputs: Optional[List[torch.Tensor | Input]] = None, input_signature: Optional[Any] = None, device: Optional[Union[device, Device]] = None, disable_tf32: bool = False, sparse_weights: bool = False, enabled_precisions: Optional[Set[Union[dtype, dtype]]] = None, refit: bool = False, debug: bool = False, capability: EngineCapability = EngineCapability.STANDARD, num_avg_timing_iters: int = 1, workspace_size: int = 0, dla_sram_size: int = 1048576, dla_local_dram_size: int = 1073741824, dla_global_dram_size: int = 536870912, truncate_long_and_double: bool = False, allow_shape_tensors: bool = False) → <torch.ScriptClass object at 0x7f169e46fb70>[源]¶

实用工具，用于创建用于使用 PyTorch TensorRT 后端的格式化 spec 字典

关键字参数

inputs (List[Union(Input, torch.Tensor)]) –

必需模块输入的形状、数据类型和内存布局的规格列表。此参数是必需的。输入大小可以指定为 torch sizes、tuples 或 lists。数据类型可以使用 torch 数据类型或 torch_tensorrt 数据类型指定，并且您可以使用 torch 设备或 torch_tensorrt 设备类型枚举来选择设备类型。

input=[
    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # Dynamic input shape for input #2
    torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings
]

device (Union(Device, torch.device, dict)) –
TensorRT 引擎运行的目标设备
```
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
```
disable_tf32 (bool) – 强制 FP32 层使用传统的 FP32 格式，而不是默认行为，即在相乘前将输入四舍五入到 10 位尾数，但使用 23 位尾数累加求和
sparse_weights (bool) – 为卷积层和全连接层启用稀疏性。
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – TensorRT 在选择内核时可以使用的数据类型集合
refit (bool) – 启用重拟合
debug (bool) – 启用可调试引擎
capability (EngineCapability) – 将内核选择限制为安全的 GPU 内核或安全的 DLA 内核
num_avg_timing_iters (python:int) – 用于选择内核的平均计时迭代次数
workspace_size (python:int) – 分配给 TensorRT 的最大工作空间大小
truncate_long_and_double (bool) – 将 int64 或 double (float64) 类型的权重截断为 int32 和 float32
allow_shape_tensors –
(实验性) 允许 aten::size 使用 TensorRT 中的 IShapeLayer 输出形状张量

返回
torch.classes.tensorrt.CompileSpec: 要提供给 torch._C._jit_to_tensorrt 的方法和格式化 spec 对象的列表

torch_tensorrt.ts¶

函数¶

文档

教程

资源