使用新权重重塑 Torch-TensorRT 程序¶

编译是一项昂贵的操作，因为它涉及对模型进行的许多图转换、翻译和优化。在模型权重可能偶尔更新的情况下（例如，插入 LoRA 适配器），重新编译的巨大成本会使 TensorRT 难以使用，如果每次都需要从头开始构建编译后的程序。Torch-TensorRT 提供了一种 PyTorch 原生的机制，通过权重重塑来更新已编译的 TensorRT 程序的权重，而无需从头开始重新编译。

在本教程中，我们将逐步介绍

将 PyTorch 模型编译为 TensorRT 图模块

保存和加载图模块

重塑图模块

本教程主要关注 AOT 工作流，因为在这种工作流中，用户最有可能需要手动重塑模块。在 JIT 工作流中，权重更改会触发重新编译。由于引擎已经预先构建，并且启用了引擎缓存，Torch-TensorRT 可以自动识别预先构建的引擎，触发重塑，并代表用户快捷地进行重新编译（请参阅：引擎缓存）。

标准工作流¶

导入和模型定义¶

import numpy as np
import torch
import torch_tensorrt as torch_trt
import torchvision.models as models
from torch_tensorrt.dynamo import refit_module_weights

np.random.seed(0)
torch.manual_seed(0)
inputs = [torch.rand((1, 3, 224, 224)).to("cuda")]

制作一个可重塑的编译程序¶

第一步是像往常一样编译模块并保存它。请注意，有一个额外的参数 immutable_weights 被设置为 False。此参数用于指示正在构建的引擎将来应支持权重重塑。未使用这些设置构建的引擎将无法重塑。

在这种情况下，我们将编译一个具有随机初始化权重的 ResNet18 模型并保存它。

model = models.resnet18(pretrained=False).to("cuda").eval()
exp_program = torch.export.export(model, tuple(inputs))
enabled_precisions = {torch.float}
workspace_size = 20 << 30
min_block_size = 0
use_python_runtime = False
torch_executed_ops = {}
trt_gm = torch_trt.dynamo.compile(
    exp_program,
    tuple(inputs),
    use_python_runtime=use_python_runtime,
    enabled_precisions=enabled_precisions,
    min_block_size=min_block_size,
    torch_executed_ops=torch_executed_ops,
    immutable_weights=False,
    reuse_cached_engines=False,
)  # Output is a torch.fx.GraphModule

# Save the graph module as an exported program
torch_trt.save(trt_gm, "./compiled.ep", inputs=inputs)

使用预训练权重重塑程序¶

随机权重对于推理没有用。但现在，我们不必重新编译模型，而是可以使用预训练权重来重塑模型。这是通过使用目标权重设置另一个 PyTorch 模块并将其导出为 ExportedProgram 来完成的。然后使用 refit_module_weights 函数来更新已编译模块的权重，使其具有新权重。

# Create and compile the updated model
model2 = models.resnet18(pretrained=True).to("cuda").eval()
exp_program2 = torch.export.export(model2, tuple(inputs))


compiled_trt_ep = torch_trt.load("./compiled.ep")

# This returns a new module with updated weights
new_trt_gm = refit_module_weights(
    compiled_module=compiled_trt_ep,
    new_weight_module=exp_program2,
    arg_inputs=inputs,
)

# Check the output
with torch.no_grad():
    expected_outputs, refitted_outputs = exp_program2.module()(*inputs), new_trt_gm(
        *inputs
    )
    for expected_output, refitted_output in zip(expected_outputs, refitted_outputs):
        assert torch.allclose(
            expected_output, refitted_output, 1e-2, 1e-2
        ), "Refit Result is not correct. Refit failed"

print("Refit successfully!")

高级用法¶

有许多设置可用于控制重塑过程

权重映射缓存¶

权重重塑通过将已编译模块的权重与用户提供的 ExportedProgram 中的新权重进行匹配来实现。由于从 PyTorch 到 TensorRT 的一对一名称匹配很难实现，因此在*重塑时*匹配权重的唯一保证方法是，在编译过程的早期阶段通过新的 ExportedProgram 来生成几乎相同的权重名称。这可能非常耗时，而且并非总是必需的。

为了避免这种情况，**在初始编译时**，Torch-TensorRt 将尝试缓存从 PyTorch 权重到 TensorRT 权重的直接映射。此缓存作为元数据存储在已编译的模块中，可用于加快重塑速度。如果缓存不存在，重塑系统将回退到在重塑时重新构建映射。此缓存的使用受 use_weight_map_cache 参数控制。

由于缓存使用基于启发式的方法来匹配 PyTorch 和 TensorRT 权重，因此您可能希望验证重塑。这可以通过将 verify_output 设置为 True 并提供示例 arg_inputs 和 kwarg_inputs 来完成。执行此操作时，重塑系统将根据相同输入运行重塑后的模块和用户提供的模块，并比较输出。

原地重塑¶

in_place 允许用户就地重塑模块。当用户想要更新已编译模块的权重而不创建新模块时，这很有用。

脚本总运行时间： ( 0 分 0.000 秒)

由 Sphinx-Gallery 生成的画廊