使用自定义转换器重载 Torch-TensorRT 转换器¶

如果您由于某种原因想更改特定 PyTorch 操作到 TensorRT 的转换行为，您可以通过编写自定义转换器来重载 Torch-TensorRT 的行为。这可能是因为您想使用自定义内核而不是 TensorRT 的内核，或者因为您想在 TensorRT 中使用与 Torch-TensorRT 通常使用的不同的层实现。

在本教程中，我们将演示如何通过一个使用 GeLU 层不同实现的自定义转换器来重载 Torch-TensorRT 对 torch.nn.functional.gelu 操作到 TensorRT 的转换。

import logging
import sys

import torch
import torch_tensorrt

GeLU 在 PyTorch 中有两种模式：一种使用 erf 函数，另一种使用 tanh 近似。TensorRT 本地支持这两种实现作为激活层，但假设我们只想在 tanh 模式下使用 GeLU 的自定义实现。

class GeLU(torch.nn.Module):
    def __init__(self, mode="tanh"):
        super().__init__()
        self.mode = mode

    def forward(self, x):
        return torch.nn.functional.gelu(x, approximate=self.mode)


my_mod = GeLU(mode="tanh").to("cuda").eval()
ex_input = torch.randn(2, 5).to("cuda")

作为基线，我们可以在我们的模块中使用标准的 Torch-TensorRT GeLU 转换器（在 tanh 近似模式下）。

my_standard_gelu = torch_tensorrt.compile(
    my_mod, arg_inputs=(ex_input,), min_block_size=1
)
print(my_standard_gelu.graph)
print(my_standard_gelu(ex_input))

编写自定义转换器¶

转换器是函数，它们接收 PyTorch 图中特定 PyTorch 操作的实例，并将其转换为正在构建的 TensorRT 图中等效的 TensorRT 操作集。它们使用 @torch_tensorrt.dynamo.conversion.dynamo_tensorrt_converter 装饰器注册到 Torch-TensorRT。在代码层面，转换器接收当前的转换状态（ConversionCtx）、图中的下一个要转换的操作以及该节点的参数，并返回该操作的占位符输出，同时作为副作用将必要的 TensorRT 层插入 TensorRT 网络。

from typing import Dict, Sequence, Tuple, Union

import tensorrt as trt
from torch.fx.node import Argument, Node, Target
from torch_tensorrt.dynamo import CompilationSettings
from torch_tensorrt.dynamo.conversion import ConversionContext

转换器元数据¶

@torch_tensorrt.dynamo.conversion.dynamo_tensorrt_converter(
    # The PyTorch operation to convert, when this operation is encountered, this converter will be called
    torch.ops.aten.gelu.default,
    # Validators are functions that determine that given a specific node, if it can be converted by the converter
    capability_validator=lambda node, settings: (
        "approximate" in node.kwargs and node.kwargs["approximate"] == "tanh"
    ),
    # Can this converter be used in cases where the input shapes are dynamic
    supports_dynamic_shapes=True,
    # Set the priority of the converter to supersede the default one
    priority=torch_tensorrt.dynamo.conversion.ConverterPriority.HIGH,
    # Whether the converter requires a dynamic output allocator to run (e.g. data dependent ops)
    requires_output_allocator=True,
)

对于定义转换器的装饰器，有一个必需参数和几个可选参数。所有转换器都需要一个目标操作，它们将在此操作上运行，其理念是，当图中有 torch.ops.aten.gelu.default 的实例时，将调用此转换器。

在目标操作之后，您可以提供其他元数据，这些元数据定义了转换器的功能以及与针对该目标的其他可能转换器的优先级。

定义转换器功能的首要工具是 capability_validator 参数，它是一个 lambda 函数，接收图中的特定节点以及用户编译设置，并返回一个布尔值，指示该转换器是否可用于该节点。此验证器函数在图分区阶段之前针对转换器目标操作的每个实例运行。在此阶段没有通过验证器的转换器将返回的节点将在运行时在 PyTorch 中执行。这对于只想在特定情况下使用自定义转换器的情况很有用，就像我们在本例中只想在 approximate == "tanh" 时使用自定义转换器一样。

与验证器不同的是 supports_dynamic_shapes 参数，它是一个布尔值，指示转换器是否可在输入形状动态的情况下使用。如果设置为 False，在用户提供的输入动态的情况下，此转换器将被禁用。如果没有支持动态形状的替代方案，该操作将在 PyTorch 中运行。

最后是 priority 参数，它是 torch_tensorrt.dynamo.conversion.ConverterPriority 类中的一个枚举，用于定义转换器的优先级。两个选项是 HIGH 和 STANDARD。使用 STANDARD 注册的转换器将被附加到给定操作的转换器列表中，而使用 HIGH 注册的转换器将被添加到列表的前面。候选转换器按照此优先级顺序进行评估其适用性，并使用第一个通过验证器的转换器。

转换器实现¶

转换器函数本身接受以下参数：当前转换上下文、目标操作、目标操作的参数、目标操作的关键字参数以及目标操作的名称。参数可以是 Python 原始类型、torch.Tensor、np.Arrays 或 ITensor 对象。转换器函数应主要以 TensorRT ITensor 的形式返回目标操作的输出。这些输入和输出应对应于目标 PyTorch 操作的模式，您可以在此处找到：https://pytorch.ac.cn/docs/stable/torch.compiler_ir.html。

由于 Torch-TensorRT 覆盖了核心 ATen opset，它已经将许多常见的低级操作抽象成了可以用来构建 TensorRT 网络的辅助函数。这使得开发人员可以避免直接创建 TensorRT 层的样板代码，而是专注于转换的高级逻辑。辅助函数位于 torch_tensorrt.dynamo.conversion.impl 模块中，并且设计为可组合的，并与原始 TensorRT 实现互操作。在这种情况下，我们将使用 impl 中的 Torch-TensorRT mul、add 和 tanh 函数来实现我们替代的 GeLU 层。

def aten_ops_gelu(
    ctx: ConversionContext,
    target: Target,
    args: Tuple[Argument, ...],
    kwargs: Dict[str, Argument],
    name: str,
) -> Union[trt.ITensor, Sequence[trt.ITensor]]:
    # The schema for torch.ops.aten.gelu.default is gelu(Tensor self, *, str approximate=’none’) -> Tensor

    from torch_tensorrt.dynamo import SourceIR
    from torch_tensorrt.dynamo.conversion import impl

    # Cheap way to allow layer names to be unqiue
    op_count = 0

    def get_op_count():
        nonlocal op_count
        op_count += 1
        return op_count

    mul = lambda x, y: impl.elementwise.mul(
        ctx,
        target,
        name=f"mul_{get_op_count()}",
        source_ir=SourceIR.ATEN,
        lhs_val=x,
        rhs_val=y,
    )
    add = lambda x, y: impl.elementwise.add(
        ctx,
        target,
        name=f"add_{get_op_count()}",
        source_ir=SourceIR.ATEN,
        lhs_val=x,
        rhs_val=y,
    )
    tanh = lambda x: impl.activation.tanh(
        ctx, target, name=f"tanh_{get_op_count()}", source_ir=SourceIR.ATEN, input_val=x
    )

    # So we know that our custom converter is being run instead of the standard one
    print("\n\n---------------------------")
    print("Using custom GeLU converter")
    print("---------------------------\n\n")

    x_7 = mul(args[0], 0.5)
    x_8 = mul(args[0], 0.79788456080000003)
    x_9 = mul(args[0], 0.044714999999999998)
    x_10 = mul(x_9, args[0])
    x_11 = add(x_10, 1.0)
    x_12 = mul(x_8, x_11)
    x_13 = tanh(x_12)
    x_14 = add(x_13, 1.0)
    x_15 = mul(x_7, x_14)

    return x_15

使用我们的自定义转换器¶

我们现在可以重新编译并看到我们的自定义转换器正在被调用以将 GeLU 转换为 TensorRT。

my_custom_gelu = torch_tensorrt.compile(
    my_mod, arg_inputs=(ex_input,), min_block_size=1
)
with torch.no_grad():
    print(my_custom_gelu.graph)
    print(my_custom_gelu(ex_input))

我们可以验证我们的实现是否与 TensorRT 在 tanh 近似情况下的实现相匹配。

print(
    f"tanh approximations are close: {torch.allclose(my_standard_gelu(ex_input), my_custom_gelu(ex_input))}"
)

最后，我们想验证在 approximate 参数未设置为 tanh 的情况下，我们的自定义转换器未被使用。

my_mod_erf = GeLU(mode="none").to("cuda").eval()
my_gelu_erf = torch_tensorrt.compile(
    my_mod_erf, arg_inputs=(ex_input,), min_block_size=1
)

请注意，我们没有看到自定义转换器的打印语句，这表明它未被使用。但是，查看图，我们仍然可以看到创建了一个 TensorRT 引擎来运行 GeLU 操作。在这种情况下，我们自定义转换器的验证器返回了 False，因此转换系统继续处理列表中的下一个转换器，即标准的 GeLU 转换器，并使用它来转换操作。

with torch.no_grad():
    print(my_gelu_erf.graph)
    print(my_gelu_erf(ex_input))

脚本总运行时间： ( 0 分 0.000 秒)

由 Sphinx-Gallery 生成的画廊