Arm(R) Ethos(TM)-U NPU 后端¶

Arm Ethos-U 后端是 ExecuTorch 用于在 Ethos-U55、Ethos-U65 和 Ethos-U85 NPU 上执行量化模型（quantized models）的 ExecuTorch 解决方案。它利用 TOSA 算子集（operator set），该算子集可以由 ethos-u-vela 图编译器进行编译。

特性¶

广泛的算子支持，可将模型的绝大部分委托给高度优化且低功耗的 Ethos-U NPU。
一个优化 NPU 目标量化的量化器。

目标要求¶

目标系统必须包含一个 Ethos-U NPU。

开发要求¶

要为 NPU 编译，需要 Ethos-U Vela 编译器。还需要一个特定于目标的工具链来构建运行时。最后，为了测试模型，Arm 提供了免费的固定虚拟平台（FVP），通过模拟参考设计，允许在没有物理开发板的情况下在 Ethos-U 上运行代码。对于 Ethos-U55，有 Corstone-300，对于 Ethos-U85，有 Corstone-320。

可以使用脚本 examples/arm/setup.sh 轻松下载这些依赖项。

使用 Arm Ethos-U 后端¶

下面的示例演示了如何将 torchvision 中的 MobileNet V2 模型为 Ethos-U55 目标进行降低（lowering）过程。由于模型是浮点模型，首先使用 EthosUQuantizer 进行量化。然后，将 EthosUPartitioner 的一个实例传递给 to_edge_transform_and_lower。量化器和分区器都需要使用 ArmCompileSpecBuilder 创建的编译规范。

import torch
from executorch.backends.arm.arm_backend import ArmCompileSpecBuilder
from executorch.backends.arm.ethosu_partitioner import EthosUPartitioner
from executorch.backends.arm.quantizer.arm_quantizer import (
    EthosUQuantizer,
    get_symmetric_quantization_config,
)
from executorch.exir import (
    EdgeCompileConfig,
    ExecutorchBackendConfig,
    to_edge_transform_and_lower,
)
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e
from torchvision.models import mobilenetv2
import executorch.kernels.quantized

mobilenet_v2 = mobilenetv2.mobilenet_v2(
    weights=mobilenetv2.MobileNet_V2_Weights.DEFAULT
).eval()
example_inputs = (torch.randn(1, 3, 224, 224),)

compile_spec = ArmCompileSpecBuilder().ethosu_compile_spec(
        "ethos-u55-128",
        system_config="Ethos_U55_High_End_Embedded",
        memory_mode="Shared_Sram",
        extra_flags="--output-format=raw --debug-force-regor",
    ).build()

# Post training quantization
graph_module = torch.export.export_for_training(mobilenet_v2, example_inputs).module()
quantizer = EthosUQuantizer(compile_spec)
operator_config = get_symmetric_quantization_config(is_per_channel=False)
quantizer.set_global(operator_config)
graph_module = prepare_pt2e(graph_module, quantizer)
graph_module(*example_inputs)
graph_module = convert_pt2e(graph_module)
exported_program = torch.export.export_for_training(graph_module, example_inputs)

# Lower the exported program to the Ethos-U backend and save pte file.
edge_program_manager = to_edge_transform_and_lower(
    exported_program,
    partitioner=[EthosUPartitioner(compile_spec)],
    compile_config=EdgeCompileConfig(
        _check_ir_validity=False,
    ),
).to_executorch(config=ExecutorchBackendConfig(extract_delegate_segments=False))

with open("mv2_arm_ethos_u55.pte", "wb") as file:
    edge_program_manager.write_to_file(file)

分区器 API¶

EthosUPartitioner 尝试尽可能多地对模型进行分区。它永远不会委托不受支持的算子，但用户可以向构造函数传递额外的检查以避免分区其他算子。为此，请继承 OperatorSupportBase 并实现 is_node_supported 函数。在 executorch.exir.backend.operator_support 中有几个这样的检查。

DontPartition：根据算子类型不进行分区。
DontPartitionModule：根据算子来自的 python 模块不进行分区。
DontPartitionName：根据算子名称不进行分区。

量化¶

使用 Arm Ethos-U 后端需要一个完全整数化的模型。如上所述，您可以使用 EthosUQuantizer 量化浮点模型。量化器是后端特定的，这意味着 EthosUQuantizer 配置为正确量化目标模型。

运行时集成¶

要在设备上运行模型，请使用脚本 executorch/backends/arm/scripts/build_executorch.sh 构建 executorch 库和 EthosUDelegate。然后使用脚本 executorch/backends/arm/scripts/build_executor_runner.sh --pte=mv2_arm_ethos_u55.pte --target=ethos-u55-128 构建 arm executorch 运行时。

最后，使用脚本 executorch/backends/arm/scripts/run_fvp.sh --elf=executorch/mv2_arm_ethos_u55/cmake-out/arm_executor_runner --target=ethos-u55-128 在 FVP 上运行 elf 文件。

另请参阅¶

Arm Ethos-U 后端教程