Core ML 后端¶

Core ML 委托是 ExecuTorch 利用 Apple 的 CoreML 框架在设备上进行机器学习的解决方案。使用 CoreML，模型可以在 CPU、GPU 和 Apple Neural Engine (ANE) 上运行。

特性¶

动态分派到 CPU、GPU 和 ANE。
支持 fp32 和 fp16 计算。

目标要求¶

以下是运行 CoreML 委托的 ExecuTorch 模型在各种硬件上的最低 OS 要求

macOS >= 13.0
iOS >= 16.0
iPadOS >= 16.0
tvOS >= 16.0

开发要求¶

开发你需要

macOS >= 13.0
Xcode >= 14.1

开始之前，请确保已安装 Xcode 命令行工具

xcode-select --install

使用 CoreML 后端¶

要将 CoreML 后端作为导出和降低过程的目标，请将 CoreMLPartitioner 的实例传递给 to_edge_transform_and_lower。下面的示例演示了使用 torchvision 中的 MobileNet V2 模型的过程。

import torch
import torchvision.models as models
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
from executorch.backends.apple.coreml.partition import CoreMLPartitioner
from executorch.exir import to_edge_transform_and_lower

mobilenet_v2 = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
sample_inputs = (torch.randn(1, 3, 224, 224), )

et_program = to_edge_transform_and_lower(
    torch.export.export(mobilenet_v2, sample_inputs),
    partitioner=[CoreMLPartitioner()],
).to_executorch()

with open("mv2_coreml.pte", "wb") as file:
    et_program.write_to_file(file)

分区器 API¶

CoreML 分区器 API 允许配置模型委托给 CoreML。传递一个不带额外参数的 CoreMLPartitioner 实例将使用默认设置，尽可能多地在 CoreML 后端上运行模型。这是最常见的用例。对于高级用例，分区器通过构造函数公开了以下选项

skip_ops_for_coreml_delegation：允许您跳过 CoreML 的委托操作。默认情况下，所有 CoreML 支持的操作都将被委托。有关跳过委托操作的示例，请参阅此处。
compile_specs：CoreML 后端的 CompileSpec 列表。这些控制 CoreML 委托的低级细节，例如计算单元（CPU、GPU、ANE）、iOS 部署目标和计算精度（FP16、FP32）。这些将在下面进一步讨论。
take_over_mutable_buffer：一个布尔值，指示状态模型中的 PyTorch 可变缓冲区是否应转换为 CoreML MLState。如果设置为 False，则 PyTorch 图中的可变缓冲区会在底层转换为 CoreML 降低模块的图输入和输出。通常，将 take_over_mutable_buffer 设置为 true 会获得更好的性能，但使用 MLState 需要 iOS >= 18.0、macOS >= 15.0 和 Xcode >= 16.0。

CoreML CompileSpec¶

一个 CompileSpec 列表使用 CoreMLBackend.generate_compile_specs 构建。以下是可用选项

compute_unit：这控制 CoreML 使用的计算单元（CPU、GPU、ANE）。默认值为 coremltools.ComputeUnit.ALL。coremltools 的可用选项为
- coremltools.ComputeUnit.ALL（使用 CPU、GPU 和 ANE）
- coremltools.ComputeUnit.CPU_ONLY（仅使用 CPU）
- coremltools.ComputeUnit.CPU_AND_GPU（同时使用 CPU 和 GPU，但不使用 ANE）
- coremltools.ComputeUnit.CPU_AND_NE（同时使用 CPU 和 ANE，但不使用 GPU）
minimum_deployment_target：最低 iOS 部署目标（例如，coremltools.target.iOS18）。默认值为 coremltools.target.iOS15。
compute_precision：CoreML 使用的计算精度（coremltools.precision.FLOAT16 或 coremltools.precision.FLOAT32）。默认值为 coremltools.precision.FLOAT16。请注意，无论在导出的 PyTorch 模型中指定了什么 dtype，都会应用计算精度。例如，默认情况下，FP32 PyTorch 模型在委托给 CoreML 后端时将被转换为 FP16。此外，ANE 仅支持 FP16 精度。
model_type：在创建 .pte 时，模型是应编译为 CoreML mlmodelc 格式（CoreMLBackend.MODEL_TYPE.COMPILED_MODEL），还是应在设备上编译为 mlmodelc（CoreMLBackend.MODEL_TYPE.MODEL）。使用 CoreMLBackend.MODEL_TYPE.COMPILED_MODEL 并提前编译应该可以缩短设备上首次加载模型的时间。

测试模型¶

生成 CoreML 委托的 .pte 后，可以使用 ExecuTorch 运行时 Python 绑定从 Python 中测试模型。这可以用于快速检查模型并评估数值精度。有关更多信息，请参阅测试模型。

量化¶

要为 CoreML 后端量化 PyTorch 模型，请使用 CoreMLQuantizer。

使用 PT2E 流进行 8 位量化¶

使用 CoreML 后端进行量化需要导出适用于 iOS 17 或更高版本的模型。要使用 PT2E 流执行 8 位量化，请遵循以下步骤

创建 coremltools.optimize.torch.quantization.LinearQuantizerConfig 并使用它来创建 CoreMLQuantizer 的实例。
使用 torch.export.export_for_training 导出将为量化准备的图模块。
调用 prepare_pt2e 来准备模型进行量化。
使用代表性样本运行准备好的模型，以校准量化张量激活范围。
调用 convert_pt2e 来量化模型。
使用标准流程导出并降低模型。

来自 convert_pt2e 的输出是一个 PyTorch 模型，可以使用正常流程导出和降低。由于它是一个常规的 PyTorch 模型，因此也可以使用标准的 PyTorch 技术来评估量化模型的准确性。

import torch
import coremltools as ct
import torchvision.models as models
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
from executorch.backends.apple.coreml.quantizer import CoreMLQuantizer
from executorch.backends.apple.coreml.partition import CoreMLPartitioner
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e
from executorch.exir import to_edge_transform_and_lower
from executorch.backends.apple.coreml.compiler import CoreMLBackend

mobilenet_v2 = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
sample_inputs = (torch.randn(1, 3, 224, 224), )

# Step 1: Define a LinearQuantizerConfig and create an instance of a CoreMLQuantizer
# Note that "linear" here does not mean only linear layers are quantized, but that linear (aka affine) quantization
# is being performed
static_8bit_config = ct.optimize.torch.quantization.LinearQuantizerConfig(
    global_config=ct.optimize.torch.quantization.ModuleLinearQuantizerConfig(
        quantization_scheme="symmetric",
        activation_dtype=torch.quint8,
        weight_dtype=torch.qint8,
        weight_per_channel=True,
    )
)
quantizer = CoreMLQuantizer(static_8bit_config)

# Step 2: Export the model for training
training_gm = torch.export.export_for_training(mobilenet_v2, sample_inputs).module()

# Step 3: Prepare the model for quantization
prepared_model = prepare_pt2e(training_gm, quantizer)

# Step 4: Calibrate the model on representative data
# Replace with your own calibration data
for calibration_sample in [torch.randn(1, 3, 224, 224)]:
	prepared_model(calibration_sample)

# Step 5: Convert the calibrated model to a quantized model
quantized_model = convert_pt2e(prepared_model)

# Step 6: Export the quantized model to CoreML
et_program = to_edge_transform_and_lower(
    torch.export.export(quantized_model, sample_inputs),
    partitioner=[
        CoreMLPartitioner(
             # iOS17 is required for the quantized ops in this example
            compile_specs=CoreMLBackend.generate_compile_specs(
                minimum_deployment_target=ct.target.iOS17
            )
        )
    ],
).to_executorch()

以上是静态量化（激活和权重都被量化）。

您可以在 coremltools 文档中找到可用量化配置的完整描述。例如，下面的配置将执行仅权重量化

weight_only_8bit_config = ct.optimize.torch.quantization.LinearQuantizerConfig(
    global_config=ct.optimize.torch.quantization.ModuleLinearQuantizerConfig(
        quantization_scheme="symmetric",
        activation_dtype=torch.float32,
        weight_dtype=torch.qint8,
        weight_per_channel=True,
    )
)
quantizer = CoreMLQuantizer(weight_only_8bit_config)

量化激活需要使用代表性数据对模型进行校准。另请注意，PT2E 当前要求在调用 convert_pt2e 之前传递至少 1 个校准样本，即使是无数据量化也需要。

有关更多信息，请参阅 PyTorch 2 导出训练后量化。

运行时集成¶

要在设备上运行模型，请使用标准的 ExecuTorch 运行时 API。有关更多信息，包括构建 iOS 框架，请参阅在设备上运行。

从源代码构建时，在配置 CMake 构建时传递 -DEXECUTORCH_BUILD_COREML=ON 以编译 CoreML 后端。

由于使用静态初始化程序进行注册，可能需要使用 whole-archive 来链接到 coremldelegate 目标。这通常可以通过将 "$<LINK_LIBRARY:WHOLE_ARCHIVE,coremldelegate>" 传递给 target_link_libraries 来完成。

# CMakeLists.txt
add_subdirectory("executorch")
...
target_link_libraries(
    my_target
    PRIVATE executorch
    extension_module_static
    extension_tensor
    optimized_native_cpu_ops_lib
    $<LINK_LIBRARY:WHOLE_ARHIVE,coremldelegate>)

除了链接目标之外，使用后端不需要其他步骤。CoreML 委托的 .pte 文件将自动在已注册的后端上运行。

高级¶

提取 mlpackage¶

可以从 CoreML 委托的 *.pte 文件中提取 CoreML *.mlpackage 文件。这有助于那些更熟悉 *.mlpackage 文件的用户进行调试和性能分析。

python examples/apple/coreml/scripts/extract_coreml_models.py -m /path/to/model.pte

请注意，如果 ExecuTorch 模型存在图中断，可能会提取多个 *.mlpackage 文件。

常见问题及解决方案¶

降低过程中¶

“ValueError: In op, of type [X], named [Y], the named input [Z] must have the same data type as the named input x. However, [Z] has dtype fp32 whereas x has dtype fp16。”

这是因为模型是 FP16 格式，但 CoreML 将某些参数解释为 FP32，这会导致类型不匹配。解决方案是将 PyTorch 模型保留为 FP32。请注意，除非在 compute_precision CoreML CompileSpec 中另有指定，否则模型在降低到 CoreML 时仍将被转换为 FP16。另请参阅 coremltools 中的相关问题。

coremltools/converters/mil/backend/mil/load.py”, line 499, in export raise RuntimeError(“BlobWriter not loaded”)

如果您使用的是 Python 3.13，请尝试将 Python 版本降级到 Python 3.12。根据 coremltools issue #2487，coremltools 不支持 Python 3.13。

运行时¶

[ETCoreMLModelCompiler.mm:55] [Core ML] Failed to compile model, error = Error Domain=com.apple.mlassetio Code=1 “Failed to parse the model specification. Error: Unable to parse ML Program: at unknown location: Unknown opset ‘CoreML7’.” UserInfo={NSLocalizedDescription=Failed to par$

这意味着模型需要 CoreML opset ‘CoreML7’，这要求在 iOS >= 17 或 macOS >= 14 上运行模型。