基准测试 API 指南¶

本教程将指导您使用 TorchAO 基准测试框架。本教程包含将新 API 与框架和仪表板集成。

将 API 添加到基准测试方案
为基准测试方案添加模型架构
将 HF 模型添加到基准测试方案
将 API 添加到微基准测试 CI 仪表板

将 API 添加到基准测试方案¶

该框架目前支持量化和稀疏性方案，可以使用 quantize_() 或 sparsity_() 函数运行

要添加新方案，请将相应的字符串配置添加到 benchmarks/microbenchmarks/utils.py 中的函数 string_to_config()。

def string_to_config(
  quantization: Optional[str], sparsity: Optional[str], **kwargs
) -> AOBaseConfig:

# ... existing code ...

elif quantization == "my_new_quantization":
  # If additional information needs to be passed as kwargs, process it here
  return MyNewQuantizationConfig(**kwargs)
elif sparsity == "my_new_sparsity":
  return MyNewSparsityConfig(**kwargs)

# ... rest of existing code ...

现在我们可以在整个基准测试框架中使用此方案。

注意： 如果 AOBaseConfig 使用输入参数，如位宽、组大小等，您可以在输入中将其附加到字符串配置中。例如，对于 GemliteUIntXWeightOnlyConfig，我们可以将位宽和组大小作为 gemlitewo-<bit_width>-<group_size> 传递

将模型添加到基准测试方案¶

要将新的模型架构添加到基准测试系统，您需要修改 torchao/testing/model_architectures.py。

要添加新的模型类型，请在 torchao/testing/model_architectures.py 中定义您的模型类

class MyCustomModel(torch.nn.Module):
    def __init__(self, input_dim, output_dim, dtype=torch.bfloat16):
        super().__init__()
        # Define your model architecture
        self.layer1 = torch.nn.Linear(input_dim, 512, bias=False).to(dtype)
        self.activation = torch.nn.ReLU()
        self.layer2 = torch.nn.Linear(512, output_dim, bias=False).to(dtype)

    def forward(self, x):
        x = self.layer1(x)
        x = self.activation(x)
        x = self.layer2(x)
        return x

更新 create_model_and_input_data 函数以处理您的新模型类型

def create_model_and_input_data(
    model_type: str,
    m: int,
    k: int,
    n: int,
    high_precision_dtype: torch.dtype = torch.bfloat16,
    device: str = "cuda",
    activation: str = "relu",
):
    # ... existing code ...

    elif model_type == "my_custom_model":
        model = MyCustomModel(k, n, high_precision_dtype).to(device)
        input_data = torch.randn(m, k, device=device, dtype=high_precision_dtype)

    # ... rest of existing code ...

模型设计注意事项¶

添加新模型时

输入/输出维度：确保您的模型处理 (m, k, n) 维度约定，其中
- m：批量大小或序列长度
- k：输入特征维度
- n：输出特征维度
数据类型：支持 high_precision_dtype 参数（通常为 torch.bfloat16）
设备兼容性：确保您的模型在 CUDA、CPU 和其他目标设备上工作
量化兼容性：设计您的模型以与 TorchAO 量化方法兼容

将 HF 模型添加到基准测试方案¶

（即将推出！！！）

将 API 添加到基准测试 CI 仪表板¶

要将您的 API 与 CI 仪表板集成

1. 修改现有 CI 配置¶

将您的量化方法添加到 benchmarks/dashboard/microbenchmark_quantization_config.yml 中的现有 CI 配置文件

# benchmarks/dashboard/microbenchmark_quantization_config.yml
benchmark_mode: "inference"
quantization_config_recipe_names:
  - "int8wo"
  - "int8dq"
  - "float8dq-tensor"
  - "float8dq-row"
  - "float8wo"
  - "my_new_quantization"  # Add your method here

output_dir: "benchmarks/microbenchmarks/results"

model_params:
  - name: "small_bf16_linear"
    matrix_shapes:
      - name: "small_sweep"
        min_power: 10
        max_power: 15
    high_precision_dtype: "torch.bfloat16"
    use_torch_compile: true
    torch_compile_mode: "max-autotune"
    device: "cuda"
    model_type: "linear"

2. 运行 CI 基准测试¶

使用 CI 运行器以 PyTorch OSS 基准数据库格式生成结果

python benchmarks/dashboard/ci_microbenchmark_runner.py \
    --config benchmarks/dashboard/microbenchmark_quantization_config.yml \
    --output benchmark_results.json

3. CI 输出格式¶

CI 运行器以 PyTorch OSS 基准数据库所需的特定 JSON 格式输出结果

[
  {
    "benchmark": {
      "name": "micro-benchmark api",
      "mode": "inference",
      "dtype": "int8wo",
      "extra_info": {
        "device": "cuda",
        "arch": "NVIDIA A100-SXM4-80GB"
      }
    },
    "model": {
      "name": "1024-1024-1024",
      "type": "micro-benchmark custom layer",
      "origins": ["torchao"]
    },
    "metric": {
      "name": "speedup(wrt bf16)",
      "benchmark_values": [1.25],
      "target_value": 0.0
    },
    "runners": [],
    "dependencies": {}
  }
]

4. 与 CI 流水线集成¶

要与您的 CI 流水线集成，请将基准测试步骤添加到您的工作流中

# Example GitHub Actions step
- name: Run Microbenchmarks
  run: |
    python benchmarks/dashboard/ci_microbenchmark_runner.py \
      --config benchmarks/dashboard/microbenchmark_quantization_config.yml \
      --output benchmark_results.json

- name: Upload Results
  # Upload benchmark_results.json to your dashboard system

故障排除¶

运行测试¶

验证您的设置并运行测试套件

python -m unittest discover benchmarks/microbenchmarks/test

常见问题¶

CUDA 内存不足：减小批量大小或矩阵维度
编译错误：将 use_torch_compile: false 设置为调试
缺少量化方法：确保正确安装 TorchAO
设备不可用：检查设备可用性和驱动程序

最佳实践¶

使用 small_sweep 进行基本测试，使用 custom shapes 进行全面或模型特定分析
仅在需要时启用性能分析（增加开销）
尽可能在多个设备上测试
使用一致的命名约定以实现可重现性

有关基准测试不同用例的信息，请参阅基准测试用户指南

有关框架组件的更多详细信息，请参阅 benchmarks/microbenchmarks/ 目录中的 README 文件。