Accelerator Integration#

Created On: Sep 02, 2025 | Last Updated On: Sep 02, 2025

自 PyTorch 2.1 起，社区在简化将新加速器集成到 PyTorch 生态系统的过程中取得了重大进展。这些改进包括但不限于：对 PrivateUse1 Dispatch Key 的改进、核心子系统扩展机制的引入和增强，以及关键模块（例如 torch.accelerator、memory management）的设备无关重构。总而言之，这些进步为加速器集成提供了一个强大、灵活且对开发者友好的途径基础。

Why Does This Matter?#

This integration pathway offers several major benefits

Speed: Extensibility is built into all core PyTorch modules. Developers can integrate new accelerators into their downstream codebases independently—without modifying upstream code and without being limited by community review bandwidth.
Future-proofing: This is the default integration path for all future PyTorch features, meaning that as new modules and features are added, they will automatically support scaling to new accelerators if this path is followed.
Autonomy: Vendors maintain full control over their accelerator integration timelines, enabling fast iteration cycles and reducing reliance on upstream coordination.

About This Document#

This guide aims to provide a comprehensive overview of the modern integration pathway for new accelerator in PyTorch. It walks through the full integration surface, from low-level device primitives to higher-level domain modules like compilation and quantization. The structure follows a modular and scenario-driven approach, where each topic is paired with corresponding code examples from torch_openreg, an official reference implementation.

The goal is to help developers

Understand the full scope of accelerator integration;
Follow best practices to quickly launch new accelerators;
Avoid common pitfalls through clear, targeted examples.

Target Audience#

This document is intended for

Accelerator Developers who are integrating accelerator into PyTorch;
Advanced PyTorch Users interested in the inner workings of key modules;

Quick Overview#

This document outlines the key processes and practical scenarios involved in integrating new devices into PyTorch, providing developers with a comprehensive and detailed guide for bringing up new backends. The discussion is structured around four major axes

Runtime: Covers core components such as Event, Stream, Memory, Generator, Guard, Hooks, as well as the supporting C++ scaffolding.
Operators: Involve the minimum necessary set of operators, forward and backward operators, fallback operators, fallthroughs, STUBs, etc. in both C++ and Python implementations.
Python Frontend: Focuses on Python bindings for modules and device-agnostic APIs.
High-level Modules: Explores integration with major subsystems such as AMP, Compiler, ONNX, and Distributed and so on.

Next, we will officially embark on the integration journey for a new PyTorch accelerator.

注意

This guide is a work in progress. For more details, please refer to the roadmap.

算子注册

Accelerator Integration#

Why Does This Matter?#

About This Document#

Target Audience#

Quick Overview#

文档

教程

资源