评价此页

★ ★ ★ ★ ★

recipes/torch_compile_caching_tutorial

在 Google Colab 中运行

torch.compile 中的编译时缓存#

创建于: 2024 年 6 月 20 日 | 最后更新: 2025 年 6 月 24 日 | 最后验证: 2024 年 11 月 05 日

作者: Oguz Ulgen

简介#

PyTorch Compiler 提供多种缓存选项以减少编译延迟。本教程将详细解释这些选项，帮助用户为自己的用例选择最佳方案。

有关如何配置这些缓存，请查看编译时缓存配置。

还可以查看我们的缓存基准测试，网址为 PT CacheBench 基准测试。

先决条件#

在开始此秘籍之前，请确保您已具备以下条件

对 torch.compile 有基本了解。请参阅
PyTorch 2.4 或更高版本

缓存选项#

torch.compile 提供以下缓存选项：

端到端缓存（也称为 Mega-Cache）
TorchDynamo、TorchInductor 和 Triton 的模块化缓存

需要注意的是，缓存会验证缓存工件是否与相同的 PyTorch 和 Triton 版本一起使用，以及在设备设置为 cuda 时是否使用相同的 GPU。

`torch.compile` 端到端缓存（`Mega-Cache`）#

端到端缓存（以下简称 Mega-Cache）是为寻求可移植缓存解决方案的用户提供的理想方案，该解决方案可以存储在数据库中，并可能在另一台机器上检索。

Mega-Cache 提供两个编译器 API：

torch.compiler.save_cache_artifacts()
torch.compiler.load_cache_artifacts()

预期用例是在编译和执行模型后，用户调用 torch.compiler.save_cache_artifacts()，它将以可移植形式返回编译器工件。之后，可能在不同的机器上，用户可以调用 torch.compiler.load_cache_artifacts() 并使用这些工件预填充 torch.compile 缓存，以快速启动其缓存。

考虑以下示例。首先，编译并保存缓存工件。

@torch.compile
def fn(x, y):
    return x.sin() @ y

a = torch.rand(100, 100, dtype=dtype, device=device)
b = torch.rand(100, 100, dtype=dtype, device=device)

result = fn(a, b)

artifacts = torch.compiler.save_cache_artifacts()

assert artifacts is not None
artifact_bytes, cache_info = artifacts

# Now, potentially store artifact_bytes in a database
# You can use cache_info for logging

之后，您可以通过以下方式快速启动缓存：

# Potentially download/fetch the artifacts from the database
torch.compiler.load_cache_artifacts(artifact_bytes)

此操作将填充下一节将讨论的所有模块化缓存，包括 PGO、AOTAutograd、Inductor、Triton 和 Autotuning。

`TorchDynamo`、`TorchInductor` 和 `Triton` 的模块化缓存#

上述 Mega-Cache 由可以在没有任何用户干预的情况下使用的各个组件组成。默认情况下，PyTorch Compiler 提供 TorchDynamo、TorchInductor 和 Triton 的本地磁盘缓存。这些缓存包括：

FXGraphCache：编译中使用的基于图的 IR 组件的缓存。
TritonCache：Triton 编译结果的缓存，包括 Triton 生成的 cubin 文件和其他缓存工件。
InductorCache：FXGraphCache 和 Triton 缓存的集合。
AOTAutogradCache：联合图工件的缓存。
PGO-cache：动态形状决策的缓存，以减少重新编译次数。
AutotuningCache:
- Inductor 生成 Triton 内核并对其进行基准测试以选择最快的内核。
- torch.compile 的内置 AutotuningCache 会缓存这些结果。

所有这些缓存工件都写入 TORCHINDUCTOR_CACHE_DIR，默认情况下，它看起来像 /tmp/torchinductor_myusername。

远程缓存#

我们还为希望利用基于 Redis 的缓存的用户提供了远程缓存选项。有关如何启用基于 Redis 的缓存的更多信息，请查看编译时缓存配置。

结论#

在本教程中，我们了解到 PyTorch Inductor 的缓存机制通过利用本地和远程缓存，显著减少了编译延迟，这些缓存无缝地在后台运行，无需用户干预。

torch.compile 中的编译时缓存#

简介#

先决条件#

缓存选项#

`torch.compile` 端到端缓存（`Mega-Cache`）#

`TorchDynamo`、`TorchInductor` 和 `Triton` 的模块化缓存#

远程缓存#

结论#

文档

教程

资源

torch.compile 中的编译时缓存#

简介#

先决条件#

缓存选项#

torch.compile 端到端缓存（Mega-Cache）#

TorchDynamo、TorchInductor 和 Triton 的模块化缓存#

远程缓存#

结论#

文档

教程

资源

`torch.compile` 端到端缓存（`Mega-Cache`）#

`TorchDynamo`、`TorchInductor` 和 `Triton` 的模块化缓存#