评价此页

★ ★ ★ ★ ★

分布式

在 Google Colab 中运行

分布式#

分布式训练是一种模型训练范式，它涉及将训练工作负载分布到多个工作节点上，从而显著提高训练速度和模型准确性。虽然分布式训练可用于任何类型的机器学习模型训练，但对于大型模型和计算密集型任务（如深度学习）来说，使用它最有益。

在 PyTorch 中，有几种方法可以执行分布式训练，每种方法在特定用例中都有其优势。

分布式数据并行 (DDP)
完全分片数据并行 (FSDP2)
张量并行 (TP)
设备网格
远程过程调用 (RPC) 分布式训练
自定义扩展

在分布式概览中阅读更多关于这些选项的信息。

学习 DDP#

DDP 入门视频教程

一个关于如何开始使用 DistributedDataParallel 并深入更复杂主题的分步视频系列。

代码视频

https://pytorch.ac.cn/tutorials/beginner/ddp_series_intro.html?utm_source=distr_landing&utm_medium=ddp_series_intro

分布式数据并行入门

本教程提供了一个简短而易懂的 PyTorch 分布式数据并行入门。

代码

https://pytorch.ac.cn/tutorials/intermediate/ddp_tutorial.html?utm_source=distr_landing&utm_medium=intermediate_ddp_tutorial

使用 Join 上下文管理器处理不均匀输入的分布式训练

本教程介绍了 Join 上下文管理器，并演示了它与分布式数据并行的使用。

代码

https://pytorch.ac.cn/tutorials/advanced/generic_join.html?utm_source=distr_landing&utm_medium=generic_join

学习 FSDP2#

FSDP2 入门

本教程演示了如何使用 FSDP2 在 Transformer 模型上执行分布式训练。

代码

https://pytorch.ac.cn/tutorials/intermediate/FSDP_tutorial.html?utm_source=distr_landing&utm_medium=FSDP_getting_started

学习张量并行 (TP)#

使用张量并行 (TP) 进行大规模 Transformer 模型训练

本教程演示了如何使用张量并行和完全分片数据并行在数百到数千个 GPU 上训练大型类 Transformer 模型。

代码

https://pytorch.ac.cn/tutorials/intermediate/TP_tutorial.html

学习设备网格#

DeviceMesh 入门

在本教程中，您将了解 DeviceMesh 及其如何帮助进行分布式训练。

代码

https://pytorch.ac.cn/tutorials/recipes/distributed_device_mesh.html?highlight=devicemesh

学习 RPC#

分布式 RPC 框架入门

本教程演示了如何开始使用基于 RPC 的分布式训练。

代码

https://pytorch.ac.cn/tutorials/intermediate/rpc_tutorial.html?utm_source=distr_landing&utm_medium=rpc_getting_started

使用分布式 RPC 框架实现参数服务器

本教程将引导您完成使用 PyTorch 的分布式 RPC 框架实现参数服务器的简单示例。

代码

https://pytorch.ac.cn/tutorials/intermediate/rpc_param_server_tutorial.html?utm_source=distr_landing&utm_medium=rpc_param_server_tutorial

使用异步执行实现批处理 RPC

在本教程中，您将使用 @rpc.functions.async_execution 装饰器构建批处理 RPC 应用程序。

代码

https://pytorch.ac.cn/tutorials/intermediate/rpc_async_execution.html?utm_source=distr_landing&utm_medium=rpc_async_execution

结合 Distributed DataParallel 和分布式 RPC 框架

在本教程中，您将学习如何将分布式数据并行与分布式模型并行结合起来。

代码

https://pytorch.ac.cn/tutorials/advanced/rpc_ddp_tutorial.html?utm_source=distr_landing&utm_medium=rpc_plus_ddp

自定义扩展#

使用 Cpp 扩展自定义进程组后端

在本教程中，您将学习如何实现自定义 ProcessGroup 后端，并使用 C++ 扩展将其集成到 PyTorch 分布式包中。

代码

https://pytorch.ac.cn/tutorials/intermediate/process_group_cpp_extension_tutorial.html?utm_source=distr_landing&utm_medium=custom_extensions_cpp