Welcome to PyTorch Tutorials# What’s new in PyTorch tutorials? Integrating Custom Operators with SYCL for Intel GPU Supporting Custom C++ Classes in torch.compile/torch.export Accelerating torch.save and torch.load with GPUDirect Storage Getting Started with Fully Sharded Data Parallel (FSDP2) Learn the Basics Familiarize yourself with PyTorch concepts and modules. Learn how to load data, build deep neural networks, train and save your models in this quickstart guide. Get started with PyTorch PyTorch Recipes Bite-size, ready-to-deploy PyTorch code examples. Explore Recipes All Learn the Basics A step-by-step guide to building a complete ML workflow with PyTorch. Getting-Started Introduction to PyTorch on YouTube An introduction to building a complete ML workflow with PyTorch. Follows the PyTorch Beginner Series on YouTube. Getting-Started Learning PyTorch with Examples This tutorial introduces the fundamental concepts of PyTorch through self-contained examples. Getting-Started What is torch.nn really? Use torch.nn to create and train a neural network. Getting-Started Visualizing Models, Data, and Training with TensorBoard Learn to use TensorBoard to visualize data and model training. Interpretability,Getting-Started,TensorBoard Good usage of `non_blocking` and `pin_memory()` in PyTorch A guide on best practices to copy data from CPU to GPU. Getting-Started Understanding requires_grad, retain_grad, Leaf, and Non-leaf Tensors Learn the subtleties of requires_grad, retain_grad, leaf, and non-leaf tensors Getting-Started Visualizing Gradients in PyTorch Visualize the gradient flow of a network. Getting-Started TorchVision Object Detection Finetuning Tutorial Finetune a pre-trained Mask R-CNN model. Image/Video Transfer Learning for Computer Vision Tutorial Train a convolutional neural network for image classification using transfer learning. Image/Video Adversarial Example Generation Train a convolutional neural network for image classification using transfer learning. Image/Video DCGAN Tutorial Train a generative adversarial network (GAN) to generate new celebrities. Image/Video Spatial Transformer Networks Tutorial Learn how to augment your network using a visual attention mechanism. Image/Video Semi-Supervised Learning Tutorial Based on USB Learn how to train semi-supervised learning algorithms (on custom data) using USB and PyTorch. Image/Video Audio IO Learn to load data with torchaudio. Audio Audio Resampling Learn to resample audio waveforms using torchaudio. Audio Audio Data Augmentation Learn to apply data augmentations using torchaudio. Audio Audio Feature Extractions Learn to extract features using torchaudio. Audio Audio Feature Augmentation Learn to augment features using torchaudio. Audio Audio Datasets Learn to use torchaudio datasets. Audio Automatic Speech Recognition with Wav2Vec2 in torchaudio Learn how to use torchaudio's pretrained models for building a speech recognition application. Audio Speech Command Classification Learn how to correctly format an audio dataset and then train/test an audio classifier network on the dataset. Audio Text-to-Speech with torchaudio Learn how to use torchaudio's pretrained models for building a text-to-speech application. Audio Forced Alignment with Wav2Vec2 in torchaudio Learn how to use torchaudio's Wav2Vec2 pretrained models for aligning text to speech Audio NLP from Scratch: Classifying Names with a Character-level RNN Build and train a basic character-level RNN to classify word from scratch without the use of torchtext. First in a series of three tutorials. NLP NLP from Scratch: Generating Names with a Character-level RNN After using character-level RNN to classify names, learn how to generate names from languages. Second in a series of three tutorials. NLP NLP from Scratch: Translation with a Sequence-to-sequence Network and Attention This is the third and final tutorial on doing “NLP From Scratch”, where we write our own classes and functions to preprocess the data to do our NLP modeling tasks. NLP Exporting a PyTorch model to ONNX using TorchDynamo backend and Running it using ONNX Runtime Build a image classifier model in PyTorch and convert it to ONNX before deploying it with ONNX Runtime. Production,ONNX,Backends Extending the ONNX exporter operator support Demonstrate end-to-end how to address unsupported operators in ONNX. Production,ONNX,Backends Exporting a model with control flow to ONNX Demonstrate how to handle control flow logic while exporting a PyTorch model to ONNX. Production,ONNX,Backends Reinforcement Learning (DQN) Learn how to use PyTorch to train a Deep Q Learning (DQN) agent on the CartPole-v0 task from the OpenAI Gym. Reinforcement-Learning Reinforcement Learning (PPO) with TorchRL Learn how to use PyTorch and TorchRL to train a Proximal Policy Optimization agent on the Inverted Pendulum task from Gym. Reinforcement-Learning Train a Mario-playing RL Agent Use PyTorch to train a Double Q-learning agent to play Mario. Reinforcement-Learning Recurrent DQN Use TorchRL to train recurrent policies Reinforcement-Learning Code a DDPG Loss Use TorchRL to code a DDPG Loss Reinforcement-Learning Writing your environment and transforms Use TorchRL to code a Pendulum Reinforcement-Learning Profiling PyTorch Learn how to profile a PyTorch application Profiling Profiling PyTorch Introduction to Holistic Trace Analysis Profiling _static/img/thumbnails/default.png Profiling PyTorch Trace Diff using Holistic Trace Analysis Profiling _static/img/thumbnails/default.png Building a Simple Performance Profiler with FX Build a simple FX interpreter to record the runtime of op, module, and function calls and report statistics FX (beta) Channels Last Memory Format in PyTorch Get an overview of Channels Last memory format and understand how it is used to order NCHW tensors in memory preserving dimensions. Memory-Format,Best-Practice,Frontend-APIs Using the PyTorch C++ Frontend Walk through an end-to-end example of training a model with the C++ frontend by training a DCGAN – a kind of generative model – to generate images of MNIST digits. Frontend-APIs,C++ PyTorch Custom Operators Landing Page This is the landing page for all things related to custom operators in PyTorch. Extending-PyTorch,Frontend-APIs,C++,CUDA Custom Python Operators Create Custom Operators in Python. Useful for black-boxing a Python function for use with torch.compile. Extending-PyTorch,Frontend-APIs,C++,CUDA Compiled Autograd: Capturing a larger backward graph for ``torch.compile`` Learn how to use compiled autograd to capture a larger backward graph. Model-Optimization,CUDA Custom C++ and CUDA Operators How to extend PyTorch with custom C++ and CUDA operators. Extending-PyTorch,Frontend-APIs,C++,CUDA Autograd in C++ Frontend The autograd package helps build flexible and dynamic neural netorks. In this tutorial, explore several examples of doing autograd in PyTorch C++ frontend Frontend-APIs,C++ Registering a Dispatched Operator in C++ The dispatcher is an internal component of PyTorch which is responsible for figuring out what code should actually get run when you call a function like torch::add. Extending-PyTorch,Frontend-APIs,C++ Extending Dispatcher For a New Backend in C++ Learn how to extend the dispatcher to add a new device living outside of the pytorch/pytorch repo and maintain it to keep in sync with native PyTorch devices. Extending-PyTorch,Frontend-APIs,C++ Facilitating New Backend Integration by PrivateUse1 Learn how to integrate a new backend living outside of the pytorch/pytorch repo and maintain it to keep in sync with the native PyTorch backend. Extending-PyTorch,Frontend-APIs,C++ Custom Function Tutorial: Double Backward Learn how to write a custom autograd Function that supports double backward. Extending-PyTorch,Frontend-APIs Custom Function Tutorial: Fusing Convolution and Batch Norm Learn how to create a custom autograd Function that fuses batch norm into a convolution to improve memory usage. Extending-PyTorch,Frontend-APIs Forward-mode Automatic Differentiation Learn how to use forward-mode automatic differentiation. Frontend-APIs Jacobians, Hessians, hvp, vhp, and more Learn how to compute advanced autodiff quantities using torch.func Frontend-APIs Model Ensembling Learn how to ensemble models using torch.vmap Frontend-APIs Per-Sample-Gradients Learn how to compute per-sample-gradients using torch.func Frontend-APIs Neural Tangent Kernels Learn how to compute neural tangent kernels using torch.func Frontend-APIs Performance Profiling in PyTorch Learn how to use the PyTorch Profiler to benchmark your module's performance. Model-Optimization,Best-Practice,Profiling Performance Profiling in TensorBoard Learn how to use the TensorBoard plugin to profile and analyze your model's performance. Model-Optimization,Best-Practice,Profiling,TensorBoard Hyperparameter Tuning Tutorial Learn how to use Ray Tune to find the best performing set of hyperparameters for your model. Model-Optimization,Best-Practice Parametrizations Tutorial Learn how to use torch.nn.utils.parametrize to put constraints on your parameters (e.g. make them orthogonal, symmetric positive definite, low-rank...) Model-Optimization,Best-Practice Pruning Tutorial Learn how to use torch.nn.utils.prune to sparsify your neural networks, and how to extend it to implement your own custom pruning technique. Model-Optimization,Best-Practice How to save memory by fusing the optimizer step into the backward pass Learn a memory-saving technique through fusing the optimizer step into the backward pass using memory snapshots. Model-Optimization,Best-Practice,CUDA,Frontend-APIs (beta) Accelerating BERT with semi-structured sparsity Train BERT, prune it to be 2:4 sparse, and then accelerate it to achieve 2x inference speedups with semi-structured sparsity and torch.compile. Text,Model-Optimization Multi-Objective Neural Architecture Search with Ax Learn how to use Ax to search over architectures find optimal tradeoffs between accuracy and latency. Model-Optimization,Best-Practice,Ax,TorchX torch.compile Tutorial Speed up your models with minimal code changes using torch.compile, the latest PyTorch compiler solution. Model-Optimization torch.compile End-to-End Tutorial An example of applying torch.compile to a real model, demonstrating speedups. Model-Optimization Building a Convolution/Batch Norm fuser in torch.compile Build a simple pattern matcher pass that fuses batch norm into convolution to improve performance during inference. Model-Optimization Inductor CPU Backend Debugging and Profiling Learn the usage, debugging and performance profiling for ``torch.compile`` with Inductor CPU backend. Model-Optimization (beta) Implementing High-Performance Transformers with SCALED DOT PRODUCT ATTENTION This tutorial explores the new torch.nn.functional.scaled_dot_product_attention and how it can be used to construct Transformer components. Model-Optimization,Attention,Transformer Knowledge Distillation in Convolutional Neural Networks Learn how to improve the accuracy of lightweight models using more powerful models as teachers. Model-Optimization,Image/Video Accelerating PyTorch Transformers by replacing nn.Transformer with Nested Tensors and torch.compile() This tutorial goes over recommended best practices for implementing Transformers with native PyTorch. Transformer PyTorch Distributed Overview Briefly go over all concepts and features in the distributed package. Use this document to find the distributed training technology that can best serve your application. Parallel-and-Distributed-Training Distributed Data Parallel in PyTorch - Video Tutorials This series of video tutorials walks you through distributed training in PyTorch via DDP. Parallel-and-Distributed-Training Single-Machine Model Parallel Best Practices Learn how to implement model parallel, a distributed training technique which splits a single model onto different GPUs, rather than replicating the entire model on each GPU Parallel-and-Distributed-Training Getting Started with Distributed Data Parallel Learn the basics of when to use distributed data paralle versus data parallel and work through an example to set it up. Parallel-and-Distributed-Training Writing Distributed Applications with PyTorch Set up the distributed package of PyTorch, use the different communication strategies, and go over some the internals of the package. Parallel-and-Distributed-Training Large Scale Transformer model training with Tensor Parallel Learn how to train large models with Tensor Parallel package. Parallel-and-Distributed-Training Customize Process Group Backends Using Cpp Extensions Extend ProcessGroup with custom collective communication implementations. Parallel-and-Distributed-Training Getting Started with Distributed RPC Framework Learn how to build distributed training using the torch.distributed.rpc package. Parallel-and-Distributed-Training Implementing a Parameter Server Using Distributed RPC Framework Walk through a through a simple example of implementing a parameter server using PyTorch’s Distributed RPC framework. Parallel-and-Distributed-Training Introduction to Distributed Pipeline Parallelism Demonstrate how to implement pipeline parallelism using torch.distributed.pipelining Parallel-and-Distributed-Training Implementing Batch RPC Processing Using Asynchronous Executions Learn how to use rpc.functions.async_execution to implement batch RPC Parallel-and-Distributed-Training Combining Distributed DataParallel with Distributed RPC Framework Walk through a through a simple example of how to combine distributed data parallelism with distributed model parallelism. Parallel-and-Distributed-Training Getting Started with Fully Sharded Data Parallel (FSDP2) Learn how to train models with Fully Sharded Data Parallel (fully_shard) package. Parallel-and-Distributed-Training Introduction to Libuv TCPStore Backend TCPStore now uses a new server backend for faster connection and better scalability. Parallel-and-Distributed-Training Interactive Distributed Applications with Monarch Learn how to spin up distributed applications using Monarch's singler controller model Parallel-and-Distributed-Training Exporting to ExecuTorch Tutorial Learn about how to use ExecuTorch, a unified ML stack for lowering PyTorch models to edge devices. Edge Running an ExecuTorch Model in C++ Tutorial Learn how to load and execute an ExecuTorch model in C++ Edge Using the ExecuTorch SDK to Profile a Model Explore how to use the ExecuTorch SDK to profile, debug, and visualize ExecuTorch models Edge Building an ExecuTorch iOS Demo App Explore how to set up the ExecuTorch iOS Demo App, which uses the MobileNet v3 model to process live camera images leveraging three different backends: XNNPACK, Core ML, and Metal Performance Shaders (MPS). Edge Building an ExecuTorch Android Demo App Learn how to set up the ExecuTorch Android Demo App for image segmentation tasks using the DeepLab v3 model and XNNPACK FP32 backend. Edge Lowering a Model as a Delegate Learn to accelerate your program using ExecuTorch by applying delegates through three methods: lowering the whole module, composing it with another module, and partitioning parts of a module. Edge Introduction to TorchRec TorchRec is a PyTorch domain library built to provide common sparsity & parallelism primitives needed for large-scale recommender systems. TorchRec,Recommender Exploring TorchRec sharding This tutorial covers the sharding schemes of embedding tables by using EmbeddingPlanner and DistributedModelParallel API. TorchRec,Recommender Additional Resources# Examples of PyTorch A set of examples around PyTorch in Vision, Text, Reinforcement Learning that you can incorporate in your existing work. Check Out Examples Run Tutorials on Google Colab Learn how to copy tutorial data into Google Drive so that you can run tutorials on Google Colab. Open