快捷方式

构建说明

注意:最新的构建说明嵌入在 FBGEMM 仓库中的一组脚本中,位于 setup_env.bash

当前可用的 FBGEMM GenAI 构建变体有:

  • CUDA

构建 FBGEMM GenAI 的一般步骤如下:

  1. 建立一个隔离的构建环境。

  2. 为 CUDA 构建设置工具链。

  3. 安装 PyTorch。

  4. 运行构建脚本。

建立一个隔离的构建环境

按照说明设置 Conda 环境

  1. 建立一个隔离的构建环境

  2. 为 CUDA 构建进行设置

  3. 安装构建工具

  4. 安装 PyTorch

其他构建前设置

由于 FBGEMM GenAI 利用了与 FBGEMM_GPU 相同的构建过程,请参阅准备构建以获取更多构建前设置信息。

准备构建

克隆仓库及其子模块,并安装 requirements_genai.txt

# !! Run inside the Conda environment !!

# Select a version tag
FBGEMM_VERSION=v1.3.0

# Clone the repo along with its submodules
git clone --recursive -b ${FBGEMM_VERSION} https://github.com/pytorch/FBGEMM.git fbgemm_${FBGEMM_VERSION}

# Install additional required packages for building and testing
cd fbgemm_${FBGEMM_VERSION}/fbgemm_gpu
pip install -r requirements_genai.txt

设置 Wheel 构建变量

在构建 Python wheel 时,必须首先正确设置包名、Python 版本标签和 Python 平台名称

# Set the package name depending on the build variant
export package_name=fbgemm_genai_{cuda}

# Set the Python version tag.  It should follow the convention `py<major><minor>`,
# e.g. Python 3.13 --> py313
export python_tag=py313

# Determine the processor architecture
export ARCH=$(uname -m)

# Set the Python platform name for the Linux case
export python_plat_name="manylinux_2_28_${ARCH}"
# For the macOS (x86_64) case
export python_plat_name="macosx_10_9_${ARCH}"
# For the macOS (arm64) case
export python_plat_name="macosx_11_0_${ARCH}"
# For the Windows case
export python_plat_name="win_${ARCH}"

CUDA 构建

为 CUDA 构建 FBGEMM GenAI 需要安装 NVML 和 cuDNN,并通过环境变量使其对构建过程可用。然而,构建该包并不需要 CUDA 设备。

与仅 CPU 构建类似,可以通过在构建命令后附加 --cxxprefix=$CONDA_PREFIX 来启用 Clang + libstdc++ 构建,前提是已正确安装工具链。

# !! Run in fbgemm_gpu/ directory inside the Conda environment !!

# [OPTIONAL] Specify the CUDA installation paths
# This may be required if CMake is unable to find nvcc
export CUDACXX=/path/to/nvcc
export CUDA_BIN_PATH=/path/to/cuda/installation

# [OPTIONAL] Provide the CUB installation directory (applicable only to CUDA versions prior to 11.1)
export CUB_DIR=/path/to/cub

# [OPTIONAL] Allow NVCC to use host compilers that are newer than what NVCC officially supports
nvcc_prepend_flags=(
  -allow-unsupported-compiler
)

# [OPTIONAL] If clang is the host compiler, set NVCC to use libstdc++ since libc++ is not supported
nvcc_prepend_flags+=(
  -Xcompiler -stdlib=libstdc++
  -ccbin "/path/to/clang++"
)

# [OPTIONAL] Set NVCC_PREPEND_FLAGS as needed
export NVCC_PREPEND_FLAGS="${nvcc_prepend_flags[@]}"

# [OPTIONAL] Enable verbose NVCC logs
export NVCC_VERBOSE=1

# Specify cuDNN header and library paths
export CUDNN_INCLUDE_DIR=/path/to/cudnn/include
export CUDNN_LIBRARY=/path/to/cudnn/lib

# Specify NVML filepath
export NVML_LIB_PATH=/path/to/libnvidia-ml.so

# Specify NCCL filepath
export NCCL_LIB_PATH=/path/to/libnccl.so.2

# Build for SM70/80 (V100/A100 GPU); update as needed
# If not specified, only the CUDA architecture supported by current system will be targeted
# If not specified and no CUDA device is present either, all CUDA architectures will be targeted
cuda_arch_list=7.0;8.0

# Unset TORCH_CUDA_ARCH_LIST if it exists, bc it takes precedence over
# -DTORCH_CUDA_ARCH_LIST during the invocation of setup.py
unset TORCH_CUDA_ARCH_LIST

# Build the wheel artifact only
python setup.py bdist_wheel \
    --build-target=genai \
    --build-variant=cuda \
    --python-tag="${python_tag}" \
    --plat-name="${python_plat_name}" \
    --nvml_lib_path=${NVML_LIB_PATH} \
    --nccl_lib_path=${NCCL_LIB_PATH} \
    -DTORCH_CUDA_ARCH_LIST="${cuda_arch_list}"

# Build and install the library into the Conda environment
python setup.py install \
    --build-target=genai \
    --build-variant=cuda \
    --nvml_lib_path=${NVML_LIB_PATH} \
    --nccl_lib_path=${NCCL_LIB_PATH} \
    -DTORCH_CUDA_ARCH_LIST="${cuda_arch_list}"

ROCm 构建

对于 ROCm 构建,需要指定 ROCM_PATHPYTORCH_ROCM_ARCH。然而,构建该包并不需要 ROCm 设备。

与 CUDA 构建类似,可以通过在构建命令后附加 --cxxprefix=$CONDA_PREFIX 来启用 Clang + libstdc++ 构建,前提是已正确安装工具链。

# !! Run in fbgemm_gpu/ directory inside the Conda environment !!

export ROCM_PATH=/path/to/rocm

# [OPTIONAL] Enable verbose HIPCC logs
export HIPCC_VERBOSE=1

# Build for the target architecture of the ROCm device installed on the machine (e.g. 'gfx908,gfx90a,gfx942')
# See https://rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html for list
export PYTORCH_ROCM_ARCH=$(${ROCM_PATH}/bin/rocminfo | grep -o -m 1 'gfx.*')

# Build the wheel artifact only
python setup.py bdist_wheel \
    --build-target=genai \
    --build-variant=rocm \
    --python-tag="${python_tag}" \
    --plat-name="${python_plat_name}" \
    -DAMDGPU_TARGETS="${PYTORCH_ROCM_ARCH}" \
    -DHIP_ROOT_DIR="${ROCM_PATH}" \
    -DCMAKE_C_FLAGS="-DTORCH_USE_HIP_DSA" \
    -DCMAKE_CXX_FLAGS="-DTORCH_USE_HIP_DSA"

# Build and install the library into the Conda environment
python setup.py install \
    --build-target=genai \
    --build-variant=rocm \
    -DAMDGPU_TARGETS="${PYTORCH_ROCM_ARCH}" \
    -DHIP_ROOT_DIR="${ROCM_PATH}" \
    -DCMAKE_C_FLAGS="-DTORCH_USE_HIP_DSA" \
    -DCMAKE_CXX_FLAGS="-DTORCH_USE_HIP_DSA"

构建后检查(针对开发者)

由于 FBGEMM GenAI 利用了与 FBGEMM_GPU 相同的构建过程,请参阅构建后检查(针对开发者)以获取有关额外构建后检查的信息。

文档

访问全面的 PyTorch 开发者文档

查看文档

教程

为初学者和高级开发者提供深入的教程

查看教程

资源

查找开发资源并让您的问题得到解答

查看资源