Codegen 迁移指南¶

随着 PyTorch/XLA 迁移到 LTC (Lazy Tensor Core)，我们需要清理现有的存根代码（横跨 6+ 个文件），这些代码曾用于 op 降低。旧 op 降低的完整流程和文件结构可以在 op 降低指南 :ref:'op-lowering' 中找到。

有关不同操作配置的更多信息，请参阅 ``codegen/xla_native_functions.yaml` <https://github.com/pytorch/xla/blob/master/codegen/xla_native_functions.yaml>`_ 。将支持的操作（在 supported 配置下）替换为 codegen 等效项（在 full_codegen 配置下）**不应**引入任何新行为，这纯粹是为了清理目的。其他配置下的操作可能会有不同的行为。有关其他配置的信息，请参阅 ``codegen/xla_native_functions.yaml` <https://github.com/pytorch/xla/blob/master/codegen/xla_native_functions.yaml>`_ 。

有关 PyTorch 中 dispatching 的更多信息，请参阅 Exyang 的博客文章。

开始之前¶

您应该按照此处的说明从源代码安装所需的依赖项并构建 pytorch 和 pytorch/XLA。您不需要 TPU 访问权限即可实现降低。建议在工作站上进行实验，并将其配置为使用 XLA:CPU。您可以通过运行以下命令将 Pytorch/XLA 配置为使用 XLA:CPU：

export PJRT_DEVICE=CPU

同样建议您在处理 codegen 之前熟悉我们的 op 降低过程。

PyTorch/XLA 使用 https://github.com/pytorch/xla/issues/3560 来跟踪 codegen 迁移的状态。在处理 codegen 时，请将您的 GitHub 别名和 PR 链接添加到问题中，以避免重复工作。

文件结构¶

下面提到的所有文件都位于 xla/torch_xla/csrc 文件夹下，除了 xla_native_functions.yaml。

PyTorch Codegen 文件¶

torch/csrc/lazy/core/shape_inference.h
- 为每个 op 定义的形状推断函数，它将采用 torch::lazy::shapes 作为输入并返回输出 torch::lazy::shape。只有非结构化 ops 才需要手动形状推断函数。
torchgen/gen_lazy_tensor.py
- 构建在所有 ATen 后端使用的现有数据模型和辅助函数之上，并添加了特定于惰性张量后端的附加功能。run_gen_lazy_tensor 定义在此文件中。
torchgen/dest/lazy_ir.py
- 包含数据类 GenLazyIR，它可以被后端覆盖并定义生成的 IR 类。

PyTorch/XLA Codegen 文件¶

xla/xla_native_functions.yaml
- 包含 XLA 今天支持的所有 op。大多数 op 都位于 supported 类别下，本文档的目的是将大多数 op 移到 full_codegen 类别。
xla/scripts/gen_lazy_tensor.py
- 提供 codegen Codegen 类必需的 XLA 版本，并调用上游 codegen API。
xla/torch_xla/csrc/XLANativeFunctions.cpp
- xla/codegen/xla_native_functions.yaml 的 full_codegen 列的结果。此处定义的 op 函数将实现 XLANativeFunctions.h 中声明的 op。每个 op 都将接受 at::tensor 并返回另一个包装在 XLATensor 中的 at::tensor。请注意，这部分以前是在 tensor_method.cpp 中手动完成的。
xla/torch_xla/csrc/LazyIr.h
- xla/codegen/xla_native_functions.yaml 的 full_codegen 列的结果。定义了用于构造 full_codegen ops 的 IR。

PyTorch/XLA 旧 Op 降低文件¶

xla/torch_xla/csrc/generated/aten_xla_type.cpp
- 手动实现 xla/codegen/xla_native_functions.yaml 中定义的 op。将被 XLANativeFunctions.cpp 替换。
xla/torch_xla/csrc/generated/tensor.h
- 定义 XLATensor 类和 XLATensor 方法声明。这些声明通常与我们在 XLANativeFunctions.h 中声明的 at::Tensor 节点是一对一的映射。对于 full_codegen ops，将删除 XLATensor 方法。
xla/torch_xla/csrc/generated/tensor_method.cpp
- 实现 tensor.h 中定义的张量方法。对于 full_codegen ops，将删除此文件。
xla/torch_xla/csrc/generated/ops/…
- 为“大多数”op 定义 IR 类。有可能多个 op 共享同一个 IR。

Codegen 逐步指南¶

1. 识别 op¶

在处理前几个 codegen 时，我们通常建议从更简单的 op 开始。本指南将以一个一元 op 和一个二元 op 为例，但建议您避免使用具有以下特征的 op：1. 包含自定义回退代码。例如，在 _adaptive_avg_pool3d 中，有一个条件回退。

if (!IsSupportedAdaptivePool(XlaHelpers::I64List(self.sizes()),
                             output_size_list, /*pool_dim=*/3)) {
  return at::native::call_fallback_fn<&xla_fallback, ATEN_OP(_adaptive_avg_pool3d)>::call(self, output_size);
}

产生动态形状，因为这些 op 尚在开发中，并且可能会随时间演变。在未来的某个时候，我们可能会将这些 op 引入 codegen。
不直接调用 tensor_method。例如：

if (!self_tensor) {
  static bool sync_update =
      torch_xla::runtime::sys_util::GetEnvBool("XLA_TENSOR_UPDATE_SYNC", true);
  XLA_CHECK(dst_tensor);
  dst_tensor->UpdateFromTensor(self, /*sync=*/sync_update);
}

具有复杂的 tensor_method，理想情况下它应该是 op 到 IR 的直接映射。

一个“简单”op 的一个很好的例子是像 abs 这样的。

at::Tensor XLANativeFunctions::abs(const at::Tensor& self) {
  TORCH_LAZY_FN_COUNTER("xla::");
  return bridge::AtenFromXlaTensor(XLATensor::abs(bridge::GetXlaTensor(self)));
}

2. Codegen op 并检查生成的代码¶

在 xla/codegen/xla_native_functions.yaml 中找到该 op，并将其移动到 full_codegen 列，然后再次在 xla 目录下运行 python setup.py install。构建会失败（原因将在本指南后面解释），但您仍然可以看到生成的代码。

如果在生成文件时遇到与 ``shape_inference.h` <https://github.com/pytorch/pytorch/blob/main/torch/csrc/lazy/core/shape_inference.h>`_ 相关的错误，您可能遇到了 PyTorch 尚未为要生成的函数提供必要实现的问题。您可以尝试在 ``shape_inference.h` <https://github.com/pytorch/pytorch/blob/main/torch/csrc/lazy/core/shape_inference.h>`_ 中添加必要的函数来解决此问题。

下面的代码片段使用 abs 作为示例。#### XLANativeFunctions.cpp

at::Tensor XLANativeFunctions::abs(const at::Tensor & self) {
  TORCH_LAZY_FN_COUNTER("xla::");
  auto common_device = torch_xla::bridge::GetXlaDevice(self);
  TORCH_INTERNAL_ASSERT(common_device);

  torch_xla::XLATensorPtr lazy_self = torch_xla::bridge::GetXlaTensorOrCreateForWrappedNumber(self, *common_device);

  torch::lazy::NodePtr node = torch::lazy::ReuseNode<Abs>(lazy_self->GetIrValue());
  if (!node) {
    node = torch_xla::MakeNode<Abs>(lazy_self->GetIrValue());
    CacheNode(node);
  }

  auto result = torch_xla::bridge::AtenFromXlaTensor(
        torch_xla::XLATensor::Create(std::move(node), *common_device));
  return result;
};

逐行描述生成的代码：- 获取并验证输入张量的设备

auto common_device = torch_xla::bridge::GetXlaDevice(self);
TORCH_INTERNAL_ASSERT(common_device);

检查是否可以重用先前创建的节点。如果不能，则创建相应的 IR 节点并缓存它。

torch::lazy::NodePtr node = torch::lazy::ReuseNode<Abs>(lazy_self->GetIrValue());
if (!node) {
  node = torch_xla::MakeNode<Abs>(lazy_self->GetIrValue());
  CacheNode(node);
}

将新创建的 IR 节点包装在 XLATensor 中。并将 XLATensor 包装在 at::Tensor 中，并将其作为结果返回。请注意，这部分以前是在 tensor_method.cpp 中手动完成的。

auto result = torch_xla::bridge::AtenFromXlaTensor(
      torch_xla::XLATensor::Create(std::move(node), *common_device));
return result;

LazyIr.h¶

class Abs : public XlaNode {
 public:
  Abs(const torch_xla::XlaValue& self)
      : XlaNode(torch::lazy::OpKind(at::aten::abs), {self},
                [&]() { return AbsOutputShape(self); },
                /* num_outputs */ 1, torch::lazy::MHash())
  {}

  std::string ToString() const override {
    std::stringstream ss;
    ss << XlaNode::ToString();
    return ss.str();
  }
  torch_xla::XlaOpVector Lower(LoweringContext* loctx) const override;
};

有几点需要注意：- Codegen 不会生成预期的 Clone 方法。即使在今天的 PyTorch/XLA 中也没有使用 Clone 方法，我们将在迁移过程中将其删除。- 对于每个 op，它都会生成一个 {OP}OutputShape 方法。我们需要在单独的文件中手动声明和实现此方法。- 对于每个 op，它都会生成一个 Lower 声明。我们需要在单独的文件中手动实现此降低函数。

3. 实现缺失的 IR 函数¶

torch_xla/csrc/ops/ops_xla_shape_fn.h¶

声明 {OP}OutputShape。

xla::Shape AbsOutputShape(const XlaValue& input);

torch_xla/csrc/ops/ops_xla_shape_fn.cpp¶

实现 {OP}OutputShape。

xla::Shape AbsOutputShape(const XlaValue& input) { return input.xla_shape(); }

Abs 是一个过于简化的例子，在正常情况下，您需要再次调用 BuildXXXOp 函数来获取输出形状。一个稍微好一点的例子是：

xla::Shape MaximumOutputShape(const XlaValue& input, const XlaValue& other) {
  auto lower_for_shape_fn =
      [&](absl::Span<const xla::XlaOp> operands) -> xla::XlaOp {
    auto promoted = XlaHelpers::Promote(operands[0], operands[1]);
    return xla::Max(promoted.first, promoted.second);
  };
  return InferOutputShape({input.xla_shape(), other.xla_shape()},
                          lower_for_shape_fn);
}

请注意，您不应从头开始。从现有 op 中找到 Xla::Shape 计算逻辑，并将其移至这两个文件。

4. 实现降低函数¶

torch_xla/csrc/ops/ops_lower_fn.cpp¶

torch_xla::XlaOpVector Abs::Lower(LoweringContext* loctx) const {
  xla::XlaOp xla_input = loctx->GetOutputOp(operand(0));
  return ReturnOp(BuildAbs(xla_input), loctx);
}

请注意，此函数应直接从现有的降低函数中移动。一些最初在 torch_xla/csrc/ops/ops.cpp 中实现的 op 使用 GenericOp。您需要稍微修改它们的降低实现，以适应上面提供的实现。

5. 清理¶

从 aten_xla_type.cpp、tensor_methods.h、tensor_methods.cpp 和 ops/…. 中删除现有的 op。请注意，有时您必须保留 tensor_method，因为它在 tensor_ops 中被使用。因此，在删除 op 之前，请将其与 tensor_ops.cpp 进行交叉引用。

XLATensor s1 = XLATensor::sub(XLATensor::mul(u2, v3), XLATensor::mul(u3, v2), one);

有时其他 IRNode 会使用您迁移的 'IRNode'。在这种情况下，您还需要更新那些 IRNode 的降低逻辑。从长远来看，我们需要摆脱这些复合 IR，并为每个 op 提供一个降低函数。

torch::lazy::NodePtr exp = Pow(Abs(input), norm_exp);

至

torch::lazy::NodePtr exp =
    Pow(torch_xla::MakeNode<Abs>(input, std::vector<torch::lazy::Shape>()),
        norm_exp);

运行测试并验证结果¶

运行 C++ op 测试或仅涉及生成的 op 的简单测试。要运行 C++ 测试：1. 通过 python setup.py install 构建 xla（注意：不要使用 BUILD_CPP_TESTS=0 标志，因为这会跳过构建 C++ 测试）2. 进入 pytorch/xla 中的 test/cpp/build 目录3. 运行命令以运行所需的 C++ 测试（例如，运行 Abs C++ 测试）：

./test_ptxla --gtest_filter=AtenXlaTensorTest.TestAbs

和往常一样，需要验证两件事：正确性和 xla 计数器的正确递增。

示例 PR¶

一元/二元 OP -> Codegen erf, erfc, erfinv, and exp (https://github.com/pytorch/xla/pull/3659)
带可选参数的 OP -> Codegen binary_cross_entropy/backward (https://github.com/pytorch/xla/pull/3809)
带 at::Scalar 的 OP -> Codegen addcdiv and addcmul (https://github.com/pytorch/xla/pull/3768)
支持负索引的向量 OP -> Codegen amin amax (https://github.com/pytorch/xla/pull/3771)
具有特殊回退逻辑的 OP -> 部分 codegen adaptive_avgpool3d 和 backward (https://github.com/pytorch/xla/pull/3790) 如需更多示例，请查看跟踪问题 (https://github.com/pytorch/xla/issues/3560)。