torch.cuda.jiterator._create_jit_fn#

torch.cuda.jiterator._create_jit_fn(code_string, **kwargs)[source]#

创建一个 jiterator 生成的针对逐元素运算的 cuda 内核。

code_string 必须是一个有效的 CUDA 函数，它描述了单个元素的计算。code_string 必须遵循 c++ 模板模式，如以下示例所示。此函数将被内联到逐元素内核模板中，并即时编译。编译后的内核将缓存在内存和本地临时目录中。

Jiterator 生成的内核支持非连续张量，并支持广播和类型提升。

参数

code_string (str) – 将由 jiterator 编译的 CUDA 代码字符串。入口函子必须通过值返回。
kwargs (Dict, optional) – 为生成的函数提供的关键字参数

返回类型

Callable

示例

code_string = "template <typename T> T my_kernel(T x, T y, T alpha) { return -x + alpha * y; }"
jitted_fn = create_jit_fn(code_string, alpha=1.0)
a = torch.rand(3, device="cuda")
b = torch.rand(3, device="cuda")
# invoke jitted function like a regular python function
result = jitted_fn(a, b, alpha=3.14)

code_string 也允许定义多个函数，最后一个函数将被视为入口函数。

示例

code_string = (
    "template <typename T> T util_fn(T x, T y) { return ::sin(x) + ::cos(y); }"
)
code_string += "template <typename T> T my_kernel(T x, T y, T val) { return ::min(val, util_fn(x, y)); }"
jitted_fn = create_jit_fn(code_string, val=0.0)
a = torch.rand(3, device="cuda")
b = torch.rand(3, device="cuda")
# invoke jitted function like a regular python function
result = jitted_fn(a, b)  # using default val=0.0

Jiterator 可以与 Python 注册一起使用，以覆盖算子的 cuda 内核。以下示例使用 relu 覆盖了 gelu 的 cuda 内核。

示例

code_string = "template <typename T> T my_gelu(T a) { return a > 0 ? a : 0; }"
my_gelu = create_jit_fn(code_string)
my_lib = torch.library.Library("aten", "IMPL")
my_lib.impl("aten::gelu", my_gelu, "CUDA")
# torch.nn.GELU and torch.nn.function.gelu are now overridden
a = torch.rand(3, device="cuda")
torch.allclose(torch.nn.functional.gelu(a), torch.nn.functional.relu(a))

警告

此 API 处于 Beta 版，未来版本中可能会更改。

警告

此 API 最多仅支持 8 个输入和 1 个输出

警告

torch.cuda.jiterator._create_jit_fn#

文档

教程

资源