评价此页

InplaceFunction#

class torch.autograd.function.InplaceFunction(inplace=False)[source]#

This class is here only for backward compatibility reasons. Use Function instead of this for any new use case.

static backward(ctx, *grad_outputs)[source]#

定义使用反向模式自动微分来区分操作的公式。

此函数应被所有子类重写。(定义此函数等同于定义 vjp 函数。)

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computed w.r.t. the output.

返回类型

任何

static forward(*args, **kwargs)[source]#

定义自定义自动微分函数的前向传播。

This function is to be overridden by all subclasses. There are two ways to define forward

Usage 1 (Combined forward and ctx)

@staticmethod
def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any:
    pass

Usage 2 (Separate forward and ctx)

@staticmethod
def forward(*args: Any, **kwargs: Any) -> Any:
    pass

@staticmethod
def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None:
    pass
  • The forward no longer accepts a ctx argument.

  • Instead, you must also override the torch.autograd.Function.setup_context() staticmethod to handle setting up the ctx object. output is the output of the forward, inputs are a Tuple of inputs to the forward.

  • See Extending torch.autograd for more details

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with ctx.save_for_backward() if they are intended to be used in backward (equivalently, vjp) or ctx.save_for_forward() if they are intended to be used for in jvp.

返回类型

任何

static jvp(ctx, *grad_inputs)[source]#

定义使用前向模式自动微分来区分操作的公式。

This function is to be overridden by all subclasses. It must accept a context ctx as the first argument, followed by as many inputs as the forward() got (None will be passed in for non tensor inputs of the forward function), and it should return as many tensors as there were outputs to forward(). Each argument is the gradient w.r.t the given input, and each returned value should be the gradient w.r.t. the corresponding output. If an output is not a Tensor or the function is not differentiable with respect to that output, you can just pass None as a gradient for that input.

You can use the ctx object to pass any value from the forward to this functions.

返回类型

任何

mark_dirty(*args)[source]#

将给定张量标记为在就地操作中已修改。

This should be called at most once, in either the setup_context() or forward() methods, and all arguments should be inputs.

Every tensor that’s been modified in-place in a call to forward() should be given to this function, to ensure correctness of our checks. It doesn’t matter whether the function is called before or after modification.

示例:
>>> class Inplace(Function):
>>>     @staticmethod
>>>     def forward(ctx, x):
>>>         x_npy = x.numpy() # x_npy shares storage with x
>>>         x_npy += 1
>>>         ctx.mark_dirty(x)
>>>         return x
>>>
>>>     @staticmethod
>>>     @once_differentiable
>>>     def backward(ctx, grad_output):
>>>         return grad_output
>>>
>>> a = torch.tensor(1., requires_grad=True, dtype=torch.double).clone()
>>> b = a * a
>>> Inplace.apply(a)  # This would lead to wrong gradients!
>>>                   # but the engine would not know unless we mark_dirty
>>> b.backward() # RuntimeError: one of the variables needed for gradient
>>>              # computation has been modified by an inplace operation
mark_non_differentiable(*args)[source]#

将输出标记为不可微分。

This should be called at most once, in either the setup_context() or forward() methods, and all arguments should be tensor outputs.

这将将输出标记为不需要梯度,从而提高反向计算的效率。您仍然需要在 backward() 中接受每个输出的梯度,但它始终会是一个与相应输出形状相同的零张量。

此功能用于例如从排序返回的索引。请参阅示例:
>>> class Func(Function):
>>>     @staticmethod
>>>     def forward(ctx, x):
>>>         sorted, idx = x.sort()
>>>         ctx.mark_non_differentiable(idx)
>>>         ctx.save_for_backward(x, idx)
>>>         return sorted, idx
>>>
>>>     @staticmethod
>>>     @once_differentiable
>>>     def backward(ctx, g1, g2):  # still need to accept g2
>>>         x, idx = ctx.saved_tensors
>>>         grad_input = torch.zeros_like(x)
>>>         grad_input.index_add_(0, idx, g1)
>>>         return grad_input
save_for_backward(*tensors)[source]#

为未来的 backward() 调用保存给定的张量。

save_for_backward 最多只能调用一次,可以在 setup_context()forward() 方法中调用,并且只能使用张量。

所有用于反向传播的张量都应该使用 save_for_backward 保存(而不是直接保存在 ctx 上),以防止不正确的梯度和内存泄漏,并启用已保存张量钩子的应用。有关更多详细信息,请参阅 torch.autograd.graph.saved_tensors_hooks。请参阅 扩展 torch.autograd

请注意,如果保存了中间张量(不是 forward() 的输入或输出),则自定义 Function 可能不支持二阶反向传播。不支持二阶反向传播的自定义 Function 应该使用 @once_differentiable 装饰其 backward() 方法,以便执行二阶反向传播时会引发错误。如果您想支持二阶反向传播,您可以根据反向传播中的输入重新计算中间值,或者将中间值作为自定义 Function 的输出返回。有关更多详细信息,请参阅 二阶反向传播教程

backward() 中,可以通过 saved_tensors 属性访问已保存的张量。在将它们返回给用户之前,会进行检查以确保它们没有被用于任何修改其内容的就地操作。

参数也可以是 None。这不会执行任何操作。

有关如何使用此方法的更多详细信息,请参阅 扩展 torch.autograd

示例

>>> class Func(Function):
>>>     @staticmethod
>>>     def forward(ctx, x: torch.Tensor, y: torch.Tensor, z: int):
>>>         w = x * z
>>>         out = x * y + y * z + w * y
>>>         ctx.save_for_backward(x, y, w, out)
>>>         ctx.z = z  # z is not a tensor
>>>         return out
>>>
>>>     @staticmethod
>>>     @once_differentiable
>>>     def backward(ctx, grad_out):
>>>         x, y, w, out = ctx.saved_tensors
>>>         z = ctx.z
>>>         gx = grad_out * (y + y * z)
>>>         gy = grad_out * (x + z + w)
>>>         gz = None
>>>         return gx, gy, gz
>>>
>>> a = torch.tensor(1., requires_grad=True, dtype=torch.double)
>>> b = torch.tensor(2., requires_grad=True, dtype=torch.double)
>>> c = 4
>>> d = Func.apply(a, b, c)
save_for_forward(*tensors)[source]#

保存给定的张量以供将来调用 jvp()

save_for_forward 最多只能调用一次,可以在 setup_context()forward() 方法中调用,并且所有参数都应该是张量。

jvp() 中,可以通过 saved_tensors 属性访问已保存的对象。

参数也可以是 None。这不会执行任何操作。

有关如何使用此方法的更多详细信息,请参阅 扩展 torch.autograd

示例

>>> class Func(torch.autograd.Function):
>>>     @staticmethod
>>>     def forward(ctx, x: torch.Tensor, y: torch.Tensor, z: int):
>>>         ctx.save_for_backward(x, y)
>>>         ctx.save_for_forward(x, y)
>>>         ctx.z = z
>>>         return x * y * z
>>>
>>>     @staticmethod
>>>     def jvp(ctx, x_t, y_t, _):
>>>         x, y = ctx.saved_tensors
>>>         z = ctx.z
>>>         return z * (y * x_t + x * y_t)
>>>
>>>     @staticmethod
>>>     def vjp(ctx, grad_out):
>>>         x, y = ctx.saved_tensors
>>>         z = ctx.z
>>>         return z * grad_out * y, z * grad_out * x, None
>>>
>>>     a = torch.tensor(1., requires_grad=True, dtype=torch.double)
>>>     t = torch.tensor(1., dtype=torch.double)
>>>     b = torch.tensor(2., requires_grad=True, dtype=torch.double)
>>>     c = 4
>>>
>>>     with fwAD.dual_level():
>>>         a_dual = fwAD.make_dual(a, t)
>>>         d = Func.apply(a_dual, b, c)
set_materialize_grads(value)[source]#

设置是否具体化梯度张量。默认为 True

这应该只从 setup_context()forward() 方法中调用。

如果为 True,则在调用 backward()jvp() 方法之前,未定义的梯度张量将被扩展为充满零的张量。

示例

>>> class SimpleFunc(Function):
>>>     @staticmethod
>>>     def forward(ctx, x):
>>>         return x.clone(), x.clone()
>>>
>>>     @staticmethod
>>>     @once_differentiable
>>>     def backward(ctx, g1, g2):
>>>         return g1 + g2  # No check for None necessary
>>>
>>> # We modify SimpleFunc to handle non-materialized grad outputs
>>> class Func(Function):
>>>     @staticmethod
>>>     def forward(ctx, x):
>>>         ctx.set_materialize_grads(False)
>>>         ctx.save_for_backward(x)
>>>         return x.clone(), x.clone()
>>>
>>>     @staticmethod
>>>     @once_differentiable
>>>     def backward(ctx, g1, g2):
>>>         x, = ctx.saved_tensors
>>>         grad_input = torch.zeros_like(x)
>>>         if g1 is not None:  # We must check for None now
>>>             grad_input += g1
>>>         if g2 is not None:
>>>             grad_input += g2
>>>         return grad_input
>>>
>>> a = torch.tensor(1., requires_grad=True)
>>> b, _ = Func.apply(a)  # induces g2 to be undefined
static setup_context(ctx, inputs, output)[source]#

有两种方法可以定义 autograd.Function 的前向传播。

要么

  1. 重写 forward,签名是 forward(ctx, *args, **kwargs)。不重写 setup_context。在 forward 中设置 ctx 以进行反向传播。

  2. 重写 forward,签名是 forward(*args, **kwargs) 并重写 setup_context。在 setup_context 中设置 ctx 以进行反向传播(而不是在 forward 中)。

有关更多详细信息,请参阅 torch.autograd.Function.forward()扩展 torch.autograd

返回类型

任何

static vjp(ctx, *grad_outputs)[source]#

定义使用反向模式自动微分来区分操作的公式。

此函数应被所有子类重写。(定义此函数等同于定义 vjp 函数。)

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computed w.r.t. the output.

返回类型

任何

static vmap(info, in_dims, *args)[source]#

定义此 autograd.Function 在 torch.vmap() 下的行为。

为了使 torch.autograd.Function() 支持 torch.vmap(),您必须重写此静态方法,或将 generate_vmap_rule 设置为 True(您不能同时执行两者)。

如果您选择重写此静态方法:它必须接受

  • 第一个参数是一个 info 对象。 info.batch_size 指定了 vmapped 的维度的大小,而 info.randomness 是传递给 torch.vmap() 的随机性选项。

  • 第二个参数是一个 in_dims 元组。对于 args 中的每个 arg,in_dims 都有一个对应的 Optional[int]。如果 arg 不是 Tensor 或 arg 没有被 vmapped,则为 None,否则它是一个整数,指定 Tensor 的哪个维度正在被 vmapped。

  • *args,与 forward() 的 args 相同。

vmap 静态方法的返回是一个 (output, out_dims) 元组。与 in_dims 类似,out_dims 的结构应该与 output 相同,并且包含一个 out_dim 来指定输出是否具有 vmapped 维度以及它在该维度中的索引。

有关更多详细信息,请参阅 使用 autograd.Function 扩展 torch.func