InplaceFunction#
- class torch.autograd.function.InplaceFunction(inplace=False)[source]#
This class is here only for backward compatibility reasons. Use
Function
instead of this for any new use case.- static backward(ctx, *grad_outputs)[source]#
定义使用反向模式自动微分来区分操作的公式。
此函数应被所有子类重写。(定义此函数等同于定义
vjp
函数。)It must accept a context
ctx
as the first argument, followed by as many outputs as theforward()
returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs toforward()
. Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.The context can be used to retrieve tensors saved during the forward pass. It also has an attribute
ctx.needs_input_grad
as a tuple of booleans representing whether each input needs gradient. E.g.,backward()
will havectx.needs_input_grad[0] = True
if the first input toforward()
needs gradient computed w.r.t. the output.- 返回类型
- static forward(*args, **kwargs)[source]#
定义自定义自动微分函数的前向传播。
This function is to be overridden by all subclasses. There are two ways to define forward
Usage 1 (Combined forward and ctx)
@staticmethod def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any: pass
It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).
See Combined or separate forward() and setup_context() for more details
Usage 2 (Separate forward and ctx)
@staticmethod def forward(*args: Any, **kwargs: Any) -> Any: pass @staticmethod def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None: pass
The forward no longer accepts a ctx argument.
Instead, you must also override the
torch.autograd.Function.setup_context()
staticmethod to handle setting up thectx
object.output
is the output of the forward,inputs
are a Tuple of inputs to the forward.See Extending torch.autograd for more details
The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with
ctx.save_for_backward()
if they are intended to be used inbackward
(equivalently,vjp
) orctx.save_for_forward()
if they are intended to be used for injvp
.- 返回类型
- static jvp(ctx, *grad_inputs)[source]#
定义使用前向模式自动微分来区分操作的公式。
This function is to be overridden by all subclasses. It must accept a context
ctx
as the first argument, followed by as many inputs as theforward()
got (None will be passed in for non tensor inputs of the forward function), and it should return as many tensors as there were outputs toforward()
. Each argument is the gradient w.r.t the given input, and each returned value should be the gradient w.r.t. the corresponding output. If an output is not a Tensor or the function is not differentiable with respect to that output, you can just pass None as a gradient for that input.You can use the
ctx
object to pass any value from the forward to this functions.- 返回类型
- mark_dirty(*args)[source]#
将给定张量标记为在就地操作中已修改。
This should be called at most once, in either the
setup_context()
orforward()
methods, and all arguments should be inputs.Every tensor that’s been modified in-place in a call to
forward()
should be given to this function, to ensure correctness of our checks. It doesn’t matter whether the function is called before or after modification.- 示例:
>>> class Inplace(Function): >>> @staticmethod >>> def forward(ctx, x): >>> x_npy = x.numpy() # x_npy shares storage with x >>> x_npy += 1 >>> ctx.mark_dirty(x) >>> return x >>> >>> @staticmethod >>> @once_differentiable >>> def backward(ctx, grad_output): >>> return grad_output >>> >>> a = torch.tensor(1., requires_grad=True, dtype=torch.double).clone() >>> b = a * a >>> Inplace.apply(a) # This would lead to wrong gradients! >>> # but the engine would not know unless we mark_dirty >>> b.backward() # RuntimeError: one of the variables needed for gradient >>> # computation has been modified by an inplace operation
- mark_non_differentiable(*args)[source]#
将输出标记为不可微分。
This should be called at most once, in either the
setup_context()
orforward()
methods, and all arguments should be tensor outputs.这将将输出标记为不需要梯度,从而提高反向计算的效率。您仍然需要在
backward()
中接受每个输出的梯度,但它始终会是一个与相应输出形状相同的零张量。- 此功能用于例如从排序返回的索引。请参阅示例:
>>> class Func(Function): >>> @staticmethod >>> def forward(ctx, x): >>> sorted, idx = x.sort() >>> ctx.mark_non_differentiable(idx) >>> ctx.save_for_backward(x, idx) >>> return sorted, idx >>> >>> @staticmethod >>> @once_differentiable >>> def backward(ctx, g1, g2): # still need to accept g2 >>> x, idx = ctx.saved_tensors >>> grad_input = torch.zeros_like(x) >>> grad_input.index_add_(0, idx, g1) >>> return grad_input
- save_for_backward(*tensors)[source]#
为未来的
backward()
调用保存给定的张量。save_for_backward
最多只能调用一次,可以在setup_context()
或forward()
方法中调用,并且只能使用张量。所有用于反向传播的张量都应该使用
save_for_backward
保存(而不是直接保存在ctx
上),以防止不正确的梯度和内存泄漏,并启用已保存张量钩子的应用。有关更多详细信息,请参阅torch.autograd.graph.saved_tensors_hooks
。请参阅 扩展 torch.autograd。请注意,如果保存了中间张量(不是
forward()
的输入或输出),则自定义 Function 可能不支持二阶反向传播。不支持二阶反向传播的自定义 Function 应该使用@once_differentiable
装饰其backward()
方法,以便执行二阶反向传播时会引发错误。如果您想支持二阶反向传播,您可以根据反向传播中的输入重新计算中间值,或者将中间值作为自定义 Function 的输出返回。有关更多详细信息,请参阅 二阶反向传播教程。在
backward()
中,可以通过saved_tensors
属性访问已保存的张量。在将它们返回给用户之前,会进行检查以确保它们没有被用于任何修改其内容的就地操作。参数也可以是
None
。这不会执行任何操作。有关如何使用此方法的更多详细信息,请参阅 扩展 torch.autograd。
示例
>>> class Func(Function): >>> @staticmethod >>> def forward(ctx, x: torch.Tensor, y: torch.Tensor, z: int): >>> w = x * z >>> out = x * y + y * z + w * y >>> ctx.save_for_backward(x, y, w, out) >>> ctx.z = z # z is not a tensor >>> return out >>> >>> @staticmethod >>> @once_differentiable >>> def backward(ctx, grad_out): >>> x, y, w, out = ctx.saved_tensors >>> z = ctx.z >>> gx = grad_out * (y + y * z) >>> gy = grad_out * (x + z + w) >>> gz = None >>> return gx, gy, gz >>> >>> a = torch.tensor(1., requires_grad=True, dtype=torch.double) >>> b = torch.tensor(2., requires_grad=True, dtype=torch.double) >>> c = 4 >>> d = Func.apply(a, b, c)
- save_for_forward(*tensors)[source]#
保存给定的张量以供将来调用
jvp()
。save_for_forward
最多只能调用一次,可以在setup_context()
或forward()
方法中调用,并且所有参数都应该是张量。在
jvp()
中,可以通过saved_tensors
属性访问已保存的对象。参数也可以是
None
。这不会执行任何操作。有关如何使用此方法的更多详细信息,请参阅 扩展 torch.autograd。
示例
>>> class Func(torch.autograd.Function): >>> @staticmethod >>> def forward(ctx, x: torch.Tensor, y: torch.Tensor, z: int): >>> ctx.save_for_backward(x, y) >>> ctx.save_for_forward(x, y) >>> ctx.z = z >>> return x * y * z >>> >>> @staticmethod >>> def jvp(ctx, x_t, y_t, _): >>> x, y = ctx.saved_tensors >>> z = ctx.z >>> return z * (y * x_t + x * y_t) >>> >>> @staticmethod >>> def vjp(ctx, grad_out): >>> x, y = ctx.saved_tensors >>> z = ctx.z >>> return z * grad_out * y, z * grad_out * x, None >>> >>> a = torch.tensor(1., requires_grad=True, dtype=torch.double) >>> t = torch.tensor(1., dtype=torch.double) >>> b = torch.tensor(2., requires_grad=True, dtype=torch.double) >>> c = 4 >>> >>> with fwAD.dual_level(): >>> a_dual = fwAD.make_dual(a, t) >>> d = Func.apply(a_dual, b, c)
- set_materialize_grads(value)[source]#
设置是否具体化梯度张量。默认为
True
。这应该只从
setup_context()
或forward()
方法中调用。如果为
True
,则在调用backward()
和jvp()
方法之前,未定义的梯度张量将被扩展为充满零的张量。示例
>>> class SimpleFunc(Function): >>> @staticmethod >>> def forward(ctx, x): >>> return x.clone(), x.clone() >>> >>> @staticmethod >>> @once_differentiable >>> def backward(ctx, g1, g2): >>> return g1 + g2 # No check for None necessary >>> >>> # We modify SimpleFunc to handle non-materialized grad outputs >>> class Func(Function): >>> @staticmethod >>> def forward(ctx, x): >>> ctx.set_materialize_grads(False) >>> ctx.save_for_backward(x) >>> return x.clone(), x.clone() >>> >>> @staticmethod >>> @once_differentiable >>> def backward(ctx, g1, g2): >>> x, = ctx.saved_tensors >>> grad_input = torch.zeros_like(x) >>> if g1 is not None: # We must check for None now >>> grad_input += g1 >>> if g2 is not None: >>> grad_input += g2 >>> return grad_input >>> >>> a = torch.tensor(1., requires_grad=True) >>> b, _ = Func.apply(a) # induces g2 to be undefined
- static setup_context(ctx, inputs, output)[source]#
有两种方法可以定义 autograd.Function 的前向传播。
要么
重写 forward,签名是
forward(ctx, *args, **kwargs)
。不重写setup_context
。在forward
中设置 ctx 以进行反向传播。重写 forward,签名是
forward(*args, **kwargs)
并重写setup_context
。在setup_context
中设置 ctx 以进行反向传播(而不是在forward
中)。
有关更多详细信息,请参阅
torch.autograd.Function.forward()
和 扩展 torch.autograd。- 返回类型
- static vjp(ctx, *grad_outputs)[source]#
定义使用反向模式自动微分来区分操作的公式。
此函数应被所有子类重写。(定义此函数等同于定义
vjp
函数。)It must accept a context
ctx
as the first argument, followed by as many outputs as theforward()
returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs toforward()
. Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.The context can be used to retrieve tensors saved during the forward pass. It also has an attribute
ctx.needs_input_grad
as a tuple of booleans representing whether each input needs gradient. E.g.,backward()
will havectx.needs_input_grad[0] = True
if the first input toforward()
needs gradient computed w.r.t. the output.- 返回类型
- static vmap(info, in_dims, *args)[source]#
定义此 autograd.Function 在
torch.vmap()
下的行为。为了使
torch.autograd.Function()
支持torch.vmap()
,您必须重写此静态方法,或将generate_vmap_rule
设置为True
(您不能同时执行两者)。如果您选择重写此静态方法:它必须接受
第一个参数是一个
info
对象。info.batch_size
指定了 vmapped 的维度的大小,而info.randomness
是传递给torch.vmap()
的随机性选项。第二个参数是一个
in_dims
元组。对于args
中的每个 arg,in_dims
都有一个对应的Optional[int]
。如果 arg 不是 Tensor 或 arg 没有被 vmapped,则为None
,否则它是一个整数,指定 Tensor 的哪个维度正在被 vmapped。*args
,与forward()
的 args 相同。
vmap 静态方法的返回是一个
(output, out_dims)
元组。与in_dims
类似,out_dims
的结构应该与output
相同,并且包含一个out_dim
来指定输出是否具有 vmapped 维度以及它在该维度中的索引。有关更多详细信息,请参阅 使用 autograd.Function 扩展 torch.func。