towhee.models.drl.until_module.AllGather¶
- class towhee.models.drl.until_module.AllGather(*args, **kwargs)[source]¶
Bases:
Function
An autograd function that performs allgather on a tensor.
Methods
apply
Defines a formula for differentiating the operation with backward mode automatic differentiation (alias to the vjp function).
Performs the operation.
Defines a formula for differentiating the operation with forward mode automatic differentiation.
Marks given tensors as modified in an in-place operation.
Marks outputs as non-differentiable.
mark_shared_storage
name
register_hook
Saves given tensors for a future call to
backward()
.Saves given tensors for a future call to
jvp()
.Sets whether to materialize output grad tensors.
Defines a formula for differentiating the operation with backward mode automatic differentiation (alias to the vjp function).
Attributes
dirty_tensors
is_traceable
materialize_grads
metadata
needs_input_grad
next_functions
non_differentiable
requires_grad
saved_for_forward
saved_tensors
saved_variables
to_save
- __call__(*args, **kwargs)¶
Call self as a function.
- __init__(*args, **kwargs)¶
- static backward(ctx, grad_output)[source]¶
Defines a formula for differentiating the operation with backward mode automatic differentiation (alias to the vjp function).
This function is to be overridden by all subclasses.
It must accept a context
ctx
as the first argument, followed by as many outputs as theforward()
returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs toforward()
. Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.The context can be used to retrieve tensors saved during the forward pass. It also has an attribute
ctx.needs_input_grad
as a tuple of booleans representing whether each input needs gradient. E.g.,backward()
will havectx.needs_input_grad[0] = True
if the first input toforward()
needs gradient computated w.r.t. the output.
- static forward(ctx, tensor, args)[source]¶
Performs the operation.
This function is to be overridden by all subclasses.
It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).
The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with
ctx.save_for_backward()
if they are intended to be used inbackward
(equivalently,vjp
) orctx.save_for_forward()
if they are intended to be used for injvp
.
- static jvp(ctx: Any, *grad_inputs: Any) Any ¶
Defines a formula for differentiating the operation with forward mode automatic differentiation. This function is to be overridden by all subclasses. It must accept a context
ctx
as the first argument, followed by as many inputs as theforward()
got (None will be passed in for non tensor inputs of the forward function), and it should return as many tensors as there were outputs toforward()
. Each argument is the gradient w.r.t the given input, and each returned value should be the gradient w.r.t. the corresponding output. If an output is not a Tensor or the function is not differentiable with respect to that output, you can just pass None as a gradient for that input.You can use the
ctx
object to pass any value from the forward to this functions.
- mark_dirty(*args: Tensor)¶
Marks given tensors as modified in an in-place operation.
This should be called at most once, only from inside the
forward()
method, and all arguments should be inputs.Every tensor that’s been modified in-place in a call to
forward()
should be given to this function, to ensure correctness of our checks. It doesn’t matter whether the function is called before or after modification.- Examples::
>>> class Inplace(Function): >>> @staticmethod >>> def forward(ctx, x): >>> x_npy = x.numpy() # x_npy shares storage with x >>> x_npy += 1 >>> ctx.mark_dirty(x) >>> return x >>> >>> @staticmethod >>> @once_differentiable >>> def backward(ctx, grad_output): >>> return grad_output >>> >>> a = torch.tensor(1., requires_grad=True, dtype=torch.double).clone() >>> b = a * a >>> Inplace.apply(a) # This would lead to wrong gradients! >>> # but the engine would not know unless we mark_dirty >>> b.backward() # RuntimeError: one of the variables needed for gradient >>> # computation has been modified by an inplace operation
- mark_non_differentiable(*args: Tensor)¶
Marks outputs as non-differentiable.
This should be called at most once, only from inside the
forward()
method, and all arguments should be tensor outputs.This will mark outputs as not requiring gradients, increasing the efficiency of backward computation. You still need to accept a gradient for each output in
backward()
, but it’s always going to be a zero tensor with the same shape as the shape of a corresponding output.- This is used e.g. for indices returned from a sort. See example::
>>> class Func(Function): >>> @staticmethod >>> def forward(ctx, x): >>> sorted, idx = x.sort() >>> ctx.mark_non_differentiable(idx) >>> ctx.save_for_backward(x, idx) >>> return sorted, idx >>> >>> @staticmethod >>> @once_differentiable >>> def backward(ctx, g1, g2): # still need to accept g2 >>> x, idx = ctx.saved_tensors >>> grad_input = torch.zeros_like(x) >>> grad_input.index_add_(0, idx, g1) >>> return grad_input
- save_for_backward(*tensors: Tensor)¶
Saves given tensors for a future call to
backward()
.save_for_backward
should be called at most once, only from inside theforward()
method, and only with tensors.All tensors intended to be used in the backward pass should be saved with
save_for_backward
(as opposed to directly onctx
) to prevent incorrect gradients and memory leaks, and enable the application of saved tensor hooks. Seetorch.autograd.graph.saved_tensors_hooks
.In
backward()
, saved tensors can be accessed through thesaved_tensors
attribute. Before returning them to the user, a check is made to ensure they weren’t used in any in-place operation that modified their content.Arguments can also be
None
. This is a no-op.See extending-autograd for more details on how to use this method.
- Example::
>>> class Func(Function): >>> @staticmethod >>> def forward(ctx, x: torch.Tensor, y: torch.Tensor, z: int): >>> w = x * y * z >>> out = x * y + y * z + w >>> ctx.save_for_backward(x, y, w, out) >>> ctx.z = z # z is not a tensor >>> return out >>> >>> @staticmethod >>> def backward(ctx, grad_out): >>> x, y, w, out = ctx.saved_tensors >>> z = ctx.z >>> gx = grad_out * (y + y * z) >>> gy = grad_out * (x + z + x * z) >>> gz = None >>> return gx, gy, gz >>> >>> a = torch.tensor(1., requires_grad=True, dtype=torch.double) >>> b = torch.tensor(2., requires_grad=True, dtype=torch.double) >>> c = 4 >>> d = Func.apply(a, b, c)
- save_for_forward(*tensors: Tensor)¶
Saves given tensors for a future call to
jvp()
.save_for_forward
should be only called once, from inside theforward()
method, and only be called with tensors.In
jvp()
, saved objects can be accessed through thesaved_tensors
attribute.Arguments can also be
None
. This is a no-op.See extending-autograd for more details on how to use this method.
- Example::
>>> class Func(torch.autograd.Function): >>> @staticmethod >>> def forward(ctx, x: torch.Tensor, y: torch.Tensor, z: int): >>> ctx.save_for_backward(x, y) >>> ctx.save_for_forward(x, y) >>> ctx.z = z >>> return x * y * z >>> >>> @staticmethod >>> def jvp(ctx, x_t, y_t, _): >>> x, y = ctx.saved_tensors >>> z = ctx.z >>> return z * (y * x_t + x * y_t) >>> >>> @staticmethod >>> def vjp(ctx, grad_out): >>> x, y = ctx.saved_tensors >>> z = ctx.z >>> return z * grad_out * y, z * grad_out * x, None >>> >>> a = torch.tensor(1., requires_grad=True, dtype=torch.double) >>> t = torch.tensor(1., dtype=torch.double) >>> b = torch.tensor(2., requires_grad=True, dtype=torch.double) >>> c = 4 >>> >>> with fwAD.dual_level(): >>> a_dual = fwAD.make_dual(a, t) >>> d = Func.apply(a_dual, b, c)
- set_materialize_grads(value: bool)¶
Sets whether to materialize output grad tensors. Default is
True
.This should be called only from inside the
forward()
methodIf
True
, undefined output grad tensors will be expanded to tensors full of zeros prior to calling thebackward()
method.- Example::
>>> class SimpleFunc(Function): >>> @staticmethod >>> def forward(ctx, x): >>> return x.clone(), x.clone() >>> >>> @staticmethod >>> @once_differentiable >>> def backward(ctx, g1, g2): >>> return g1 + g2 # No check for None necessary >>> >>> # We modify SimpleFunc to handle non-materialized grad outputs >>> class Func(Function): >>> @staticmethod >>> def forward(ctx, x): >>> ctx.set_materialize_grads(False) >>> ctx.save_for_backward(x) >>> return x.clone(), x.clone() >>> >>> @staticmethod >>> @once_differentiable >>> def backward(ctx, g1, g2): >>> x, = ctx.saved_tensors >>> grad_input = torch.zeros_like(x) >>> if g1 is not None: # We must check for None now >>> grad_input += g1 >>> if g2 is not None: >>> grad_input += g2 >>> return grad_input >>> >>> a = torch.tensor(1., requires_grad=True) >>> b, _ = Func.apply(a) # induces g2 to be undefined
- static vjp(ctx: Any, *grad_outputs: Any) Any ¶
Defines a formula for differentiating the operation with backward mode automatic differentiation (alias to the vjp function).
This function is to be overridden by all subclasses.
It must accept a context
ctx
as the first argument, followed by as many outputs as theforward()
returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs toforward()
. Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.The context can be used to retrieve tensors saved during the forward pass. It also has an attribute
ctx.needs_input_grad
as a tuple of booleans representing whether each input needs gradient. E.g.,backward()
will havectx.needs_input_grad[0] = True
if the first input toforward()
needs gradient computated w.r.t. the output.