functorch.grad¶

functorch.grad(func, argnums=0, has_aux=False)[source]¶

grad 操作符帮助计算 func 相对于由 argnums 指定的输入的梯度。此操作符可以嵌套以计算高阶梯度。

Parameters

func (可调用) – 一个接受一个或多个参数的Python函数。必须返回一个单元素张量。如果指定了has_aux等于True，函数可以返回一个单元素张量和其他辅助对象的元组： (output, aux)。
argnums (int 或 Tuple[int]) – 指定要计算梯度的参数。 argnums 可以是单个整数或整数元组。默认值：0。
has_aux (bool) – 标志，表示 func 返回一个张量和其他辅助对象：(output, aux)。默认值：False。

Returns

计算相对于其输入的梯度的函数。默认情况下，函数的输出是相对于第一个参数的梯度张量。如果指定了has_aux等于True，则返回梯度和输出辅助对象的元组。如果argnums是整数元组，则返回相对于每个argnums值的输出梯度元组。

使用 grad 的示例：

>>> # xdoctest: +SKIP
>>> from torch.func import grad
>>> x = torch.randn([])
>>> cos_x = grad(lambda x: torch.sin(x))(x)
>>> assert torch.allclose(cos_x, x.cos())
>>>
>>> # Second-order gradients
>>> neg_sin_x = grad(grad(lambda x: torch.sin(x)))(x)
>>> assert torch.allclose(neg_sin_x, -x.sin())

当与vmap组合使用时，grad可用于计算每个样本的梯度：

>>> # xdoctest: +SKIP
>>> from torch.func import grad, vmap
>>> batch_size, feature_size = 3, 5
>>>
>>> def model(weights, feature_vec):
>>>     # Very simple linear model with activation
>>>     assert feature_vec.dim() == 1
>>>     return feature_vec.dot(weights).relu()
>>>
>>> def compute_loss(weights, example, target):
>>>     y = model(weights, example)
>>>     return ((y - target) ** 2).mean()  # MSELoss
>>>
>>> weights = torch.randn(feature_size, requires_grad=True)
>>> examples = torch.randn(batch_size, feature_size)
>>> targets = torch.randn(batch_size)
>>> inputs = (weights, examples, targets)
>>> grad_weight_per_example = vmap(grad(compute_loss), in_dims=(None, 0, 0))(*inputs)

使用 grad 与 has_aux 和 argnums 的示例：

>>> # xdoctest: +SKIP
>>> from torch.func import grad
>>> def my_loss_func(y, y_pred):
>>>    loss_per_sample = (0.5 * y_pred - y) ** 2
>>>    loss = loss_per_sample.mean()
>>>    return loss, (y_pred, loss_per_sample)
>>>
>>> fn = grad(my_loss_func, argnums=(0, 1), has_aux=True)
>>> y_true = torch.rand(4)
>>> y_preds = torch.rand(4, requires_grad=True)
>>> out = fn(y_true, y_preds)
>>> # > output is ((grads w.r.t y_true, grads w.r.t y_preds), (y_pred, loss_per_sample))

注意

使用 PyTorch torch.no_grad 与 grad 一起。

案例1：在函数内部使用torch.no_grad：

>>> # xdoctest: +SKIP
>>> def f(x):
>>>     with torch.no_grad():
>>>         c = x ** 2
>>>     return x - c

在这种情况下，grad(f)(x) 将遵循内部的 torch.no_grad。

案例2：在torch.no_grad上下文管理器中使用grad：

>>> # xdoctest: +SKIP
>>> with torch.no_grad():
>>>     grad(f)(x)

在这种情况下，grad 将尊重内部的 torch.no_grad，但不会尊重外部的。这是因为 grad 是一个“函数变换”：其结果不应依赖于 f 之外的上下文管理器的结果。

警告

我们已经将functorch集成到PyTorch中。作为集成的最后一步，functorch.grad从PyTorch 2.0开始已被弃用，并将在PyTorch >= 2.3的未来版本中删除。请改用torch.func.grad；更多详情请参阅PyTorch 2.0发布说明和/或torch.func迁移指南https://pytorch.org/docs/master/func.migrating.html