Differentiate the model input using torch.autograd.grad.
使用torch可以自动实现对网络参数的梯度计算,然而有时需要用到网络对输入变量的求导。该过程可通过torch.autograd.grad()
方法实现。
import torch.autograd as autograd
autograd.grad(outputs,
inputs,
grad_outputs=None,
retain_graph=None,
create_graph=False,
only_inputs=True,
allow_unused=False)
outputs
:求导的因变量(需要求导的函数)。inputs
:求导的自变量。grad_outputs
:形状与outputs
一致;若outputs
是标量,则为None
;若outputs
是向量,则必须给定。
不妨假设自变量\(\textbf{x} = (x_1,x_2,\cdots,x_n) \in \Bbb{R}^{n}\),因变量\(\textbf{y} = (y_1,y_2,\cdots,y_m) \in \Bbb{R}^{m}\),则求导后的结果为Jacobian矩阵:
\[J = \begin{pmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n} \end{pmatrix}\]grad_outputs
为形状与outputs
一致的向量:\(\text{grad_outputs} = (a_1,a_2,\cdots,a_m)^T \in \Bbb{R}^{m}\)。给定grad_outputs
后,真正返回的梯度为:
在实践中,网络的输入和输出均为一批数据,即多维张量。此时需要显式地指定grad_outputs
,若对输入求导,则可构造全1向量:
grad_outputs = torch.ones_like(outputs, requires_grad=False)
retain_graph
:是否保留计算图create_graph
:若要计算高阶导数,需要指定True
allow_unused
:允许输入变量不进入计算
⚪ 例1:标量函数对输入的导数
import torch
import torch.autograd as autograd
x = torch.rand(3, 4).requires_grad_(True)
print(x)
"""
tensor([[0.5875, 0.6347, 0.8646, 0.0988],
[0.9347, 0.1997, 0.6708, 0.3222],
[0.2878, 0.4751, 0.3830, 0.1323]], requires_grad=True)
"""
y = torch.sum(x)
grads = autograd.grad(outputs=y, inputs=x)[0]
print(grads)
"""
tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]),
"""
⚪ 例2:向量函数对输入的导数
x = torch.rand(3, 4).requires_grad_(True)
print(x)
"""
tensor([[0.8124, 0.4225, 0.1788, 0.3574],
[0.8980, 0.8503, 0.9356, 0.9869],
[0.2243, 0.5823, 0.2081, 0.9394]], requires_grad=True)
"""
y = x[:,0] + x[:, 1]
grads = autograd.grad(outputs=y, inputs=x,
grad_outputs=torch.ones_like(y))[0]
print(grads)
"""
tensor([[1., 1., 0., 0.],
[1., 1., 0., 0.],
[1., 1., 0., 0.]])
"""
⚪ 例3:计算输入的二阶导数
x = torch.rand(3, 4).requires_grad_(True)
print(x)
"""
tensor([[0.7372, 0.3823, 0.2201, 0.6887],
[0.9190, 0.9403, 0.8159, 0.2409],
[0.4549, 0.5058, 0.6941, 0.9284]], requires_grad=True)
"""
y = x**2
grads = autograd.grad(outputs=y, inputs=x,
grad_outputs=torch.ones_like(y),
create_graph=True)[0]
grads2 = autograd.grad(outputs=grads, inputs=x,
grad_outputs=torch.ones_like(grads))[0]
print(grads2)
"""
tensor([[2., 2., 2., 2.],
[2., 2., 2., 2.],
[2., 2., 2., 2.]])
"""