Differentiate the model input using torch.autograd.grad.

使用torch可以自动实现对网络参数的梯度计算,然而有时需要用到网络对输入变量的求导。该过程可通过torch.autograd.grad()方法实现。

import torch.autograd as autograd

autograd.grad(outputs,
              inputs,
              grad_outputs=None,
              retain_graph=None,
              create_graph=False,
              only_inputs=True,
              allow_unused=False)

不妨假设自变量\(\textbf{x} = (x_1,x_2,\cdots,x_n) \in \Bbb{R}^{n}\),因变量\(\textbf{y} = (y_1,y_2,\cdots,y_m) \in \Bbb{R}^{m}\),则求导后的结果为Jacobian矩阵:

\[J = \begin{pmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n} \end{pmatrix}\]

grad_outputs为形状与outputs一致的向量:\(\text{grad_outputs} = (a_1,a_2,\cdots,a_m)^T \in \Bbb{R}^{m}\)。给定grad_outputs后,真正返回的梯度为:

\[J \otimes \text{grad_outputs} = \begin{pmatrix} a_1\frac{\partial y_1}{\partial x_1} +a_2\frac{\partial y_2}{\partial x_1} + \cdots + a_m \frac{\partial y_m}{\partial x_1} \\ a_1\frac{\partial y_1}{\partial x_2} +a_2\frac{\partial y_2}{\partial x_2} + \cdots + a_m \frac{\partial y_m}{\partial x_2} \\ \vdots \\ a_1\frac{\partial y_1}{\partial x_n} +a_2\frac{\partial y_2}{\partial x_n} + \cdots + a_m \frac{\partial y_m}{\partial x_n} \end{pmatrix} \in \Bbb{R}^{n}\]

在实践中,网络的输入和输出均为一批数据,即多维张量。此时需要显式地指定grad_outputs,若对输入求导,则可构造全1向量:

grad_outputs = torch.ones_like(outputs, requires_grad=False)

⚪ 例1:标量函数对输入的导数

import torch
import torch.autograd as autograd

x = torch.rand(3, 4).requires_grad_(True)
print(x)
"""
tensor([[0.5875, 0.6347, 0.8646, 0.0988],
        [0.9347, 0.1997, 0.6708, 0.3222],
        [0.2878, 0.4751, 0.3830, 0.1323]], requires_grad=True)
"""

y = torch.sum(x)
grads = autograd.grad(outputs=y, inputs=x)[0]
print(grads)
"""
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]),
"""

⚪ 例2:向量函数对输入的导数

x = torch.rand(3, 4).requires_grad_(True)
print(x)
"""
tensor([[0.8124, 0.4225, 0.1788, 0.3574],
        [0.8980, 0.8503, 0.9356, 0.9869],
        [0.2243, 0.5823, 0.2081, 0.9394]], requires_grad=True)
"""

y = x[:,0] + x[:, 1]
grads = autograd.grad(outputs=y, inputs=x,
                      grad_outputs=torch.ones_like(y))[0]
print(grads)
"""
tensor([[1., 1., 0., 0.],
        [1., 1., 0., 0.],
        [1., 1., 0., 0.]])
"""

⚪ 例3:计算输入的二阶导数

x = torch.rand(3, 4).requires_grad_(True)
print(x)
"""
tensor([[0.7372, 0.3823, 0.2201, 0.6887],
        [0.9190, 0.9403, 0.8159, 0.2409],
        [0.4549, 0.5058, 0.6941, 0.9284]], requires_grad=True)
"""

y = x**2
grads = autograd.grad(outputs=y, inputs=x,
                      grad_outputs=torch.ones_like(y),
                      create_graph=True)[0]
grads2 = autograd.grad(outputs=grads, inputs=x,
                       grad_outputs=torch.ones_like(grads))[0]
print(grads2)
"""
tensor([[2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [2., 2., 2., 2.]])
"""