Notes about Pytorch.

⚪ Pytorch 语法笔记

model.eval()和with torch.no_grad()的比较
更改预训练模型的网络结构
用ImageFolder读取测试数据集
register_buffer和register_parameter
使用itertools.chain在优化器中捆绑多个模型参数
使用tensor.scatter构造one-hot向量
使用tensor.contiguous()进行张量拷贝

1. model.eval()和with torch.no_grad()的比较

model.eval()使模型进入测试(evaluation)模式（对应的，model.train()使模型进入训练模式）。

部分深度学习方法(如batch norm和dropout)在训练和测试模式中表现不同：

对于batch norm，训练时的统计参数(均值和标准差)是由每一个批次计算得到；而测试时的统计参数是通过训练时的统计参数进行滑动平均存储得到的。
对于dropout，训练时每一个神经元都有一定概率被舍弃，并对该层的输出做修正；而测试时不做处理。

torch.no_grad()关闭了自动求导机制，在该语句下的程序段会减少内存使用，提高计算速度，但是无法进行反向传播；如果没有关闭自动求导，即使没有进行反向传播，也会一直存储之前累积的梯度。

2. 更改预训练模型的网络结构

继承自torch.nn.Module的模型包含一个叫做children()的函数，这个函数可以用来提取出模型每一层的网络结构，在此基础上进行修改即可。

如去掉Resnet50的最后一层全连接层：

class Net(torch.nn.Module):
    def __init__(self , model):
        super(Net, self).__init__()
        self.resnet_layer = torch.nn.Sequential(*list(model.children())[:-1])
    
    def forward(self, x):
        return self.resnet_layer(x)
    
resnet = models.resnet50(pretrained=True)
model = Net(resnet)

3. 用ImageFolder读取测试数据

使用dataset = ImageFolder('data/test')可以从文件夹中读取图像文件。

ImageFolder读取文件是按照以下顺序（而不是顺序编号）：

from torchvision.datasets import ImageFolder

['test/1.jpg',
 'test/10.jpg',
 'test/100.jpg',
 'test/1000.jpg',
 'test/10000.jpg',
 'test/10001.jpg',
 'test/10002.jpg',
 'test/10003.jpg',
 ......

故需建立测试数据与预测结果之间的联系：

dataset.imgs以列表的形式返回按顺序读取的文件路径及其类别，

fname = dataset.imgs[i][0]返回读取的第i个文件路径，

index = int(fname[fname.rfind('\\')+1:fname.rfind('.')])得到第i个文件的文件名，即顺序编号，

其中.rfind方法返回字符串中最后一次出现某字符的位置。

4. `register_buffer`和`register_parameter`

torch.nn.Module.register_buffer用于注册不应被视为模型参数的缓冲区。例如BatchNorm的running_mean不是参数，而是模块状态的一部分，模型训练时不会更新，其值只能人为地改变；但是保存模型时，该组参数又作为模型参数不可或缺的一部分被保存。

而torch.nn.Module.register_parameter定义的参数会参与梯度更新过程。

import torch 
import torch.nn as nn
from collections import OrderedDict

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        # （1）使用register_buffer()定义一组参数
        self.register_buffer('param_buf', torch.randn(1, 2))
        # （2）使用形式类似的register_parameter()定义一组参数
        self.register_parameter('param_reg', torch.randn(1, 2))

    def forward(self, x):
        return x

net = Model()
net.state_dict()
"""
OrderedDict([('param_reg', tensor([[-0.0617, -0.8984]])),
             ('param_buf', tensor([[-1.0517,  0.7663]])),
"""

5. 使用`itertools.chain`在优化器中捆绑多个模型参数

有时会有多个模型共同构造了同一个损失函数，在参数更新时需要一起训练。此时可以用itertools.chain方法捆绑多个模型参数：

import itertools
optimizer_G = torch.optim.Adam(
    itertools.chain(E.parameters(), G.parameters()),
    lr=opt.lr,
    betas=(opt.b1, opt.b2),
)

6. 使用`tensor.scatter`构造one-hot向量

tensor.scatter(dim, index, src) → Tensor

tensor.scatter方法把src (Tensor)中的数据重新分配到tensor中；dim (int)表示数据分配的维度；index (LongTensor)表示数据分配的索引位置。

output[index[i][j][k]][j][k] = src[i][j][k]  # if dim == 0
output[i][index[i][j][k]][k] = src[i][j][k]  # if dim == 1
output[i][j][index[i][j][k]] = src[i][j][k]  # if dim == 2

使用tensor.scatter方法可以把整型数据转换成one-hot向量：

labels = torch.FloatTensor(y_true.shape[0], num_classes).zero_()
onehot = labels.scatter_(1, y_true.data, 1)

7. 使用`tensor.contiguous()`进行张量拷贝

torch中的一些操作不会产生新的内存地址，这时对新的tensor操作会改变原来的tensor的内容，这类常见的操作包括：

torch.permute(), torch.transpose(), torch.view(),
torch.narrow(), torch.expand()

使用contiguous方法可以对对新产生的tensor开辟新的内存，对其做进一步修改不会影响原tensor的值。

⚪ Pytorch 报错笔记

RuntimeError: model.pth is a zip archive (did you mean to use torch.jit.load()?)
RuntimeError: Error(s) in loading state_dict for Net
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
BrokenPipeError: [Errno 32] Broken pipe

1. RuntimeError: model.pth is a zip archive (did you mean to use torch.jit.load()?)

加载保存的预训练模型时，出现以下错误：

RuntimeError: model.pth is a zip archive (did you mean to use torch.jit.load()?)

在torch1.6版本中，对torch.save进行了更改。

在torch1.6及以上版本中，torch.save使用新的zipfile-based文件格式。torch.load仍然可以加载旧版本的文件。

如果想用torch.save保存旧版本的文件格式，需要传入关键字参数_use_new_zipfile_serialization=False：

torch.save(
    model.state_dict(), model_cp,
    _use_new_zipfile_serialization=False
    )

2. RuntimeError: Error(s) in loading state_dict for Net

Pytorch加载保存的预训练模型时，出现以下错误：

RuntimeError: Error(s) in loading state_dict for Net:
	Missing key(s) in state_dict: "feat1.conv.weight". 
	Unexpected key(s) in state_dict: "module.feat1.conv.weight". 

出现上述错误的原因是在训练时，使用了多块GPU，因此加载模型时也需要在载入模型参数前加上：

model = nn.DataParallel(model)

3. RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

构建网络模型时，出现以下错误：

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation.

原因是在构建模型时使用了就地操作(in-place)。in-place操作指的是直接修改原张量内容而不创建新张量的操作。使用in-place操作可以节省内存，但可能会破坏梯度计算或导致不可预测的行为。

PyTorch中的in-place操作示例：

通过下划线_标识的in-place操作，如数学运算add_(), sub_(), mul_(), div_()，形状变换t_(), squeeze_(), unsqueeze_()，激活函数relu_(), sigmoid_()
通过inplace=True参数实现的in-place操作：一些函数同时提供了in-place和非in-place版本，通过inplace参数来控制。例如：torch.relu(input, inplace=True)
其他in-place操作，如计数a += 1，切片x[..., 0] = x[..., 0]*2

PyTorch的自动微分系统对in-place操作的支持有限，因此通常建议避免使用in-place操作。若报错可以通过torch.autograd.set_detect_anomaly锁定报错的语句或变量：

import torch
# 正向传播时：开启自动求导的异常侦测
torch.autograd.set_detect_anomaly(True)

# 反向传播时：在求导时开启侦测
with torch.autograd.detect_anomaly():
    loss.backward()

4. BrokenPipeError: [Errno 32] Broken pipe

设置多线程num_workers读取数据的时候，网络报错：

BrokenPipeError: [Errno 32] Broken pipe

解决方法：在运行的语句前加上：

if __name__ == '__main__':