FCN: 语义分割的全卷积网络.
FCN(Fully Convilutional Networks)是语义分割领域的开山之作,相较于此前提出的AlexNet和VGG等卷积全连接的网络结构,FCN提出用卷积层代替全连接层来处理语义分割问题,在PASCAL VOC(2012)数据集上获得了$62.2\%$的mIoU。
FCN通过全卷积网络进行特征提取和下采样,通过双线性插值或可学习的转置卷积进行上采样,并建立一个有向无环图(DAG)进行特征融合。
- 先进行5次下采样得到尺寸为输入图像$\frac{1}{32}$的特征图像;
- 对上述特征图像进行32倍上采样得到第一张输出特征图像FCN-32s;
- 结合第4次和第5次下采样的特征映射进行16倍上采样得到第二张输出特征图像FCN-16s;
- 结合第3次、第4次和第5次下采样的特征映射进行8倍上采样得到第三张输出特征图像FCN-8s。
特征图的构建过程引入了跳跃连接,允许模型在上采样过程中获得不同维度的特征,融合更多特征的同时也保留更多细节,帮助模型更精细的重建图像信息。特征图像FCN-8s相对于特征图像FCN-32s和特征图像FCN-16s,既含有丰富的语义信息,又含有丰富的空间信息,分割效果最好:
利用PyTorch实现一个FCN-8网络:
class FCN8(nn.Module):
def __init__(self, num_classes):
super(FCN8, self).__init__()
self.stage1 = nn.Sequential(
nn.Conv2d(in_channels=3,out_channels=96,kernel_size=3,padding=1),
nn.ReLU(),
nn.BatchNorm2d(num_features=96),
nn.MaxPool2d(kernel_size=2,padding=0)
)
self.stage2 = nn.Sequential(
nn.Conv2d(in_channels=96,out_channels=256,kernel_size=3,padding=1),
nn.ReLU(),
nn.BatchNorm2d(num_features=256),
nn.MaxPool2d(kernel_size=2,padding=0)
)
self.stage3 = nn.Sequential(
nn.Conv2d(in_channels=256,out_channels=384,kernel_size=3,padding=1),
nn.ReLU(),
nn.BatchNorm2d(num_features=384),
nn.Conv2d(in_channels=384,out_channels=384,kernel_size=3,padding=1),
nn.ReLU(),
nn.BatchNorm2d(num_features=384),
nn.Conv2d(in_channels=384,out_channels=256,kernel_size=3,padding=1),
nn.ReLU(),
nn.BatchNorm2d(num_features=256),
nn.MaxPool2d(kernel_size=2,padding=0)
)
self.stage4 = nn.Sequential(
nn.Conv2d(in_channels=256,out_channels=512,kernel_size=3,padding=1),
nn.ReLU(),
nn.BatchNorm2d(num_features=512),
nn.Conv2d(in_channels=512,out_channels=512,kernel_size=3,padding=1),
nn.ReLU(),
nn.BatchNorm2d(num_features=512),
nn.MaxPool2d(kernel_size=2,padding=0)
)
self.stage5 = nn.Sequential(
nn.Conv2d(in_channels=512,out_channels=num_classes,kernel_size=3,padding=1),
nn.ReLU(),
nn.BatchNorm2d(num_features=num_classes),
nn.MaxPool2d(kernel_size=2,padding=0)
)
#k倍上采样
self.upsample_2 = nn.ConvTranspose2d(in_channels=512, out_channels=512, kernel_size=4, padding= 1,stride=2)
self.upsample_4 = nn.ConvTranspose2d(in_channels=num_classes, out_channels=num_classes, kernel_size=4, padding= 0,stride=4)
self.upsample_81 = nn.ConvTranspose2d(in_channels=512+num_classes+256, out_channels=512+num_classes+256, kernel_size=4, padding= 0,stride=4)
self.upsample_82 = nn.ConvTranspose2d(in_channels=512+num_classes+256, out_channels=512+num_classes+256, kernel_size=4, padding= 1,stride=2)
#最后的预测模块
self.final = nn.Sequential(
nn.Conv2d(512+num_classes+256, num_classes, kernel_size=7, padding=3),
)
def forward(self, x):
x = x.float()
#conv1->pool1->输出
x = self.stage1(x)
#conv2->pool2->输出
x = self.stage2(x)
#conv3->pool3->输出输出, 经过上采样后, 需要用pool3暂存
x = self.stage3(x)
pool3 = x
#conv4->pool4->输出输出, 经过上采样后, 需要用pool4暂存
x = self.stage4(x)
pool4 = self.upsample_2(x)
x = self.stage5(x)
conv7 = self.upsample_4(x)
#对所有上采样过的特征图进行concat, 在channel维度上进行叠加
x = torch.cat([pool3, pool4, conv7], dim = 1)
#经过一个分类网络,输出结果(这里采样到原图大小,分别一次2倍一次4倍上采样来实现8倍上采样)
output = self.upsample_81(x)
output = self.upsample_82(output)
output = self.final(output)
return output