FCN: 语义分割的全卷积网络.

FCN(Fully Convilutional Networks)是语义分割领域的开山之作,相较于此前提出的AlexNetVGG等卷积全连接的网络结构,FCN提出用卷积层代替全连接层来处理语义分割问题,在PASCAL VOC(2012)数据集上获得了$62.2\%$的mIoU

FCN通过全卷积网络进行特征提取和下采样,通过双线性插值或可学习的转置卷积进行上采样,并建立一个有向无环图(DAG)进行特征融合。

  1. 先进行5次下采样得到尺寸为输入图像$\frac{1}{32}$的特征图像;
  2. 对上述特征图像进行32倍上采样得到第一张输出特征图像FCN-32s
  3. 结合第4次和第5次下采样的特征映射进行16倍上采样得到第二张输出特征图像FCN-16s
  4. 结合第3次、第4次和第5次下采样的特征映射进行8倍上采样得到第三张输出特征图像FCN-8s

特征图的构建过程引入了跳跃连接,允许模型在上采样过程中获得不同维度的特征,融合更多特征的同时也保留更多细节,帮助模型更精细的重建图像信息。特征图像FCN-8s相对于特征图像FCN-32s和特征图像FCN-16s,既含有丰富的语义信息,又含有丰富的空间信息,分割效果最好:

利用PyTorch实现一个FCN-8网络:

class FCN8(nn.Module): 
    def __init__(self, num_classes):
        super(FCN8, self).__init__()  
        self.stage1 = nn.Sequential(
            nn.Conv2d(in_channels=3,out_channels=96,kernel_size=3,padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=96),
            nn.MaxPool2d(kernel_size=2,padding=0)
        )
        self.stage2 = nn.Sequential(
            nn.Conv2d(in_channels=96,out_channels=256,kernel_size=3,padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=256),
            nn.MaxPool2d(kernel_size=2,padding=0) 
        )
        
        self.stage3 = nn.Sequential(
            nn.Conv2d(in_channels=256,out_channels=384,kernel_size=3,padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=384),
            
            nn.Conv2d(in_channels=384,out_channels=384,kernel_size=3,padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=384),   
            
            nn.Conv2d(in_channels=384,out_channels=256,kernel_size=3,padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=256),    
            
            nn.MaxPool2d(kernel_size=2,padding=0) 
        )
        
        self.stage4 = nn.Sequential(
            nn.Conv2d(in_channels=256,out_channels=512,kernel_size=3,padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=512),
            
            nn.Conv2d(in_channels=512,out_channels=512,kernel_size=3,padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=512),   
  
            nn.MaxPool2d(kernel_size=2,padding=0) 
        )
        
        self.stage5 = nn.Sequential(
            nn.Conv2d(in_channels=512,out_channels=num_classes,kernel_size=3,padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=num_classes),
            
            nn.MaxPool2d(kernel_size=2,padding=0) 
        )
        
        #k倍上采样
        self.upsample_2 = nn.ConvTranspose2d(in_channels=512, out_channels=512, kernel_size=4, padding= 1,stride=2)
        self.upsample_4 = nn.ConvTranspose2d(in_channels=num_classes, out_channels=num_classes, kernel_size=4, padding= 0,stride=4)
        self.upsample_81 = nn.ConvTranspose2d(in_channels=512+num_classes+256, out_channels=512+num_classes+256, kernel_size=4, padding= 0,stride=4)
        self.upsample_82 = nn.ConvTranspose2d(in_channels=512+num_classes+256, out_channels=512+num_classes+256, kernel_size=4, padding= 1,stride=2)
        #最后的预测模块
        self.final = nn.Sequential(
            nn.Conv2d(512+num_classes+256, num_classes, kernel_size=7, padding=3),
        )
        
    def forward(self, x):
        x = x.float()
        #conv1->pool1->输出
        x = self.stage1(x)
        #conv2->pool2->输出
        x = self.stage2(x)
        #conv3->pool3->输出输出, 经过上采样后, 需要用pool3暂存
        x = self.stage3(x)
        pool3 = x
        #conv4->pool4->输出输出, 经过上采样后, 需要用pool4暂存
        x = self.stage4(x)
        pool4 = self.upsample_2(x)
 
        x = self.stage5(x)
        conv7 = self.upsample_4(x)
 
        #对所有上采样过的特征图进行concat, 在channel维度上进行叠加
        x = torch.cat([pool3, pool4, conv7], dim = 1)
        #经过一个分类网络,输出结果(这里采样到原图大小,分别一次2倍一次4倍上采样来实现8倍上采样)
        output = self.upsample_81(x)
        output = self.upsample_82(output)
        output = self.final(output)
 
        return output