DeepLab v2: 通过带有空洞卷积的金字塔池化实现图像语义分割.

Deeplab v2Deeplab的基础上最大的改进在于提出了空洞空间金字塔池化 Atrous Spatial Pyramid Pooling(ASPP),即带有不同扩张率的空洞卷积的金字塔池化,该设计的主要目的是提取图像的多尺度特征。

多尺度问题就是当图像中的目标对象存在不同大小时,分割效果不佳的现象。比如同样的物体,在近处拍摄时物体显得大,远处拍摄时显得小。解决多尺度问题的目标就是不论目标对象是大还是小,网络都能将其分割地很好。Deeplab v2使用ASPP处理多尺度问题:

ASPP模块的实现如下:

#DeepLabv2使用的ASPPmodule
class ASPP_module(nn.ModuleList):
    def __init__(self, in_channels, out_channels, dilation_list=[6, 12, 18, 24]):
        super(ASPP_module, self).__init__()
        self.dilation_list = dilation_list
        for dia_rate in self.dilation_list:
            self.append(
                nn.Sequential(
                    nn.Conv2d(in_channels, out_channels, kernel_size=3, dilation=dia_rate, padding=dia_rate),
                    nn.Conv2d(out_channels, out_channels, kernel_size=1),
                    nn.Conv2d(out_channels, out_channels, kernel_size=1),
                )
            )
            
    def forward(self, x):
        outputs = []
        for aspp_module in self:
            outputs.append(aspp_module(x))

另外Deeplab v2也将Deeplab v1Backbone网络更换为ResNet,并且改进了学习率策略。

class DeepLabV2(nn.Module):
    def __init__(self, num_classes):
        super(DeepLabV2, self).__init__()
        self.num_classes = num_classes
        self.ASPP_module = ASPP_module(512,256) 
        self.backbone = ResNet()
        self.final = nn.Sequential(
            nn.Conv2d(256*4, 256, kernel_size=3, padding=1),
            nn.Conv2d(256, self.num_classes, kernel_size=1)
        )
    def forward(self, x):
        x = self.backbone(x)[-1]
        x = self.ASPP_module(x)
        x = nn.functional.interpolate(x ,scale_factor=8,mode='bilinear', align_corners=True)
        x = self.final(x)
        return x