DeepLab v2: 通过带有空洞卷积的金字塔池化实现图像语义分割.
Deeplab v2在Deeplab的基础上最大的改进在于提出了空洞空间金字塔池化 Atrous Spatial Pyramid Pooling(ASPP),即带有不同扩张率的空洞卷积的金字塔池化,该设计的主要目的是提取图像的多尺度特征。
多尺度问题就是当图像中的目标对象存在不同大小时,分割效果不佳的现象。比如同样的物体,在近处拍摄时物体显得大,远处拍摄时显得小。解决多尺度问题的目标就是不论目标对象是大还是小,网络都能将其分割地很好。Deeplab v2使用ASPP处理多尺度问题:
ASPP模块的实现如下:
#DeepLabv2使用的ASPPmodule
class ASPP_module(nn.ModuleList):
def __init__(self, in_channels, out_channels, dilation_list=[6, 12, 18, 24]):
super(ASPP_module, self).__init__()
self.dilation_list = dilation_list
for dia_rate in self.dilation_list:
self.append(
nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=3, dilation=dia_rate, padding=dia_rate),
nn.Conv2d(out_channels, out_channels, kernel_size=1),
nn.Conv2d(out_channels, out_channels, kernel_size=1),
)
)
def forward(self, x):
outputs = []
for aspp_module in self:
outputs.append(aspp_module(x))
另外Deeplab v2也将Deeplab v1的Backbone网络更换为ResNet,并且改进了学习率策略。
class DeepLabV2(nn.Module):
def __init__(self, num_classes):
super(DeepLabV2, self).__init__()
self.num_classes = num_classes
self.ASPP_module = ASPP_module(512,256)
self.backbone = ResNet()
self.final = nn.Sequential(
nn.Conv2d(256*4, 256, kernel_size=3, padding=1),
nn.Conv2d(256, self.num_classes, kernel_size=1)
)
def forward(self, x):
x = self.backbone(x)[-1]
x = self.ASPP_module(x)
x = nn.functional.interpolate(x ,scale_factor=8,mode='bilinear', align_corners=True)
x = self.final(x)
return x