FCOS:一种简单强力的Anchor-free目标检测器.

FCOS是一种Anchor-free的检测模型,先预测下采样S倍的特征图上的各点类别,再预测各点的$l,r,t,b$四个值来确定bbox的大小位置,同时提出center-ness分支,用于帮助NMS抑制低质量框,进一步提高网络的性能表现。

FCOS的整体结构非常简单,其沿用了FPN的结构,在FPN的基础上增加了两个更小的特征图P6P7P6通过P5卷积得到,P7通过P6卷积得到. 最终P3-P7都将被用做检测,并且共享一个检测头。检测头采用decouple的形式,分为Classification分支和位置预测分支,其中位置预测分支包括Regression分支和Center-ness分支。

class FPN(nn.Module):
    def __init__(self,features=256):
        super(FPN,self).__init__()
        self.prj_3 = nn.Conv2d(512, features, kernel_size=1)
        self.prj_4 = nn.Conv2d(1024, features, kernel_size=1)
        self.prj_5 = nn.Conv2d(2048, features, kernel_size=1)

        self.conv_3 =nn.Conv2d(features, features, kernel_size=3, padding=1)
        self.conv_4 =nn.Conv2d(features, features, kernel_size=3, padding=1)
        self.conv_5 =nn.Conv2d(features, features, kernel_size=3, padding=1)

        self.conv_out6 = nn.Conv2d(features, features, kernel_size=3, padding=1, stride=2)
        self.conv_out7 = nn.Conv2d(features, features, kernel_size=3, padding=1, stride=2)
        
    def upsamplelike(self,inputs):
        src, target = inputs
        return F.interpolate(src, size=(target.shape[2], target.shape[3]),mode='nearest')

    def forward(self,x):
        C3, C4, C5 = x
        #-------------------------------------#
        #   80, 80, 512 -> 80, 80, 256
        #   40, 40, 1024 -> 40, 40, 256
        #   20, 20, 2048 -> 20, 20, 256
        #-------------------------------------#
        P3 = self.prj_3(C3)
        P4 = self.prj_4(C4)
        P5 = self.prj_5(C5)
            
        #------------------------------------------------#
        #   20, 20, 256 -> 40, 40, 256 -> 40, 40, 256
        #------------------------------------------------#
        P4 = P4 + self.upsamplelike([P5, C4])
        #------------------------------------------------#
        #   40, 40, 256 -> 80, 80, 256 -> 80, 80, 256
        #------------------------------------------------#
        P3 = P3 + self.upsamplelike([P4, C3])

        # 80, 80, 256
        P3 = self.conv_3(P3)
        # 40, 40, 256
        P4 = self.conv_4(P4)
        # 20, 20, 256
        P5 = self.conv_5(P5)

        # 10, 10, 256
        P6 = self.conv_out6(P5)
        # 5, 5, 256
        P7 = self.conv_out7(F.relu(P6))
        return [P3,P4,P5,P6,P7]

class ScaleExp(nn.Module):
    def __init__(self, init_value=1.0):
        super(ScaleExp,self).__init__()
        self.scale = nn.Parameter(torch.tensor([init_value], dtype=torch.float32))
    def forward(self,x):
        return torch.exp(x*self.scale)

class Fcos_Head(nn.Module):
    def __init__(self, in_channel ,num_classes):
        super(Fcos_Head,self).__init__()
        self.num_classes=num_classes
        cls_branch=[]
        reg_branch=[]

        for _ in range(4):
            cls_branch.append(nn.Conv2d(in_channel, in_channel, kernel_size=3, padding=1, bias=True))
            cls_branch.append(nn.GroupNorm(32, in_channel)),
            cls_branch.append(nn.ReLU(True))

            reg_branch.append(nn.Conv2d(in_channel, in_channel, kernel_size=3, padding=1, bias=True))
            reg_branch.append(nn.GroupNorm(32, in_channel)),
            reg_branch.append(nn.ReLU(True))

        self.cls_conv=nn.Sequential(*cls_branch)
        self.reg_conv=nn.Sequential(*reg_branch)

        self.cls_logits = nn.Conv2d(in_channel, num_classes, kernel_size=3, padding=1)
        self.cnt_logits = nn.Conv2d(in_channel, 1, kernel_size=3, padding=1)
        self.reg_pred   = nn.Conv2d(in_channel, 4, kernel_size=3, padding=1)
        
        prior = 0.01
        nn.init.constant_(self.cls_logits.bias,-math.log((1 - prior) / prior))
        self.scale_exp = nn.ModuleList([ScaleExp(1) for _ in range(5)])
    
    def forward(self,inputs):
        cls_logits  = []
        cnt_logits  = []
        reg_preds   = []
        for index, P in enumerate(inputs):
            cls_conv_out=self.cls_conv(P)
            reg_conv_out=self.reg_conv(P)

            cls_logits.append(self.cls_logits(cls_conv_out))
            cnt_logits.append(self.cnt_logits(reg_conv_out))
            reg_preds.append(self.scale_exp[index](self.reg_pred(reg_conv_out)))
        return cls_logits, cnt_logits, reg_preds


class FCOS(nn.Module):
    def __init__(self, num_classes, fpn_out_channels=256, pretrained=False):
        super().__init__()
        self.backbone   = resnet50(pretrained = pretrained)
        self.fpn        = FPN(fpn_out_channels)
        self.head       = Fcos_Head(fpn_out_channels, num_classes)

    def forward(self,x):
        #-------------------------------------#
        #   80, 80, 512
        #   40, 40, 1024
        #   20, 20, 2048
        #-------------------------------------#
        C3, C4, C5          = self.backbone(x)
        
        #-------------------------------------#
        #   80, 80, 256
        #   40, 40, 256
        #   20, 20, 256
        #   10, 10, 256
        #   5, 5, 256
        #-------------------------------------#
        P3, P4, P5, P6, P7  = self.fpn.forward([C3, C4, C5])
        
        cls_logits, cnt_logits, reg_preds = self.head.forward([P3, P4, P5, P6, P7])
        return [cls_logits, cnt_logits, reg_preds]

对大小为HxW的特征图上各点来说,如果其落入了GT的中心区域,则视作正样本. 例如,某个GT的中心点为$(c_x,c_y)$,则其中心区域为$(c_x-rs,c_y-rs,c_x+rs,c_y+rs)$,只有落入该中心区域内的点才是该GT的正样本并赋予该GT的类别属性,否则为该GT的负样本、赋予背景类别属性。其中$s$为特征图对于原图的下采样率,$r$为自定义的超参数(对于COCO数据集设为1.5)。

假设特征图上某点的回归分支四个值的预测目标为$l,r,t,b$,则该点的center-ness目标定义为:

\[o_{x,y} = \sqrt{\frac{\min(l,r)}{\max(l,r)}\times \frac{\min(t,b)}{\max(t,b)}}\]

center-ness的可视化图如下所示,距离GT中心点越近,center-ness值越大,反之越小:

在推理时,center-ness也被用作类别权重。在NMS时,其排序所用的置信度$s$由center-ness$o$和类别概率$p$得到:

\[s_{x,y} = \sqrt{p_{x,y} \times o_{x,y}}\]

因此,距离GT中心越远的点,其center-ness越小,所得到的置信度就越小,因此NMS时会倾向于抑制该点. 此外,如下图所示,在对分类概率应用了center-ness后,具有低IoU但高置信度的检测框也有效的减少了:

如果特征图上某点同时落在了多个GT的中心区域内,则被称为ambiguous sampleFCOS简单地把这些重叠GT中面积最小的作为该点的回归目标。