Lightweight Convolutional Neural Networks.

卷积神经网络被广泛应用在图像分类、目标检测等视觉任务中,并取得了巨大的成功。然而,卷积神经网络通常需要较大的运算量和内存占用,在嵌入式设备等资源受限的环境中受到限制,因此需要进行网络压缩。

轻量级网络设计是网络压缩的一种方法,旨在设计计算复杂度更低的网络结构。 从结构的角度考虑,卷积层提取的特征存在冗余,可以设计特殊的卷积操作,减少卷积操作的冗余,从而减少计算量。 从计算的角度,模型推理过程中存在大量乘法运算,而乘法操作(相比于加法)对于目前的硬件设备不友好,可以对乘法运算进行优化,也可以减少计算量。

本文目录:

  1. 设计特殊的卷积
  2. 寻找乘法的替代

1. 设计特殊的卷积

一个标准的$3\times 3$卷积层表示如下:

class VanillaConv(nn.Module):
    """(convolution => [BN] => [ReLU])"""
    def __init__(
            self, in_channels, out_channels, 
            kernel_size=3, stride=1, padding=1, groups=1,
            bn=True, relu=True
            ):
        super().__init__()
        self.vanilla_conv = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 
                      kernel_size=kernel_size, stride=stride, padding=padding, groups=groups),
        )
        if bn:
            self.vanilla_conv.add_module('batchnorm', nn.BatchNorm2d(out_channels))
        if relu:
            self.vanilla_conv.add_module('relu', nn.ReLU(inplace=True))

    def forward(self, x):
        return self.vanilla_conv(x)

下面介绍一些特殊设计的卷积神经网络:

轻量级网络 卷积层 特殊结构
SqueezeNet 标准卷积 Fire模块
SqueezeNext 标准卷积 分离卷积($3\times 1+1\times 3$)
MobileNet 深度可分离卷积 深度(depth-wise)卷积, 逐点(point-wise)卷积
MobileNetV2 深度可分离卷积 线性瓶颈(linear bottleneck), 倒残差(inverted residual)
MobileNetV3 深度可分离卷积 通道注意力机制(SENet), 神经结构搜索(NAS)
ShuffleNet 组卷积+深度卷积 通道打乱(channel shuffle)
ShuffleNet V2 标准卷积+深度卷积 通道拆分(channel split), 通道打乱(channel shuffle)
IGCNet 组卷积 交错组卷积(overleaved group conv)
IGCV2 组卷积 交错结构化稀疏卷积(overleaved structured sparse conv)
ChannelNet 深度卷积+组卷积+通道卷积 组通道卷积, 深度可分离通道卷积, 卷积分类层
EfficientNet MBConv(即MobileNetV3) 复合缩放(compound scaling)
EfficientNetV2 Fused-MBConv 渐进训练
GhostNet Ghost模块 Ghost BottleNeck
MicroNet 微因子卷积 微因子(micro-factorized)深度卷积和逐点卷积
CompConv 分治卷积 -

SqueezeNet:使用Fire模块代替普通卷积

class Fire(nn.Module):
    """
    (1x1 convolution => [BN] => ReLU 
    => 1x1+3x3 convolution => [BN] => ReLU)
    """
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.e1x1 = out_channels//2
        self.e3x3 = out_channels-self.e1x1
        self.s1x1 = out_channels//4
        self.squeeze = VanillaConv(in_channels, self.s1x1, kernel_size=1, padding=0)
        self.expand1x1 = nn.Conv2d(self.s1x1, self.e1x1, kernel_size=1, padding=0)
        self.expand3x3 = nn.Conv2d(self.s1x1, self.e3x3, kernel_size=3, padding=1)
        self.tail = nn.Sequential(nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True))

    def forward(self, x):
        s = self.squeeze(x)
        e1 = self.expand1x1(s)
        e2 = self.expand3x3(s)
        e = torch.cat([e1,e2],1)
        return self.tail(e)

SqueezeNext:使用分离卷积构造标准卷积块

class SqNxt(nn.Module):
    """
    (1x1 convolution => [BN] => ReLU 
    => 1x1 convolution => [BN] => ReLU 
    => 3x1 convolution => [BN] => ReLU 
    => 1x3 convolution => [BN] => ReLU 
    => 1x1 convolution => [BN] => ReLU)
    """
    def __init__(self, in_channels, out_channels):
        super().__init__()
        print(in_channels)
        self.sqnxt = nn.Sequential(
            VanillaConv(in_channels, in_channels//2, kernel_size=1, padding=0),
            VanillaConv(in_channels//2, in_channels//4, kernel_size=1, padding=0),
            VanillaConv(in_channels//4, in_channels//2, kernel_size=(3,1), padding=(1,0)),
            VanillaConv(in_channels//2, in_channels//2, kernel_size=(1,3), padding=(0,1)),
            VanillaConv(in_channels//2, out_channels, kernel_size=1, padding=0)
        )
        if in_channels != out_channels:
            self.shortcut = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=1, padding=0))
        else:
            self.shortcut = nn.Sequential()

    def forward(self, x):
        return self.sqnxt(x)+self.shortcut(x)

MobileNet:使用深度可分离卷积(Depthwise Separable Conv)代替普通卷积

class DSConv(nn.Module):
    """
    (depthwise convolution => [BN] => ReLU6
        => 1x1 convolution => [BN] => ReLU6)
    """
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.depthwise_separable_conv = nn.Sequential(
            VanillaConv(in_channels, in_channels, kernel_size=3, padding=1, groups=in_channels), # 此处激活函数为 nn.ReLU6(inplace=True)
            VanillaConv(in_channels, out_channels, kernel_size=1, padding=0), # 此处激活函数为 nn.ReLU6(inplace=True)
        )

    def forward(self, x):
        return self.depthwise_separable_conv(x)

MobileNetV2:为MobileNet引入线性瓶颈(linear bottleneck),并设计倒残差(inverted residual)结构

class DSConvv2(nn.Module):
    """
    (1x1 convolution => [BN] => ReLU6
    => depthwise convolution => [BN] => ReLU6 
    => 1x1 convolution => [BN] => Linear)
        """
    def __init__(self, in_channels, out_channels, t=6):
        super().__init__()
        self.inverted_residual = nn.Sequential(
            VanillaConv(in_channels, t*in_channels, kernel_size=1, padding=0), # 此处激活函数为 nn.ReLU6(inplace=True)
            VanillaConv(t*in_channels, t*in_channels, kernel_size=3, padding=1, groups=t*in_channels), # 此处激活函数为 nn.ReLU6(inplace=True)
            VanillaConv(t*in_channels, out_channels, kernel_size=1, padding=0, relu=False)
        )
        if in_channels != out_channels:
            self.shortcut = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=1, padding=0))
        else:
            self.shortcut = nn.Sequential()

    def forward(self, x):
        return self.inverted_residual(x)+self.shortcut(x)

MobileNetV3:引入通道注意力(Channel Attention),通过神经结构搜索网络

class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel//reduction,bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel//reduction,channel, bias=False),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        b,c,h,w = x.size()
        y = self.avgpool(x).view(b,c)
        y = self.fc(y).view(b,c,1,1)
        return x * y.expand_as(x)

class DSConvv3(nn.Module):
    """
    (1x1 convolution => [BN] => Hardswish 
    => depthwise convolution => [BN] => Hardswish 
    => SELayer 
    => 1x1 convolution => [BN] => Linear)
    """
    def __init__(self, in_channels, out_channels, t=6):
        super().__init__()
        self.block = nn.Sequential(
            VanillaConv(in_channels, t*in_channels, kernel_size=1, padding=0), # 此处激活函数为 nn.Hardswish(inplace=True)
            VanillaConv(t*in_channels, t*in_channels, kernel_size=3, padding=1, groups=t*in_channels), # 此处激活函数为 nn.Hardswish(inplace=True)
            SELayer(t*in_channels),
            VanillaConv(t*in_channels, out_channels, kernel_size=1, padding=0, relu=True)
        )
        if in_channels != out_channels:
            self.shortcut = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=1, padding=0))
        else:
            self.shortcut = nn.Sequential()

    def forward(self, x):
        return self.block(x) + self.shortcut(x)

ShuffleNet:使用组卷积(Group Conv)和通道打乱(Channel Shuffle)代替普通卷积

class ShuffleBlock(nn.Module):
    def __init__(self, groups):
        super(ShuffleBlock, self).__init__()
        self.groups = groups

    def forward(self, x):
        '''Channel shuffle: [N,C,H,W] -> [N,g,C/g,H,W] -> [N,C/g,g,H,w] -> [N,C,H,W]'''
        N,C,H,W = x.size()
        g = self.groups
        # 维度变换之后必须要使用.contiguous()使得张量在内存连续之后才能调用view函数
        return x.view(N,g,int(C/g),H,W).permute(0,2,1,3,4).contiguous().view(N,C,H,W)

class ShuffleNet(nn.Module):
    """
    (1x1 group convolution => [BN] => ReLU => ChannelShuffle
    => depthwise convolution => [BN] 
    => 1x1 group convolution => [BN] => Linear)"""
    def __init__(self, in_channels, out_channels, groups=4):
        super().__init__()
        mid_channels = int(0.25*in_channels)
        # 如果输入通道太少则无法分组
        g = 1 if in_channels<groups**2 else groups
        self.shuffle_block = nn.Sequential(
            VanillaConv(in_channels, mid_channels, kernel_size=1, padding=0, groups=g),
            ShuffleBlock(groups=g),
            VanillaConv(mid_channels, mid_channels, kernel_size=3, padding=1, groups=mid_channels, relu=False),
            VanillaConv(mid_channels, out_channels, kernel_size=1, padding=0, groups=groups, relu=False)
        )
        if in_channels != out_channels:
            self.shortcut = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=1, padding=0))
        else:
            self.shortcut = nn.Sequential()

    def forward(self, x):
        return F.relu(self.shuffle_block(x)+self.shortcut(x))

ShuffleNet V2:为ShuffleNet引入通道拆分(Channel Split)

# ShuffleBlock定义见ShuffleNet  
class ShuffleNetv2(nn.Module):
    """
    (ChannelSplit 
    => 1x1 convolution => [BN] => ReLU 
    => depthwise convolution => [BN] 
    => 1x1 convolution => [BN] => ReLU 
    => ChannelShuffle)
    """
    def __init__(self, in_channels, out_channels):
        super().__init__()
        # 需要处理输入输出特征通道数不相等的情况
        self.cin = in_channels//2
        self.cout = out_channels//2
        self.block = nn.Sequential(
            VanillaConv(self.cin, self.cin, kernel_size=1, padding=0),
            VanillaConv(self.cin, self.cin, kernel_size=3, padding=1, groups=self.cin, relu=False),
            VanillaConv(self.cin, self.cout, kernel_size=1, padding=0)
        )
        if in_channels != out_channels:
            self.shortcut = nn.Sequential(nn.Conv2d(self.cin, self.cout, kernel_size=1, padding=0))
        else:
            self.shortcut = nn.Sequential()
        self.shuffle = ShuffleBlock(groups=2)

    def forward(self, x):
        # chunk方法可以对张量分块,后面的块通道数可能少一些
        x1, x2 = x.chunk(2, dim=1)
        y = torch.cat([self.shortcut(x1), self.block(x2)], dim=1)
        return self.shuffle(y)

IGCNet:使用交错组卷积代替普通卷积

# ShuffleBlock定义见ShuffleNet    
class IGCNet(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.M = 2
        self.L = out_channels//self.M
        self.igconv = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, groups=self.L),
            nn.BatchNorm2d(out_channels),
            ShuffleBlock(groups=self.L),
            nn.Conv2d(out_channels, out_channels, kernel_size=1, padding=0, groups=self.M),
            nn.BatchNorm2d(out_channels),
            ShuffleBlock(groups=self.M),
            nn.ReLU(inplace=False)
        )

    def forward(self, x):
        return self.igconv(x)

IGCV2 :使用交错结构化稀疏卷积代替普通卷积

# ShuffleBlock定义见ShuffleNet    
class IGCV2(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.K = 8
        self.L = math.ceil(math.log(out_channels)/math.log(self.K))+1
        self.igcv2 = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, groups=out_channels//self.K),
            nn.BatchNorm2d(out_channels),
        )
        for l in range(self.L-1):
            self.igcv2.add_module('shuffle'+str(l+2),
                                  ShuffleBlock(groups=out_channels//self.K))
            self.igcv2.add_module('groupconv'+str(l+2), 
                                  nn.Conv2d(out_channels, out_channels, 
                                            kernel_size=1, padding=0, groups=out_channels//self.K))
            self.igcv2.add_module('batchnorm'+str(l+2), 
                                  nn.BatchNorm2d(out_channels))

    def forward(self, x):
        return F.relu(self.igcv2(x))

ChannelNet:使用通道卷积代替普通卷积

class ChannelConv(nn.Module):
    def __init__(self, group, kernel_size, padding):
        super().__init__()
        self.conv = nn.Conv3d(1, group, kernel_size, 
                              stride=(group, 1, 1), 
                              padding=(padding, 0, 0), 
                              bias=False)

    def forward(self, x):
        x = x.unsqueeze(1)
        x = self.conv(x)
        x = x.view(x.size(0), -1, x.size(3), x.size(4))
        return x

GCWConv = ChannelConv(g, (f, 1, 1), (f-g)//2)
DWSCWConv = ChannelConv(1, (f, 1, 1), (f-1)//2)
CCL = ChannelConv(1, (m-n+1, df, df), 0)

EfficientNet:复合缩放网络深度、宽度和分辨率

基本结构与MobileNetV3相同,作者称之为MBConv

复合缩放网络的深度、宽度和分辨率。

EfficientNetV2:复合缩放结构,渐进训练网络

基本结构采用MBConv和一种改进的Fused-MBConvMBConvMobileNetV3相同,Fused-MBConv是将其中的深度可分离卷积还原为标准卷积。

# SELayer定义见MobileNetV3
class Fused-MBConv(nn.Module):
    """
    (3x3 convolution => [BN] => ReLU
    => SELayer 
    => 1x1 convolution => [BN] => Linear)
    """
    def __init__(self, in_channels, out_channels, t=4):
        super().__init__()
        self.block = nn.Sequential(
            VanillaConv(in_channels, t*in_channels, kernel_size=3, padding=1, relu=True), 
            SELayer(t*in_channels),
            VanillaConv(t*in_channels, out_channels, kernel_size=1, padding=0, relu=True)
        )
        if in_channels != out_channels:
            self.shortcut = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=1, padding=0))
        else:
            self.shortcut = nn.Sequential()

    def forward(self, x):
        return self.block(x) + self.shortcut(x)

GhostNet:使用Ghost模块代替普通卷积

class GhostModule(nn.Module):
    def __init__(self, in_channels, out_channels, s=2, d=3, relu=True):
        super().__init__()
        self.s = s
        self.d = d
        self.mid = out_channels//self.s
        self.primary_conv = VanillaConv(in_channels, self.mid, 
                                        kernel_size=1, padding=0, relu=True)
        self.cheap_operation = VanillaConv(self.mid, out_channels-self.mid, 
                                           kernel_size=self.d, padding=self.d//2, 
                                           groups=self.mid, relu=True)

    def forward(self, x):
        x1 = self.primary_conv(x)
        x2 = self.cheap_operation(x1)
        out = torch.cat([x1,x2], dim=1)
        return out

MicroNet:使用微因子卷积代替普通卷积

CompConv:使用分治卷积代替普通卷积

2. 寻找乘法的替代

AdderNet:使用L1距离代替卷积乘法

卷积神经网络的计算可以表示为卷积滤波器$F \in \Bbb{R}^{d \times d \times c_{in} \times c_{out}}$和输入特征$X \in \Bbb{R}^{H \times W \times c_{in}}$的乘积:

\[Y(m,n,t)=\sum_{i=0}^{d} {\sum_{j=0}^{d} {\sum_{k=0}^{c_{in}} {S(X(m+i,n+j,k),F(i,j,k,t))}}}\]

使用L1距离代替卷积计算中的乘法:

\[Y(m,n,t)=-\sum_{i=0}^{d} {\sum_{j=0}^{d} {\sum_{k=0}^{c_{in}} {| X(m+i,n+j,k)-F(i,j,k,t) | }}}\]

Mitchell’s approximate:使用Mitchell近似代替卷积乘法

二进制下的乘法运算可以通过对数和指数转换转变成加法运算:

\[pq=2^s, \quad s=\log_2p+\log_2q\]

因此计算$p$和$q$的乘积,可以先通过Mitchell近似计算快速对数$\log_2p$和$\log_2q$,将其相加后得到$s$;再通过Mitchell近似计算快速指数$2^s$。

⚪ 参考文献