Lightweight Convolutional Neural Networks.

卷积神经网络被广泛应用在图像分类、目标检测等视觉任务中，并取得了巨大的成功。然而，卷积神经网络通常需要较大的运算量和内存占用，在嵌入式设备等资源受限的环境中受到限制，因此需要进行网络压缩。

轻量级网络设计是网络压缩的一种方法，旨在设计计算复杂度更低的网络结构。从结构的角度考虑，卷积层提取的特征存在冗余，可以设计特殊的卷积操作，减少卷积操作的冗余，从而减少计算量。从计算的角度，模型推理过程中存在大量乘法运算，而乘法操作(相比于加法)对于目前的硬件设备不友好，可以对乘法运算进行优化，也可以减少计算量。

本文目录：

设计特殊的卷积
寻找乘法的替代

1. 设计特殊的卷积

一个标准的$3\times 3$卷积层表示如下：

class VanillaConv(nn.Module):
    """(convolution => [BN] => [ReLU])"""
    def __init__(
            self, in_channels, out_channels, 
            kernel_size=3, stride=1, padding=1, groups=1,
            bn=True, relu=True
            ):
        super().__init__()
        self.vanilla_conv = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 
                      kernel_size=kernel_size, stride=stride, padding=padding, groups=groups),
        )
        if bn:
            self.vanilla_conv.add_module('batchnorm', nn.BatchNorm2d(out_channels))
        if relu:
            self.vanilla_conv.add_module('relu', nn.ReLU(inplace=True))

    def forward(self, x):
        return self.vanilla_conv(x)

下面介绍一些特殊设计的卷积神经网络：

轻量级网络	卷积层	特殊结构
SqueezeNet	标准卷积	Fire模块
SqueezeNext	标准卷积	分离卷积($3\times 1+1\times 3$)
MobileNet	深度可分离卷积	深度(depth-wise)卷积, 逐点(point-wise)卷积
MobileNetV2	深度可分离卷积	线性瓶颈(linear bottleneck), 倒残差(inverted residual)
MobileNetV3	深度可分离卷积	通道注意力机制(SENet), 神经结构搜索(NAS)
ShuffleNet	组卷积+深度卷积	通道打乱(channel shuffle)
ShuffleNet V2	标准卷积+深度卷积	通道拆分(channel split), 通道打乱(channel shuffle)
IGCNet	组卷积	交错组卷积(overleaved group conv)
IGCV2	组卷积	交错结构化稀疏卷积(overleaved structured sparse conv)
ChannelNet	深度卷积+组卷积+通道卷积	组通道卷积, 深度可分离通道卷积, 卷积分类层
EfficientNet	MBConv(即MobileNetV3)	复合缩放(compound scaling)
EfficientNetV2	Fused-MBConv	渐进训练
GhostNet	Ghost模块	Ghost BottleNeck
MicroNet	微因子卷积	微因子(micro-factorized)深度卷积和逐点卷积
CompConv	分治卷积	-

⚪ SqueezeNet：使用Fire模块代替普通卷积

class Fire(nn.Module):
    """
    (1x1 convolution => [BN] => ReLU 
    => 1x1+3x3 convolution => [BN] => ReLU)
    """
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.e1x1 = out_channels//2
        self.e3x3 = out_channels-self.e1x1
        self.s1x1 = out_channels//4
        self.squeeze = VanillaConv(in_channels, self.s1x1, kernel_size=1, padding=0)
        self.expand1x1 = nn.Conv2d(self.s1x1, self.e1x1, kernel_size=1, padding=0)
        self.expand3x3 = nn.Conv2d(self.s1x1, self.e3x3, kernel_size=3, padding=1)
        self.tail = nn.Sequential(nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True))

    def forward(self, x):
        s = self.squeeze(x)
        e1 = self.expand1x1(s)
        e2 = self.expand3x3(s)
        e = torch.cat([e1,e2],1)
        return self.tail(e)

⚪ SqueezeNext：使用分离卷积构造标准卷积块

class SqNxt(nn.Module):
    """
    (1x1 convolution => [BN] => ReLU 
    => 1x1 convolution => [BN] => ReLU 
    => 3x1 convolution => [BN] => ReLU 
    => 1x3 convolution => [BN] => ReLU 
    => 1x1 convolution => [BN] => ReLU)
    """
    def __init__(self, in_channels, out_channels):
        super().__init__()
        print(in_channels)
        self.sqnxt = nn.Sequential(
            VanillaConv(in_channels, in_channels//2, kernel_size=1, padding=0),
            VanillaConv(in_channels//2, in_channels//4, kernel_size=1, padding=0),
            VanillaConv(in_channels//4, in_channels//2, kernel_size=(3,1), padding=(1,0)),
            VanillaConv(in_channels//2, in_channels//2, kernel_size=(1,3), padding=(0,1)),
            VanillaConv(in_channels//2, out_channels, kernel_size=1, padding=0)
        )
        if in_channels != out_channels:
            self.shortcut = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=1, padding=0))
        else:
            self.shortcut = nn.Sequential()

    def forward(self, x):
        return self.sqnxt(x)+self.shortcut(x)

⚪ MobileNet：使用深度可分离卷积(Depthwise Separable Conv)代替普通卷积

class DSConv(nn.Module):
    """
    (depthwise convolution => [BN] => ReLU6
        => 1x1 convolution => [BN] => ReLU6)
    """
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.depthwise_separable_conv = nn.Sequential(
            VanillaConv(in_channels, in_channels, kernel_size=3, padding=1, groups=in_channels), # 此处激活函数为 nn.ReLU6(inplace=True)
            VanillaConv(in_channels, out_channels, kernel_size=1, padding=0), # 此处激活函数为 nn.ReLU6(inplace=True)
        )

    def forward(self, x):
        return self.depthwise_separable_conv(x)

⚪ MobileNetV2：为MobileNet引入线性瓶颈(linear bottleneck)，并设计倒残差(inverted residual)结构

class DSConvv2(nn.Module):
    """
    (1x1 convolution => [BN] => ReLU6
    => depthwise convolution => [BN] => ReLU6 
    => 1x1 convolution => [BN] => Linear)
        """
    def __init__(self, in_channels, out_channels, t=6):
        super().__init__()
        self.inverted_residual = nn.Sequential(
            VanillaConv(in_channels, t*in_channels, kernel_size=1, padding=0), # 此处激活函数为 nn.ReLU6(inplace=True)
            VanillaConv(t*in_channels, t*in_channels, kernel_size=3, padding=1, groups=t*in_channels), # 此处激活函数为 nn.ReLU6(inplace=True)
            VanillaConv(t*in_channels, out_channels, kernel_size=1, padding=0, relu=False)
        )
        if in_channels != out_channels:
            self.shortcut = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=1, padding=0))
        else:
            self.shortcut = nn.Sequential()

    def forward(self, x):
        return self.inverted_residual(x)+self.shortcut(x)

⚪ MobileNetV3：引入通道注意力(Channel Attention)，通过神经结构搜索网络

class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel//reduction,bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel//reduction,channel, bias=False),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        b,c,h,w = x.size()
        y = self.avgpool(x).view(b,c)
        y = self.fc(y).view(b,c,1,1)
        return x * y.expand_as(x)

class DSConvv3(nn.Module):
    """
    (1x1 convolution => [BN] => Hardswish 
    => depthwise convolution => [BN] => Hardswish 
    => SELayer 
    => 1x1 convolution => [BN] => Linear)
    """
    def __init__(self, in_channels, out_channels, t=6):
        super().__init__()
        self.block = nn.Sequential(
            VanillaConv(in_channels, t*in_channels, kernel_size=1, padding=0), # 此处激活函数为 nn.Hardswish(inplace=True)
            VanillaConv(t*in_channels, t*in_channels, kernel_size=3, padding=1, groups=t*in_channels), # 此处激活函数为 nn.Hardswish(inplace=True)
            SELayer(t*in_channels),
            VanillaConv(t*in_channels, out_channels, kernel_size=1, padding=0, relu=True)
        )
        if in_channels != out_channels:
            self.shortcut = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=1, padding=0))
        else:
            self.shortcut = nn.Sequential()

    def forward(self, x):
        return self.block(x) + self.shortcut(x)

⚪ ShuffleNet：使用组卷积(Group Conv)和通道打乱(Channel Shuffle)代替普通卷积

class ShuffleBlock(nn.Module):
    def __init__(self, groups):
        super(ShuffleBlock, self).__init__()
        self.groups = groups

    def forward(self, x):
        '''Channel shuffle: [N,C,H,W] -> [N,g,C/g,H,W] -> [N,C/g,g,H,w] -> [N,C,H,W]'''
        N,C,H,W = x.size()
        g = self.groups
        # 维度变换之后必须要使用.contiguous()使得张量在内存连续之后才能调用view函数
        return x.view(N,g,int(C/g),H,W).permute(0,2,1,3,4).contiguous().view(N,C,H,W)

class ShuffleNet(nn.Module):
    """
    (1x1 group convolution => [BN] => ReLU => ChannelShuffle
    => depthwise convolution => [BN] 
    => 1x1 group convolution => [BN] => Linear)"""
    def __init__(self, in_channels, out_channels, groups=4):
        super().__init__()
        mid_channels = int(0.25*in_channels)
        # 如果输入通道太少则无法分组
        g = 1 if in_channels<groups**2 else groups
        self.shuffle_block = nn.Sequential(
            VanillaConv(in_channels, mid_channels, kernel_size=1, padding=0, groups=g),
            ShuffleBlock(groups=g),
            VanillaConv(mid_channels, mid_channels, kernel_size=3, padding=1, groups=mid_channels, relu=False),
            VanillaConv(mid_channels, out_channels, kernel_size=1, padding=0, groups=groups, relu=False)
        )
        if in_channels != out_channels:
            self.shortcut = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=1, padding=0))
        else:
            self.shortcut = nn.Sequential()

    def forward(self, x):
        return F.relu(self.shuffle_block(x)+self.shortcut(x))

⚪ ShuffleNet V2：为ShuffleNet引入通道拆分(Channel Split)

# ShuffleBlock定义见ShuffleNet  
class ShuffleNetv2(nn.Module):
    """
    (ChannelSplit 
    => 1x1 convolution => [BN] => ReLU 
    => depthwise convolution => [BN] 
    => 1x1 convolution => [BN] => ReLU 
    => ChannelShuffle)
    """
    def __init__(self, in_channels, out_channels):
        super().__init__()
        # 需要处理输入输出特征通道数不相等的情况
        self.cin = in_channels//2
        self.cout = out_channels//2
        self.block = nn.Sequential(
            VanillaConv(self.cin, self.cin, kernel_size=1, padding=0),
            VanillaConv(self.cin, self.cin, kernel_size=3, padding=1, groups=self.cin, relu=False),
            VanillaConv(self.cin, self.cout, kernel_size=1, padding=0)
        )
        if in_channels != out_channels:
            self.shortcut = nn.Sequential(nn.Conv2d(self.cin, self.cout, kernel_size=1, padding=0))
        else:
            self.shortcut = nn.Sequential()
        self.shuffle = ShuffleBlock(groups=2)

    def forward(self, x):
        # chunk方法可以对张量分块，后面的块通道数可能少一些
        x1, x2 = x.chunk(2, dim=1)
        y = torch.cat([self.shortcut(x1), self.block(x2)], dim=1)
        return self.shuffle(y)

⚪ IGCNet：使用交错组卷积代替普通卷积

# ShuffleBlock定义见ShuffleNet    
class IGCNet(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.M = 2
        self.L = out_channels//self.M
        self.igconv = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, groups=self.L),
            nn.BatchNorm2d(out_channels),
            ShuffleBlock(groups=self.L),
            nn.Conv2d(out_channels, out_channels, kernel_size=1, padding=0, groups=self.M),
            nn.BatchNorm2d(out_channels),
            ShuffleBlock(groups=self.M),
            nn.ReLU(inplace=False)
        )

    def forward(self, x):
        return self.igconv(x)

⚪ IGCV2 ：使用交错结构化稀疏卷积代替普通卷积

# ShuffleBlock定义见ShuffleNet    
class IGCV2(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.K = 8
        self.L = math.ceil(math.log(out_channels)/math.log(self.K))+1
        self.igcv2 = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, groups=out_channels//self.K),
            nn.BatchNorm2d(out_channels),
        )
        for l in range(self.L-1):
            self.igcv2.add_module('shuffle'+str(l+2),
                                  ShuffleBlock(groups=out_channels//self.K))
            self.igcv2.add_module('groupconv'+str(l+2), 
                                  nn.Conv2d(out_channels, out_channels, 
                                            kernel_size=1, padding=0, groups=out_channels//self.K))
            self.igcv2.add_module('batchnorm'+str(l+2), 
                                  nn.BatchNorm2d(out_channels))

    def forward(self, x):
        return F.relu(self.igcv2(x))

⚪ ChannelNet：使用通道卷积代替普通卷积

class ChannelConv(nn.Module):
    def __init__(self, group, kernel_size, padding):
        super().__init__()
        self.conv = nn.Conv3d(1, group, kernel_size, 
                              stride=(group, 1, 1), 
                              padding=(padding, 0, 0), 
                              bias=False)

    def forward(self, x):
        x = x.unsqueeze(1)
        x = self.conv(x)
        x = x.view(x.size(0), -1, x.size(3), x.size(4))
        return x

GCWConv = ChannelConv(g, (f, 1, 1), (f-g)//2)
DWSCWConv = ChannelConv(1, (f, 1, 1), (f-1)//2)
CCL = ChannelConv(1, (m-n+1, df, df), 0)

⚪ EfficientNet：复合缩放网络深度、宽度和分辨率

基本结构与MobileNetV3相同，作者称之为MBConv：

复合缩放网络的深度、宽度和分辨率。

⚪ EfficientNetV2：复合缩放结构，渐进训练网络

基本结构采用MBConv和一种改进的Fused-MBConv。MBConv与MobileNetV3相同，Fused-MBConv是将其中的深度可分离卷积还原为标准卷积。

# SELayer定义见MobileNetV3
class Fused-MBConv(nn.Module):
    """
    (3x3 convolution => [BN] => ReLU
    => SELayer 
    => 1x1 convolution => [BN] => Linear)
    """
    def __init__(self, in_channels, out_channels, t=4):
        super().__init__()
        self.block = nn.Sequential(
            VanillaConv(in_channels, t*in_channels, kernel_size=3, padding=1, relu=True), 
            SELayer(t*in_channels),
            VanillaConv(t*in_channels, out_channels, kernel_size=1, padding=0, relu=True)
        )
        if in_channels != out_channels:
            self.shortcut = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=1, padding=0))
        else:
            self.shortcut = nn.Sequential()

    def forward(self, x):
        return self.block(x) + self.shortcut(x)

⚪ GhostNet：使用Ghost模块代替普通卷积

class GhostModule(nn.Module):
    def __init__(self, in_channels, out_channels, s=2, d=3, relu=True):
        super().__init__()
        self.s = s
        self.d = d
        self.mid = out_channels//self.s
        self.primary_conv = VanillaConv(in_channels, self.mid, 
                                        kernel_size=1, padding=0, relu=True)
        self.cheap_operation = VanillaConv(self.mid, out_channels-self.mid, 
                                           kernel_size=self.d, padding=self.d//2, 
                                           groups=self.mid, relu=True)

    def forward(self, x):
        x1 = self.primary_conv(x)
        x2 = self.cheap_operation(x1)
        out = torch.cat([x1,x2], dim=1)
        return out

⚪ MicroNet：使用微因子卷积代替普通卷积

⚪ CompConv：使用分治卷积代替普通卷积

2. 寻找乘法的替代

⚪ AdderNet：使用L1距离代替卷积乘法

卷积神经网络的计算可以表示为卷积滤波器$F \in \Bbb{R}^{d \times d \times c_{in} \times c_{out}}$和输入特征$X \in \Bbb{R}^{H \times W \times c_{in}}$的乘积：

\[Y(m,n,t)=\sum_{i=0}^{d} {\sum_{j=0}^{d} {\sum_{k=0}^{c_{in}} {S(X(m+i,n+j,k),F(i,j,k,t))}}}\]

使用L1距离代替卷积计算中的乘法：

\[Y(m,n,t)=-\sum_{i=0}^{d} {\sum_{j=0}^{d} {\sum_{k=0}^{c_{in}} {| X(m+i,n+j,k)-F(i,j,k,t) | }}}\]

⚪ Mitchell’s approximate：使用Mitchell近似代替卷积乘法

二进制下的乘法运算可以通过对数和指数转换转变成加法运算：

\[pq=2^s, \quad s=\log_2p+\log_2q\]

因此计算$p$和$q$的乘积，可以先通过Mitchell近似计算快速对数$\log_2p$和$\log_2q$，将其相加后得到$s$；再通过Mitchell近似计算快速指数$2^s$。

⚪ 参考文献

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size：(arXiv1602)SqueezeNet: 与AlexNet精度相当的轻量级模型。
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications：(arXiv1704)MobileNet: 使用深度可分离卷积构造轻量网络。
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices：(arXiv1707)ShuffleNet: 使用组卷积与通道打乱构造高效网络。
Interleaved Group Convolutions for Deep Neural Networks：(arXiv1707)IGCNet: 交错组卷积网络。
MobileNetV2: Inverted Residuals and Linear Bottlenecks：(arXiv1801)MobileNetV2: 倒残差与线性瓶颈。
SqueezeNext: Hardware-Aware Neural Network Design：(arXiv1803)SqueezeNext: 针对硬件特性的神经网络设计。
IGCV2: Interleaved Structured Sparse Convolutional Neural Networks：(arXiv1804)IGCV2: 交错结构化稀疏卷积。
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design：(arXiv1807)ShuffleNet V2: 高效卷积神经网络结构设计的实践准则。
ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions：(arXiv1809)ChannelNets: 使用通道卷积构建高效卷积神经网络。
Searching for MobileNetV3：(arXiv1905)使用神经结构搜索寻找MobileNet V3。
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks：(arXiv1905)EfficientNet: 重新考虑卷积神经网络的缩放。
GhostNet: More Features from Cheap Operations：(arXiv1911)GhostNet：使用廉价操作构造更多特征。
AdderNet: Do We Really Need Multiplications in Deep Learning?：(arXiv1912)AdderNet：仅使用加法运算的卷积神经网络。
MicroNet: Towards Image Recognition with Extremely Low FLOPs：(arXiv2011)MicroNet：极低FLOPs的图像识别网络。
Deep Neural Network Training without Multiplications：(arXiv2012)使用Mitchell近似构造加法神经网络。
EfficientNetV2: Smaller Models and Faster Training：(arXiv2104)EfficientNetV2: 更小的模型和更快的训练。
CompConv: A Compact Convolution Module for Efficient Feature Learning：(arXiv2106)CompConv：使用分治法的紧凑卷积模块。