Pytorch之MobileNetV3图像分类

MobileNet v1是在VGG网络的基础上引入了深度可分离卷积（Depthwise Separable Convolutions），从而使得卷积参数大大减少，达到轻量化的效果。

MobileNet v2在v1的基础上引入了倒残差结构（Bottleneck Residual Block），并加入了shortcut连接，以进一步提高网络的性能。

MobileNet v3使用了 NAS 和 NetAdapt 算法搜索最优的模型结构，同时对模型一些结构进行了改进，例如在 MobileNet v2 的 Bottleneck Residual Block 基础上引入 Squeeze-and-Excitation

一、MobileNet v1

Mobilenet v1是一种轻量级的神经网络结构，用于图像分类和目标检测任务。它的设计目标是在保持较高准确率的同时，减少了模型的计算复杂度和参数量。

Mobilenet v1使用了深度可分离卷积（depthwise separable convolution）来减少计算量。深度可分离卷积将标准卷积分为深度卷积和逐点卷积两个步骤，从而降低了计算量和参数量。此外，Mobilenet v1还使用了全局平均池化来减少全连接层的数量，进一步减小了模型的规模。

介绍MobileNet v1可以使用一句话概括，MobileNet v1只要将把VGG中的标准卷积层换成深度可分离卷积即可。

1.深度可分离卷积

MobileNet模型是基于深度可分离卷积，这是一种分解卷积的形式。这种形式将标准卷积分解为一个深度卷积和一个称为逐点卷积的1x1的卷积。

所以深度可分离卷积应该由两个部分组成，即深度卷积和逐点卷积。用下图直观进行表示：

先了解标准卷积的计算过程：对所有输入通道用相同的卷积核得到不同通道特征，将不同通道特征组合起来使得输出特征包含每个输入的特征。

特点：

输入特征矩阵channel = 卷积核channel
卷积核个数 = 输出特征矩阵channel

(1)深度卷积(Depthwise convolution)

深度卷积对每一个输入通道应用一个单独的卷积核（与标准卷积不同，标准卷积对每一个输入通道应用同一个卷积核）得到特征图，此时，每张特征图仅与一个输入通道和其对应卷积核相关，各个通道之间特征图不关联。

DW卷积中的每一个卷积核，只会和输入特征矩阵的一个channel进行卷积计算，所以输出的特征矩阵就等于输入的特征矩阵。

特点：

卷积核channel=1
输入特征矩阵channel=卷积核个数=输出特征矩阵channel

(2)逐点卷积(Pointwise convolution)

点卷积（1x1卷积）将深度卷积的输出特征图进行线性组合后再输出，使得最后的输出特征都包含每个输入特征，即将深度卷积输出的不关联的特征图关联起来。

点卷积将标准卷积层中卷积核大小换成1x1即可得到，其主要作用就是对特征图进行升维和降维。

(3)计算量对比

图为标准卷积与深度可分离卷积对应的计算量

其中 $D_{F}$ 为输入Feature Map的宽度和高度（假设输入特征图与使用的卷积核均为正方形），

M为输入通道数 / 输入Feature Map深度），N为输出通道数， $D_{K}$ 为卷积核大小。

然后引入计算量的定义：

参数数量（params）：关系到模型大小，单位通常为M，通常参数用 float32 表示，所以模型大小是参数数量的 4 倍。

FLOPS(floating point operations per second)：注意全大写，指每秒浮点运算次数，理解为计算速度，是一个衡量硬件性能的指标。

FLOPs(floating point operations)：注意s小写（s表复数），指浮点运算数，理解为计算量。

可以用来衡量算法/模型的复杂度。这关系到算法速度，大模型的单位通常为 G，小模型单位通常为 M。

注意在深度学习中，我们用FLOPs，也就是计算量，用来衡量算法/模型的复杂度。

不同神经网络层的参数数量和计算量估计方法：

Conv2d标准卷积层：
Input: H ∗ W ∗ N
Output:H ∗ W ∗ M
Filters: K ∗ K
==>
Params: K ∗ K ∗ M ∗ N
FLOPs: H ∗ W ∗ K ∗ K ∗ M ∗ N

FC全连接层：
Input: N
Output: M
==>
Params: M ∗ N
FLOPs: M ∗ N

Depthwise conv2d深度卷积:
Input: H ∗ W ∗ M
Output: H ∗ W ∗ M
Filters: K ∗ K
==>
Params: K ∗ K ∗ M
FLOPs: H ∗ W ∗ K ∗ K ∗ M

Pointwise conv2d点卷积：
将标准卷积层中卷积核大小换成1x1即可得到
Params: M ∗ N
FLOPs: H ∗ W ∗ M ∗ N

从上面我们可以得到，使用深度可分离卷积后得到的总计算量为DF*DF*DK*DK + DF*DF*M*N。

将深度可分离卷积的总计算量与标准卷积相比较，可以得到下面式子，

深度可分离卷积的计算量是标准卷积 的 1/N +1/( Dk * Dk)倍。通常情况下，N (输出通道）远大于卷积核尺寸，故1/N +1/( Dk * Dk) 近似等于 1/( Dk * Dk)。

当卷积核的大小为Dk =3 时，深度可分离卷积的计算量约为标准卷积的1/9倍。

2.MobileNet v1网络结构

(1)核心层（深度可分离卷积层）

上图左边是标准卷积层，右边是V1的卷积层。V1的卷积层，首先使用3×3的深度卷积提取特征，接着是BN层、ReLU层，这里的第一个ReLU是指的ReLU6，在之后是逐点卷积，BN和ReLU。这也很符合深度可分离卷积，将左边的标准卷积拆分成右边的一个深度卷积和一个逐点卷积。

Relu6（抑制其最大值）：

公式：当 x > 6时，其导数也为0。

主要是为了在移动端float16的低精度的时候，也能有很好的数值分辨率，如果对ReLu的输出值不加限制，那么输出范围就是0到正无穷，而低精度的float16无法精确描述其数值，带来精度损失。

ReLU和ReLU6图表对比：

(2)宽度因子和分辨率因子

MobileNetV1还使用宽度因子和分辨率因子进一步减小模型参数数量和计算量，当然，在一定程度上会降低模型精确度。

宽度因子Width Multiplier(α):
α ∈ ( 0 , 1 ]作用于通道数量上，通常取1, 0.75, 0.5和0.25

由上式可得，模型计算量和参数量大约降低 $a^{2}$

分辨率因子Resolution Multiplier(ρ)
ρ ∈ ( 0 , 1 ] 作用于输入图像上，通常取为224,192,160和128（常隐式表示），作用后模型大小，参数量不变，计算量降低为 $\rho ^{2}$

(3)网络结构

MobileNetV1 网络结构如下图所示，其中 Conv 表示普通卷积，Conv dw 表示 DW 卷积操作。

整个MobileNet v1网络除了平均池化层和softmax输出层外，共28层，其中深度卷积层有13层。

第1层为3x3的标准卷积，s2进行下采样。接下来26层为核心层结构（深度可分离卷积层），并且其中的部分深度卷积会利用s2进行下采样。最后采用平均池化层将feature变成1x1，根据预测类别大小全连接层加softmax层输出

除全连接层不使用激活函数，而使用softmax进行分类之外，其他所有层都使用BatchNorm和ReLU。

MobileNet v1网络有一个缺点：DW 卷积核很容易废掉，即卷积核参数大部分为 0。

3.创新点

(1)MobileNetV1提出了深度可分离卷积的概念，替代了标准卷积操作，大大减少了参数数量和计算量。

(2)提出两个超参数宽度因子和分辨率因子,可根据现实情况需求，调整输入输出通道数量和输入图像尺寸，可自由权衡精度与参数数量，计算量。

(3)使用BN加快模型收敛速度，提高模型精度和泛化能力。

二、MobileNet v2

在MobileNet v1的实际训练过程中，深度卷积时卷积核特别容易废掉，即训练完成后卷积核参数是大部分为0。

在 2018 年 googleNet 提出了 v2 版本的 mobileNet，v2认为是Relu函数造成这样的原因：在输入维度是2，3时，输出和输入相比丢失了较多信息；但是在输入维度是15到30时，输出则保留了输入的较多信息。如下图所示，

所以在使用Relu函数时，当输入的维度较低时，会丢失较多信息，因此我们这里可以想到两种思路，一是把Relu激活函数替换成别的，二是通过升维将输入的维度变高。

1.Linear Bottlenecks

从v1可知，深度可分离卷积层的构成：首先是一个3x3的深度卷积，其次是BN、Relu层，接下来是1x1的逐点卷积，最后又是BN和Relu层。

既然是ReLU导致的信息损耗，将ReLU替换成线性激活函数。但是需要注意的是，并不是将所有的Relu激活都换成了线性激活，而是将最后一个ReLU6变成线性激活函数，如下图。变换后的模块称为Linear Bottlenecks。

线性瓶颈(Linear Bottlenecks) 在高维空间上，如 ReLU 这种激活函数能有效增加特征的非线性表达，但是仅限于高维空间中，如果降低维度，再使用 ReLU 则会破坏特征。

因此在 mobileNets V2 中提出了 Linear Bottlenecks 结构，在执行了降维的卷积层后面，不再加入类似 ReLU 等的激活函数进行非线性转化，这样做的目的是尽可能的不造成信息的丢失。

2.Inverted Residuals

Inverted Residuals(倒残差结构)，通过下图可以看出，左侧为ResNet中的残差结构，其结构为1x1卷积降维--->3x3卷积--->1x1卷积升维；右侧为MobileNetV2中的倒残差结构，其结构为1x1卷积升维--->3x3DW卷积--->1x1卷积降维。V2先使用1x1卷积进行升维的原因是高维信息通过ReLU激活函数后丢失的信息更少。

倒残差结构(Inverted residual) 在 ResNet 为了构建更深的网络，提出了 ResNet 的另一种形式，一个 bottleneck 由一个 1 x 1 卷积(降维)，3 x 3 卷积和 1 x 1 卷积(升维)构成。

在 MobileNet 中，DW 卷积的层数是输入通道数，本身就比较少，如果跟残差网络中 bottleneck 一样，先压缩，后卷积提取，可得到特征就太少了。采取了一种逆向的方法—先升维，卷积，再降维。

ResNet网络：残差结构是先用1*1卷积降维，3x3卷积，1x1卷积升维，两头大中间小。

MobileNet v2：残差结构是先用1*1卷积升维，dw卷积，1x1卷积降维，两头小中间大。

所以MobileNet v1和v2的block如下：

左边是v1的block，没有Shortcut并且带最后的ReLU6。

右边是v2的加入了1×1升维，引入Shortcut并且将最后的ReLU改为Linear。步长为1时，先进行1×1卷积升维，再进行深度卷积提取特征，再通过Linear的逐点卷积降维。将input与output相加，形成残差结构。

步长为2时，因为input与output的尺寸不符，因此不添加shortcut结构，其余均一致。当stride=1且输入特征矩阵与输出特征矩阵shape相同时才有shortcut链接。

3.MobileNet v2网络结构

MobileNet v2网络结构如下图所示，其中 conv2d表示普通卷积，bottleneck表示右边的block。

上图是MobileNet v2网络的结构表，其中t代表的是扩展因子（倒残差结构中第一个1x1卷积的扩展因子），c代表输出特征矩阵的channel，n代表倒残差结构重复的次数，s代表步距（注意：当有多个bottleneck, s只针对第一个bottleneck这里的步距只是针对重复n次的第一层倒残差结构，后面的都默认为1）。

三、MobileNet v3

MobileNetV3，是谷歌在2019年3月21日提出的网络架构。首先，引入眼帘的是这篇文章的标题"Searching for MobileNetV3"，“searching”一词就把V3的论文的核心观点展示了出来——用神经结构搜索（NAS）来完成V3参数的设计。

想较于之前的网络，不管是VGG、ResNet、MobileNetV1、MobileNetV2，网络结构都是我们自己手动去设计的，如网络的层数、卷积核大小、步长等等参数都需要自己设置。而NAS通过计算机来实现最优的参数设定，通过比较不同参数的网络模型效果，从而选择最优的参数设置。但是这也对计算机的性能要求也特别的高。

主要特点：

①论文推出两个版本：Large 和 Small，分别适用于不同的场景。

②使用NetAdapt算法获得卷积核和通道的最佳数量。

③继承V1的深度可分离卷积。

④继承V2的具有线性瓶颈的残差结构。

⑤引入SE通道注意力结构。

⑥使用了一种新的激活函数h-swish(x)代替Relu6，h的意思表示hard。

⑦使用了Relu6(x + 3)/6来近似SE模块中的sigmoid。

⑧修改了MobileNetV2后端输出head。

1.SE模块

引入SE模块，主要为了利用结合特征通道的关系来加强网络的学习能力。它首先是将特征图的每个通道都进行平均池化，然后进行两个全连接层得到一个输出结果，这个结果会和原始的特征图进行相乘，得到新的特征图。

网络的左半部分是一个传统的卷积变换，忽略掉这一部分并不会影响我们的SE模块的理解。直接看一下后半部分，其中 U 是一个 W×H×C 的Feature Map， (W,H) 是图像的尺寸， C是图像的通道数。

首先是Fsq(⋅) （Squeeze操作），顺着空间维度来进行特征压缩，将每个二维的特征通道变成一个 1×1×C 的特征向量，特征向量的值由U确定。这个实数某种程度上具有全局的感受野，并且输出的维度和输入的特征通道数相匹配。它表征着在特征通道上响应的全局分布，而且使得靠近输入的层也可以获得全局的感受野，这一点在很多任务中都是非常有用的。

Squeeze部分的作用是获得Feature Map U的每个通道的全局信息嵌入（特征向量）。在SE block中，这一步通过VGG中引入的Global Average Pooling（GAP）实现的，即通过求每个通道C Feature Map的平均值。

其次是 Excitation 操作，它是一个类似于循环神经网络中门的机制。通过参数 w 来为每个特征通道生成权重，其中参数 w 被学习用来显式地建模特征通道间的相关性。

最后是一个 Reweight 的操作，将 Excitation 的输出的权重看做是经过特征选择后的每个特征通道的重要性，然后通过乘法逐通道加权到先前的特征上，完成在通道维度上的对原始特征的重标定。

以在Mobilenetv3中的应用为例进行理解，

首先下图左上角表示为四个通道的特质图，经平均池化后得到左下角的图；再次经过两次全连接层后，转化成了右下角的图，最后用右下角的0.5、0.3，0.1，0.2（Conv1每个channel的权值系数）分别乘原始的特质图，则得到最终的右上角的图。

2.耗时层结构

在MobileNet v3中，作者重新设计耗时层结构，首先减少第一个卷积层卷积核个数，从原来的32个变为16个，使用ReLU或者swich，其准确率几乎相同，并且节省了2ms和1000万madds。

其次，精简了Last Stage。Original Last Stage是通过NAS算出来的，但最后实际测试发现Efficient Last Stage结构可以在不损失精度情况下去除一些多余的层，如下图所示，

移除之前的瓶颈层连接，进一步降低网络参数。可以有效降低11%的推理耗时，而性能几乎没有损失。

3.ReLu6和h-swish激活函数

在v1和v2版本中用到的Relu激活函数是Relu6激活函数，ReLu6激活函数如下图所示，在ReLU的基础上加了最大值6进行限制。

v3版本使用的激活函数为h-swish,其图像和表达式如下图所示：图中包括了一些其他相关的一些函数表达式及图像。

采用h-swish(hardswish)，计算速度相对较快，有利于量化 。

h-swish是基于swish的改进，swish最早是在谷歌大脑2017的论文Searching for Activation functions所提出。

swish论文的作者认为，Swish具备无上界有下界、平滑、非单调的特性。并且Swish在深层模型上的效果优于ReLU。仅仅使用Swish单元替换ReLU就能把MobileNet,NASNetA在 ImageNet上的top-1分类准确率提高0.9%，Inception-ResNet-v的分类准确率提高0.6%。

v3也利用swish当作为ReLU的替代时，它可以显著提高神经网络的精度。但是呢，作者认为这种非线性激活函数虽然提高了精度，但在嵌入式环境中，是有不少的成本的。原因就是在移动设备上计算sigmoid函数是非常明智的选择。所以提出了h-swish。

可以用一个近似函数来逼近swish，作者选择的是基于ReLU6，作者认为几乎所有的软件和硬件框架上都可以使用ReLU6的优化实现。其次，它能在特定模式下消除了由于近似sigmoid的不同实现而带来的潜在的数值精度损失。

作者认为随着网络的深入，应用非线性激活函数的成本会降低，能够更好的减少参数量。作者发现swish的大多数好处都是通过在更深的层中使用它们实现的。因此，在V3的架构中，只在模型的后半部分使用h-swish(HS)。

4.网络总体结构

MobileNet v3特有的block结构如下，

当stride=1且输入特征矩阵与输出特征矩阵shape相同时才有shortcut链接。

作者针对不同需求，通过NAS得到两种结构，一个是MobilenetV3-Large，结构如下图：

①Input表示输入尺寸
②Operator中的NBN表示不使用BN，最后的conv2d 1x1相当于全连接层的作用
③exp size表示bottleneck中的第一层1x1卷积升维，维度升到多少(第一个bottleneck没有1x1卷积升维操作)
④out表示bottleneck输出的channel个数
⑤SE表示是否使用SE模块

⑥NL表示使用何种激活函数，HS表示HardSwish，RE表示ReLu
⑦s表示步长(s=2，长宽变为原来一半)

四、MobileNet网络实现

1.构建MobileNet网络

MobileNet网络Inverted Residuals(倒残差结构)如下所示：


# 倒残差结构
class InvertedResidual(nn.Module):
    def __init__(self, in_channel, out_channel, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        # expand_ratio：扩展因子
        hidden_channel = in_channel * expand_ratio  # output: h x w x （tk=hidden_channel)
        # 是否使用shortcut :当stride=1而且输入特征矩阵和输出特征矩阵shape相同时才有shortcut
        self.use_shortcut = (stride == 1 and in_channel == out_channel)
 
        layers = []
        if expand_ratio != 1:
            # 1x1 pointwise conv
            layers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size=1))
        layers.extend([
            # 3x3 depthwise conv
            ConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel),
            # 1x1 pointwise conv(linear) 这里激活函数为linear(y=x)，所以不做任何处理
            nn.Conv2d(hidden_channel, out_channel, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channel),
        ])
 
        self.conv = nn.Sequential(*layers)
 
    def forward(self, x):
        if self.use_shortcut:
            return x + self.conv(x)
        else:
            return self.conv(x)

依据论文中表格网络配置MobileNetV2


# 将ch调整为divisor最近的的整数倍
def _make_divisible(ch, divisor=8, min_ch=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    """
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch
 
 
# 执行流程：Conv-->BN-->ReLU6
class ConvBNReLU(nn.Sequential):
    # groups=1:普通卷积；groups = in_channel：DW卷积
    def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1):
        # kernel_size=3时padding=1，kernel_size=1时padding=0
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias=False),
            nn.BatchNorm2d(out_channel),
            nn.ReLU6(inplace=True)
        )
 
 
 
class MobileNetV2(nn.Module):
    # alpha:宽度因子，卷积核个数的倍率，也就是输出通道数
    def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        # 第一层所使用卷积核的个数，_make_divisible：将卷积核个数调整为_make_divisible的整数倍
        input_channel = _make_divisible(32 * alpha, round_nearest)
        last_channel = _make_divisible(1280 * alpha, round_nearest)
 
        # 查表可得下面的配置
        inverted_residual_setting = [
            # t(扩展因子), c(输出特征矩阵深度channel), n(bottleneck的重复次数), s(步矩，针对第一层，其他为1)
            [1, 16, 1, 1],   # bottleneck1
            [6, 24, 2, 2],   # bottleneck2
            [6, 32, 3, 2],   # bottleneck3
            [6, 64, 4, 2],   # bottleneck4
            [6, 96, 3, 1],   # bottleneck5
            [6, 160, 3, 2],  # bottleneck6
            [6, 320, 1, 1],  # bottleneck7
        ]
 
        features = []
        # conv1 layer
        features.append(ConvBNReLU(3, input_channel, stride=2))
        # building inverted residual residual blockes
        for t, c, n, s in inverted_residual_setting:
            # 调整输出后的channel
            output_channel = _make_divisible(c * alpha, round_nearest)
            # 重复n次bottleneck
            for i in range(n):
                # i=0，为第一层，则strider=s，其余stride=1
                stride = s if i == 0 else 1
                features.append(block(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel  # 更新，作为下一层的输入
        # building last several layers
        features.append(ConvBNReLU(input_channel, last_channel, 1))
        # combine feature layers
        self.features = nn.Sequential(*features)
 
        # building classifier 分类器
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # 自适应平均池化下采样
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(last_channel, num_classes)
        )
 
        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)
 
    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

MobileNetV3的InvertedResidual结构如下：

注意力机制SE模块配置：


# SE模块即注意力机制模块
class SqueezeExcitation(nn.Module):
    def __init__(self, input_c: int, squeeze_factor: int = 4):  # squeeze_factor 第一个全连接层是输入特征矩阵的1/4
        super(SqueezeExcitation, self).__init__()
        # 将squeeze_c调整为离8最近的整数倍
        squeeze_c = _make_divisible(input_c // squeeze_factor, 8)
        self.fc1 = nn.Conv2d(input_c, squeeze_c, 1)  # 使用卷积层代替全连接层
        self.fc2 = nn.Conv2d(squeeze_c, input_c, 1)
 
    def forward(self, x: Tensor) -> Tensor:
        # 自适应平均池化操作：将特征矩阵调整为1x1的大小
        scale = F.adaptive_avg_pool2d(x, output_size=(1, 1))
        scale = self.fc1(scale)
        scale = F.relu(scale, inplace=True)
        scale = self.fc2(scale)
        scale = F.hardsigmoid(scale, inplace=True)
        return scale * x  # 将输出的权重，通过乘法逐通道加权到先前的特征上，完成在通道维度上的对原始特征的重标定。

InvertedResidual结构：


# 将ch调整为divisor最近的的整数倍
def _make_divisible(ch, divisor=8, min_ch=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    """
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch
 
 
# 处理流程Conv-->BN-->activation
class ConvBNActivation(nn.Sequential):
    def __init__(self,
                 in_planes: int,
                 out_planes: int,
                 kernel_size: int = 3,
                 stride: int = 1,
                 groups: int = 1,
                 # Optional[Callable[..., nn.Module]]类型注释表示一个可选的参数，
                 # 该参数可以是一个可调用的对象，这个可调用的对象可以接受任意数量和类型的参数，并且它的返回值应该是一个 nn.Module 类型的对象
                 # 通常用于描述神经网络模型构建函数的参数，以允许用户选择是否提供一个自定义的函数或方法来创建神经网络的一部分
                 norm_layer: Optional[Callable[..., nn.Module]] = None,
                 activation_layer: Optional[Callable[..., nn.Module]] = None):
        padding = (kernel_size - 1) // 2
        # 无norm_layer使用BatchNorm2d
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        # 无activation_layer使用ReLU6
        if activation_layer is None:
            activation_layer = nn.ReLU6
        super(ConvBNActivation, self).__init__(nn.Conv2d(in_channels=in_planes,
                                                         out_channels=out_planes,
                                                         kernel_size=kernel_size,
                                                         stride=stride,
                                                         padding=padding,
                                                         groups=groups,
                                                         bias=False),
                                               norm_layer(out_planes),
                                               activation_layer(inplace=True))
 
# MobilenetV3 InvertedResidual的结构
class InvertedResidual(nn.Module):
    def __init__(self,
                 cnf: InvertedResidualConfig,
                 norm_layer: Callable[..., nn.Module]):
        super(InvertedResidual, self).__init__()
 
        # stride参数检测
        if cnf.stride not in [1, 2]:
            raise ValueError("illegal stride value.")
 
        # shortcut分支
        self.use_res_connect = (cnf.stride == 1 and cnf.input_c == cnf.out_c)
 
        layers: List[nn.Module] = []
        activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU
 
        # expand：除了第一个bottleneck没有1x1卷积升维操作
        if cnf.expanded_c != cnf.input_c:
            layers.append(ConvBNActivation(cnf.input_c,
                                           cnf.expanded_c,
                                           kernel_size=1,
                                           norm_layer=norm_layer,
                                           activation_layer=activation_layer))
 
        # depthwise
        layers.append(ConvBNActivation(cnf.expanded_c,
                                       cnf.expanded_c,
                                       kernel_size=cnf.kernel,
                                       stride=cnf.stride,
                                       groups=cnf.expanded_c,  # 卷积核组数：DW卷积
                                       norm_layer=norm_layer,
                                       activation_layer=activation_layer))
        # SE模块
        if cnf.use_se:
            layers.append(SqueezeExcitation(cnf.expanded_c))
 
        # project：降维
        layers.append(ConvBNActivation(cnf.expanded_c,
                                       cnf.out_c,
                                       kernel_size=1,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Identity))  # Identity 线性激活y=x
 
        self.block = nn.Sequential(*layers)
        self.out_channels = cnf.out_c
        self.is_strided = cnf.stride > 1
 
    def forward(self, x: Tensor) -> Tensor:
        result = self.block(x)
        if self.use_res_connect:   # shortcut
            result += x
 
        return result

MobileNetV3网络配置：


# MobileNetV3网络模型结构
class MobileNetV3(nn.Module):
    def __init__(self,
                 inverted_residual_setting: List[InvertedResidualConfig],
                 last_channel: int,  # 倒数第二个全连接层
                 num_classes: int = 1000,
                 block: Optional[Callable[..., nn.Module]] = None,
                 norm_layer: Optional[Callable[..., nn.Module]] = None):
        super(MobileNetV3, self).__init__()
 
        # inverted_residual_setting参数检测：列表
        if not inverted_residual_setting:
            raise ValueError("The inverted_residual_setting should not be empty.")
        elif not (isinstance(inverted_residual_setting, List) and
                  all([isinstance(s, InvertedResidualConfig) for s in inverted_residual_setting])):
            raise TypeError("The inverted_residual_setting should be List[InvertedResidualConfig]")
 
        if block is None:
            block = InvertedResidual
 
        if norm_layer is None:
            norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.01)
 
        layers: List[nn.Module] = []
 
        # building first layer第一个卷积层
        firstconv_output_c = inverted_residual_setting[0].input_c
        layers.append(ConvBNActivation(3,
                                       firstconv_output_c,
                                       kernel_size=3,
                                       stride=2,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        # building inverted residual blocks
        for cnf in inverted_residual_setting:
            layers.append(block(cnf, norm_layer))
 
        # building last several layers最后几层：卷积+全连接
        lastconv_input_c = inverted_residual_setting[-1].out_c
        lastconv_output_c = 6 * lastconv_input_c  # 6倍关系
        layers.append(ConvBNActivation(lastconv_input_c,
                                       lastconv_output_c,
                                       kernel_size=1,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        self.features = nn.Sequential(*layers)
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Sequential(nn.Linear(lastconv_output_c, last_channel),  # conv2d 1x1即全连接层
                                        nn.Hardswish(inplace=True),
                                        nn.Dropout(p=0.2, inplace=True),
                                        nn.Linear(last_channel, num_classes))
 
        # initial weights
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out")
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)
 
    def _forward_impl(self, x: Tensor) -> Tensor:
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
 
        return x
 
    def forward(self, x: Tensor) -> Tensor:
        return self._forward_impl(x)

实例化MobileNetV3，并根据网络配置MobileNetV3-Large表格设计网络


# MobilenetV3-Large
def mobilenet_v3_large(num_classes: int = 1000,
                       reduced_tail: bool = False) -> MobileNetV3:
    """
    Constructs a large MobileNetV3 architecture from
    "Searching for MobileNetV3" .
    weights_link:
    https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth
    Args:
        num_classes (int): number of classes
        reduced_tail (bool): If True, reduces the channel counts of all feature layers
            between C4 and C5 by 2. It is used to reduce the channel redundancy in the
            backbone for Detection and Segmentation.
    """
    width_multi = 1.0  # 阿尔法超参数
    bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)
 
    # 修改表格中的配置，为1和原论文中的一致
    reduce_divider = 2 if reduced_tail else 1
 
    # MobilenetV3-Large表格配置，在论文表格中有：每一个bneck结构配置
    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16, 3, 16, 16, False, "RE", 1),
        bneck_conf(16, 3, 64, 24, False, "RE", 2),  # C1
        bneck_conf(24, 3, 72, 24, False, "RE", 1),
        bneck_conf(24, 5, 72, 40, True, "RE", 2),  # C2
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 3, 240, 80, False, "HS", 2),  # C3
        bneck_conf(80, 3, 200, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 480, 112, True, "HS", 1),
        bneck_conf(112, 3, 672, 112, True, "HS", 1),
        bneck_conf(112, 5, 672, 160 // reduce_divider, True, "HS", 2),  # C4
        bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
        bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
    ]
    last_channel = adjust_channels(1280 // reduce_divider)  # C5
 
    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
                       last_channel=last_channel,
                       num_classes=num_classes)
 
 
# mobilenet_v3_small
def mobilenet_v3_small(num_classes: int = 1000,
                       reduced_tail: bool = False) -> MobileNetV3:
    """
    Constructs a large MobileNetV3 architecture from
    "Searching for MobileNetV3" .
    weights_link:
    https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth
    Args:
        num_classes (int): number of classes
        reduced_tail (bool): If True, reduces the channel counts of all feature layers
            between C4 and C5 by 2. It is used to reduce the channel redundancy in the
            backbone for Detection and Segmentation.
    """
    width_multi = 1.0
    bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)
 
    reduce_divider = 2 if reduced_tail else 1
 
    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16, 3, 16, 16, True, "RE", 2),  # C1
        bneck_conf(16, 3, 72, 24, False, "RE", 2),  # C2
        bneck_conf(24, 3, 88, 24, False, "RE", 1),
        bneck_conf(24, 5, 96, 40, True, "HS", 2),  # C3
        bneck_conf(40, 5, 240, 40, True, "HS", 1),
        bneck_conf(40, 5, 240, 40, True, "HS", 1),
        bneck_conf(40, 5, 120, 48, True, "HS", 1),
        bneck_conf(48, 5, 144, 48, True, "HS", 1),
        bneck_conf(48, 5, 288, 96 // reduce_divider, True, "HS", 2),  # C4
        bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1),
        bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1)
    ]
    last_channel = adjust_channels(1024 // reduce_divider)  # C5
 
    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
                       last_channel=last_channel,
                       num_classes=num_classes)

2.加载数据集

这里使用花朵数据集，数据集制造和数据集使用的脚本的参考：Pytorch之AlexNet花朵分类_风间琉璃•的博客-CSDN博客

加载数据集和测试集，并进行相应的预处理操作。


    batch_size = 16
    epochs = 100
 
    data_transform = {
        "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
        "val": transforms.Compose([transforms.Resize(256),
                                   transforms.CenterCrop(224),
                                   transforms.ToTensor(),
                                   transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}
 
    # 数据集根目录
    data_root = os.path.abspath(os.getcwd())
    print(os.getcwd())
    # 图片目录
    image_path = os.path.join(data_root, "data_set", "flower_data")
    print(image_path)
    assert os.path.exists(image_path), "{} path does not exit.".format(image_path)
 
    # 准备数据集
    train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
                                         transform=data_transform["train"])
    train_num = len(train_dataset)
 
    validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
                                            transform=data_transform["val"])
    val_num = len(validate_dataset)
 
    # 定义一个包含花卉类别到索引的字典:雏菊,蒲公英,玫瑰,向日葵,郁金香
    # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
    # 获取包含训练数据集类别名称到索引的字典，这通常用于数据加载器或数据集对象中。
    flower_list = train_dataset.class_to_idx
    # 创建一个反向字典，将索引映射回类别名称
    cla_dict = dict((val, key) for key, val in flower_list.items())
    # 将字典转换为格式化的JSON字符串，每行缩进4个空格
    json_str = json.dumps(cla_dict, indent=4)
    # 打开名为 'class_indices.json' 的JSON文件，并将JSON字符串写入其中
    with open('class_indices.json', 'w') as json_file:
        json_file.write(json_str)
 
    batch_size = 32
    nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workers
    print("using {} dataloader workers every process".format(nw))
 
    # 加载数据集
    train_loader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=batch_size, shuffle=True,
                                               num_workers=nw)
 
    validate_loader = torch.utils.data.DataLoader(validate_dataset,
                                                  batch_size=4, shuffle=False,
                                                  num_workers=nw)
 
    print("using {} images for training, {} images for validation.".format(train_num, val_num))

3.训练和测试模型

数据集预处理完成后，就可以进行网络模型的训练和验证。


    # create model
    net = MobileNetV2(num_classes=5)
 
    # load pretrain weights
    # download url: https://download.pytorch.org/models/mobilenet_v2-b0353104.pth
    model_weight_path = "./mobilenet_v2-pre.pth"
    assert os.path.exists(model_weight_path), "file {} dose not exist.".format(model_weight_path)
    pre_weights = torch.load(model_weight_path, map_location='cpu')
 
    # delete classifier weights
    # 遍历预训练权重字典（pre_weights），只保留那些与当前模型（net）中同名参数具有相同尺寸的键-值对，并将它们保存在pre_dict中
    pre_dict = {k: v for k, v in pre_weights.items() if net.state_dict()[k].numel() == v.numel()}
    # 将上一步筛选出的pre_dict中的权重加载到模型net中，strict=False表示允许加载不完全匹配的权重，可能会有一些不匹配的权重被忽略
    missing_keys, unexpected_keys = net.load_state_dict(pre_dict, strict=False)
 
    # 冻结了网络中的特征提取器（features）的权重，使其在训练过程中不再更新
    for param in net.features.parameters():
        param.requires_grad = False
 
    net.to(device)
 
    # define loss function
    loss_function = nn.CrossEntropyLoss()
 
    # construct an optimizer
    params = [p for p in net.parameters() if p.requires_grad]
    optimizer = optim.Adam(params, lr=0.0001)
 
    best_acc = 0.0
    save_path = './MobileNetV2.pth'
    train_steps = len(train_loader)
    for epoch in range(epochs):
        # train
        net.train()
        running_loss = 0.0
        train_bar = tqdm(train_loader, file=sys.stdout)
        for step, data in enumerate(train_bar):
            images, labels = data
            optimizer.zero_grad()
            logits = net(images.to(device))
            loss = loss_function(logits, labels.to(device))
            loss.backward()
            optimizer.step()
 
            # print statistics
            running_loss += loss.item()
 
            train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
                                                                     epochs,
                                                                     loss)
 
        # validate
        net.eval()
        acc = 0.0  # accumulate accurate number / epoch
        with torch.no_grad():
            val_bar = tqdm(validate_loader, file=sys.stdout)
            for val_data in val_bar:
                val_images, val_labels = val_data
                outputs = net(val_images.to(device))
                # loss = loss_function(outputs, test_labels)
                predict_y = torch.max(outputs, dim=1)[1]
                acc += torch.eq(predict_y, val_labels.to(device)).sum().item()
 
                val_bar.desc = "valid epoch[{}/{}]".format(epoch + 1,
                                                           epochs)
        val_accurate = acc / val_num
        print('[epoch %d] train_loss: %.3f  val_accuracy: %.3f' %
              (epoch + 1, running_loss / train_steps, val_accurate))
 
        if val_accurate > best_acc:
            best_acc = val_accurate
            torch.save(net.state_dict(), save_path)
 
    print('Finished Training')

这里使用了官方的预训练权重，在其基础上训练自己的数据集。训练100epoch的准确率能到达90%左右。

五、实现图像分类

利用上述训练好的网络模型进行测试，验证是否能完成分类任务。


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
 
    data_transform = transforms.Compose(
        [transforms.Resize(256),
         transforms.CenterCrop(224),
         transforms.ToTensor(),
         transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
 
    # 加载图片
    img_path = 'rose.jpg'
    assert os.path.exists(img_path), "file: '{}' does not exist.".format(img_path)
    image = Image.open(img_path)
 
    # image.show()
    # [N, C, H, W]
    img = data_transform(image)
    # 扩展维度
    img = torch.unsqueeze(img, dim=0)
 
    # 获取标签
    json_path = 'class_indices.json'
    assert os.path.exists(json_path), "file: '{}' does not exist.".format(json_path)
    with open(json_path, 'r') as f:
        # 使用json.load()函数加载JSON文件的内容并将其存储在一个Python字典中
        class_indict = json.load(f)
 
    # 加载网络
    model = MobileNetV2(num_classes=5).to(device)
 
    # 加载模型文件
    weights_path = "./MobileNetV2.pth"
    assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
    model.load_state_dict(torch.load(weights_path, map_location=device))
 
    model.eval()
    with torch.no_grad():
        # 对输入图像进行预测
        output = torch.squeeze(model(img.to(device))).cpu()
        # 对模型的输出进行 softmax 操作，将输出转换为类别概率
        predict = torch.softmax(output, dim=0)
        # 得到高概率的类别的索引
        predict_cla = torch.argmax(predict).numpy()
 
    res = "class: {}   prob: {:.3}".format(class_indict[str(predict_cla)], predict[predict_cla].numpy())
    draw = ImageDraw.Draw(image)
    # 文本的左上角位置
    position = (10, 10)
    # fill 指定文本颜色
    draw.text(position, res, fill='red')
    image.show()
    for i in range(len(predict)):
        print("class: {:10}   prob: {:.3}".format(class_indict[str(i)], predict[i].numpy()))

测试结果：

结束语

感谢阅读吾之文章，今已至此次旅程之终站 🛬。

吾望斯文献能供尔以宝贵之信息与知识也 🎉。

学习者之途，若藏于天际之星辰🍥，吾等皆当努力熠熠生辉，持续前行。

然而，如若斯文献有益于尔，何不以三连为礼？点赞、留言、收藏 - 此等皆以证尔对作者之支持与鼓励也 💞。

相关阅读:
Flink 读写 Ceph S3入门学习总结
 30.3.2 使用GRANT语句为用户授权
 【AGC】质量服务2-性能管理示例
 文字转图片，文字水印图片，合成图片，教你 Python 生成网站原创配图！
三相智能电表逆相序的原因及解决方法
 Linux笔记--文件内容的查阅与统计
 【ChatGPT】ChatGPT的自定义指令
 深度学习系列1——Pytorch 图像分类(LeNet)
PHP如何对二维数组(多维数组)进排序
 软件测试面试题--自用
原文地址：https://blog.csdn.net/qq_53144843/article/details/133420764