• 空间金字塔池化改进 SPP / SPPF / ASPP / RFB / SPPCSPC



    更新日志:2022年8月16日上午9:33分前在图片中增加感受野标注🍀


    1 原理

    1.1 SPP(Spatial Pyramid Pooling)

    SPP模块是何凯明大神在2015年的论文《Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition》中被提出。

    SPP全程为空间金字塔池化结构,主要是为了解决两个问题:

    1. 有效避免了对图像区域裁剪、缩放操作导致的图像失真等问题;
    2. 解决了卷积神经网络对图相关重复特征提取的问题,大大提高了产生候选框的速度,且节省了计算成本。

    在这里插入图片描述

    请添加图片描述

    class SPP(nn.Module):
        # Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729
        def __init__(self, c1, c2, k=(5, 9, 13)):
            super().__init__()
            c_ = c1 // 2  # hidden channels
            self.cv1 = Conv(c1, c_, 1, 1)
            self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)
            self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
    
        def forward(self, x):
            x = self.cv1(x)
            with warnings.catch_warnings():
                warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning
                return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14

    1.2 SPPF(Spatial Pyramid Pooling - Fast)

    这个是YOLOv5作者Glenn Jocher基于SPP提出的,速度较SPP快很多,所以叫SPP-Fast

    请添加图片描述

    class SPPF(nn.Module):
        # Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher
        def __init__(self, c1, c2, k=5):  # equivalent to SPP(k=(5, 9, 13))
            super().__init__()
            c_ = c1 // 2  # hidden channels
            self.cv1 = Conv(c1, c_, 1, 1)
            self.cv2 = Conv(c_ * 4, c2, 1, 1)
            self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)
    
        def forward(self, x):
            x = self.cv1(x)
            with warnings.catch_warnings():
                warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning
                y1 = self.m(x)
                y2 = self.m(y1)
                return self.cv2(torch.cat((x, y1, y2, self.m(y2)), 1))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16

    1.3 ASPP(Atrous Spatial Pyramid Pooling)

    受到SPP的启发,语义分割模型DeepLabv2中提出了ASPP模块(空洞空间卷积池化金字塔),该模块使用具有不同采样率的多个并行空洞卷积层。为每个采样率提取的特征在单独的分支中进一步处理,并融合以生成最终结果。该模块通过不同的空洞率构建不同感受野的卷积核,用来获取多尺度物体信息,具体结构比较简单如下图所示:

    请添加图片描述

    ASPP是在DeepLab中提出来的,在后续的DeepLab版本中对其做了改进,如加入BN层、加入深度可分离卷积等,但基本的思路还是没变。

    # without BN version
    class ASPP(nn.Module):
        def __init__(self, in_channel=512, out_channel=256):
            super(ASPP, self).__init__()
            self.mean = nn.AdaptiveAvgPool2d((1, 1))  # (1,1)means ouput_dim
            self.conv = nn.Conv2d(in_channel,out_channel, 1, 1)
            self.atrous_block1 = nn.Conv2d(in_channel, out_channel, 1, 1)
            self.atrous_block6 = nn.Conv2d(in_channel, out_channel, 3, 1, padding=6, dilation=6)
            self.atrous_block12 = nn.Conv2d(in_channel, out_channel, 3, 1, padding=12, dilation=12)
            self.atrous_block18 = nn.Conv2d(in_channel, out_channel, 3, 1, padding=18, dilation=18)
            self.conv_1x1_output = nn.Conv2d(out_channel * 5, out_channel, 1, 1)
    
        def forward(self, x):
            size = x.shape[2:]
    
            image_features = self.mean(x)
            image_features = self.conv(image_features)
            image_features = F.upsample(image_features, size=size, mode='bilinear')
    
            atrous_block1 = self.atrous_block1(x)
            atrous_block6 = self.atrous_block6(x)
            atrous_block12 = self.atrous_block12(x)
            atrous_block18 = self.atrous_block18(x)
    
            net = self.conv_1x1_output(torch.cat([image_features, atrous_block1, atrous_block6,
                                                  atrous_block12, atrous_block18], dim=1))
            return net
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27

    1.4 RFB(Receptive Field Block)

    RFB模块是在《ECCV2018:Receptive Field Block Net for Accurate and Fast Object Detection》一文中提出的,该文的出发点是模拟人类视觉的感受野从而加强网络的特征提取能力,在结构上RFB借鉴了Inception的思想,主要是在Inception的基础上加入了空洞卷积,从而有效增大了感受野
    在这里插入图片描述
    请添加图片描述

    RFBRFB-s的架构。RFB-s用于在浅层人类视网膜主题图中模拟较小的pRF,使用具有较小内核的更多分支。

    class BasicConv(nn.Module):
    
        def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1, relu=True, bn=True):
            super(BasicConv, self).__init__()
            self.out_channels = out_planes
            if bn:
                self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups, bias=False)
                self.bn = nn.BatchNorm2d(out_planes, eps=1e-5, momentum=0.01, affine=True)
                self.relu = nn.ReLU(inplace=True) if relu else None
            else:
                self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups, bias=True)
                self.bn = None
                self.relu = nn.ReLU(inplace=True) if relu else None
    
        def forward(self, x):
            x = self.conv(x)
            if self.bn is not None:
                x = self.bn(x)
            if self.relu is not None:
                x = self.relu(x)
            return x
    
    
    class BasicRFB(nn.Module):
    
        def __init__(self, in_planes, out_planes, stride=1, scale=0.1, map_reduce=8, vision=1, groups=1):
            super(BasicRFB, self).__init__()
            self.scale = scale
            self.out_channels = out_planes
            inter_planes = in_planes // map_reduce
    
            self.branch0 = nn.Sequential(
                BasicConv(in_planes, inter_planes, kernel_size=1, stride=1, groups=groups, relu=False),
                BasicConv(inter_planes, 2 * inter_planes, kernel_size=(3, 3), stride=stride, padding=(1, 1), groups=groups),
                BasicConv(2 * inter_planes, 2 * inter_planes, kernel_size=3, stride=1, padding=vision + 1, dilation=vision + 1, relu=False, groups=groups)
            )
            self.branch1 = nn.Sequential(
                BasicConv(in_planes, inter_planes, kernel_size=1, stride=1, groups=groups, relu=False),
                BasicConv(inter_planes, 2 * inter_planes, kernel_size=(3, 3), stride=stride, padding=(1, 1), groups=groups),
                BasicConv(2 * inter_planes, 2 * inter_planes, kernel_size=3, stride=1, padding=vision + 2, dilation=vision + 2, relu=False, groups=groups)
            )
            self.branch2 = nn.Sequential(
                BasicConv(in_planes, inter_planes, kernel_size=1, stride=1, groups=groups, relu=False),
                BasicConv(inter_planes, (inter_planes // 2) * 3, kernel_size=3, stride=1, padding=1, groups=groups),
                BasicConv((inter_planes // 2) * 3, 2 * inter_planes, kernel_size=3, stride=stride, padding=1, groups=groups),
                BasicConv(2 * inter_planes, 2 * inter_planes, kernel_size=3, stride=1, padding=vision + 4, dilation=vision + 4, relu=False, groups=groups)
            )
    
            self.ConvLinear = BasicConv(6 * inter_planes, out_planes, kernel_size=1, stride=1, relu=False)
            self.shortcut = BasicConv(in_planes, out_planes, kernel_size=1, stride=stride, relu=False)
            self.relu = nn.ReLU(inplace=False)
    
        def forward(self, x):
            x0 = self.branch0(x)
            x1 = self.branch1(x)
            x2 = self.branch2(x)
    
            out = torch.cat((x0, x1, x2), 1)
            out = self.ConvLinear(out)
            short = self.shortcut(x)
            out = out * self.scale + short
            out = self.relu(out)
    
            return out
    
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66

    1.5 SPPCSPC

    该模块是YOLOv7中使用的SPP结构,在COCO数据集上表现优于SPPF(其它的数据集并不一定)

    请添加图片描述

    class SPPCSPC(nn.Module):
        # CSP https://github.com/WongKinYiu/CrossStagePartialNetworks
        def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5, 9, 13)):
            super(SPPCSPC, self).__init__()
            c_ = int(2 * c2 * e)  # hidden channels
            self.cv1 = Conv(c1, c_, 1, 1)
            self.cv2 = Conv(c1, c_, 1, 1)
            self.cv3 = Conv(c_, c_, 3, 1)
            self.cv4 = Conv(c_, c_, 1, 1)
            self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
            self.cv5 = Conv(4 * c_, c_, 1, 1)
            self.cv6 = Conv(c_, c_, 3, 1)
            self.cv7 = Conv(2 * c_, c2, 1, 1)
    
        def forward(self, x):
            x1 = self.cv4(self.cv3(self.cv1(x)))
            y1 = self.cv6(self.cv5(torch.cat([x1] + [m(x1) for m in self.m], 1)))
            y2 = self.cv2(x)
            return self.cv7(torch.cat((y1, y2), dim=1))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    #分组SPPCSPC 分组后参数量和计算量与原本差距不大,不知道效果怎么样
    class SPPCSPC_group(nn.Module):
        def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5, 9, 13)):
            super(SPPCSPC_group, self).__init__()
            c_ = int(2 * c2 * e)  # hidden channels
            self.cv1 = Conv(c1, c_, 1, 1, g=4)
            self.cv2 = Conv(c1, c_, 1, 1, g=4)
            self.cv3 = Conv(c_, c_, 3, 1, g=4)
            self.cv4 = Conv(c_, c_, 1, 1, g=4)
            self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
            self.cv5 = Conv(4 * c_, c_, 1, 1, g=4)
            self.cv6 = Conv(c_, c_, 3, 1, g=4)
            self.cv7 = Conv(2 * c_, c2, 1, 1, g=4)
    
        def forward(self, x):
            x1 = self.cv4(self.cv3(self.cv1(x)))
            y1 = self.cv6(self.cv5(torch.cat([x1] + [m(x1) for m in self.m], 1)))
            y2 = self.cv2(x)
            return self.cv7(torch.cat((y1, y2), dim=1))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19

    2 参数量对比

    这里我在yolov5s.yaml中使用各个模型替换SPP模块

    模型参数量(parameters)计算量(GFLOPs)
    SPP722588516.5
    SPPF723538916.5
    ASPP1548572523.1
    BasicRFB789542117.1
    SPPCSPC1366354921.7
    分组SPPCSPC835513317.4

    3 改进方式

    第一步;各个代码放入common.py
    第二步;yolo.py中加入类名
    第三步;修改配置文件
    yolov5配置文件如下:

    # YOLOv5 🚀 by Ultralytics, GPL-3.0 license
    
    # YOLOv5 v6.0 backbone
    backbone:
      # [from, number, module, args]
      [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
       [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
       [-1, 3, C3, [128]],
       [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
       [-1, 6, C3, [256]],
       [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
       [-1, 9, C3, [512]],
       [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
       [-1, 3, C3, [1024]],
       [-1, 1, SPPF, [1024, 5]],  # 9
       #[-1, 1, ASPP, [1024]],  # 9
       #[-1, 1, SPP, [1024]],
       #[-1, 1, BasicRFB, [1024]],
       #[-1, 1, SPPCSPC, [1024]],
      ]
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21

    更多内容导航

    1.手把手带你调参Yolo v5 (v6.1)(一)🌟强烈推荐

    2.手把手带你调参Yolo v5 (v6.1)(二)🚀

    3.如何快速使用自己的数据集训练Yolov5模型

    4.手把手带你Yolov5 (v6.1)添加注意力机制(一)(并附上30多种顶会Attention原理图)🌟

    5.手把手带你Yolov5 (v6.1)添加注意力机制(二)(在C3模块中加入注意力机制)

    6.Yolov5如何更换激活函数?

    7.Yolov5 (v6.1)数据增强方式解析

    8.Yolov5更换上采样方式( 最近邻 / 双线性 / 双立方 / 三线性 / 转置卷积)

    9.Yolov5如何更换EIOU / alpha IOU / SIoU?

    10.Yolov5更换主干网络之《旷视轻量化卷积神经网络ShuffleNetv2》🍀

    11.YOLOv5应用轻量级通用上采样算子CARAFE🍀

    12.空间金字塔池化改进 SPP / SPPF / ASPP / RFB / SPPCSPC🍀

    13.持续更新中


    参考文献:增强感受野SPP、ASPP、RFB、PPM

  • 相关阅读:
    活动功能->状态模式的使用
    【华为联机对战服务】SDK初始化方法,返回1001错误码
    NPOI组件下载、引用、基本使用
    【面试经典150 | 栈】有效的括号
    Linux 和 分区
    ZMQ/ZeroMQ的三种消息模式
    PHY驱动开发算法详解
    java---IO流:特殊操作流
    【李宏毅机器学习】Explainable AI
    某攻防演练心得之随笔记
  • 原文地址:https://blog.csdn.net/weixin_43694096/article/details/126354660