• 《动手学深度学习 Pytorch版》 7.7 稠密连接网络


    7.7.1 从 ResNet 到 DenseNet

    DenseNet 可以视为 ResNet 的逻辑扩展。

    ResNet 将函数展开为 f ( x ) = x + g ( x ) f(\boldsymbol{x})=x+g(\boldsymbol{x}) f(x)=x+g(x),即一个简单的线性项和一个复杂的非线性项。

    若将 f f f 拓展成超过两部分,则 DenseNet 便是其中一种方案。这即是 DenseNet 和 ResNet 的主要区别。

    在这里插入图片描述

    DenseNet 这个名字由变量之间的“稠密连接”而得来。主要由两部分构成:

    • 稠密块:定义如何连接输入和输出。

    • 过渡层:控制通道数量,使其不会太复杂。

    何为稠密连接?即最后一层与之前的所有层紧密相连,DenseNet 输出是连接执行从 x \boldsymbol{x} x 到其展开式的映射:

    x → [ x , f 1 ( x ) , f 2 ( [ x , f 1 ( x ) ] ) , f 3 ( [ x , f 1 ( x ) , f 2 ( [ x , f 1 ( x ) ] ) ] , …   ) ] \boldsymbol{x}\to \left[\boldsymbol{x},f_1(\boldsymbol{x}),f_2([\boldsymbol{x},f_1(\boldsymbol{x})]),f_3([\boldsymbol{x},f_1(x),f_2([\boldsymbol{x},f_1(x)])],\dots)\right] x[x,f1(x),f2([x,f1(x)]),f3([x,f1(x),f2([x,f1(x)])],)]

    在这里插入图片描述

    7.7.2 稠密块体

    import torch
    from torch import nn
    from d2l import torch as d2l
    
    • 1
    • 2
    • 3
    def conv_block(input_channels, num_channels):
        return nn.Sequential(
            nn.BatchNorm2d(input_channels), nn.ReLU(),
            nn.Conv2d(input_channels, num_channels, kernel_size=3, padding=1))
    
    • 1
    • 2
    • 3
    • 4
    class DenseBlock(nn.Module):
        def __init__(self, num_convs, input_channels, num_channels):
            super(DenseBlock, self).__init__()
            layer = []
            for i in range(num_convs):
                layer.append(conv_block(  # 输入通道按稠密连接调整
                    num_channels * i + input_channels, num_channels))
            self.net = nn.Sequential(*layer)
    
        def forward(self, X):
            for blk in self.net:
                Y = blk(X)
                X = torch.cat((X, Y), dim=1)  # 连接通道维度上每个块的输入和输出
            return X
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    blk = DenseBlock(2, 3, 10)  # 会得到 3+2*10=23 通道数的输出
    X = torch.randn(4, 3, 8, 8)
    Y = blk(X)
    Y.shape
    
    • 1
    • 2
    • 3
    • 4
    torch.Size([4, 23, 8, 8])
    
    • 1

    7.7.3 过渡层

    由于每个稠密块都会带来通道数的增加,由此会导致模型复杂化。可以使用过渡层来控制模型复杂度,它通过 1 × 1 1\times 1 1×1 的卷积层来减小通道数,并使用步幅为2的平均汇聚层加班班高度和宽度,从而降低模型复杂度。

    def transition_block(input_channels, num_channels):
        return nn.Sequential(
            nn.BatchNorm2d(input_channels), nn.ReLU(),
            nn.Conv2d(input_channels, num_channels, kernel_size=1),
            nn.AvgPool2d(kernel_size=2, stride=2))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    blk = transition_block(23, 10)  # 通道数缩减为10
    blk(Y).shape
    
    • 1
    • 2
    torch.Size([4, 10, 4, 4])
    
    • 1

    7.7.4 DenseNet 模型

    b1 = nn.Sequential(  # b1 层和前面一样的
        nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
        nn.BatchNorm2d(64), nn.ReLU(),
        nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
    
    • 1
    • 2
    • 3
    • 4
    num_convs_in_dense_blocks = [4, 4, 4, 4]  # 使用4个稠密块,每个稠密块内使用4个卷积层
    num_channels, growth_rate = 64, 32  # 增长率为32则每个稠密块增加4*32=128个通道
    blks = []
    for i, num_convs in enumerate(num_convs_in_dense_blocks):
        blks.append(DenseBlock(num_convs, num_channels, growth_rate))
        num_channels += num_convs * growth_rate  # 计算上一个稠密块的输出通道数作为下一个块的输入通道数
        if i != len(num_convs_in_dense_blocks) - 1:  # 在稠密块之间添加一个过渡层,使通道数量减半
            blks.append(transition_block(num_channels, num_channels // 2))
            num_channels = num_channels // 2
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    net = nn.Sequential(
        b1, *blks,
        nn.BatchNorm2d(num_channels), nn.ReLU(),
        nn.AdaptiveAvgPool2d((1, 1)),  # 最终使用全局汇聚层和全连接层输出结果
        nn.Flatten(),
        nn.Linear(num_channels, 10))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    7.7.5 训练模型

    lr, num_epochs, batch_size = 0.1, 10, 256
    train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)
    d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())  # 大约需要十五分钟,慎跑
    
    • 1
    • 2
    • 3
    loss 0.140, train acc 0.948, test acc 0.914
    865.0 examples/sec on cuda:0
    
    • 1
    • 2

    在这里插入图片描述

    练习

    (1)为什么我们在过渡层使用平均汇聚层而不是最大汇聚层?

    我觉得平均汇聚就像考虑所有特征,而最大汇聚就像只考虑最明显的特征。

    过渡层如果只考虑最明显特征可能就会有特征损失掉。

    不过实测差别似乎不大 。


    (2)DenseNet 的优点之一是其模型参数比 ResNet 小。为什么呢?

    X = torch.rand(size=(1, 1, 224, 224), device=d2l.try_gpu())
    for layer in net:
        X = layer(X)
        print(layer.__class__.__name__,'output shape:\t', X.shape)
    
    • 1
    • 2
    • 3
    • 4
    Sequential output shape:	 torch.Size([1, 64, 56, 56])
    DenseBlock output shape:	 torch.Size([1, 192, 56, 56])
    Sequential output shape:	 torch.Size([1, 96, 28, 28])
    DenseBlock output shape:	 torch.Size([1, 224, 28, 28])
    Sequential output shape:	 torch.Size([1, 112, 14, 14])
    DenseBlock output shape:	 torch.Size([1, 240, 14, 14])
    Sequential output shape:	 torch.Size([1, 120, 7, 7])
    DenseBlock output shape:	 torch.Size([1, 248, 7, 7])
    BatchNorm2d output shape:	 torch.Size([1, 248, 7, 7])
    ReLU output shape:	 torch.Size([1, 248, 7, 7])
    AdaptiveAvgPool2d output shape:	 torch.Size([1, 248, 1, 1])
    Flatten output shape:	 torch.Size([1, 248])
    Linear output shape:	 torch.Size([1, 10])
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13

    可以看到过渡层的存在很好的抑制了输出通道数,同样的卷积层数,FenseNet 的层数始终没有超过256。


    (3)DenseNet 一个诟病的问题是内存或显存消耗过多。

    a. 真的是这样吗?可以把输入形状换成 $224\times 224$,来看看实际的显存消耗。
    
    b. 还有其他方法来减少显存消耗吗?需要改变框架么?
    
    • 1
    • 2
    • 3
    net2 = nn.Sequential(
        b1, *blks,
        nn.BatchNorm2d(num_channels), nn.ReLU(),
        nn.AdaptiveAvgPool2d((1, 1)),  # 最终使用全局汇聚层和全连接层输出结果
        nn.Flatten(),
        nn.Linear(num_channels, 10))
    
    lr, num_epochs, batch_size = 0.1, 10, 256
    train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)
    # d2l.train_ch6(net2, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
    # 别跑,一跑显存直接爆炸
    # CUDA out of memory. Tried to allocate 294.00 MiB (GPU 0; 4.00 GiB total capacity; 2.48 GiB already allocated; 109.80 MiB free; 2.61 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    (4)实现 DenseNet 论文表 1 所示的不同 DenseNet 版本。

    在这里插入图片描述

    def conv_block_121(input_channels, num_channels):
        return nn.Sequential(
            nn.BatchNorm2d(input_channels), nn.ReLU(),
            nn.Conv2d(input_channels, 4 * input_channels, kernel_size=1),  # 按原作加个BottleNeck
            nn.BatchNorm2d(4 * input_channels), nn.ReLU(),
            nn.Conv2d(4 * input_channels, num_channels, kernel_size=3, padding=1))
    
    class DenseBlock_121(nn.Module):
        def __init__(self, num_convs, input_channels, num_channels):
            super(DenseBlock_121, self).__init__()
            layer = []
            for i in range(num_convs):
                layer.append(conv_block_121(
                    num_channels * i + input_channels, num_channels))
            self.net = nn.Sequential(*layer)
    
        def forward(self, X):
            for blk in self.net:
                Y = blk(X)
                X = torch.cat((X, Y), dim=1)
            return X
    
    b1 = nn.Sequential(
        nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
        nn.BatchNorm2d(64), nn.ReLU(),
        nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
    
    num_convs_in_dense_blocks_121 = [6, 12, 23, 16]
    num_channels, growth_rate = 64, 32
    blks_121 = []
    for i, num_convs in enumerate(num_convs_in_dense_blocks_121):
        blks_121.append(DenseBlock_121(num_convs, num_channels, growth_rate))
        num_channels += num_convs * growth_rate
        if i != len(num_convs_in_dense_blocks_121) - 1:
            blks_121.append(conv_block_121(num_channels, num_channels // 2))
            num_channels = num_channels // 2
    
    net3 = nn.Sequential(
        b1, *blks_121,
        nn.BatchNorm2d(num_channels), nn.ReLU(),
        nn.AdaptiveAvgPool2d((1, 1)),
        nn.Flatten(),
        nn.Linear(num_channels, 10))
    
    lr, num_epochs, batch_size = 0.1, 10, 64
    train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)
    # d2l.train_ch6(net3, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
    # 跑不了一点,batch_size都调到 64 了,还是爆显存,看看 shape 得了
    # CUDA out of memory. Tried to allocate 90.00 MiB (GPU 0; 4.00 GiB total capacity; 2.49 GiB already allocated; 19.80 MiB free; 2.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
    
    X = torch.rand(size=(1, 1, 224, 224))  # 好吧,看个 shape 都得 6.5 秒
    for layer in net3:
        X = layer(X)
        print(layer.__class__.__name__,'output shape:\t', X.shape)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    Sequential output shape:	 torch.Size([1, 64, 56, 56])
    DenseBlock_121 output shape:	 torch.Size([1, 256, 56, 56])
    Sequential output shape:	 torch.Size([1, 128, 56, 56])
    DenseBlock_121 output shape:	 torch.Size([1, 512, 56, 56])
    Sequential output shape:	 torch.Size([1, 256, 56, 56])
    DenseBlock_121 output shape:	 torch.Size([1, 992, 56, 56])
    Sequential output shape:	 torch.Size([1, 496, 56, 56])
    DenseBlock_121 output shape:	 torch.Size([1, 1008, 56, 56])
    BatchNorm2d output shape:	 torch.Size([1, 1008, 56, 56])
    ReLU output shape:	 torch.Size([1, 1008, 56, 56])
    AdaptiveAvgPool2d output shape:	 torch.Size([1, 1008, 1, 1])
    Flatten output shape:	 torch.Size([1, 1008])
    Linear output shape:	 torch.Size([1, 10])
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
  • 相关阅读:
    23、Plenoxels: Radiance Fields without Neural Networks
    数据结构——AVL树
    脑机接口002 | 上海与长三角地区脑科学发展与跨学科合作
    Debezium发布历史163
    在阿里云/腾讯云Ubuntu服务器上优化VNC配置以提高远程访问体验的方法有哪些?
    【学习笔记】浅谈高斯整数
    初识贝塞尔(bezier)曲线
    月薪2万的大数据职位,为什么必须学习Python?
    Appium环境搭建及元素定位
    游戏录屏软件推荐,教你录制高清游戏视频
  • 原文地址:https://blog.csdn.net/qq_43941037/article/details/133229973