• 深度学习 | Pytorch深度学习实践 (Chapter 10、11 CNN)


     十、CNN 卷积神经网络 基础篇


    首先引入 ——

    • 二维卷积:卷积层保留原空间信息
    • 关键:判断输入输出的维度大小
    • 特征提取:卷积层、下采样
    • 分类器:全连接

            

            


    引例:RGB图像(栅格图像)

    • 首先,老师介绍了CCD相机模型,这是一种通过光敏电阻,利用光强对电阻的阻值影响,对应地影响色彩亮度实现不同亮度等级像素采集的原件。三色图像是采用不同敏感度的光敏电阻实现的。
    • 还介绍了矢量图像(也就是PPT里通过圆心、边、填充信息描述而来的图像,而非采集的图像)
    • 红绿蓝 Channel
    • 拿出一个图像块做卷积,通道高度宽度都可能会改变,将整个图像遍历,每个块分别做卷积

            


    引例:

    深度学习 | CNN卷积核与通道-CSDN博客


     实现:A Simple Convolutional Neural Network

             

    • 池化层一个就行,因为他没有权重,但是有权重的,必须每一层做一个实例
    • relu 非线性激活
    • 交叉熵损失 最后一层不做激活!

                    

    代码实现:

    1. import torch
    2. from torchvision import transforms
    3. from torchvision import datasets
    4. from torch.utils.data import DataLoader
    5. import torch.nn.functional as F
    6. import torch.optim as optim
    7. # 1、数据准备
    8. batch_size = 64
    9. transform = transforms.Compose([
    10. transforms.ToTensor(),
    11. transforms.Normalize((0.1307,),(0.3081,))
    12. ])
    13. train_dataset = datasets.MNIST(root='../dataset/mnist',train=True,download=True,transform=transform)
    14. train_loader = DataLoader(train_dataset,shuffle=True,batch_size=batch_size)
    15. test_dataset = datasets.MNIST(root='../dataset/mnist',train=False,download=True,transform=transform)
    16. test_loader = DataLoader(test_dataset,shuffle=False,batch_size=batch_size)
    17. # 2、构建模型
    18. class Net(torch.nn.Module):
    19. def __init__(self):
    20. super(Net,self).__init__()
    21. self.conv1 = torch.nn.Conv2d(1,10,kernel_size=5)
    22. self.conv2 = torch.nn.Conv2d(10,20,kernel_size=5)
    23. self.pooling = torch.nn.MaxPool2d(2)
    24. self.fc = torch.nn.Linear(320,10)
    25. def forward(self,x):
    26. # Flatten data from (n,1,28,28) to (n,784)
    27. batch_size = x.size(0)
    28. x = self.pooling(F.relu(self.conv1(x)))
    29. x = self.pooling(F.relu(self.conv2(x)))
    30. x = x.view(batch_size,-1) #flatten
    31. x = self.fc(x)
    32. return x
    33. model = Net()
    34. # 3、损失函数和优化器
    35. criterion = torch.nn.CrossEntropyLoss()
    36. optimizer = optim.SGD(model.parameters(),lr=0.01,momentum=0.5)
    37. # 4、训练和测试
    38. def train(epoch):
    39. running_loss = 0.0
    40. for batch_idx,data in enumerate(train_loader,0):
    41. inputs,target = data
    42. optimizer.zero_grad()
    43. outputs = model(inputs)
    44. loss = criterion(outputs,target)
    45. loss.backward()
    46. optimizer.step()
    47. running_loss += loss.item()
    48. if batch_idx % 300 == 299: # 每三百次迭代输出一次
    49. print('[%d , %5d] loss: %.3f' % (epoch + 1 ,batch_idx + 1,running_loss / 300))
    50. running_loss = 0.0
    51. def test():
    52. correct = 0
    53. total = 0
    54. with torch.no_grad():
    55. for data in test_loader:
    56. images,labels = data
    57. outputs = model(images) # 输出为一个矩阵,下面要求每一行最大值(即分类)的下标
    58. _,predicted = torch.max(outputs.data,dim=1)
    59. total += labels.size(0)
    60. correct += (predicted == labels).sum().item()
    61. print('Accuracy on test set: %d %%' % (100 * correct / total))
    62. if __name__ == '__main__':
    63. for epoch in range(10):
    64. train(epoch)
    65. test()

    实验结果:

            



    十一、CNN 卷积神经网络 高级篇

    基础篇中设计的模型类似于LeNet5

            


    再来看一些更为复杂的结构:

    11.1、GoogLeNet

    GoogLeNet是一种串行结构的复杂网络;想要实现复杂网络,并且较少代码冗余和多次重写相同功能的程序,面向过程的语言使用函数,面向对象的语言python使用

    而在CNN当中,使用Moduleblock这种模块将具有复用价值的代码块封装成一块积木,供拼接使用;

    GoogLeNet为自己框架里被重复使用的Module命名为Inception,这也电影盗梦空间的英文名,意为:梦中梦、嵌套;

            


    Inception Module的构造方式之一:

             

    • 为什么要做成这个样子?
      • —— 在构建神经网络时,一些超参数我们是难以确定的,比如卷积核的大小,所以你不知道哪个好用,那我就都用一下,将来通过训练,找到最优的卷积组合。
      • GoogLeNet的设计思路是:我把各种形式的都写进我的Block当中,至于每一个支路的权重,让网络训练的时候自己去搭配;
    • Concatenate:将四条通道算出来的张量拼接到一起
    • GoogLeNet设计了四条通路支线,并要求他们保证图像的宽和高WH必须相同,只有通道数C可以不相同,因为支线进行过卷积和池化等操作后,要将WH构成的面为粘合面,按照C的方向,拼接concatenate起来;
    • Average Pooling:均值池化
    • 1x1的卷积可以将信息融合:也叫network in network(网络里的网络)
      • 1 * 1的卷积核:以往我只是表面上觉得,单位像素大小的卷积核,他的意义不过是调整输入和输出的通道数之间的关系;刘老师举了个例子,让我对这个卷积核有了新的认识:
      • 就是加速运算,他的作用的确是加速运算,不过其中的原理是:通过1*1的核处理过的图像,可以减少后续卷积层的输入通道数;

    Inception块 代码实现:

            

    然后再沿着通道将他们拼接在一起:

            

    将四个分支可以放到一个列表里,然后用torch提供的函数cat沿着dim=1的维度将他们拼接起来

    因为我们的维度是 batch,channel,width,height ,所以是第一个维度dim=1,索引从零开始,C的位置是1

             


    MNIST数据集 代码实现:

    初始的输入通道并没有写死,而是作为构造函数里的参数,这是因为我们将来实例化时可以指明输入通道是多少。

    先是1个卷积层(conv,maxpooling,relu),然后inceptionA模块(输出的channels是24+16+24+24=88),接下来又是一个卷积层(conv,mp,relu),然后inceptionA模块,最后一个全连接层(fc)。

    1408这个数据可以通过x = x.view(in_size, -1)后调用x.shape得到。


    也可通过查看网络结构:

    最后一层线性层的输入尺寸(input size)1408是根据倒数第二个InceptionA模块的输出形状推导出来的。在该模块中,输入形状为[-1, 88, 4, 4],其中-1表示批量大小(Batch Size)。因此,通过展平这个特征图(Flatten),我们可以将其转换为一维向量,即 [-1, 88 * 4 * 4] = [-1, 1408]。

    所以,线性层的输入尺寸为1408,它接收展平后的特征向量作为输入,并将其映射到10个输出类别的向量。

    1. import torch
    2. from torchvision import transforms
    3. from torchvision import datasets
    4. from torch.utils.data import DataLoader
    5. import torch.nn.functional as F
    6. import torch.optim as optim
    7. from torchvision import models
    8. from torchsummary import summary
    9. # 1、数据准备
    10. batch_size = 64
    11. transform = transforms.Compose([
    12. transforms.ToTensor(),
    13. transforms.Normalize((0.1307,),(0.3081,))
    14. ])
    15. train_dataset = datasets.MNIST(root='../dataset/mnist',train=True,download=True,transform=transform)
    16. train_loader = DataLoader(train_dataset,shuffle=True,batch_size=batch_size)
    17. test_dataset = datasets.MNIST(root='../dataset/mnist',train=False,download=True,transform=transform)
    18. test_loader = DataLoader(test_dataset,shuffle=False,batch_size=batch_size)
    19. # 2、构建模型
    20. class InceptionA(torch.nn.Module):
    21. def __init__(self,in_channels):
    22. super(InceptionA,self).__init__()
    23. self.branch1x1 = torch.nn.Conv2d(in_channels,16,kernel_size=1)
    24. self.branch5x5_1 = torch.nn.Conv2d(in_channels,16,kernel_size=1)
    25. self.branch5x5_2 = torch.nn.Conv2d(16, 24, kernel_size=5,padding=2)
    26. self.branch3x3_1 = torch.nn.Conv2d(in_channels,16,kernel_size=1)
    27. self.branch3x3_2 = torch.nn.Conv2d(16, 24,kernel_size=3,padding=1)
    28. self.branch3x3_3 = torch.nn.Conv2d(24, 24, kernel_size=3, padding=1)
    29. self.branch_pool = torch.nn.Conv2d(in_channels,24,kernel_size=1)
    30. def forward(self,x):
    31. branch1x1 = self.branch1x1(x)
    32. branch5x5 = self.branch5x5_1(x)
    33. branch5x5 = self.branch5x5_2(branch5x5)
    34. branch3x3 = self.branch3x3_1(x)
    35. branch3x3 = self.branch3x3_2(branch3x3)
    36. branch3x3 = self.branch3x3_3(branch3x3)
    37. branch_pool = F.avg_pool2d(x,kernel_size=3,stride=1,padding=1)
    38. branch_pool = self.branch_pool(branch_pool)
    39. outputs = [branch1x1,branch5x5,branch3x3,branch_pool]
    40. return torch.cat(outputs,dim=1)
    41. class Net(torch.nn.Module):
    42. def __init__(self):
    43. super(Net,self).__init__()
    44. self.conv1 = torch.nn.Conv2d(1,10,kernel_size=5)
    45. self.conv2 = torch.nn.Conv2d(88,20,kernel_size=5)
    46. self.incep1 = InceptionA(in_channels=10)
    47. self.incep2 = InceptionA(in_channels=20)
    48. self.mp = torch.nn.MaxPool2d(2)
    49. self.fc = torch.nn.Linear(1408,10)
    50. def forward(self,x):
    51. in_size = x.size(0)
    52. x = F.relu(self.mp(self.conv1(x)))
    53. x = self.incep1(x)
    54. x = F.relu(self.mp(self.conv2(x)))
    55. x = self.incep2(x)
    56. x = x.view(in_size,-1)
    57. x = self.fc(x)
    58. return x
    59. model = Net()
    60. #summary(model,(1,28,28),device='cpu')
    61. # 3、损失函数和优化器
    62. criterion = torch.nn.CrossEntropyLoss()
    63. optimizer = optim.SGD(model.parameters(),lr=0.01,momentum=0.5)
    64. # 4、训练和测试
    65. def train(epoch):
    66. running_loss = 0.0
    67. for batch_idx,data in enumerate(train_loader,0):
    68. inputs,target = data
    69. optimizer.zero_grad()
    70. outputs = model(inputs)
    71. loss = criterion(outputs,target)
    72. loss.backward()
    73. optimizer.step()
    74. running_loss += loss.item()
    75. if batch_idx % 300 == 299: # 每三百次迭代输出一次
    76. print('[%d , %5d] loss: %.3f' % (epoch + 1 ,batch_idx + 1,running_loss / 300))
    77. running_loss = 0.0
    78. def test():
    79. correct = 0
    80. total = 0
    81. with torch.no_grad():
    82. for data in test_loader:
    83. images,labels = data
    84. outputs = model(images) # 输出为一个矩阵,下面要求每一行最大值(即分类)的下标
    85. _,predicted = torch.max(outputs.data,dim=1)
    86. total += labels.size(0)
    87. correct += (predicted == labels).sum().item()
    88. print('Accuracy on test set: %d %%' % (100 * correct / total))
    89. if __name__ == '__main__':
    90. for epoch in range(10):
    91. train(epoch)
    92. test()

            Layer (type)               Output Shape         Param #
    ================================================================
                Conv2d-1           [-1, 10, 24, 24]             260
             MaxPool2d-2           [-1, 10, 12, 12]               0
                Conv2d-3           [-1, 16, 12, 12]             176
                Conv2d-4           [-1, 16, 12, 12]             176
                Conv2d-5           [-1, 24, 12, 12]           9,624
                Conv2d-6           [-1, 16, 12, 12]             176
                Conv2d-7           [-1, 24, 12, 12]           3,480
                Conv2d-8           [-1, 24, 12, 12]           5,208
                Conv2d-9           [-1, 24, 12, 12]             264
           InceptionA-10           [-1, 88, 12, 12]               0
               Conv2d-11             [-1, 20, 8, 8]          44,020
            MaxPool2d-12             [-1, 20, 4, 4]               0
               Conv2d-13             [-1, 16, 4, 4]             336
               Conv2d-14             [-1, 16, 4, 4]             336
               Conv2d-15             [-1, 24, 4, 4]           9,624
               Conv2d-16             [-1, 16, 4, 4]             336
               Conv2d-17             [-1, 24, 4, 4]           3,480
               Conv2d-18             [-1, 24, 4, 4]           5,208
               Conv2d-19             [-1, 24, 4, 4]             504
           InceptionA-20             [-1, 88, 4, 4]               0
               Linear-21                   [-1, 10]          14,090 

             

             


    11.2、ResNet

    GoogLeNet最后留下了一个问题:通过测试,网络的层数会影响模型的精度,但当时没有意识到梯度消失的问题,

    所以GoogLeNet认为We Need To Go Deeper;

    直到何凯明大神的ResNet的出现,提出了层数越多,模型效果不一定越好的问题,

    并针对这个问题提出了解决方案ResNet网络结构。

            

    Residual Net提出了这样一种块:跳连接

                    

    以往的网络模型是这种Plain Net形式:

    输入数据x,经过Weight Layer(可以是卷积层,也可以是池化或者线性层),再通过激活函数加入非线性影响因素,最后输出结果H(x);

    这种方式使得H(x)对x的偏导数的值分布在(0,1)之间,这在反向传播、复合函数的偏导数逐步累乘的过程中,必然会导致损失函数L对x的偏导数的值,趋近于0,而且,网络层数越深,这种现象就会越明显,最终导致最开始的(也就是靠近输入的)层没有获得有效的权重更新,甚至模型失效;

    即梯度消失:假如每一处的梯度都小于1,由于我们使用的是反向传播,当梯度趋近于0时,那么权重得不到更新:w=w- \sigma g,也就是说离输入近的块没办法得到充分的训练。

    解决方法:逐层训练,但层数过多会很难

    ResNet采用了一个非常巧妙的方式解决了H(x)对x的偏导数的值分布在(0,1)之间这个问题:

    在以往的框架中,加入一个跳跃,再原有的网络输出F(x)的基础上,将输入x累加到上面,这样一来,在最终输出H(x)对输入数据x求偏导数的时候,这个结果就会分布在(1,2)之间,这样就不怕网络在更新权重梯度累乘的过程中,出现乘积越来越趋于0而导致的梯度消失问题;

    与GoogLeNet类似,ResNet的Residual Block在搭建时,留了一个传入参数的机会,这个参数留给了通道数channel,Residual Block的要求是输入与输出的C,W,H分别对应相同,B是一定要相同的,所以就是说,经过残差模块Residual Block处理过的图像,并不改变原有的尺寸和通道数;(TBD)
     

    但是注意,因为要和x做加法,所以图中的两层输出和输入x 张量维度必须完全一样,即通道高度宽度都要一样

    若输出和输入的维度不一样,也可以做跳连接,可以将x过一个最大池化层转换成同样的大小,如下图

             


    利用残差结构块的网络:

             

    先来看一下residual block的代码实现:

    为了保持输入输出的大小不变,所以要将padding设置为1,输入通道和输出通道都和x保持一致

    注意第二个卷积之后,先做求和再激活

            


            

    MNIST数据集 代码实现:

    1. import torch
    2. from torchvision import transforms
    3. from torchvision import datasets
    4. from torch.utils.data import DataLoader
    5. import torch.nn.functional as F
    6. import torch.optim as optim
    7. from torchvision import models
    8. from torchsummary import summary
    9. from torchviz import make_dot
    10. # 1、数据准备
    11. batch_size = 64
    12. transform = transforms.Compose([
    13. transforms.ToTensor(),
    14. transforms.Normalize((0.1307,),(0.3081,))
    15. ])
    16. train_dataset = datasets.MNIST(root='../dataset/mnist',train=True,download=True,transform=transform)
    17. train_loader = DataLoader(train_dataset,shuffle=True,batch_size=batch_size)
    18. test_dataset = datasets.MNIST(root='../dataset/mnist',train=False,download=True,transform=transform)
    19. test_loader = DataLoader(test_dataset,shuffle=False,batch_size=batch_size)
    20. # 2、构建模型
    21. class InceptionA(torch.nn.Module):
    22. def __init__(self,in_channels):
    23. super(InceptionA,self).__init__()
    24. self.branch1x1 = torch.nn.Conv2d(in_channels,16,kernel_size=1)
    25. self.branch5x5_1 = torch.nn.Conv2d(in_channels,16,kernel_size=1)
    26. self.branch5x5_2 = torch.nn.Conv2d(16, 24, kernel_size=5,padding=2)
    27. self.branch3x3_1 = torch.nn.Conv2d(in_channels,16,kernel_size=1)
    28. self.branch3x3_2 = torch.nn.Conv2d(16, 24,kernel_size=3,padding=1)
    29. self.branch3x3_3 = torch.nn.Conv2d(24, 24, kernel_size=3, padding=1)
    30. self.branch_pool = torch.nn.Conv2d(in_channels,24,kernel_size=1)
    31. def forward(self,x):
    32. branch1x1 = self.branch1x1(x)
    33. branch5x5 = self.branch5x5_1(x)
    34. branch5x5 = self.branch5x5_2(branch5x5)
    35. branch3x3 = self.branch3x3_1(x)
    36. branch3x3 = self.branch3x3_2(branch3x3)
    37. branch3x3 = self.branch3x3_3(branch3x3)
    38. branch_pool = F.avg_pool2d(x,kernel_size=3,stride=1,padding=1)
    39. branch_pool = self.branch_pool(branch_pool)
    40. outputs = [branch1x1,branch5x5,branch3x3,branch_pool]
    41. return torch.cat(outputs,dim=1)
    42. class ResidualBlock(torch.nn.Module):
    43. def __init__(self,channels):
    44. super(ResidualBlock,self).__init__()
    45. self.channels = channels
    46. self.conv1 = torch.nn.Conv2d(channels,channels,kernel_size=3,padding=1)
    47. self.conv2 = torch.nn.Conv2d(channels, channels, kernel_size=3, padding=1)
    48. def forward(self,x):
    49. y = F.relu(self.conv1(x))
    50. y = self.conv2(y)
    51. return F.relu(x+y)
    52. class Net(torch.nn.Module):
    53. def __init__(self):
    54. super(Net,self).__init__()
    55. self.conv1 = torch.nn.Conv2d(1,16,kernel_size=5)
    56. self.conv2 = torch.nn.Conv2d(16,32,kernel_size=5)
    57. self.rblock1 = ResidualBlock(16)
    58. self.rblock2 = ResidualBlock(32)
    59. self.mp = torch.nn.MaxPool2d(2)
    60. self.fc = torch.nn.Linear(512,10)
    61. def forward(self,x):
    62. in_size = x.size(0)
    63. x = self.mp(F.relu(self.conv1(x)))
    64. x = self.rblock1(x)
    65. x = self.mp(F.relu(self.conv2(x)))
    66. x = self.rblock2(x)
    67. x = x.view(in_size,-1)
    68. x = self.fc(x)
    69. return x
    70. model = Net()
    71. #x = torch.randn(1,1,28,28)
    72. #y = model(x)
    73. #vise=make_dot(y, params=dict(model.named_parameters()))
    74. #vise.view()
    75. #print(model)
    76. #summary(model,(1,28,28),device='cpu')
    77. # 3、损失函数和优化器
    78. criterion = torch.nn.CrossEntropyLoss()
    79. optimizer = optim.SGD(model.parameters(),lr=0.01,momentum=0.5)
    80. # 4、训练和测试
    81. def train(epoch):
    82. running_loss = 0.0
    83. for batch_idx,data in enumerate(train_loader,0):
    84. inputs,target = data
    85. optimizer.zero_grad()
    86. outputs = model(inputs)
    87. loss = criterion(outputs,target)
    88. loss.backward()
    89. optimizer.step()
    90. running_loss += loss.item()
    91. if batch_idx % 300 == 299: # 每三百次迭代输出一次
    92. print('[%d , %5d] loss: %.3f' % (epoch + 1 ,batch_idx + 1,running_loss / 300))
    93. running_loss = 0.0
    94. def test():
    95. correct = 0
    96. total = 0
    97. with torch.no_grad():
    98. for data in test_loader:
    99. images,labels = data
    100. outputs = model(images) # 输出为一个矩阵,下面要求每一行最大值(即分类)的下标
    101. _,predicted = torch.max(outputs.data,dim=1)
    102. total += labels.size(0)
    103. correct += (predicted == labels).sum().item()
    104. print('Accuracy on test set: %d %%' % (100 * correct / total))
    105. if __name__ == '__main__':
    106. for epoch in range(10):
    107. train(epoch)
    108. test()

     实验结果:

             


    课程最后刘老师推荐了两篇论文:

        Identity Mappings in Deep Residual Networks:

        He K, Zhang X, Ren S, et al. Identity Mappings in Deep Residual Networks[C]

            其中给出了很多不同种类的Residual Block变化的构造形式;

             


        Densely Connected Convolutional Networks:

        Huang G, Liu Z, Laurens V D M, et al. Densely Connected Convolutional Networks[J]. 2016:2261-2269. 

             

            大名鼎鼎的DenseNet,这个网络结构基于ResNet跳跃传递的思想,实现了多次跳跃的网络结构,以后很多通过神经网络提取多尺度、多层级的特征,都在利用这种方式,通过Encoder对不同层级的语义特征进行逐步提取,在穿插着传递到Decoder过程中不同的层级上去,旨在融合不同层级的特征,尽可能地挖掘图像全部的特征;


    学习方法

            

    全文资料及部分文字来源于 —— 

    【Pytorch深度学习实践】B站up刘二大人之BasicCNN & Advanced CNN -代码理解与实现(9/9)_b站讲神经网络的up土堆-CSDN博客

    11.卷积神经网络(高级篇)_哔哩哔哩_bilibili


  • 相关阅读:
    【NNDL作业】图像锐化后,为什么“蒙上了一层灰色”?
    MySQL数据库5.5.25a版本下载与安装
    VMWare里Centos系统下使用Bonding技术实现两块网卡绑定
    08_瑞萨GUI(LVGL)移植实战教程之LVGL对接串口打印
    SNAT和DNAT原理及应用
    Visual Studio 2022 设置 PySide6 扩展工具
    06 Thread API
    【学习笔记04】认识npm
    C++ 求幂函数pow()的输出问题
    【STL】:反向迭代器
  • 原文地址:https://blog.csdn.net/weixin_47187147/article/details/133935346