• 深度学习基础——残差神经网络(ResNet)


    深度学习基础——残差神经网络(ResNet)

    1. 定义

    残差神经网络(ResNet)是一种深度神经网络结构,由微软研究院的Kaiming He等人于2015年提出。它通过引入残差块(Residual Block)来解决深度神经网络的退化问题,使得网络可以更深地进行训练。ResNet在ImageNet图像识别挑战赛上取得了第一名的成绩,并在许多领域取得了显著的成功应用。

    2. 如何计算

    ResNet的核心思想是引入残差连接(Residual Connection)。传统的神经网络是通过堆叠一系列的层来逐层提取特征,但随着网络层数的增加,网络往往会遭遇梯度消失(Gradient Vanishing)和梯度爆炸(Gradient Exploding)等问题,导致训练困难。ResNet通过在原始输入和输出之间添加一个跳跃连接,使得网络可以学习残差,从而解决了这些问题。

    残差块的计算方式可以表示为:

    Output = F ( Input ) + Input \text{Output} = \mathcal{F}(\text{Input}) + \text{Input} Output=F(Input)+Input

    其中, F ( ⋅ ) \mathcal{F}(\cdot) F()表示残差学习的映射函数。通过残差块,网络学习到的是残差 F ( Input ) \mathcal{F}(\text{Input}) F(Input),而不是直接学习输出。这种设计使得网络可以更轻松地学习到恒等映射,从而避免了梯度消失和梯度爆炸的问题。

    3. 用Python实现(结果可视化)

    下面是使用PyTorch实现的一个简单的ResNet模型,并使用CIFAR-10数据集进行训练和测试的示例代码:

    import torch
    import torch.nn as nn
    import torch.optim as optim
    import torchvision
    import torchvision.transforms as transforms
    
    # 定义ResNet残差块
    class ResidualBlock(nn.Module):
        def __init__(self, in_channels, out_channels, stride=1):
            super(ResidualBlock, self).__init__()
            self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
            self.bn1 = nn.BatchNorm2d(out_channels)
            self.relu = nn.ReLU(inplace=True)
            self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
            self.bn2 = nn.BatchNorm2d(out_channels)
            self.downsample = None
            if stride != 1 or in_channels != out_channels:
                self.downsample = nn.Sequential(
                    nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                    nn.BatchNorm2d(out_channels)
                )
    
        def forward(self, x):
            residual = x
            out = self.conv1(x)
            out = self.bn1(out)
            out = self.relu(out)
            out = self.conv2(out)
            out = self.bn2(out)
            if self.downsample is not None:
                residual = self.downsample(residual)
            out += residual
            out = self.relu(out)
            return out
    
    # 定义ResNet网络
    class ResNet(nn.Module):
        def __init__(self, block, num_blocks, num_classes=10):
            super(ResNet, self).__init__()
            self.in_channels = 64
            self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
            self.bn1 = nn.BatchNorm2d(64)
            self.relu = nn.ReLU(inplace=True)
            self.layer1 = self.make_layer(block, 64, num_blocks[0], stride=1)
            self.layer2 = self.make_layer(block, 128, num_blocks[1], stride=2)
            self.layer3 = self.make_layer(block, 256
    
    , num_blocks[2], stride=2)
            self.layer4 = self.make_layer(block, 512, num_blocks[3], stride=2)
            self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
            self.fc = nn.Linear(512, num_classes)
    
        def make_layer(self, block, out_channels, num_blocks, stride):
            layers = []
            layers.append(block(self.in_channels, out_channels, stride))
            self.in_channels = out_channels
            for _ in range(1, num_blocks):
                layers.append(block(out_channels, out_channels, stride=1))
            return nn.Sequential(*layers)
    
        def forward(self, x):
            out = self.conv1(x)
            out = self.bn1(out)
            out = self.relu(out)
            out = self.layer1(out)
            out = self.layer2(out)
            out = self.layer3(out)
            out = self.layer4(out)
            out = self.avg_pool(out)
            out = out.view(out.size(0), -1)
            out = self.fc(out)
            return out
    
    # 定义数据预处理
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])
    
    # 加载CIFAR-10数据集
    trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)
    
    testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
    testloader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=False, num_workers=2)
    
    classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
    
    # 定义设备
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    
    # 定义ResNet模型
    net = ResNet(ResidualBlock, [2, 2, 2, 2]).to(device)
    
    # 定义损失函数和优化器
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
    
    # 训练网络
    num_epochs = 50
    for epoch in range(num_epochs):
        net.train()
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            inputs, labels = data[0].to(device), data[1].to(device)
            optimizer.zero_grad()
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / len(trainloader)))
    
    print('Finished Training')
    
    # 测试网络
    net.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for data in testloader:
            images, labels = data[0].to(device), data[1].to(device)
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))
    
    # 保存模型
    torch.save(net.state_dict(), 'resnet_model.pth')
    
    # 加载模型
    net.load_state_dict(torch.load('resnet_model.pth'))
    
    # 输出结果可视化
    import matplotlib.pyplot as plt
    import numpy as np
    
    # 输出图像的函数
    def imshow(img):
        img = img / 2 + 0.5     # unnormalize
        npimg = img.numpy()
        plt.imshow(np.transpose(npimg, (1, 2, 0)))
        plt.show()
    
    # 获取随机数据
    dataiter = iter(testloader)
    images, labels = dataiter.next()
    
    # 输出图像
    imshow(torchvision.utils.make_grid(images))
    print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
    
    # 预测图像
    outputs = net(images.to(device))
    _, predicted = torch.max(outputs, 1)
    
    print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145
    • 146
    • 147
    • 148
    • 149
    • 150
    • 151
    • 152
    • 153
    • 154
    • 155
    • 156
    • 157
    • 158
    • 159

    此示例演示了如何使用PyTorch实现ResNet模型,并使用CIFAR-10数据集对其进行训练和测试。最后,展示了模型对测试图像的分类结果,并可视化了部分图像及其预测结果。

  • 相关阅读:
    开启数据库审计 db,extended级别或os级别)并将审计文件存放到/opt/oracle/audit/下
    SpringBoot+Mybatis实现代码获取建表语句并实现动态建表
    【历史上的今天】6 月 24 日:网易成立;首届消费电子展召开;世界上第一次网络直播
    Java真的不难(四十九)Redis的入门及使用(2)
    CEP开发基础知识-AI|PS|AE插件-事件机制-文件操作-界面颜色
    H5 <blockquote> 标签
    【树莓派不吃灰】命令篇⑥ 了解树莓派Boot分区,学习Linux启动流程
    【自用重要】概率论中θ和θ尖的区别【计算时的一般方法】
    OceanBase 如何通过日志观测冻结转储流程?
    Linux--shell脚本详解
  • 原文地址:https://blog.csdn.net/weixin_39753819/article/details/137834587