• 适合初学者的CNN数字图像识别项目:Digit Recognizer with CNN for beginner


    准备工作

    MNIST数据集介绍

    MNIST(“Modified National Institute of Standards and Technology”)是事实上的计算机视觉“hello world”数据集。自 1999 年发布以来,这个经典的手写图像数据集一直作为基准分类算法的基础。随着新的机器学习技术的出现,MNIST 仍然是研究人员和学习者的可靠资源。我们的目标是从数万张手写图像的数据集中正确识别数字。

    数据文件 train.csv 和 test.csv 包含从零到九的手绘数字的灰度图像。

    每张图像高 28 像素,宽 28 像素,总共 784 像素。每个像素都有一个与之关联的像素值,表示该像素的亮度或暗度,数字越大表示越暗。该像素值是介于 0 和 255 之间的整数,包括 0 和 255。

    训练数据集 (train.csv) 有 785 列。第一列称为“标签”,是用户绘制的数字。其余列包含相关图像的像素值。

    训练集中的每个像素列都有一个类似 pixelx 的名称,其中 x 是 0 到 783 之间的整数,包括 0 到 783。要在图像上定位该像素,假设我们已将 x 分解为 x = i * 28 + j,其中 i 和 j 是 0 到 27 之间的整数,包括 0 和 27。然后 pixelx 位于 28 x 28 矩阵的第 i 行和第 j 列(索引为零)。

    例如,pixel31 表示左数第四列、上数第二行的像素,如下面的 ascii 图表所示。

    从视觉上看,如果我们省略“像素”前缀,像素组成图像如下:

    000 001 002 003 ... 026 027
    028 029 030 031 ... 054 055
    056 057 058 059 ... 082 083
     |   |   |   |  ...  |   |
    728 729 730 731 ... 754 755
    756 757 758 759 ... 782 783 
    

    测试数据集 (test.csv) 与训练集相同,只是它不包含“标签”列。

    您的提交文件应采用以下格式:对于测试集中的 28000 张图像中的每一张,输出一行包含 ImageId 和您预测的数字。例如,如果您预测第一张图像是 3,第二张图像是 7,第三张图像是 8,那么您的提交文件将如下所示:

    ImageId,Label
    1,3
    2,7
    3,8 
    (27997 more lines)
    

    本次比赛的评价指标是分类准确率,或者说测试图像被正确分类的比例。例如,0.97 的分类准确度表示您已正确分类除 3% 的图像之外的所有图像。

    数据集下载:https://wwp.lanzoub.com/iIUFY08t575a

    导入包

    import numpy as np
    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    %matplotlib inline
    

    读取数据集

    train = pd.read_csv('../input/digit-recognizer/train.csv')
    
    test = pd.read_csv('../input/digit-recognizer/test.csv')
    

    查看数据特征

    train.head()
    
    label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
    0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    2 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    3 4 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

    5 rows × 785 columns

    train.info()
    


    RangeIndex: 42000 entries, 0 to 41999
    Columns: 785 entries, label to pixel783
    dtypes: int64(785)
    memory usage: 251.5 MB

    train.isnull().sum()
    

    label 0
    pixel0 0
    pixel1 0
    pixel2 0
    pixel3 0
    ..
    pixel779 0
    pixel780 0
    pixel781 0
    pixel782 0
    pixel783 0
    Length: 785, dtype: int64

    sum(train.isnull().sum())
    

    0

    预处理训练集|测试集

    #y_train 是数字标签
    y_train = train['label'].copy()
    
    #X_train 是各像素亮度值
    X_train = train.drop('label',axis=1)
    
    y_train.value_counts()
    

    1 4684
    7 4401
    3 4351
    9 4188
    2 4177
    6 4137
    0 4132
    4 4072
    8 4063
    5 3795
    Name: label, dtype: int64

    y_train = pd.get_dummies(y_train,prefix='Num')
    
    y_train.head()
    
    Num_0 Num_1 Num_2 Num_3 Num_4 Num_5 Num_6 Num_7 Num_8 Num_9
    0 0 1 0 0 0 0 0 0 0 0
    1 1 0 0 0 0 0 0 0 0 0
    2 0 1 0 0 0 0 0 0 0 0
    3 0 0 0 0 1 0 0 0 0 0
    4 1 0 0 0 0 0 0 0 0 0
    #28×28一共784个像素,其中的数值表示亮度[0,255]
    X_train.describe()
    
    pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
    count 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 ... 42000.000000 42000.000000 42000.000000 42000.00000 42000.000000 42000.000000 42000.0 42000.0 42000.0 42000.0
    mean 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.219286 0.117095 0.059024 0.02019 0.017238 0.002857 0.0 0.0 0.0 0.0
    std 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 6.312890 4.633819 3.274488 1.75987 1.894498 0.414264 0.0 0.0 0.0 0.0
    min 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.0 0.0 0.0 0.0
    25% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.0 0.0 0.0 0.0
    50% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.0 0.0 0.0 0.0
    75% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.0 0.0 0.0 0.0
    max 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 254.000000 254.000000 253.000000 253.00000 254.000000 62.000000 0.0 0.0 0.0 0.0

    8 rows × 784 columns

    #from sklearn.preprocessing import Normalizer
    
    X_train = X_train/255
    
    X_train.head()
    
    pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
    0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

    5 rows × 784 columns

    X_train.describe()
    
    pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
    count 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 ... 42000.000000 42000.000000 42000.000000 42000.000000 42000.000000 42000.000000 42000.0 42000.0 42000.0 42000.0
    mean 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000860 0.000459 0.000231 0.000079 0.000068 0.000011 0.0 0.0 0.0 0.0
    std 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.024756 0.018172 0.012841 0.006901 0.007429 0.001625 0.0 0.0 0.0 0.0
    min 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0
    25% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0
    50% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0
    75% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0
    max 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.996078 0.996078 0.992157 0.992157 0.996078 0.243137 0.0 0.0 0.0 0.0

    8 rows × 784 columns

    X_train = X_train.values.reshape(-1,28,28,1)
    
    X_train
    

    array([[[[0.],
    [0.],
    [0.],
    ...,
    [0.],
    [0.],
    [0.]],

    test.info()
    


    RangeIndex: 28000 entries, 0 to 27999
    Columns: 784 entries, pixel0 to pixel783
    dtypes: int64(784)
    memory usage: 167.5 MB

    test.isnull().sum()
    

    pixel0 0
    pixel1 0
    pixel2 0
    pixel3 0
    pixel4 0
    ..
    pixel779 0
    pixel780 0
    pixel781 0
    pixel782 0
    pixel783 0
    Length: 784, dtype: int64

    sum(test.isnull().sum())
    

    0

    test = test/255
    
    test.head()
    
    pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
    0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

    5 rows × 784 columns

    test.describe()
    
    pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
    count 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 ... 28000.000000 28000.000000 28000.000000 28000.000000 28000.000000 28000.0 28000.0 28000.0 28000.0 28000.0
    mean 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000646 0.000287 0.000110 0.000044 0.000026 0.0 0.0 0.0 0.0 0.0
    std 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.021464 0.014184 0.007112 0.004726 0.003167 0.0 0.0 0.0 0.0 0.0
    min 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
    25% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
    50% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
    75% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
    max 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.992157 0.996078 0.756863 0.733333 0.466667 0.0 0.0 0.0 0.0 0.0

    8 rows × 784 columns

    test = test.values.reshape(-1,28,28,1)
    
    test
    

    array([[[[0.],
    [0.],
    [0.],
    ...,
    [0.],
    [0.],
    [0.]],

    训练CNN Model

    import tensorflow as tf
    
    tf.__version__
    

    '2.6.4'

    cnn = tf.keras.models.Sequential()
    

    2022-08-01 05:41:16.816392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15403 MB memory: -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0

    #Convolution
    cnn.add(tf.keras.layers.Conv2D(filters=256,kernel_size=(5,5),activation='relu',input_shape=(28,28,1)))
    
    #Max Pooling
    cnn.add(tf.keras.layers.MaxPool2D(pool_size=(3,3),strides=3))
    
    cnn.add(tf.keras.layers.BatchNormalization())
    
    cnn.add(tf.keras.layers.Conv2D(filters=128,kernel_size=(4,4),activation='relu'))
    
    cnn.add(tf.keras.layers.MaxPool2D(pool_size=(2,2),strides=2))
    
    #Flattening
    cnn.add(tf.keras.layers.Flatten())
    
    #Full connection 
    cnn.add(tf.keras.layers.Dense(units=256,activation='relu'))
    
    #Output Layer
    cnn.add(tf.keras.layers.Dense(units=10,activation='softmax'))
    
    #Compile cnn
    cnn.compile(optimizer='adam',loss='categorical_crossentropy')
    
    # Epoch(时期):
    # 当一个完整的数据集通过了神经网络一次并且返回了一次,这个过程称为一次>epoch。(也就是说,所有训练样本在神经网络中都 进行了一次正向传播 和一次反向传播 )
    # 再通俗一点,一个Epoch就是将所有训练样本训练一次的过程。
    # 然而,当一个Epoch的样本(也就是所有的训练样本)数量可能太过庞大(对于计算机而言),就需要把它分成多个小块,也就是就是分成多个Batch 来进行训练。**
    
    # Batch(批 / 一批样本):
    # 将整个训练样本分成若干个Batch。
    
    # Batch_Size(批大小):
    # 每批样本的大小。
    
    # Iteration(一次迭代):
    # 训练一个Batch就是一次Iteration(这个概念跟程序语言中的迭代器相似)。
    
    cnn.fit(X_train,y_train,batch_size=32,epochs=50)
    

    2022-08-01 05:41:18.154328: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)

    Epoch 1/50

    2022-08-01 05:41:19.541340: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005

    1313/1313 [] - 13s 5ms/step - loss: 0.1159
    Epoch 2/50
    1313/1313 [
    ] - 5s 4ms/step - loss: 0.0496
    Epoch 3/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0367
    Epoch 4/50
    1313/1313 [
    ] - 5s 4ms/step - loss: 0.0289
    Epoch 5/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0256
    Epoch 6/50
    1313/1313 [
    ] - 5s 4ms/step - loss: 0.0220
    Epoch 7/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0192
    Epoch 8/50
    1313/1313 [
    ] - 5s 4ms/step - loss: 0.0167
    Epoch 9/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0146
    Epoch 10/50
    1313/1313 [
    ] - 5s 4ms/step - loss: 0.0121
    Epoch 11/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0133
    Epoch 12/50
    1313/1313 [
    ] - 5s 4ms/step - loss: 0.0142
    Epoch 13/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0119
    Epoch 14/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0125
    Epoch 15/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0103
    Epoch 16/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0103
    Epoch 17/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0130
    Epoch 18/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0118
    Epoch 19/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0093
    Epoch 20/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0075
    Epoch 21/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0075
    Epoch 22/50
    1313/1313 [
    ] - 6s 5ms/step - loss: 0.0129
    Epoch 23/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0105
    Epoch 24/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0087
    Epoch 25/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0097
    Epoch 26/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0117
    Epoch 27/50
    1313/1313 [] - 5s 4ms/step - loss: 0.0051
    Epoch 28/50
    1313/1313 [
    ] - 6s 5ms/step - loss: 0.0086
    Epoch 29/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0100
    Epoch 30/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0087
    Epoch 31/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0096
    Epoch 32/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0065
    Epoch 33/50
    1313/1313 [] - 5s 4ms/step - loss: 0.0082
    Epoch 34/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0110
    Epoch 35/50
    1313/1313 [] - 6s 4ms/step - loss: 0.0063
    Epoch 36/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0107
    Epoch 37/50
    1313/1313 [] - 5s 4ms/step - loss: 0.0048
    Epoch 38/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0076
    Epoch 39/50
    1313/1313 [] - 5s 4ms/step - loss: 0.0154
    Epoch 40/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0095
    Epoch 41/50
    1313/1313 [] - 5s 4ms/step - loss: 0.0052
    Epoch 42/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0057
    Epoch 43/50
    1313/1313 [] - 5s 4ms/step - loss: 0.0080
    Epoch 44/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0085
    Epoch 45/50
    1313/1313 [] - 5s 4ms/step - loss: 0.0108
    Epoch 46/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0062
    Epoch 47/50
    1313/1313 [] - 5s 4ms/step - loss: 0.0118
    Epoch 48/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0078
    Epoch 49/50
    1313/1313 [] - 5s 4ms/step - loss: 0.0083
    Epoch 50/50
    1313/1313 [
    ] - 6s 4ms/step - loss: 0.0044

    pred = cnn.predict(test)
    
    pred = np.argmax(pred,axis=1)
    
    pred
    

    array([2, 0, 9, ..., 3, 9, 2])

    pred = pd.DataFrame(pred,columns=['Label'])
    
    test_id = list(range(1,len(test)+1,1))
    
    test_id = pd.DataFrame(test_id,columns=['ImageId'])
    
    submission = pd.concat([test_id,pred],axis=1)
    
    submission.describe()
    
    ImageId Label
    count 28000.000000 28000.000000
    mean 14000.500000 4.453036
    std 8083.048105 2.896665
    min 1.000000 0.000000
    25% 7000.750000 2.000000
    50% 14000.500000 4.000000
    75% 21000.250000 7.000000
    max 28000.000000 9.000000

    原创作者:孤飞-博客园
    原文地址:https://www.cnblogs.com/ranxi169/p/16540166.html

    jupyter格式代码查看|下载https://www.kaggle.com/code/ranxi169/digit-recognizer-with-cnn-for-beginner/notebook

  • 相关阅读:
    go|一道算法题引发的思考|slice底层剖析
    媒介易发稿教程,在人民网投稿的指南与技巧
    Jetson Xavier NX 与飞控(Pixhawk 4 Mini)实现串口通信
    AI房产户型图识别3DRender
    从一个 issue 出发,带你玩图数据库 NebulaGraph 内核开发
    打印机 默认使用 首选项配置
    【Typescript】学习笔记(二)之函数与类的使用
    vue3的watch、computed写法及扩展( 对比vue2)
    docker 安装 RabbitMq
    机器学习——集成算法原理
  • 原文地址:https://www.cnblogs.com/ranxi169/p/16540166.html