• 深度学习实战(2):肺炎预测|附数据集与源码


    目录

    写在前面

    项目任务简介

    实验结果

    导入需要的第三方库 

    定义预处理函数 

    加载数据集路径 

    加载训练集的图像和标签 

    部分图片展示 

    加载测试图片

    Pickle存储已处理好的图片

    标签预处理 

    将X转化为(None, 200,200,1)的格式

    数据增强 

    构建CNN神经网络 

    展示CNN模型的详细信息 

    Model.complie

    训练过程展示 

    结果展示

    结果分析 

    模型评估 


     

    写在前面

    * 本篇文章仅作为深度学习学习用途 而非商用

    * 运行系统 :MacOS / Windows

    * Python版本 :Python3.0

    * 运行平台:Visual Studio Code

    🤯

    6d2ca9d86e284106a171781eb3ec2ec0.jpeg

     

    项目任务简介

    在已有的数据集下,训练一个CNN模型,预测一张CT图的肺部照片是否患有肺炎,若是患有肺炎,是因为细菌还是病毒引起的。数据集共有三个子文件夹:train / test / val 他们的作用相信各位都已经很清楚了 字面意思 这里不再多加赘述。

    实验结果

    接近80%的Accuracy,验证集和训练集的拟合程度较好(并未达到最好,赠机epoch或者更换学习函数可能可以更好的训练模型,这里只迭代30次(呜呜CPU的痛 各位如果有GPU可以适当增肌迭代次数) 

    导入需要的第三方库 

    在执行深度学习项目时,要做的第一件事就是加载所有必需的模块和图像数据本身。

    在这里qdm模块的作用是来显示进度条。

    1. import os
    2. import cv2
    3. import pickle
    4. import numpy as np
    5. import matplotlib.pyplot as plt
    6. import seaborn as sns
    7. from tqdm import tqdm
    8. from sklearn.preprocessing import OneHotEncoder
    9. from sklearn.metrics import confusion_matrix
    10. from keras.models import Model, load_model
    11. from keras.layers import Dense, Input, Conv2D, MaxPool2D, Flatten
    12. from keras.preprocessing.image import ImageDataGenerator
    13. np.random.seed(22)

    定义预处理函数 

    以下两个函数来从每个文件夹加载图像数据。

     

    首先,所有图像的大小都将调整为200 x 200像素。这很重要,因为所有文件夹中的图像都具有不同的维度,而神经网络只能接受具有固定阵列大小的数据。接下来,默认情况所有图像都存储在3个彩色通道中,但本次试验我们所用的数据集的图片均为灰度图,因此,这里需要将所有这些彩色图像转换为灰度。(  很重要的点!)

     

    其中的这句话:

    if image == '.DS_Store' : continue

    如果您是Mac电脑,需要加上,Windows电脑的删去即可,当然不删也不会有任何的影响。

    1. def load_normal(norm_path):
    2. norm_files = np.array(os.listdir(norm_path))
    3. norm_labels = np.array(['normal']*len(norm_files))
    4. norm_images = []
    5. for image in tqdm(norm_files):
    6. # Read image
    7. if image == '.DS_Store' : continue
    8. image = cv2.imread(norm_path + image)
    9. # Resize image to 200x200 px
    10. image = cv2.resize(image, dsize=(200,200))
    11. # Convert to grayscale
    12. image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    13. norm_images.append(image)
    14. norm_images = np.array(norm_images)
    15. return norm_images, norm_labels
    16. def load_pneumonia(pneu_path):
    17. pneu_files = np.array(os.listdir(pneu_path))
    18. pneu_labels = np.array([pneu_file.split('_')[1] for pneu_file in pneu_files])
    19. pneu_images = []
    20. for image in tqdm(pneu_files):
    21. # Read image
    22. if image == '.DS_Store' : continue
    23. image = cv2.imread(pneu_path + image)
    24. # Resize image to 200x200 px
    25. image = cv2.resize(image, dsize=(200,200))
    26. # Convert to grayscale
    27. image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    28. pneu_images.append(image)
    29. pneu_images = np.array(pneu_images)
    30. return pneu_images, pneu_labels

    加载数据集路径 

    1. norm_images, norm_labels = load_normal('/Users/liqun/Desktop/KS/MyPython/DataSet/chest_xray/train/NORMAL/')
    2. pneu_images, pneu_labels = load_pneumonia('/Users/liqun/Desktop/KS/MyPython/DataSet/chest_xray/train/PNEUMONIA/')

    加载训练集的图像和标签 

    1. X_train = np.append(norm_images, pneu_images, axis=0)
    2. y_train = np.append(norm_labels, pneu_labels)

    94342498736e40508ad964eb9ad2c6e0.png 

    在Jupyter Notebook中可以输入这两个语句查看是否成功导入了数据

     65245f20ef4344c99b7ba71c305e7512.png

    同理,可以查看每一个类别分别有多少张图片

    部分图片展示 

    这个展示并不是必须的,只是这里可以用于检查图片的颜色通道、尺寸等是否已经转化为期待的样子了。

    1. fig, axes = plt.subplots(ncols=7, nrows=2, figsize=(16, 4))
    2. indices = np.random.choice(len(X_train), 14)
    3. counter = 0
    4. for i in range(2):
    5. for j in range(7):
    6. axes[i,j].set_title(y_train[indices[counter]])
    7. axes[i,j].imshow(X_train[indices[counter]], cmap='gray')
    8. axes[i,j].get_xaxis().set_visible(False)
    9. axes[i,j].get_yaxis().set_visible(False)
    10. counter += 1
    11. plt.show()

    加载测试图片

    和上方加载训练集的图片过程类似,函数也是相同的,测试集共有624张图片

    1. print('Loading test images')
    2. # Do the exact same thing as what we have done on train data
    3. norm_images_test, norm_labels_test = load_normal('/Users/liqun/Desktop/KS/MyPython/DataSet/chest_xray/test/NORMAL/')
    4. pneu_images_test, pneu_labels_test = load_pneumonia('/Users/liqun/Desktop/KS/MyPython/DataSet/chest_xray/test/PNEUMONIA/')
    5. X_test = np.append(norm_images_test, pneu_images_test, axis=0)
    6. y_test = np.append(norm_labels_test, pneu_labels_test)

    Pickle存储已处理好的图片

    由于这里处理图片格式就花费大量的时间,所以可以使用pickle库,它可以存储这些转化好的图片,下次再使用的时候就不需要重新重复上面的操作了

    1. with open('pneumonia_data.pickle', 'wb') as f:
    2. pickle.dump((X_train, X_test, y_train, y_test), f)
    3. with open('pneumonia_data.pickle', 'rb') as f:
    4. (X_train, X_test, y_train, y_test) = pickle.load(f)

    标签预处理 

    此时,两个Y变量都由以字符串数据类型编写的正常变量、细菌或病毒组成。但是,神经网络无法接受这样的标签。因此,我们需要将其转换为单热格式。

    1. y_train = y_train[:, np.newaxis]
    2. y_test = y_test[:, np.newaxis]
    3. one_hot_encoder = OneHotEncoder(sparse=False)
    4. y_train_one_hot = one_hot_encoder.fit_transform(y_train)
    5. y_test_one_hot = one_hot_encoder.transform(y_test)

    将X转化为(None, 200,200,1)的格式

    1. X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], X_train.shape[2], 1)
    2. X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], X_test.shape[2], 1)

    数据增强 

    增加数据指的是,我们将通过在每个样本上创建更多具有某种随机性的样本来增加用于训练的数据数量。这些随机性可能包括翻译、旋转、缩放、剪切和翻转。这种技术能够帮助我们的神经网络分类器减少过度拟合。

    1. datagen = ImageDataGenerator(
    2. rotation_range = 10,
    3. zoom_range = 0.1,
    4. width_shift_range = 0.1,
    5. height_shift_range = 0.1)
    6. datagen.fit(X_train)
    7. train_gen = datagen.flow(X_train, y_train_one_hot, batch_size=32)

    如有需要,可以了解一下ImageDataGenerator函数的参数构成:

    ImageDataGenerator详解

    构建CNN神经网络 

    我们需要确保第一层接受与图像大小完全相同的形状。值得注意的是,我们需要定义的只是(宽度、高度、通道),而不是(样本、宽度、高度、通道)。

     

    之后,该输入层连接到几个卷积池层,最终被扁平并连接到Dense层。模型中的所有隐藏层都在使用Relu激活函数,RelU的计算速度快,因此所需的训练时间短。最后,连接的最后一层是输出,它由3个具有softmax激活功能的神经元组成。这里使用softmax是因为我们希望输出是每个类的概率值。 

    1. input1 = Input(shape=(X_train.shape[1], X_train.shape[2], 1))
    2. cnn = Conv2D(16, (3, 3), activation='relu', strides=(1, 1),
    3. padding='same')(input1)
    4. cnn = Conv2D(32, (3, 3), activation='relu', strides=(1, 1),
    5. padding='same')(cnn)
    6. cnn = MaxPool2D((2, 2))(cnn)
    7. cnn = Conv2D(16, (2, 2), activation='relu', strides=(1, 1),
    8. padding='same')(cnn)
    9. cnn = Conv2D(32, (2, 2), activation='relu', strides=(1, 1),
    10. padding='same')(cnn)
    11. cnn = MaxPool2D((2, 2))(cnn)
    12. cnn = Flatten()(cnn)
    13. cnn = Dense(100, activation='relu')(cnn)
    14. cnn = Dense(50, activation='relu')(cnn)
    15. output1 = Dense(3, activation='softmax')(cnn)
    16. model = Model(inputs=input1, outputs=output1)

    展示CNN模型的详细信息 

    7644105fd7164ed4a2c99cb4725a7458.png 

    Model.complie

    在构建模型后,现在我们使用categorical_crossentropy损失函数和Adam优化器编译神经网络。使用这个点损失函数的原因是,它是多类分类任务中常用的函数。而Adam作为优化器,是大多数神经网络任务中最小化损失值的最佳优化器。 

    1. model.compile(loss='categorical_crossentropy',
    2. optimizer='adam', metrics=['acc'])

    训练过程展示 

    e907d9b5a9b04d1a9a9540a65f65e735.png

    一共迭代了30次,这里仅展示了最后四次的迭代结果 

    结果展示

    1. plt.figure(figsize=(8,6))
    2. plt.title('Accuracy scores')
    3. plt.plot(history.history['acc'])
    4. plt.plot(history.history['val_acc'])
    5. plt.legend(['acc', 'val_acc'])
    6. plt.show()
    7. plt.figure(figsize=(8,6))
    8. plt.title('Loss value')
    9. plt.plot(history.history['loss'])
    10. plt.plot(history.history['val_loss'])
    11. plt.legend(['loss', 'val_loss'])
    12. plt.show()

    9c1b91c2246843c7a8f06e969094c1da.png 

    e8b51c7c481e4fe4859e48c613313b59.png 

    结果分析 

    由于我们前面将数据增强了,所以我们发现模型训练结果没有出现过拟合的状况,尽管accuracy的波动仍然存在,但整体的趋势是较为正常且能接受的。

    模型评估 

    1. predictions = model.predict(X_test)
    2. predictions = one_hot_encoder.inverse_transform(predictions)
    3. cm = confusion_matrix(y_test, predictions)
    4. classnames = ['bacteria', 'normal', 'virus']
    5. plt.figure(figsize=(8,8))
    6. plt.title('Confusion matrix')
    7. sns.heatmap(cm, cbar=False, xticklabels=classnames, yticklabels=classnames, fmt='d', annot=True, cmap=plt.cm.Blues)
    8. plt.xlabel('Predicted')
    9. plt.ylabel('Actual')
    10. plt.show()

    c662478d875c4d8f8e208d589b85e405.png 

    根据上面的混淆矩阵,我们可以看到22张病毒X射线图像被预测为细菌。这可能是因为这两种肺炎类型很难区分。但是,至少我们的模型能够很好地预测细菌引起的肺炎,因为242个样本中有233个被正确分类。 

     🫥 以上是本次深度学习实战的全部解析,以下是完整代码,需要请自取:

    1. import os
    2. import cv2
    3. import pickle # Used to save variables
    4. import numpy as np
    5. import matplotlib.pyplot as plt
    6. import seaborn as sns
    7. from tqdm import tqdm # Used to display progress bar
    8. from sklearn.preprocessing import OneHotEncoder
    9. from sklearn.metrics import confusion_matrix
    10. from keras.models import Model, load_model
    11. from keras.layers import Dense, Input, Conv2D, MaxPool2D, Flatten
    12. from keras.preprocessing.image import ImageDataGenerator # Used to generate images
    13. np.random.seed(22)
    14. # Do not forget to include the last slash
    15. def load_normal(norm_path):
    16. norm_files = np.array(os.listdir(norm_path))
    17. norm_labels = np.array(['normal']*len(norm_files))
    18. norm_images = []
    19. for image in tqdm(norm_files):
    20. # Read image
    21. if image == '.DS_Store' : continue
    22. image = cv2.imread(norm_path + image)
    23. # Resize image to 200x200 px
    24. image = cv2.resize(image, dsize=(200,200))
    25. # Convert to grayscale
    26. image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    27. norm_images.append(image)
    28. norm_images = np.array(norm_images)
    29. return norm_images, norm_labels
    30. def load_pneumonia(pneu_path):
    31. pneu_files = np.array(os.listdir(pneu_path))
    32. pneu_labels = np.array([pneu_file.split('_')[1] for pneu_file in pneu_files])
    33. pneu_images = []
    34. for image in tqdm(pneu_files):
    35. # Read image
    36. if image == '.DS_Store' : continue
    37. image = cv2.imread(pneu_path + image)
    38. # Resize image to 200x200 px
    39. image = cv2.resize(image, dsize=(200,200))
    40. # Convert to grayscale
    41. image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    42. pneu_images.append(image)
    43. pneu_images = np.array(pneu_images)
    44. return pneu_images, pneu_labels
    45. print('Loading images')
    46. # All images are stored in _images, all labels are in _labels
    47. norm_images, norm_labels = load_normal('/Users/liqun/Desktop/KS/MyPython/DataSet/chest_xray/train/NORMAL/')
    48. pneu_images, pneu_labels = load_pneumonia('/Users/liqun/Desktop/KS/MyPython/DataSet/chest_xray/train/PNEUMONIA/')
    49. # Put all train images to X_train
    50. X_train = np.append(norm_images, pneu_images, axis=0)
    51. # Put all train labels to y_train
    52. y_train = np.append(norm_labels, pneu_labels)
    53. print(X_train.shape)
    54. print(y_train.shape)
    55. # Finding out the number of samples of each class
    56. print(np.unique(y_train, return_counts=True))
    57. # print('Display several images')
    58. fig, axes = plt.subplots(ncols=7, nrows=2, figsize=(16, 4))
    59. indices = np.random.choice(len(X_train), 14)
    60. counter = 0
    61. for i in range(2):
    62. for j in range(7):
    63. axes[i,j].set_title(y_train[indices[counter]])
    64. axes[i,j].imshow(X_train[indices[counter]], cmap='gray')
    65. axes[i,j].get_xaxis().set_visible(False)
    66. axes[i,j].get_yaxis().set_visible(False)
    67. counter += 1
    68. # plt.show()
    69. print('Loading test images')
    70. # Do the exact same thing as what we have done on train data
    71. norm_images_test, norm_labels_test = load_normal('/Users/liqun/Desktop/KS/MyPython/DataSet/chest_xray/test/NORMAL/')
    72. pneu_images_test, pneu_labels_test = load_pneumonia('/Users/liqun/Desktop/KS/MyPython/DataSet/chest_xray/test/PNEUMONIA/')
    73. X_test = np.append(norm_images_test, pneu_images_test, axis=0)
    74. y_test = np.append(norm_labels_test, pneu_labels_test)
    75. # Save the loaded images to pickle file for future use
    76. with open('pneumonia_data.pickle', 'wb') as f:
    77. pickle.dump((X_train, X_test, y_train, y_test), f)
    78. # Here's how to load it
    79. with open('pneumonia_data.pickle', 'rb') as f:
    80. (X_train, X_test, y_train, y_test) = pickle.load(f)
    81. print('Label preprocessing')
    82. # Create new axis on all y data
    83. y_train = y_train[:, np.newaxis]
    84. y_test = y_test[:, np.newaxis]
    85. # Initialize OneHotEncoder object
    86. one_hot_encoder = OneHotEncoder(sparse=False)
    87. # Convert all labels to one-hot
    88. y_train_one_hot = one_hot_encoder.fit_transform(y_train)
    89. y_test_one_hot = one_hot_encoder.transform(y_test)
    90. print('Reshaping X data')
    91. # Reshape the data into (no of samples, height, width, 1), where 1 represents a single color channel
    92. X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], X_train.shape[2], 1)
    93. X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], X_test.shape[2], 1)
    94. print('Data augmentation')
    95. # Generate new images with some randomness
    96. datagen = ImageDataGenerator(
    97. rotation_range = 10,
    98. zoom_range = 0.1,
    99. width_shift_range = 0.1,
    100. height_shift_range = 0.1)
    101. datagen.fit(X_train)
    102. train_gen = datagen.flow(X_train, y_train_one_hot, batch_size = 32)
    103. print('CNN')
    104. # Define the input shape of the neural network
    105. input_shape = (X_train.shape[1], X_train.shape[2], 1)
    106. print(input_shape)
    107. input1 = Input(shape=input_shape)
    108. cnn = Conv2D(16, (3, 3), activation='relu', strides=(1, 1),
    109. padding='same')(input1)
    110. cnn = Conv2D(32, (3, 3), activation='relu', strides=(1, 1),
    111. padding='same')(cnn)
    112. cnn = MaxPool2D((2, 2))(cnn)
    113. cnn = Conv2D(16, (2, 2), activation='relu', strides=(1, 1),
    114. padding='same')(cnn)
    115. cnn = Conv2D(32, (2, 2), activation='relu', strides=(1, 1),
    116. padding='same')(cnn)
    117. cnn = MaxPool2D((2, 2))(cnn)
    118. cnn = Flatten()(cnn)
    119. cnn = Dense(100, activation='relu')(cnn)
    120. cnn = Dense(50, activation='relu')(cnn)
    121. output1 = Dense(3, activation='softmax')(cnn)
    122. model = Model(inputs=input1, outputs=output1)
    123. model.compile(loss='categorical_crossentropy',
    124. optimizer='adam', metrics=['acc'])
    125. # Using fit_generator() instead of fit() because we are going to use data
    126. # taken from the generator. Note that the randomness is changing
    127. # on each epoch
    128. history = model.fit_generator(train_gen, epochs=30,
    129. validation_data=(X_test, y_test_one_hot))
    130. # Saving model
    131. model.save('pneumonia_cnn.h5')
    132. print('Displaying accuracy')
    133. plt.figure(figsize=(8,6))
    134. plt.title('Accuracy scores')
    135. plt.plot(history.history['acc'])
    136. plt.plot(history.history['val_acc'])
    137. plt.legend(['acc', 'val_acc'])
    138. plt.show()
    139. print('Displaying loss')
    140. plt.figure(figsize=(8,6))
    141. plt.title('Loss value')
    142. plt.plot(history.history['loss'])
    143. plt.plot(history.history['val_loss'])
    144. plt.legend(['loss', 'val_loss'])
    145. plt.show()
    146. # Predicting test data
    147. predictions = model.predict(X_test)
    148. print(predictions)
    149. predictions = one_hot_encoder.inverse_transform(predictions)
    150. print('Model evaluation')
    151. print(one_hot_encoder.categories_)
    152. classnames = ['bacteria', 'normal', 'virus']
    153. # Display confusion matrix
    154. cm = confusion_matrix(y_test, predictions)
    155. plt.figure(figsize=(8,8))
    156. plt.title('Confusion matrix')
    157. sns.heatmap(cm, cbar=False, xticklabels=classnames, yticklabels=classnames, fmt='d', annot=True, cmap=plt.cm.Blues)
    158. plt.xlabel('Predicted')
    159. plt.ylabel('Actual')
    160. plt.show()

    🫥 本次实验所用到的数据集也为大家整理好了,从这里下载即可: 

    链接: https://pan.baidu.com/s/1h4Ve-YiXw0FyJDXCFlU1eA?pwd=qak4 提取码: qak4

    ———————————————————————————————————————————

    码字不易

    如果我的文章对你有帮助的话 麻烦留下赞 😋 

    感谢浏览!

     

     

     

     

  • 相关阅读:
    32位单片机GPIO端口电路结构以及输出模式
    HTML5期末大作业:游戏网站设计与实现——基于bootstrap响应式游戏资讯网站制作HTML+CSS+JavaScript
    第十二届蓝桥杯《杨辉三角》-二分法
    Python搭建http下载服务器
    【书籍篇】Git 学习指南(二)提交与多次提交
    GBase8a SSL 配置
    逃避型人格分析,如何改变逃避型性格?
    【sim-storage-client】SpringBoot集成Minio与本地存储
    nmcli 命令行设置 ipv4 ipv6 ip 网关等
    「互动有礼,感谢有你」参与互动就有机会获赠 Navicat Premium 16
  • 原文地址:https://blog.csdn.net/m0_54689021/article/details/126495422