- 🍨 本文为🔗365天深度学习训练营 中的学习记录博客
- 🍦 参考文章:365天深度学习训练营-第5周:运动鞋品牌识别(训练营内部成员可读)
- 🍖 原作者:K同学啊
本期深度学习,我们依旧是学习有关CNN的相关知识,比起以往,我们这一期博客将引入两个新的概念——学习率和早停。
其中很多步骤都和之前的项目类似,重点就在于后续的学习率和早停的设置。
好啦,我们开始今天的深度学习!
导入依赖项:
from tensorflow import keras
from tensorflow.keras import layers,models
import os, PIL, pathlib
import matplotlib.pyplot as plt
import tensorflow as tf
和之前一样,如果你GPU很好就只使用GPU进行训练,如果GPU不行就推荐使用CPU训练加GPU加速。
只使用GPU:
if gpus:
gpu0 = gpus[0] #如果有多个GPU,仅使用第0个GPU
tf.config.experimental.set_memory_growth(gpu0, True) #设置GPU显存用量按需使用
tf.config.set_visible_devices([gpu0],"GPU")
使用CPU+GPU:
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
data_dir = "E:\深度学习\data\Day14"
data_dir = pathlib.Path(data_dir)
查看数据集内有多少张图片:
image_count = len(list(data_dir.glob('*/*/*.jpg')))
print("图片总数为:",image_count)
运行的结果是:
图片总数为: 578
从数据集内返回一张图片查看一下:
roses = list(data_dir.glob('train/nike/*.jpg'))
PIL.Image.open(str(roses[0]))
我们使用image_dataset_from_directory方法将我们本地的数据加载到tf.data.Dataset
中,并设置训练图片模型参数:
batch_size = 32
img_height = 224
img_width = 224
在上一期深度学习博客中我们讲诉了训练集、验证集和测试集的含义以及三者的关系,如果不了解的话,大家可以移步去看看,或者上网自己查阅一下:
接下来加载数据:
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
"E:/深度学习/data/Day14/train/",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
"E:/深度学习/data/Day14/test/",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
Found 502 files belonging to 2 classes.
Found 76 files belonging to 2 classes.
这里也可以看到我们两个数据分别有多少文件。
然后我们再利用class_name输出我们本地数据集的标签,标签也就是对应数据所在的文件目录名:
['adidas', 'nike']
在可视化数据前,我们来检查一下我们的数据信息是否是正确的:
for image_batch, labels_batch in train_ds:
print(image_batch.shape)
print(labels_batch.shape)
break
(32, 224, 224, 3)
(32,)
可以看出这是一批180×180×3的32张图片,然后我们将数据进行可视化看看:
plt.figure(figsize=(20, 10))
for images, labels in train_ds.take(1):
for i in range(20):
ax = plt.subplot(5, 10, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
AUTOTUNE = tf.data.experimental.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
num_classes = 2
model = models.Sequential([
layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, (3, 3), activation='relu', input_shape=(img_height, img_width, 3)), # 卷积层1,卷积核3*3
layers.AveragePooling2D((2, 2)), # 池化层1,2*2采样
layers.Conv2D(32, (3, 3), activation='relu'), # 卷积层2,卷积核3*3
layers.AveragePooling2D((2, 2)), # 池化层2,2*2采样
layers.Dropout(0.3),
layers.Conv2D(64, (3, 3), activation='relu'), # 卷积层3,卷积核3*3
layers.Dropout(0.3),
layers.Flatten(), # Flatten层,连接卷积层与全连接层
layers.Dense(128, activation='relu'), # 全连接层,特征进一步提取
layers.Dense(num_classes) # 输出层,输出预期结果
])
model.summary() # 打印网络结构
打印出来的结果是:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling (Rescaling) (None, 224, 224, 3) 0
_________________________________________________________________
conv2d (Conv2D) (None, 222, 222, 16) 448
_________________________________________________________________
average_pooling2d (AveragePo (None, 111, 111, 16) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 109, 109, 32) 4640
_________________________________________________________________
average_pooling2d_1 (Average (None, 54, 54, 32) 0
_________________________________________________________________
dropout (Dropout) (None, 54, 54, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 52, 52, 64) 18496
_________________________________________________________________
dropout_1 (Dropout) (None, 52, 52, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 173056) 0
_________________________________________________________________
dense (Dense) (None, 128) 22151296
_________________________________________________________________
dense_1 (Dense) (None, 2) 258
=================================================================
Total params: 22,175,138
Trainable params: 22,175,138
Non-trainable params: 0
_________________________________________________________________
到这里,前面的步骤跟我们前面的几篇博客都是类似操作,在这里我就不再做过多说明。
前面的都类似,接下来我们将学习新的内容,在上一期博客中有一个拔高选项那就是设置动态学习率,本期我们就来了解学习一下什么是动态学习率以及如何设置动态学习率。
学习率是一个超参数,它控制每次更新模型权重时响应估计误差改变模型的程度。选择学习率具有挑战性,因为值太小可能会导致训练过程过长,可能会卡住,而值太大可能会导致学习一组次优的权重太快或训练过程不稳定。
学习率是配置神经网络时最重要的超参数。因此,了解如何研究学习率对模型性能的影响以及建立关于学习率对模型行为的动态的直觉至关重要。
这里我们给出一张表来对比一下学习率大小的影响:
学习率大 | 学习率小 | |
---|---|---|
学习速度 | 快 | 慢 |
使用时间点 | 刚开始训练时 | 一定轮数过后 |
优点 | 加快学习速率,有助于跳出局部最优值 | 有助于模型收敛、模型细化,提高模型精度 |
缺点 | 导致模型训练不收敛,单单使用大学习率容易导致模型不精确 | 很难跳出局部最优值,收敛缓慢 |
在训练过程中,一般根据训练轮数设置动态变化的学习率。
注意:这里设置的动态学习率为:指数衰减型(ExponentialDecay)。在每一个epoch开始前,学习率(learning_rate)都将会重置为初始学习率(initial_learning_rate),然后再重新开始衰减。计算公式如下:
learning_rate = initial_learning_rate * decay ^ (step / decay_steps)
网上有很多有关动态学习率的介绍,大家可以去看看。
# 设置初始学习率
initial_learning_rate = 0.1
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate,
decay_steps=10, # 敲黑板!!!这里是指 steps,不是指epochs
decay_rate=0.92, # lr经过一次衰减就会变成 decay_rate*lr
staircase=True)
# 将指数衰减学习率送入优化器
optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
model.compile(optimizer=optimizer,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
另外我们再了解学习一下有关早停的相关知识。
这里我们来了解一下其中的早停策略之一EarlyStopping
EarlyStopping是用于提前停止训练的callbacks,callbacks用于指定在每个epoch开始和结束的时候进行哪种特定操作。简而言之,就是可以达到当测试集上的loss不再减小(即减小的程度小于某个阈值)的时候停止继续训练。
EarlyStopping相关原理就是将数据分为训练集和测试集,每个epoch结束后(或每N个epoch后): 在测试集上获取测试结果,随着epoch的增加,如果在测试集上发现测试误差上升,则停止训练,将停止之后的权重作为网络的最终参数。
它的函数模型是:
tf.keras.callbacks.EarlyStopping(
monitor="acc",
min_delta=0,
patience=0,
verbose=0,
mode="max",
baseline=None,
restore_best_weights=False,
)
其中的参数说明:
参数 | 说明 |
---|---|
monitor | 监控的数据接口,有’acc’,’val_acc’,’loss’,’val_loss’等等。正常情况下如果有验证集,就用’val_acc’或者’val_loss’ |
min_delta | 就’auto’, ‘min’, ‘,max’三个可能。如果知道是要上升还是下降,建议设置一下。被监测数量的最小变化有资格作为改进,即小于 min_delta 的绝对变化,将被视为没有改进。 |
patience | 训练停止后没有改善的 epoch 数。 |
verbose | 详细模式,0 或 1。模式 0 是静默的,模式 1 在回调执行操作时显示消息。 |
mode | 其中之一{"auto", "min", "max"} 。模式下,当min 监测的数量停止减少时,训练将停止;在"max" 模式下,当监控的数量停止增加时,它会停止;在"auto" mode 下,方向会自动从监控量的名称中推断出来。 |
baseline | 监控数量的基线值。如果模型没有显示出对基线的改进,则训练将停止。patience的大小和learning rate直接相关。在learning rate设定的情况下,前期先训练几次观察抖动的epoch number,patience设置的值应当稍大于epoch number。在learning rate变化的情况下,建议要略小于最大的抖动epoch number。 |
restore_best_weights | 是否从监测量的最佳值的epoch恢复模型权重。如果为 False,则使用在训练的最后一步获得的模型权重。无论相对于baseline . 如果 epoch 没有改进baseline ,训练将运行patience epoch 并从该集合中的最佳 epoch 恢复权重。 |
现在我们来设置一下EarlyStopping()函数:
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
epochs = 50
# 保存最佳模型参数
checkpointer = ModelCheckpoint('best_model.h5',
monitor='val_accuracy',
verbose=1,
save_best_only=True,
save_weights_only=True)
# 设置早停
earlystopper = EarlyStopping(monitor='val_accuracy',
min_delta=0.001,
patience=20,
verbose=1)
history = model.fit(train_ds,
validation_data=val_ds,
epochs=epochs,
callbacks=[checkpointer, earlystopper])
训练的结果是:
Epoch 1/50
16/16 [==============================] - ETA: 0s - loss: 215608.0156 - accuracy: 0.4920
Epoch 00001: val_accuracy improved from -inf to 0.50000, saving model to best_model.h5
16/16 [==============================] - 7s 409ms/step - loss: 215608.0156 - accuracy: 0.4920 - val_loss: 0.7101 - val_accuracy: 0.5000
Epoch 2/50
16/16 [==============================] - ETA: 0s - loss: 0.7026 - accuracy: 0.4920
Epoch 00002: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 382ms/step - loss: 0.7026 - accuracy: 0.4920 - val_loss: 0.6959 - val_accuracy: 0.5000
Epoch 3/50
16/16 [==============================] - ETA: 0s - loss: 0.6976 - accuracy: 0.4781
Epoch 00003: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 380ms/step - loss: 0.6976 - accuracy: 0.4781 - val_loss: 0.6999 - val_accuracy: 0.5000
Epoch 4/50
16/16 [==============================] - ETA: 0s - loss: 0.6957 - accuracy: 0.5000
Epoch 00004: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 386ms/step - loss: 0.6957 - accuracy: 0.5000 - val_loss: 0.6931 - val_accuracy: 0.5000
Epoch 5/50
16/16 [==============================] - ETA: 0s - loss: 0.6962 - accuracy: 0.4801
Epoch 00005: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 382ms/step - loss: 0.6962 - accuracy: 0.4801 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 6/50
16/16 [==============================] - ETA: 0s - loss: 0.6942 - accuracy: 0.5000
Epoch 00006: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 389ms/step - loss: 0.6942 - accuracy: 0.5000 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 7/50
16/16 [==============================] - ETA: 0s - loss: 0.6962 - accuracy: 0.5000
Epoch 00007: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 384ms/step - loss: 0.6962 - accuracy: 0.5000 - val_loss: 0.6936 - val_accuracy: 0.5000
Epoch 8/50
16/16 [==============================] - ETA: 0s - loss: 0.6943 - accuracy: 0.5120
Epoch 00008: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 379ms/step - loss: 0.6943 - accuracy: 0.5120 - val_loss: 0.6943 - val_accuracy: 0.5000
Epoch 9/50
16/16 [==============================] - ETA: 0s - loss: 0.6947 - accuracy: 0.4681
Epoch 00009: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 385ms/step - loss: 0.6947 - accuracy: 0.4681 - val_loss: 0.6933 - val_accuracy: 0.5000
Epoch 10/50
16/16 [==============================] - ETA: 0s - loss: 0.6940 - accuracy: 0.4641
Epoch 00010: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 381ms/step - loss: 0.6940 - accuracy: 0.4641 - val_loss: 0.6931 - val_accuracy: 0.5000
Epoch 11/50
16/16 [==============================] - ETA: 0s - loss: 0.6936 - accuracy: 0.4721
Epoch 00011: val_accuracy did not improve from 0.50000
16/16 [==============================] - 7s 422ms/step - loss: 0.6936 - accuracy: 0.4721 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 12/50
16/16 [==============================] - ETA: 0s - loss: 0.6934 - accuracy: 0.4801
Epoch 00012: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 403ms/step - loss: 0.6934 - accuracy: 0.4801 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 13/50
16/16 [==============================] - ETA: 0s - loss: 0.6938 - accuracy: 0.4880
Epoch 00013: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 396ms/step - loss: 0.6938 - accuracy: 0.4880 - val_loss: 0.6933 - val_accuracy: 0.5000
Epoch 14/50
16/16 [==============================] - ETA: 0s - loss: 0.6922 - accuracy: 0.5000
Epoch 00014: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 381ms/step - loss: 0.6922 - accuracy: 0.5000 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 15/50
16/16 [==============================] - ETA: 0s - loss: 0.6936 - accuracy: 0.4920
Epoch 00015: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 405ms/step - loss: 0.6936 - accuracy: 0.4920 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 16/50
16/16 [==============================] - ETA: 0s - loss: 0.7512 - accuracy: 0.5000
Epoch 00016: val_accuracy did not improve from 0.50000
16/16 [==============================] - 7s 420ms/step - loss: 0.7512 - accuracy: 0.5000 - val_loss: 0.6933 - val_accuracy: 0.5000
Epoch 17/50
16/16 [==============================] - ETA: 0s - loss: 0.6930 - accuracy: 0.5040
Epoch 00017: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 402ms/step - loss: 0.6930 - accuracy: 0.5040 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 18/50
16/16 [==============================] - ETA: 0s - loss: 0.6933 - accuracy: 0.5000
Epoch 00018: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 373ms/step - loss: 0.6933 - accuracy: 0.5000 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 19/50
16/16 [==============================] - ETA: 0s - loss: 0.6936 - accuracy: 0.4482
Epoch 00019: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 380ms/step - loss: 0.6936 - accuracy: 0.4482 - val_loss: 0.6931 - val_accuracy: 0.5000
Epoch 20/50
16/16 [==============================] - ETA: 0s - loss: 0.6935 - accuracy: 0.4880
Epoch 00020: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 369ms/step - loss: 0.6935 - accuracy: 0.4880 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 21/50
16/16 [==============================] - ETA: 0s - loss: 0.6933 - accuracy: 0.5000
Epoch 00021: val_accuracy did not improve from 0.50000
16/16 [==============================] - 6s 382ms/step - loss: 0.6933 - accuracy: 0.5000 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 00021: early stopping
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(len(loss))
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
这里我们把epochs调到100再试试:
# 加载效果最好的模型权重
model.load_weights('best_model.h5')
from PIL import Image
import numpy as np
img = Image.open("E:/深度学习/data/Day14/test/adidas\9.jpg") #这里选择你需要预测的图片
image = tf.image.resize(img, [img_height, img_width])
img_array = tf.expand_dims(image, 0) #/255.0 # 记得做归一化处理(与训练集处理方式保持一致)
predictions = model.predict(img_array) # 这里选用你已经训练好的模型
print("预测结果为:",class_names[np.argmax(predictions)])
预测的结果是:
预测结果为:adidas
val_accuracy: 0.6842
可以发现准确率提高了,K导设置的学习率偏高,应该设置低一点的学习率。
val_accuracy: 0.7368
可以发现准确率提高了一点。
val_accuracy: 0.8421
可以发现这次准确率比前几次提高了不少。
通过后续的参数调整,我们不难发现影响本次模型准确率的关键因素就在于学习率、衰变率和衰变步数的设置,加上后续引入的早停,可以达到当测试集上的loss不再减小的时候停止继续训练,大大节省了模型训练的时间。
学习率和早停的设置在今后的深度学习中会经常遇见,我们需要去了解并掌握相关使用方法,学会调整参数并优化自己的模型非常重要。