• 【人工智能】神经网络优化:复杂度学习率、激活函数、损失函数、缓解过拟合、优化器...


    预备知识

    1. import tensorflow as tf
    2. import numpy as np
    3. import matplotlib.pyplot as plt
    4. import random
    5. import pandas as pd
    6. plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
    7. plt.rcParams['axes.unicode_minus']=False # 正常显示负号

    tf.where()

    tf.where(条件语句,真返回值,假返回值)

    1. a = tf.constant([1,2,3,1,1])
    2. b = tf.constant([0,1,3,4,5])
    3. c = tf.where(tf.greater(a,b),a,b) # 若a > b ,则返回a对应位置元素,否则返回b对应位置元素
    4. print(c)
    tf.Tensor([1 2 3 4 5], shape=(5,), dtype=int32)

    np.random.RandomState.rand()

    返回一个[0,1)之间的随机数
    np.random.RandomState.rand(维度)

    若参数 维度 为空,则返回一个0~1之间的标量

    1. rdm = np.random.RandomState(seed = 1) # seed=常数 每次产生的随机数相同
    2. a = rdm.rand() # 返回一个随机标量
    3. b = rdm.rand(2,3) # 返回维度为二行三列随机数矩阵(第一个维度两个元素,第二个维度三个元素)
    4. print("a:",a)
    5. print("b:",b)
    1. a: 0.417022004702574
    2. b: [[7.20324493e-01 1.14374817e-04 3.02332573e-01]
    3. [1.46755891e-01 9.23385948e-02 1.86260211e-01]]

    np.vstack()

    将两个数组按垂直方向叠加

    np.vstack(数组1,数组2)

    1. a = np.array([1,2,3])
    2. b = np.array([3,3,3])
    3. c = np.vstack((a,b))
    4. print(c)
    1. [[1 2 3]
    2. [3 3 3]]

    np.mgrid[] .ravel() np.c_[]

    三个一起使用,生成网格坐标点

    np.mgrid[]

    np.mgrid[起始值:结束值:步长,起始值:结束值:步长,……]

    x.ravel()

    将x变为一维数组

    np.c_[]

    np.c_[数组1,数组2,……]
    使返回的间隔数值点配对

    1. # 生成等间隔数值点
    2. x, y = np.mgrid[1:3:1, 2:4:0.5]
    3. # 将x, y拉直,并合并配对为二维张量,生成二维坐标点
    4. grid = np.c_[x.ravel(), y.ravel()]
    5. print("x:\n", x)
    6. print("y:\n", y)
    7. print("x.ravel():\n", x.ravel())
    8. print("y.ravel():\n", y.ravel())
    9. print('grid:\n', grid)
    1. x:
    2. [[1. 1. 1. 1.]
    3. [2. 2. 2. 2.]]
    4. y:
    5. [[2. 2.5 3. 3.5]
    6. [2. 2.5 3. 3.5]]
    7. x.ravel():
    8. [1. 1. 1. 1. 2. 2. 2. 2.]
    9. y.ravel():
    10. [2. 2.5 3. 3.5 2. 2.5 3. 3.5]
    11. grid:
    12. [[1. 2. ]
    13. [1. 2.5]
    14. [1. 3. ]
    15. [1. 3.5]
    16. [2. 2. ]
    17. [2. 2.5]
    18. [2. 3. ]
    19. [2. 3.5]]

    神经网络(NN)复杂度

    NN复杂度:多用NN层数和NN参数的个数表示

    • 空间复杂度
      • 层数 = 隐藏层的层数 + 1个输出层
      • 总参数 = 总w + 总b
    • 时间复杂度
      • 乘加运算次数

    imags

    上图 第一层参数个数是 三行四列w+4个偏置b;第二层参数个数是四行二列w+2个偏置b。总共26个参数

    学习率

    imags

    学习率lr = 0.001时,参数w更新过慢;lr = 0.999时,,参数w不收敛。那么学习率设置多少合适呢?

    在实际应用中,可以先用较大的学习率,快速找到较优值,然后逐步减小学习率,使模型找到最优解使模型在训练后期稳定。——指数衰减学习率

    指数衰减学习率

    指数衰减学习率 = 初始学习率 * 学习率衰减率 (当前轮数/多少轮衰减一次)

    备注: (当前轮数/多少轮衰减一次) 是上标

    1. w = tf.Variable(tf.constant(5, dtype=tf.float32))
    2. # lr = 0.2
    3. epoch = 40
    4. LR_BASE = 0.2
    5. LR_DECAY = 0.99
    6. LR_STEP = 1 # 决定更新频率
    7. epoch_all=[]
    8. lr_all = []
    9. w_numpy_all=[]
    10. loss_all=[]
    11. for epoch in range(epoch): # for epoch 定义顶层循环,表示对数据集循环epoch次,此例数据集数据仅有1个w,初始化时候constant赋值为5,循环40次迭代。
    12. lr = LR_BASE * LR_DECAY **(epoch/LR_STEP) # 根据当前迭代次数,动态改变学习率的值
    13. lr_all.append(lr)
    14. with tf.GradientTape() as tape: # with结构到grads框起了梯度的计算过程。
    15. loss = tf.square(w + 1)
    16. grads = tape.gradient(loss, w) # .gradient函数告知谁对谁求导
    17. w.assign_sub(lr * grads) # .assign_sub 对变量做自减 即:w -= lr*grads 即 w = w - lr*grads
    18. print("After %s epoch,w is %f,loss is %f" % (epoch, w.numpy(), loss))
    19. epoch_all.append(epoch)
    20. w_numpy_all.append(w.numpy())
    21. loss_all.append(loss)
    22. fig,axes = plt.subplots(nrows=1,ncols=3,figsize=(10,5),dpi=300)
    23. axes[0].plot(epoch_all,lr_all,color="g",linestyle="-",label="学习率") # 绘画
    24. axes[0].plot(epoch_all,w_numpy_all,color="k",linestyle="-.",label="参数") # 绘画
    25. axes[0].plot(epoch_all,loss_all,color="b",linestyle="--",label="损失率") # 绘画
    26. axes[0].set_title("指数衰减学习率")
    27. axes[0].set_xlabel("epoch")
    28. axes[0].set_label("data")
    29. axes[0].legend(loc="upper right")# 显示图例必须在绘制时设置好
    30. axes[1].plot(epoch_all,lr_all,color="g",linestyle="-",label="学习率") # 绘画
    31. axes[1].plot(epoch_all,w_numpy_all,color="k",linestyle="-.",label="参数") # 绘画
    32. axes[1].set_title("指数衰减学习率")
    33. axes[1].set_xlabel("epoch")
    34. axes[1].set_ylabel("data")
    35. axes[1].legend(loc="upper right")# 显示图例必须在绘制时设置好
    36. axes[2].plot(epoch_all,lr_all,color="g",linestyle="-",label="学习率") # 绘画
    37. axes[2].set_title("指数衰减学习率")
    38. axes[2].set_xlabel("epoch")
    39. axes[2].set_ylabel("data")
    40. axes[2].legend(loc="upper right")# 显示图例必须在绘制时设置好
    41. plt.show()
    42. # lr初始值:0.2 请自改学习率 0.001 0.999 看收敛过程
    43. # 最终目的:找到 loss 最小 即 w = -1 的最优参数w
    1. After 0 epoch,w is 2.600000,loss is 36.000000
    2. After 1 epoch,w is 1.174400,loss is 12.959999
    3. After 2 epoch,w is 0.321948,loss is 4.728015
    4. After 3 epoch,w is -0.191126,loss is 1.747547
    5. After 4 epoch,w is -0.501926,loss is 0.654277
    6. After 5 epoch,w is -0.691392,loss is 0.248077
    7. After 6 epoch,w is -0.807611,loss is 0.095239
    8. After 7 epoch,w is -0.879339,loss is 0.037014
    9. After 8 epoch,w is -0.923874,loss is 0.014559
    10. After 9 epoch,w is -0.951691,loss is 0.005795
    11. After 10 epoch,w is -0.969167,loss is 0.002334
    12. After 11 epoch,w is -0.980209,loss is 0.000951
    13. After 12 epoch,w is -0.987226,loss is 0.000392
    14. After 13 epoch,w is -0.991710,loss is 0.000163
    15. After 14 epoch,w is -0.994591,loss is 0.000069
    16. After 15 epoch,w is -0.996452,loss is 0.000029
    17. After 16 epoch,w is -0.997660,loss is 0.000013
    18. After 17 epoch,w is -0.998449,loss is 0.000005
    19. After 18 epoch,w is -0.998967,loss is 0.000002
    20. After 19 epoch,w is -0.999308,loss is 0.000001
    21. After 20 epoch,w is -0.999535,loss is 0.000000
    22. After 21 epoch,w is -0.999685,loss is 0.000000
    23. After 22 epoch,w is -0.999786,loss is 0.000000
    24. After 23 epoch,w is -0.999854,loss is 0.000000
    25. After 24 epoch,w is -0.999900,loss is 0.000000
    26. After 25 epoch,w is -0.999931,loss is 0.000000
    27. After 26 epoch,w is -0.999952,loss is 0.000000
    28. After 27 epoch,w is -0.999967,loss is 0.000000
    29. After 28 epoch,w is -0.999977,loss is 0.000000
    30. After 29 epoch,w is -0.999984,loss is 0.000000
    31. After 30 epoch,w is -0.999989,loss is 0.000000
    32. After 31 epoch,w is -0.999992,loss is 0.000000
    33. After 32 epoch,w is -0.999994,loss is 0.000000
    34. After 33 epoch,w is -0.999996,loss is 0.000000
    35. After 34 epoch,w is -0.999997,loss is 0.000000
    36. After 35 epoch,w is -0.999998,loss is 0.000000
    37. After 36 epoch,w is -0.999999,loss is 0.000000
    38. After 37 epoch,w is -0.999999,loss is 0.000000
    39. After 38 epoch,w is -0.999999,loss is 0.000000
    40. After 39 epoch,w is -0.999999,loss is 0.000000

    png

    激活函数

    imags

    • 线性函数
      • y = x * w + b
    • 非线性函数(MP模型)
      • y = f(x * w + b )

      其中f便是激活函数

    优秀的激活函数具有以下特点

    • 非线性:激活函数非线性时,多层神经网络可逼近所有函数
    • 可微性:优化器大多用梯度下降更新参数(如果激活函数不可微,就无法更新参数了)
    • 单调性:当激活函数是单调的,能保证单层网络的损失函数是凸函数
    • 近似恒等性:f(x)≈x当参数初始化为随机小值时,神经网络更稳定

    激活函数输出值的范围

    • 激活函数输出为有限值时,基于梯度的优化方法更稳定
    • 激活函数输出为无限值时,建议调小学习率

    常用激活函数 Sigmoid函数

    tf.nn.sigmoid(x)

    imags

    特点:

    (1)已造成梯度消失

    (2)输出非0均值,收敛慢

    (3)幂运算复杂,训练时间长

    常用激活函数 Tanh函数

    imags

    常用激活函数 Relu函数

    imags

    注意要避免过多的负数特征送入relu函数,避免神经元死亡

    • 改进随机初始化
    • 设置更小的学习率,减少参数分布的巨大变化

    常用激活函数 Leaky Relu函数

    Leaky Relu函数是为了解决relu负区间为0引起神经元死亡问题而设计

    imags

    损失函数loss

    损失函数显示出预测值(y)与已知答案y_的差距

    主流loss的三种计算方法

    • 【均方误差】mse(Mean Squared Error)
      • 前向传播计算出结果y与已知标准答案y_之差的平方_
      • image.png
    • 自定义
    • 【交叉熵】ce (Cross Entropy)

    通过均方误差预测酸奶日销量

    已知 酸奶日销量y。其中x1,x2是影响日销量的元素。

    建模前,需要采集的数据有

    • 每日的x1,x2
    • 销量y_

    拟造数据集 X,Y:y_=x1+x2;噪声:-0.05~+0.05。拟合可以预测销量的函数

    1. SEED = 23455
    2. rdm = np.random.RandomState(seed=SEED) # 生成[0,1)之间的随机数 此时为了方便调试因此写固定值,实际应用不写seed的
    3. x = rdm.rand(32, 2)
    4. y_ = [[x1 + x2 + (rdm.rand() / 10.0 - 0.05)] for (x1, x2) in x] # 生成噪声[0,1)/10=[0,0.1); [0,0.1)-0.05=[-0.05,0.05)
    5. x = tf.cast(x, dtype=tf.float32)
    6. w1 = tf.Variable(tf.random.normal([2, 1], stddev=1, seed=1)) # 两行一列
    7. epoch = 15000 # 迭代次数
    8. lr = 0.002 # 学习率
    9. epoch_all=[]
    10. w0_numpy_all = []
    11. w1_numpy_all = []
    12. for epoch in range(epoch):
    13. with tf.GradientTape() as tape:
    14. y = tf.matmul(x, w1)
    15. loss_mse = tf.reduce_mean(tf.square(y_ - y))
    16. grads = tape.gradient(loss_mse, w1) # 损失函数对待训练参数w1求偏导
    17. w1.assign_sub(lr * grads)
    18. if epoch % 500 == 0: # 每迭代500次记录一次数据
    19. # print("After %d training steps,w1 is " % (epoch))
    20. epoch_all.append(epoch)
    21. w0_numpy_all.append(w1.numpy()[0])
    22. w1_numpy_all.append(w1.numpy()[1])
    23. # print(w1.numpy(), "\n")
    24. plt.figure(figsize=(10,5),dpi=360)
    25. plt.plot(epoch_all,w0_numpy_all,color="g",linestyle="-",label="x1 标准答案") # 绘画
    26. plt.plot(epoch_all,w1_numpy_all,color="k",linestyle="-.",label="x2 预测答案") # 绘画
    27. plt.title("预测酸奶日销量")
    28. plt.xlabel("epoch")
    29. plt.ylabel("data")
    30. plt.legend(loc="upper right")# 显示图例必须在绘制时设置好
    31. plt.show()
    32. # print("Final w1 is: ", w1.numpy())
    33. # 最后得到的结果是趋近于1的

    png

    上图结果 y=1.00 * x1 + 1.00 * x2符合制造数据集的公式 y = 1 * x1 + 1 * x2,说明预测酸奶日销量的公式拟合正确

    上述方法,如果预测多了,损失成本;如果预测少了,损失利润。

    若 利润 ≠ 成本 ,则mes产生的loss无法利益最大化

    为了实现利益最大化,在此引入自定义损失函数

    自定义函数

    imags

    上图中,PROFIT代表利润,COST代表成本

    如果 预测结果y > 标准答案 y_ 则预测的多了,损失成本,则应该 ( y - y_ ) * COST
    如果 预测结果y < 标准答案 y_ 则预测的少了,损失利润,则应该 ( y_ - y ) * COST

    通过自定义函数预测商品销量

    loss_zdy = tf.reduce_sum(tf.where( tf.greater(y,y_) , COST*(y-y_) , PROFIT*(y_-y) ))

    1. import tensorflow as tf
    2. import numpy as np
    3. SEED = 23455
    4. COST = 1 # 成本
    5. PROFIT = 99
    6. rdm = np.random.RandomState(SEED)
    7. x = rdm.rand(32, 2)
    8. y_ = [[x1 + x2 + (rdm.rand() / 10.0 - 0.05)] for (x1, x2) in x] # 生成噪声[0,1)/10=[0,0.1); [0,0.1)-0.05=[-0.05,0.05)
    9. x = tf.cast(x, dtype=tf.float32)
    10. w1 = tf.Variable(tf.random.normal([2, 1], stddev=1, seed=1))
    11. epoch = 10000
    12. lr = 0.002
    13. w0_numpy_all = []
    14. w1_numpy_all = []
    15. epoch_all = []
    16. for epoch in range(epoch):
    17. with tf.GradientTape() as tape:
    18. y = tf.matmul(x, w1)
    19. loss = tf.reduce_sum(tf.where(tf.greater(y, y_), (y - y_) * COST, (y_ - y) * PROFIT))
    20. grads = tape.gradient(loss, w1)
    21. w1.assign_sub(lr * grads)
    22. if epoch % 500 == 0:
    23. # print("After %d training steps,w1 is " % (epoch))
    24. # print(w1.numpy()[1], "\n")
    25. epoch_all.append(epoch)
    26. w0_numpy_all.append(w1.numpy()[0])
    27. w1_numpy_all.append(w1.numpy()[1])
    28. # print("Final w1 is: ", w1.numpy())
    29. plt.figure(figsize=(10,5),dpi=360)
    30. plt.plot(epoch_all,w0_numpy_all,color="g",linestyle="-",label="x1 标准答案") # 绘画
    31. plt.plot(epoch_all,w1_numpy_all,color="k",linestyle="-.",label="x2 预测答案") # 绘画
    32. plt.title("预测酸奶日销量")
    33. plt.xlabel("epoch")
    34. plt.ylabel("data")
    35. plt.legend(loc="upper right")# 显示图例必须在绘制时设置好
    36. plt.show()
    37. # 自定义损失函数
    38. # 酸奶成本1元, 酸奶利润99元
    39. # 成本很低,利润很高,人们希望多预测些,生成模型系数大于1,往多了预测

    png

    看上图可以发现,自定义损失函数的系数都大于均方误差做损失函数时的系数,即模型在尽量往大了预测,(大损成本小损利润)

    接下来把成本改为99,利润改为1,如下:

    1. import tensorflow as tf
    2. import numpy as np
    3. SEED = 23455
    4. COST = 99 # 成本
    5. PROFIT = 1
    6. rdm = np.random.RandomState(SEED)
    7. x = rdm.rand(32, 2)
    8. y_ = [[x1 + x2 + (rdm.rand() / 10.0 - 0.05)] for (x1, x2) in x] # 生成噪声[0,1)/10=[0,0.1); [0,0.1)-0.05=[-0.05,0.05)
    9. x = tf.cast(x, dtype=tf.float32)
    10. w1 = tf.Variable(tf.random.normal([2, 1], stddev=1, seed=1))
    11. epoch = 10000
    12. lr = 0.002
    13. w0_numpy_all = []
    14. w1_numpy_all = []
    15. epoch_all = []
    16. for epoch in range(epoch):
    17. with tf.GradientTape() as tape:
    18. y = tf.matmul(x, w1)
    19. loss = tf.reduce_sum(tf.where(tf.greater(y, y_), (y - y_) * COST, (y_ - y) * PROFIT))
    20. grads = tape.gradient(loss, w1)
    21. w1.assign_sub(lr * grads)
    22. if epoch % 500 == 0:
    23. # print("After %d training steps,w1 is " % (epoch))
    24. # print(w1.numpy()[1], "\n")
    25. epoch_all.append(epoch)
    26. w0_numpy_all.append(w1.numpy()[0])
    27. w1_numpy_all.append(w1.numpy()[1])
    28. # print("Final w1 is: ", w1.numpy())
    29. plt.figure(figsize=(10,5),dpi=360)
    30. plt.plot(epoch_all,w0_numpy_all,color="g",linestyle="-",label="x1 标准答案") # 绘画
    31. plt.plot(epoch_all,w1_numpy_all,color="k",linestyle="-.",label="x2 预测答案") # 绘画
    32. plt.title("预测酸奶日销量")
    33. plt.xlabel("epoch")
    34. plt.ylabel("data")
    35. plt.legend(loc="upper right")# 显示图例必须在绘制时设置好
    36. plt.show()
    37. # 自定义损失函数
    38. # 酸奶成本1元, 酸奶利润99元
    39. # 成本很低,利润很高,人们希望多预测些,生成模型系数大于1,往多了预测

    png

    看上图可以发现,模型在尽量往少了预测

    交叉熵损失函数(Cross Entropy)

    imags

    imags

    tensorflow交叉熵计算公式 tf.losses.categorical_crossentropy(标准答案y_,输出结果)

    1. loss_ce1 = tf.losses.categorical_crossentropy([1, 0], [0.6, 0.4])
    2. loss_ce2 = tf.losses.categorical_crossentropy([1, 0], [0.8, 0.2])
    3. print("loss_ce1:", loss_ce1)
    4. print("loss_ce2:", loss_ce2)
    5. # 交叉熵损失函数
    1. loss_ce1: tf.Tensor(0.5108256, shape=(), dtype=float32)
    2. loss_ce2: tf.Tensor(0.22314353, shape=(), dtype=float32)

    同时计算概率分布和交叉熵的函数

    tf.nn.softmax_cross_entropy_with_logits(y_,y)|

    1. y_ = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0]])
    2. y = np.array([[12, 3, 2], [3, 10, 1], [1, 2, 5], [4, 6.5, 1.2], [3, 6, 1]])
    3. ## 分步完成
    4. y_pro = tf.nn.softmax(y)
    5. loss_ce1 = tf.losses.categorical_crossentropy(y_,y_pro)
    6. ## 一步完成
    7. loss_ce2 = tf.nn.softmax_cross_entropy_with_logits(y_, y)
    8. print('分步计算的结果:\n', loss_ce1)
    9. print('结合计算的结果:\n', loss_ce2)
    1. 分步计算的结果:
    2. tf.Tensor(
    3. [1.68795487e-04 1.03475622e-03 6.58839038e-02 2.58349207e+00
    4. 5.49852354e-02], shape=(5,), dtype=float64)
    5. 结合计算的结果:
    6. tf.Tensor(
    7. [1.68795487e-04 1.03475622e-03 6.58839038e-02 2.58349207e+00
    8. 5.49852354e-02], shape=(5,), dtype=float64)

    缓解过拟合

    imags

    欠拟合和过拟合

    • 欠拟合解决方法

      • 增加输入特征项
      • 增加网络参数
      • 减少正则化参数
    • 过拟合解决方案

      • 数据清洗
      • 增大训练集
      • 采用正则化
      • 增大正则化参数

    正则化缓解过拟合

    正则化在损失函数中引入模型复杂度指标,利用给W加权值,弱化训练数据的噪声(一般不正则化b)

    loss = loss(y与y_)+ REGULARIZER * loss(w)

    imags

    正则化的选择

    • L1正则化大概率会使很多参数变为0,因此该方法可以通过稀释参数,即减少参数的数量,降低复杂度。
    • L2正则化会使参数很接近但不为0,因此该方法可以通过减小参数值的大小降低复杂度

    下面正则化缓解过拟合案例需要使用要以下数据,数据文件名字为dot.csv。该文件应该和代码在同一目录下。

    1. x1,x2,y_c
    2. -0.416757847,-0.056266827,1
    3. -2.136196096,1.640270808,0
    4. -1.793435585,-0.841747366,0
    5. 0.502881417,-1.245288087,1
    6. -1.057952219,-0.909007615,1
    7. 0.551454045,2.292208013,0
    8. 0.041539393,-1.117925445,1
    9. 0.539058321,-0.5961597,1
    10. -0.019130497,1.17500122,1
    11. -0.747870949,0.009025251,1
    12. -0.878107893,-0.15643417,1
    13. 0.256570452,-0.988779049,1
    14. -0.338821966,-0.236184031,1
    15. -0.637655012,-1.187612286,1
    16. -1.421217227,-0.153495196,0
    17. -0.26905696,2.231366789,0
    18. -2.434767577,0.112726505,0
    19. 0.370444537,1.359633863,1
    20. 0.501857207,-0.844213704,1
    21. 9.76E-06,0.542352572,1
    22. -0.313508197,0.771011738,1
    23. -1.868090655,1.731184666,0
    24. 1.467678011,-0.335677339,0
    25. 0.61134078,0.047970592,1
    26. -0.829135289,0.087710218,1
    27. 1.000365887,-0.381092518,1
    28. -0.375669423,-0.074470763,1
    29. 0.43349633,1.27837923,1
    30. -0.634679305,0.508396243,1
    31. 0.216116006,-1.858612386,0
    32. -0.419316482,-0.132328898,1
    33. -0.03957024,0.326003433,1
    34. -2.040323049,0.046255523,0
    35. -0.677675577,-1.439439027,0
    36. 0.52429643,0.735279576,1
    37. -0.653250268,0.842456282,1
    38. -0.381516482,0.066489009,1
    39. -1.098738947,1.584487056,0
    40. -2.659449456,-0.091452623,0
    41. 0.695119605,-2.033466546,0
    42. -0.189469265,-0.077218665,1
    43. 0.824703005,1.248212921,0
    44. -0.403892269,-1.384518667,0
    45. 1.367235424,1.217885633,0
    46. -0.462005348,0.350888494,1
    47. 0.381866234,0.566275441,1
    48. 0.204207979,1.406696242,0
    49. -1.737959504,1.040823953,0
    50. 0.38047197,-0.217135269,1
    51. 1.173531498,-2.343603191,0
    52. 1.161521491,0.386078048,1
    53. -1.133133274,0.433092555,1
    54. -0.304086439,2.585294868,0
    55. 1.835332723,0.440689872,0
    56. -0.719253841,-0.583414595,1
    57. -0.325049628,-0.560234506,1
    58. -0.902246068,-0.590972275,1
    59. -0.276179492,-0.516883894,1
    60. -0.69858995,-0.928891925,1
    61. 2.550438236,-1.473173248,0
    62. -1.021414731,0.432395701,1
    63. -0.32358007,0.423824708,1
    64. 0.799179995,1.262613663,0
    65. 0.751964849,-0.993760983,1
    66. 1.109143281,-1.764917728,0
    67. -0.114421297,-0.498174194,1
    68. -1.060799036,0.591666521,1
    69. -0.183256574,1.019854729,1
    70. -1.482465478,0.846311892,0
    71. 0.497940148,0.126504175,1
    72. -1.418810551,-0.251774118,0
    73. -1.546674611,-2.082651936,0
    74. 3.279745401,0.97086132,0
    75. 1.792592852,-0.429013319,0
    76. 0.69619798,0.697416272,1
    77. 0.601515814,0.003659491,1
    78. -0.228247558,-2.069612263,0
    79. 0.610144086,0.4234969,1
    80. 1.117886733,-0.274242089,1
    81. 1.741812188,-0.447500876,0
    82. -1.255427218,0.938163671,0
    83. -0.46834626,-1.254720307,1
    84. 0.124823646,0.756502143,1
    85. 0.241439629,0.497425649,1
    86. 4.108692624,0.821120877,0
    87. 1.531760316,-1.985845774,0
    88. 0.365053516,0.774082033,1
    89. -0.364479092,-0.875979478,1
    90. 0.396520159,-0.314617436,1
    91. -0.593755583,1.149500568,1
    92. 1.335566168,0.302629336,1
    93. -0.454227855,0.514370717,1
    94. 0.829458431,0.630621967,1
    95. -1.45336435,-0.338017777,0
    96. 0.359133332,0.622220414,1
    97. 0.960781945,0.758370347,1
    98. -1.134318483,-0.707420888,1
    99. -1.221429165,1.804476642,0
    100. 0.180409807,0.553164274,1
    101. 1.033029066,-0.329002435,1
    102. -1.151002944,-0.426522471,1
    103. -0.148147191,1.501436915,0
    104. 0.869598198,-1.087090575,1
    105. 0.664221413,0.734884668,1
    106. -1.061365744,-0.108516824,1
    107. -1.850403974,0.330488064,0
    108. -0.31569321,-1.350002103,1
    109. -0.698170998,0.239951198,1
    110. -0.55294944,0.299526813,1
    111. 0.552663696,-0.840443012,1
    112. -0.31227067,2.144678089,0
    113. 0.121105582,-0.846828752,1
    114. 0.060462449,-1.33858888,1
    115. 1.132746076,0.370304843,1
    116. 1.085806404,0.902179395,1
    117. 0.39029645,0.975509412,1
    118. 0.191573647,-0.662209012,1
    119. -1.023514985,-0.448174823,1
    120. -2.505458132,1.825994457,0
    121. -1.714067411,-0.076639564,0
    122. -1.31756727,-2.025593592,0
    123. -0.082245375,-0.304666585,1
    124. -0.15972413,0.54894656,1
    125. -0.618375485,0.378794466,1
    126. 0.513251444,-0.334844125,1
    127. -0.283519516,0.538424263,1
    128. 0.057250947,0.159088487,1
    129. -2.374402684,0.058519935,0
    130. 0.376545911,-0.135479764,1
    131. 0.335908395,1.904375909,0
    132. 0.085364433,0.665334278,1
    133. -0.849995503,-0.852341797,1
    134. -0.479985112,-1.019649099,1
    135. -0.007601138,-0.933830661,1
    136. -0.174996844,-1.437143432,0
    137. -1.652200291,-0.675661789,0
    138. -1.067067124,-0.652931145,1
    139. -0.61209475,-0.351262461,1
    140. 1.045477988,1.369016024,0
    141. 0.725353259,-0.359474459,1
    142. 1.49695179,-1.531111108,0
    143. -2.023363939,0.267972576,0
    144. -0.002206445,-0.139291883,1
    145. 0.032565469,-1.640560225,0
    146. -1.156699171,1.234034681,0
    147. 1.028184899,-0.721879726,1
    148. 1.933156966,-1.070796326,0
    149. -0.571381608,0.292432067,1
    150. -1.194999895,-0.487930544,1
    151. -0.173071165,-0.395346401,1
    152. 0.870840765,0.592806797,1
    153. -1.099297309,-0.681530644,1
    154. 0.180066685,-0.066931044,1
    155. -0.78774954,0.424753672,1
    156. 0.819885117,-0.631118683,1
    157. 0.789059649,-1.621673803,0
    158. -1.610499259,0.499939764,0
    159. -0.834515207,-0.996959687,1
    160. -0.263388077,-0.677360492,1
    161. 0.327067038,-1.455359445,0
    162. -0.371519124,3.16096597,0
    163. 0.109951013,-1.913523218,0
    164. 0.599820429,0.549384465,1
    165. 1.383781035,0.148349243,1
    166. -0.653541444,1.408833984,0
    167. 0.712061227,-1.800716041,0
    168. 0.747598942,-0.232897001,1
    169. 1.11064528,-0.373338813,1
    170. 0.78614607,0.194168696,1
    171. 0.586204098,-0.020387292,1
    172. -0.414408598,0.067313412,1
    173. 0.631798924,0.417592731,1
    174. 1.615176269,0.425606211,0
    175. 0.635363758,2.102229267,0
    176. 0.066126417,0.535558351,1
    177. -0.603140792,0.041957629,1
    178. 1.641914637,0.311697707,0
    179. 1.4511699,-1.06492788,0
    180. -1.400845455,0.307525527,0
    181. -1.369638673,2.670337245,0
    182. 1.248450298,-1.245726553,0
    183. -0.167168774,-0.57661093,1
    184. 0.416021749,-0.057847263,1
    185. 0.931887358,1.468332133,0
    186. -0.221320943,-1.173155621,1
    187. 0.562669078,-0.164515057,1
    188. 1.144855376,-0.152117687,1
    189. 0.829789046,0.336065952,1
    190. -0.189044051,-0.449328601,1
    191. 0.713524448,2.529734874,0
    192. 0.837615794,-0.131682403,1
    193. 0.707592866,0.114053878,1
    194. -1.280895178,0.309846277,1
    195. 1.548290694,-0.315828043,0
    196. -1.125903781,0.488496666,1
    197. 1.830946657,0.940175993,0
    198. 1.018717047,2.302378289,0
    199. 1.621092978,0.712683273,0
    200. -0.208703629,0.137617991,1
    201. -0.103352168,0.848350567,1
    202. -0.883125561,1.545386826,0
    203. 0.145840073,-0.400106056,1
    204. 0.815206041,-2.074922365,0
    205. -0.834437391,-0.657718447,1
    206. 0.820564332,-0.489157001,1
    207. 1.424967034,-0.446857897,0
    208. 0.521109431,-0.70819438,1
    209. 1.15553059,-0.254530459,1
    210. 0.518924924,-0.492994911,1
    211. -1.086548153,-0.230917497,1
    212. 1.098010039,-1.01787805,0
    213. -1.529391355,-0.307987737,0
    214. 0.780754356,-1.055839639,1
    215. -0.543883381,0.184301739,1
    216. -0.330675843,0.287208202,1
    217. 1.189528137,0.021201548,1
    218. -0.06540968,0.766115904,1
    219. -0.061635085,-0.952897152,1
    220. -1.014463064,-1.115263963,0
    221. 1.912600678,-0.045263203,0
    222. 0.576909718,0.717805695,1
    223. -0.938998998,0.628775807,1
    224. -0.564493432,-2.087807462,0
    225. -0.215050132,-1.075028564,1
    226. -0.337972149,0.343212732,1
    227. 2.28253964,-0.495778848,0
    228. -0.163962832,0.371622161,1
    229. 0.18652152,-0.158429224,1
    230. -1.082929557,-0.95662552,0
    231. -0.183376735,-1.159806896,1
    232. -0.657768362,-1.251448406,1
    233. 1.124482861,-1.497839806,0
    234. 1.902017223,-0.580383038,0
    235. -1.054915674,-1.182757204,0
    236. 0.779480054,1.026597951,1
    237. -0.848666001,0.331539648,1
    238. -0.149591353,-0.2424406,1
    239. 0.151197175,0.765069481,1
    240. -1.916630519,-2.227341292,0
    241. 0.206689897,-0.070876356,1
    242. 0.684759969,-1.707539051,0
    243. -0.986569665,1.543536339,0
    244. -1.310270529,0.363433972,1
    245. -0.794872445,-0.405286267,1
    246. -1.377757931,1.186048676,0
    247. -1.903821143,-1.198140378,0
    248. -0.910065643,1.176454193,0
    249. 0.29921067,0.679267178,1
    250. -0.01766068,0.236040923,1
    251. 0.494035871,1.546277646,0
    252. 0.246857508,-1.468775799,0
    253. 1.147099942,0.095556985,1
    254. -1.107438726,-0.176286141,1
    255. -0.982755667,2.086682727,0
    256. -0.344623671,-2.002079233,0
    257. 0.303234433,-0.829874845,1
    258. 1.288769407,0.134925462,1
    259. -1.778600641,-0.50079149,0
    260. -1.088161569,-0.757855553,1
    261. -0.6437449,-2.008784527,0
    262. 0.196262894,-0.87589637,1
    263. -0.893609209,0.751902355,1
    264. 1.896932244,-0.629079151,0
    265. 1.812085527,-2.056265741,0
    266. 0.562704887,-0.582070757,1
    267. -0.074002975,-0.986496364,1
    268. -0.594722499,-0.314811843,1
    269. -0.346940532,0.411443516,1
    270. 2.326390901,-0.634053128,0
    271. -0.154409962,-1.749288804,0
    272. -2.519579296,1.391162427,0
    273. -1.329346443,-0.745596414,0
    274. 0.02126085,0.910917515,1
    275. 0.315276082,1.866208205,0
    276. -0.182497623,-1.82826634,0
    277. 0.138955717,0.119450165,1
    278. -0.8188992,-0.332639265,1
    279. -0.586387955,1.734516344,0
    280. -0.612751558,-1.393442017,0
    281. 0.279433757,-1.822231268,0
    282. 0.427017458,0.406987749,1
    283. -0.844308241,-0.559820113,1
    284. -0.600520405,1.614873237,0
    285. 0.39495322,-1.203813469,1
    286. -1.247472432,-0.07754625,1
    287. -0.013339751,-0.76832325,1
    288. 0.29123401,-0.197330948,1
    289. 1.07682965,0.437410232,1
    290. -0.093197866,0.135631416,1
    291. -0.882708822,0.884744194,1
    292. 0.383204463,-0.416994149,1
    293. 0.11779655,-0.536685309,1
    294. 2.487184575,-0.451361054,0
    295. 0.518836127,0.364448005,1
    296. -0.798348729,0.005657797,1
    297. -0.320934708,0.24951355,1
    298. 0.256308392,0.767625083,1
    299. 0.783020087,-0.407063047,1
    300. -0.524891667,-0.589808683,1
    301. -0.862531086,-1.742872904,0
    1. # 读入数据/标签 生成x_train y_train
    2. df = pd.read_csv('dot.csv')
    3. x_data = np.array(df[['x1', 'x2']])
    4. y_data = np.array(df['y_c'])
    5. x_train = np.vstack(x_data).reshape(-1, 2)
    6. y_train = np.vstack(y_data).reshape(-1, 1)
    7. Y_c = [['red' if y else 'blue'] for y in y_train]
    8. # 转换x的数据类型,否则后面矩阵相乘时会因数据类型问题报错
    9. x_train = tf.cast(x_train, tf.float32)
    10. y_train = tf.cast(y_train, tf.float32)
    11. # from_tensor_slices函数切分传入的张量的第一个维度,生成相应的数据集,使输入特征和标签值一一对应
    12. train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32) # 打包成数据集
    13. # 生成神经网络的参数,输入层为2个神经元,隐藏层为11个神经元,1层隐藏层,输出层为1个神经元
    14. # 用tf.Variable()保证参数可训练
    15. w1 = tf.Variable(tf.random.normal([2, 11]), dtype=tf.float32)
    16. b1 = tf.Variable(tf.constant(0.01, shape=[11]))
    17. # 第二层的输入特征个数就是第一层的输出个数,因此11保持一致。为什么是11?随便选的神经元个数,可以更改以改进网络效果
    18. w2 = tf.Variable(tf.random.normal([11, 1]), dtype=tf.float32)
    19. b2 = tf.Variable(tf.constant(0.01, shape=[1]))
    20. lr = 0.005 # 学习率
    21. epoch = 800 # 循环轮数
    22. # 训练部分
    23. for epoch in range(epoch):
    24. for step, (x_train, y_train) in enumerate(train_db):
    25. with tf.GradientTape() as tape: # 记录梯度信息
    26. h1 = tf.matmul(x_train, w1) + b1 # 记录神经网络乘加运算
    27. h1 = tf.nn.relu(h1)
    28. y = tf.matmul(h1, w2) + b2
    29. # 采用均方误差损失函数mse = mean(sum(y-out)^2)
    30. loss = tf.reduce_mean(tf.square(y_train - y))
    31. # 计算loss对各个参数的梯度
    32. variables = [w1, b1, w2, b2]
    33. grads = tape.gradient(loss, variables)
    34. # 实现梯度更新
    35. # w1 = w1 - lr * w1_grad tape.gradient是自动求导结果与[w1, b1, w2, b2] 索引为0,1,2,3
    36. w1.assign_sub(lr * grads[0])
    37. b1.assign_sub(lr * grads[1])
    38. w2.assign_sub(lr * grads[2])
    39. b2.assign_sub(lr * grads[3])
    40. # 每20个epoch,打印loss信息
    41. if epoch % 20 == 0:
    42. print('epoch:', epoch, 'loss:', float(loss))
    43. # 预测部分
    44. print("*******predict*******")
    45. # xx在-3到3之间以步长为0.01,yy在-3到3之间以步长0.01,生成间隔数值点
    46. xx, yy = np.mgrid[-3:3:.1, -3:3:.1]
    47. # 将xx , yy拉直,并合并配对为二维张量,生成二维坐标点
    48. grid = np.c_[xx.ravel(), yy.ravel()]
    49. grid = tf.cast(grid, tf.float32)
    50. # 将网格坐标点喂入神经网络,进行预测,probs为输出
    51. probs = []
    52. for x_test in grid:
    53. # 使用训练好的参数进行预测
    54. h1 = tf.matmul([x_test], w1) + b1
    55. h1 = tf.nn.relu(h1)
    56. y = tf.matmul(h1, w2) + b2 # y为预测结果
    57. probs.append(y)
    58. # 取第0列给x1,取第1列给x2
    59. x1 = x_data[:, 0]
    60. x2 = x_data[:, 1]
    61. # probs的shape调整成xx的样子
    62. probs = np.array(probs).reshape(xx.shape)
    63. plt.scatter(x1, x2, color=np.squeeze(Y_c)) # squeeze去掉纬度是1的纬度,相当于去掉[['red'],[''blue]],内层括号变为['red','blue']
    64. # 把坐标xx yy和对应的值probs放入contour函数,给probs值为0.5的所有点上色 plt.show()后 显示的是红蓝点的分界线
    65. plt.contour(xx, yy, probs, levels=[.5])
    66. plt.show()
    67. # 读入红蓝点,画出分割线,不包含正则化
    68. # 不清楚的数据,建议print出来查看
    1. epoch: 0 loss: 3.3588955402374268
    2. epoch: 20 loss: 0.0404302217066288
    3. epoch: 40 loss: 0.04070901498198509
    4. epoch: 60 loss: 0.03821190819144249
    5. epoch: 80 loss: 0.036114610731601715
    6. epoch: 100 loss: 0.03467976301908493
    7. epoch: 120 loss: 0.03373105823993683
    8. epoch: 140 loss: 0.03225767984986305
    9. epoch: 160 loss: 0.03019583784043789
    10. epoch: 180 loss: 0.028336063027381897
    11. epoch: 200 loss: 0.026807844638824463
    12. epoch: 220 loss: 0.025512710213661194
    13. epoch: 240 loss: 0.024442538619041443
    14. epoch: 260 loss: 0.02359318919479847
    15. epoch: 280 loss: 0.022960280999541283
    16. epoch: 300 loss: 0.02251446805894375
    17. epoch: 320 loss: 0.02214951254427433
    18. epoch: 340 loss: 0.02189861238002777
    19. epoch: 360 loss: 0.021764680743217468
    20. epoch: 380 loss: 0.021699346601963043
    21. epoch: 400 loss: 0.021605348214507103
    22. epoch: 420 loss: 0.021516701206564903
    23. epoch: 440 loss: 0.021409569308161736
    24. epoch: 460 loss: 0.02125600166618824
    25. epoch: 480 loss: 0.021161897107958794
    26. epoch: 500 loss: 0.021076681092381477
    27. epoch: 520 loss: 0.02100759744644165
    28. epoch: 540 loss: 0.020953044295310974
    29. epoch: 560 loss: 0.020902445539832115
    30. epoch: 580 loss: 0.020858166739344597
    31. epoch: 600 loss: 0.02082127146422863
    32. epoch: 620 loss: 0.020805032923817635
    33. epoch: 640 loss: 0.020836150273680687
    34. epoch: 660 loss: 0.020815739408135414
    35. epoch: 680 loss: 0.02078959532082081
    36. epoch: 700 loss: 0.020798802375793457
    37. epoch: 720 loss: 0.020799407735466957
    38. epoch: 740 loss: 0.020800543949007988
    39. epoch: 760 loss: 0.020800618454813957
    40. epoch: 780 loss: 0.020786955952644348
    41. *******predict*******

    png

    神经网络参数优化器

    待优化参数w;损失函数loss;学习率lr;每次迭代一个vatch;t表示当前batch总次数

    imags

    一阶动量:与梯度相关的函数

    二阶动量:与梯度平方相关的函数

    优化器

    SGD

    imags

    无momentum:不含动量

    mt=gt 一阶动量定义为梯度

    Vt= 1 二阶动量恒等于1

    ηt = 学习率lr 乘以 一阶动量 除以 二阶动量开平方 = lr * gt

    最后方框框起来的是需要记住的参数更新公式

    对于单层网络:

    1. w1.assign_sub(lr * grads[0]) # 参数w1自更新
    2. b1.assign_sub(lr * grads[1]) # 参数b自更新
    1. from sklearn import datasets
    2. import time ##1##
    3. # 导入数据,分别为输入特征和标签
    4. x_data = datasets.load_iris().data
    5. y_data = datasets.load_iris().target
    6. # 随机打乱数据(因为原始数据是顺序的,顺序不打乱会影响准确率)
    7. # seed: 随机数种子,是一个整数,当设置之后,每次生成的随机数都一样(为方便教学,以保每位同学结果一致)
    8. np.random.seed(116) # 使用相同的seed,保证输入特征和标签一一对应
    9. np.random.shuffle(x_data)
    10. np.random.seed(116)
    11. np.random.shuffle(y_data)
    12. tf.random.set_seed(116)
    13. # 将打乱后的数据集分割为训练集和测试集,训练集为前120行,测试集为后30行
    14. x_train = x_data[:-30]
    15. y_train = y_data[:-30]
    16. x_test = x_data[-30:]
    17. y_test = y_data[-30:]
    18. # 转换x的数据类型,否则后面矩阵相乘时会因数据类型不一致报错
    19. x_train = tf.cast(x_train, tf.float32)
    20. x_test = tf.cast(x_test, tf.float32)
    21. # from_tensor_slices函数使输入特征和标签值一一对应。(把数据集分批次,每个批次batch组数据)
    22. train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)
    23. test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)
    24. # 生成神经网络的参数,4个输入特征故,输入层为4个输入节点;因为3分类,故输出层为3个神经元
    25. # 用tf.Variable()标记参数可训练
    26. # 使用seed使每次生成的随机数相同(方便教学,使大家结果都一致,在现实使用时不写seed)
    27. w1 = tf.Variable(tf.random.truncated_normal([4, 3], stddev=0.1, seed=1))
    28. b1 = tf.Variable(tf.random.truncated_normal([3], stddev=0.1, seed=1))
    29. lr = 0.1 # 学习率为0.1
    30. train_loss_results = [] # 将每轮的loss记录在此列表中,为后续画loss曲线提供数据
    31. test_acc = [] # 将每轮的acc记录在此列表中,为后续画acc曲线提供数据
    32. epoch = 500 # 循环500轮
    33. loss_all = 0 # 每轮分4个step,loss_all记录四个step生成的4个loss的和
    34. # 训练部分
    35. now_time = time.time() ##2##
    36. for epoch in range(epoch): # 数据集级别的循环,每个epoch循环一次数据集
    37. for step, (x_train, y_train) in enumerate(train_db): # batch级别的循环 ,每个step循环一个batch
    38. with tf.GradientTape() as tape: # with结构记录梯度信息
    39. y = tf.matmul(x_train, w1) + b1 # 神经网络乘加运算
    40. y = tf.nn.softmax(y) # 使输出y符合概率分布(此操作后与独热码同量级,可相减求loss)
    41. y_ = tf.one_hot(y_train, depth=3) # 将标签值转换为独热码格式,方便计算loss和accuracy
    42. loss = tf.reduce_mean(tf.square(y_ - y)) # 采用均方误差损失函数mse = mean(sum(y-out)^2)
    43. loss_all += loss.numpy() # 将每个step计算出的loss累加,为后续求loss平均值提供数据,这样计算的loss更准确
    44. # 计算loss对各个参数的梯度
    45. grads = tape.gradient(loss, [w1, b1])
    46. # 实现梯度更新 w1 = w1 - lr * w1_grad b = b - lr * b_grad
    47. w1.assign_sub(lr * grads[0]) # 参数w1自更新
    48. b1.assign_sub(lr * grads[1]) # 参数b自更新
    49. # 每个epoch,打印loss信息
    50. # print("Epoch {}, loss: {}".format(epoch, loss_all / 4))
    51. train_loss_results.append(loss_all / 4) # 将4个step的loss求平均记录在此变量中
    52. loss_all = 0 # loss_all归零,为记录下一个epoch的loss做准备
    53. # 测试部分
    54. # total_correct为预测对的样本个数, total_number为测试的总样本数,将这两个变量都初始化为0
    55. total_correct, total_number = 0, 0
    56. for x_test, y_test in test_db:
    57. # 使用更新后的参数进行预测
    58. y = tf.matmul(x_test, w1) + b1
    59. y = tf.nn.softmax(y)
    60. pred = tf.argmax(y, axis=1) # 返回y中最大值的索引,即预测的分类
    61. # 将pred转换为y_test的数据类型
    62. pred = tf.cast(pred, dtype=y_test.dtype)
    63. # 若分类正确,则correct=1,否则为0,将bool型的结果转换为int型
    64. correct = tf.cast(tf.equal(pred, y_test), dtype=tf.int32)
    65. # 将每个batch的correct数加起来
    66. correct = tf.reduce_sum(correct)
    67. # 将所有batch中的correct数加起来
    68. total_correct += int(correct)
    69. # total_number为测试的总样本数,也就是x_test的行数,shape[0]返回变量的行数
    70. total_number += x_test.shape[0]
    71. # 总的准确率等于total_correct/total_number
    72. acc = total_correct / total_number
    73. test_acc.append(acc)
    74. # print("Test_acc:", acc)
    75. # print("--------------------------")
    76. total_time = time.time() - now_time ##3##
    77. print("total_time", total_time) ##4##
    78. # 绘制 loss 曲线
    79. plt.title('Loss Function Curve') # 图片标题
    80. plt.xlabel('Epoch') # x轴变量名称
    81. plt.ylabel('Loss') # y轴变量名称
    82. plt.plot(train_loss_results, label="$Loss$") # 逐点画出trian_loss_results值并连线,连线图标是Loss
    83. plt.legend() # 画出曲线图标
    84. plt.show() # 画出图像
    85. # 绘制 Accuracy 曲线
    86. plt.title('Acc Curve') # 图片标题
    87. plt.xlabel('Epoch') # x轴变量名称
    88. plt.ylabel('Acc') # y轴变量名称
    89. plt.plot(test_acc, label="$Accuracy$") # 逐点画出test_acc值并连线,连线图标是Accuracy
    90. plt.legend()
    91. plt.show()
    92. # 本文件较 class1\p45_iris.py 仅添加四处时间记录 用 ##n## 标识
    93. # 请将loss曲线、ACC曲线、total_time记录到 class2\优化器对比.docx 对比各优化器收敛情况
    total_time 3.9469685554504395

    png

    png

    SGDM

    imags

    mt这个公式表示各时刻梯度方向的指数滑动平均值

    mt-1表示上一时刻的一阶动量

    β是个超参数,是个接近1的数值,因此β* mt-1 在公式中占大头

    1. m_w,m_b = 0,0 # 第一个时刻的一阶动量是由第0时刻的一阶动量决定,而第0时刻的一阶动量为0
    2. beta = 0.9
    3. # sgd-momentun
    4. m_w = beta * m_w + (1 - beta) * grads[0]
    5. m_b = beta * m_b + (1 - beta) * grads[1]
    6. w1.assign_sub(lr * m_w)
    7. b1.assign_sub(lr * m_b)
    1. # 导入数据,分别为输入特征和标签
    2. x_data = datasets.load_iris().data
    3. y_data = datasets.load_iris().target
    4. # 随机打乱数据(因为原始数据是顺序的,顺序不打乱会影响准确率)
    5. # seed: 随机数种子,是一个整数,当设置之后,每次生成的随机数都一样(为方便教学,以保每位同学结果一致)
    6. np.random.seed(116) # 使用相同的seed,保证输入特征和标签一一对应
    7. np.random.shuffle(x_data)
    8. np.random.seed(116)
    9. np.random.shuffle(y_data)
    10. tf.random.set_seed(116)
    11. # 将打乱后的数据集分割为训练集和测试集,训练集为前120行,测试集为后30行
    12. x_train = x_data[:-30]
    13. y_train = y_data[:-30]
    14. x_test = x_data[-30:]
    15. y_test = y_data[-30:]
    16. # 转换x的数据类型,否则后面矩阵相乘时会因数据类型不一致报错
    17. x_train = tf.cast(x_train, tf.float32)
    18. x_test = tf.cast(x_test, tf.float32)
    19. # from_tensor_slices函数使输入特征和标签值一一对应。(把数据集分批次,每个批次batch组数据)
    20. train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)
    21. test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)
    22. # 生成神经网络的参数,4个输入特征故,输入层为4个输入节点;因为3分类,故输出层为3个神经元
    23. # 用tf.Variable()标记参数可训练
    24. # 使用seed使每次生成的随机数相同(方便教学,使大家结果都一致,在现实使用时不写seed)
    25. w1 = tf.Variable(tf.random.truncated_normal([4, 3], stddev=0.1, seed=1))
    26. b1 = tf.Variable(tf.random.truncated_normal([3], stddev=0.1, seed=1))
    27. lr = 0.1 # 学习率为0.1
    28. train_loss_results = [] # 将每轮的loss记录在此列表中,为后续画loss曲线提供数据
    29. test_acc = [] # 将每轮的acc记录在此列表中,为后续画acc曲线提供数据
    30. epoch = 500 # 循环500轮
    31. loss_all = 0 # 每轮分4个step,loss_all记录四个step生成的4个loss的和
    32. ##########################################################################
    33. m_w, m_b = 0, 0
    34. beta = 0.9
    35. ##########################################################################
    36. # 训练部分
    37. now_time = time.time() ##2##
    38. for epoch in range(epoch): # 数据集级别的循环,每个epoch循环一次数据集
    39. for step, (x_train, y_train) in enumerate(train_db): # batch级别的循环 ,每个step循环一个batch
    40. with tf.GradientTape() as tape: # with结构记录梯度信息
    41. y = tf.matmul(x_train, w1) + b1 # 神经网络乘加运算
    42. y = tf.nn.softmax(y) # 使输出y符合概率分布(此操作后与独热码同量级,可相减求loss)
    43. y_ = tf.one_hot(y_train, depth=3) # 将标签值转换为独热码格式,方便计算loss和accuracy
    44. loss = tf.reduce_mean(tf.square(y_ - y)) # 采用均方误差损失函数mse = mean(sum(y-out)^2)
    45. loss_all += loss.numpy() # 将每个step计算出的loss累加,为后续求loss平均值提供数据,这样计算的loss更准确
    46. # 计算loss对各个参数的梯度
    47. grads = tape.gradient(loss, [w1, b1])
    48. ##########################################################################
    49. # sgd-momentun
    50. m_w = beta * m_w + (1 - beta) * grads[0]
    51. m_b = beta * m_b + (1 - beta) * grads[1]
    52. w1.assign_sub(lr * m_w)
    53. b1.assign_sub(lr * m_b)
    54. ##########################################################################
    55. # 每个epoch,打印loss信息
    56. # print("Epoch {}, loss: {}".format(epoch, loss_all / 4))
    57. train_loss_results.append(loss_all / 4) # 将4个step的loss求平均记录在此变量中
    58. loss_all = 0 # loss_all归零,为记录下一个epoch的loss做准备
    59. # 测试部分
    60. # total_correct为预测对的样本个数, total_number为测试的总样本数,将这两个变量都初始化为0
    61. total_correct, total_number = 0, 0
    62. for x_test, y_test in test_db:
    63. # 使用更新后的参数进行预测
    64. y = tf.matmul(x_test, w1) + b1
    65. y = tf.nn.softmax(y)
    66. pred = tf.argmax(y, axis=1) # 返回y中最大值的索引,即预测的分类
    67. # 将pred转换为y_test的数据类型
    68. pred = tf.cast(pred, dtype=y_test.dtype)
    69. # 若分类正确,则correct=1,否则为0,将bool型的结果转换为int型
    70. correct = tf.cast(tf.equal(pred, y_test), dtype=tf.int32)
    71. # 将每个batch的correct数加起来
    72. correct = tf.reduce_sum(correct)
    73. # 将所有batch中的correct数加起来
    74. total_correct += int(correct)
    75. # total_number为测试的总样本数,也就是x_test的行数,shape[0]返回变量的行数
    76. total_number += x_test.shape[0]
    77. # 总的准确率等于total_correct/total_number
    78. acc = total_correct / total_number
    79. test_acc.append(acc)
    80. # print("Test_acc:", acc)
    81. # print("--------------------------")
    82. total_time = time.time() - now_time ##3##
    83. print("total_time", total_time) ##4##
    84. # 绘制 loss 曲线
    85. plt.title('Loss Function Curve') # 图片标题
    86. plt.xlabel('Epoch') # x轴变量名称
    87. plt.ylabel('Loss') # y轴变量名称
    88. plt.plot(train_loss_results, label="$Loss$") # 逐点画出trian_loss_results值并连线,连线图标是Loss
    89. plt.legend() # 画出曲线图标
    90. plt.show() # 画出图像
    91. # 绘制 Accuracy 曲线
    92. plt.title('Acc Curve') # 图片标题
    93. plt.xlabel('Epoch') # x轴变量名称
    94. plt.ylabel('Acc') # y轴变量名称
    95. plt.plot(test_acc, label="$Accuracy$") # 逐点画出test_acc值并连线,连线图标是Accuracy
    96. plt.legend()
    97. plt.show()
    98. # 请将loss曲线、ACC曲线、total_time记录到 class2\优化器对比.docx 对比各优化器收敛情况
    total_time 4.3680760860443115

    png

    png

    Adagrad

    imags

    1. v_w += tf.square(grads[0])
    2. v_b += tf.square(grads[1])
    3. w1.assign_sub(lr * grads[0] / tf.sqrt(v_w))
    4. b1.assign_sub(lr * grads[1] / tf.sqrt(v_b))
    1. # 导入数据,分别为输入特征和标签
    2. x_data = datasets.load_iris().data
    3. y_data = datasets.load_iris().target
    4. # 随机打乱数据(因为原始数据是顺序的,顺序不打乱会影响准确率)
    5. # seed: 随机数种子,是一个整数,当设置之后,每次生成的随机数都一样(为方便教学,以保每位同学结果一致)
    6. np.random.seed(116) # 使用相同的seed,保证输入特征和标签一一对应
    7. np.random.shuffle(x_data)
    8. np.random.seed(116)
    9. np.random.shuffle(y_data)
    10. tf.random.set_seed(116)
    11. # 将打乱后的数据集分割为训练集和测试集,训练集为前120行,测试集为后30行
    12. x_train = x_data[:-30]
    13. y_train = y_data[:-30]
    14. x_test = x_data[-30:]
    15. y_test = y_data[-30:]
    16. # 转换x的数据类型,否则后面矩阵相乘时会因数据类型不一致报错
    17. x_train = tf.cast(x_train, tf.float32)
    18. x_test = tf.cast(x_test, tf.float32)
    19. # from_tensor_slices函数使输入特征和标签值一一对应。(把数据集分批次,每个批次batch组数据)
    20. train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)
    21. test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)
    22. # 生成神经网络的参数,4个输入特征故,输入层为4个输入节点;因为3分类,故输出层为3个神经元
    23. # 用tf.Variable()标记参数可训练
    24. # 使用seed使每次生成的随机数相同(方便教学,使大家结果都一致,在现实使用时不写seed)
    25. w1 = tf.Variable(tf.random.truncated_normal([4, 3], stddev=0.1, seed=1))
    26. b1 = tf.Variable(tf.random.truncated_normal([3], stddev=0.1, seed=1))
    27. lr = 0.1 # 学习率为0.1
    28. train_loss_results = [] # 将每轮的loss记录在此列表中,为后续画loss曲线提供数据
    29. test_acc = [] # 将每轮的acc记录在此列表中,为后续画acc曲线提供数据
    30. epoch = 500 # 循环500轮
    31. loss_all = 0 # 每轮分4个step,loss_all记录四个step生成的4个loss的和
    32. ##########################################################################
    33. v_w, v_b = 0, 0
    34. ##########################################################################
    35. # 训练部分
    36. now_time = time.time() ##2##
    37. for epoch in range(epoch): # 数据集级别的循环,每个epoch循环一次数据集
    38. for step, (x_train, y_train) in enumerate(train_db): # batch级别的循环 ,每个step循环一个batch
    39. with tf.GradientTape() as tape: # with结构记录梯度信息
    40. y = tf.matmul(x_train, w1) + b1 # 神经网络乘加运算
    41. y = tf.nn.softmax(y) # 使输出y符合概率分布(此操作后与独热码同量级,可相减求loss)
    42. y_ = tf.one_hot(y_train, depth=3) # 将标签值转换为独热码格式,方便计算loss和accuracy
    43. loss = tf.reduce_mean(tf.square(y_ - y)) # 采用均方误差损失函数mse = mean(sum(y-out)^2)
    44. loss_all += loss.numpy() # 将每个step计算出的loss累加,为后续求loss平均值提供数据,这样计算的loss更准确
    45. # 计算loss对各个参数的梯度
    46. grads = tape.gradient(loss, [w1, b1])
    47. ##########################################################################
    48. # adagrad
    49. v_w += tf.square(grads[0])
    50. v_b += tf.square(grads[1])
    51. w1.assign_sub(lr * grads[0] / tf.sqrt(v_w))
    52. b1.assign_sub(lr * grads[1] / tf.sqrt(v_b))
    53. ##########################################################################
    54. # 每个epoch,打印loss信息
    55. # print("Epoch {}, loss: {}".format(epoch, loss_all / 4))
    56. train_loss_results.append(loss_all / 4) # 将4个step的loss求平均记录在此变量中
    57. loss_all = 0 # loss_all归零,为记录下一个epoch的loss做准备
    58. # 测试部分
    59. # total_correct为预测对的样本个数, total_number为测试的总样本数,将这两个变量都初始化为0
    60. total_correct, total_number = 0, 0
    61. for x_test, y_test in test_db:
    62. # 使用更新后的参数进行预测
    63. y = tf.matmul(x_test, w1) + b1
    64. y = tf.nn.softmax(y)
    65. pred = tf.argmax(y, axis=1) # 返回y中最大值的索引,即预测的分类
    66. # 将pred转换为y_test的数据类型
    67. pred = tf.cast(pred, dtype=y_test.dtype)
    68. # 若分类正确,则correct=1,否则为0,将bool型的结果转换为int型
    69. correct = tf.cast(tf.equal(pred, y_test), dtype=tf.int32)
    70. # 将每个batch的correct数加起来
    71. correct = tf.reduce_sum(correct)
    72. # 将所有batch中的correct数加起来
    73. total_correct += int(correct)
    74. # total_number为测试的总样本数,也就是x_test的行数,shape[0]返回变量的行数
    75. total_number += x_test.shape[0]
    76. # 总的准确率等于total_correct/total_number
    77. acc = total_correct / total_number
    78. test_acc.append(acc)
    79. # print("Test_acc:", acc)
    80. # print("--------------------------")
    81. total_time = time.time() - now_time ##3##
    82. print("total_time", total_time) ##4##
    83. # 绘制 loss 曲线
    84. plt.title('Loss Function Curve') # 图片标题
    85. plt.xlabel('Epoch') # x轴变量名称
    86. plt.ylabel('Loss') # y轴变量名称
    87. plt.plot(train_loss_results, label="$Loss$") # 逐点画出trian_loss_results值并连线,连线图标是Loss
    88. plt.legend() # 画出曲线图标
    89. plt.show() # 画出图像
    90. # 绘制 Accuracy 曲线
    91. plt.title('Acc Curve') # 图片标题
    92. plt.xlabel('Epoch') # x轴变量名称
    93. plt.ylabel('Acc') # y轴变量名称
    94. plt.plot(test_acc, label="$Accuracy$") # 逐点画出test_acc值并连线,连线图标是Accuracy
    95. plt.legend()
    96. plt.show()
    97. # 请将loss曲线、ACC曲线、total_time记录到 class2\优化器对比.docx 对比各优化器收敛情况
    total_time 4.468905210494995

    png

    png

    RMSProp

    imags

    二阶动量v使用指数滑动平均值计算,表征的是过去一段时间的平均值

    表征,是信息在头脑中的呈现方式,是信息记载或表达的方式,能把某些实体或某类信息表达清楚的形式化系统以及说明该系统如何行使其职能的若干规则。因此,我们可以这样理解,表征是指可以指代某种东西的符号或信号,即某一事物缺席时,它代表该事物。@百度百科-表征

    1. v_w ,v_b = 0,0
    2. beta = 0.9
    3. v_w = beta * v_w + (1 - beta) * tf.square(grads[0])
    4. v_b = beta * v_b + (1 - beta) * tf.square(grads[1])
    5. w1.assign_sub(lr * grads[0] / tf.sqrt(v_w))
    6. b1.assign_sub(lr * grads[1] / tf.sqrt(v_b))
    1. # 导入数据,分别为输入特征和标签
    2. x_data = datasets.load_iris().data
    3. y_data = datasets.load_iris().target
    4. # 随机打乱数据(因为原始数据是顺序的,顺序不打乱会影响准确率)
    5. # seed: 随机数种子,是一个整数,当设置之后,每次生成的随机数都一样(为方便教学,以保每位同学结果一致)
    6. np.random.seed(116) # 使用相同的seed,保证输入特征和标签一一对应
    7. np.random.shuffle(x_data)
    8. np.random.seed(116)
    9. np.random.shuffle(y_data)
    10. tf.random.set_seed(116)
    11. # 将打乱后的数据集分割为训练集和测试集,训练集为前120行,测试集为后30行
    12. x_train = x_data[:-30]
    13. y_train = y_data[:-30]
    14. x_test = x_data[-30:]
    15. y_test = y_data[-30:]
    16. # 转换x的数据类型,否则后面矩阵相乘时会因数据类型不一致报错
    17. x_train = tf.cast(x_train, tf.float32)
    18. x_test = tf.cast(x_test, tf.float32)
    19. # from_tensor_slices函数使输入特征和标签值一一对应。(把数据集分批次,每个批次batch组数据)
    20. train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)
    21. test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)
    22. # 生成神经网络的参数,4个输入特征故,输入层为4个输入节点;因为3分类,故输出层为3个神经元
    23. # 用tf.Variable()标记参数可训练
    24. # 使用seed使每次生成的随机数相同(方便教学,使大家结果都一致,在现实使用时不写seed)
    25. w1 = tf.Variable(tf.random.truncated_normal([4, 3], stddev=0.1, seed=1))
    26. b1 = tf.Variable(tf.random.truncated_normal([3], stddev=0.1, seed=1))
    27. lr = 0.1 # 学习率为0.1
    28. train_loss_results = [] # 将每轮的loss记录在此列表中,为后续画loss曲线提供数据
    29. test_acc = [] # 将每轮的acc记录在此列表中,为后续画acc曲线提供数据
    30. epoch = 500 # 循环500轮
    31. loss_all = 0 # 每轮分4个step,loss_all记录四个step生成的4个loss的和
    32. ##########################################################################
    33. v_w, v_b = 0, 0
    34. beta = 0.9
    35. ##########################################################################
    36. # 训练部分
    37. now_time = time.time() ##2##
    38. for epoch in range(epoch): # 数据集级别的循环,每个epoch循环一次数据集
    39. for step, (x_train, y_train) in enumerate(train_db): # batch级别的循环 ,每个step循环一个batch
    40. with tf.GradientTape() as tape: # with结构记录梯度信息
    41. y = tf.matmul(x_train, w1) + b1 # 神经网络乘加运算
    42. y = tf.nn.softmax(y) # 使输出y符合概率分布(此操作后与独热码同量级,可相减求loss)
    43. y_ = tf.one_hot(y_train, depth=3) # 将标签值转换为独热码格式,方便计算loss和accuracy
    44. loss = tf.reduce_mean(tf.square(y_ - y)) # 采用均方误差损失函数mse = mean(sum(y-out)^2)
    45. loss_all += loss.numpy() # 将每个step计算出的loss累加,为后续求loss平均值提供数据,这样计算的loss更准确
    46. # 计算loss对各个参数的梯度
    47. grads = tape.gradient(loss, [w1, b1])
    48. ##########################################################################
    49. # rmsprop
    50. v_w = beta * v_w + (1 - beta) * tf.square(grads[0])
    51. v_b = beta * v_b + (1 - beta) * tf.square(grads[1])
    52. w1.assign_sub(lr * grads[0] / tf.sqrt(v_w))
    53. b1.assign_sub(lr * grads[1] / tf.sqrt(v_b))
    54. ##########################################################################
    55. # 每个epoch,打印loss信息
    56. # print("Epoch {}, loss: {}".format(epoch, loss_all / 4))
    57. train_loss_results.append(loss_all / 4) # 将4个step的loss求平均记录在此变量中
    58. loss_all = 0 # loss_all归零,为记录下一个epoch的loss做准备
    59. # 测试部分
    60. # total_correct为预测对的样本个数, total_number为测试的总样本数,将这两个变量都初始化为0
    61. total_correct, total_number = 0, 0
    62. for x_test, y_test in test_db:
    63. # 使用更新后的参数进行预测
    64. y = tf.matmul(x_test, w1) + b1
    65. y = tf.nn.softmax(y)
    66. pred = tf.argmax(y, axis=1) # 返回y中最大值的索引,即预测的分类
    67. # 将pred转换为y_test的数据类型
    68. pred = tf.cast(pred, dtype=y_test.dtype)
    69. # 若分类正确,则correct=1,否则为0,将bool型的结果转换为int型
    70. correct = tf.cast(tf.equal(pred, y_test), dtype=tf.int32)
    71. # 将每个batch的correct数加起来
    72. correct = tf.reduce_sum(correct)
    73. # 将所有batch中的correct数加起来
    74. total_correct += int(correct)
    75. # total_number为测试的总样本数,也就是x_test的行数,shape[0]返回变量的行数
    76. total_number += x_test.shape[0]
    77. # 总的准确率等于total_correct/total_number
    78. acc = total_correct / total_number
    79. test_acc.append(acc)
    80. # print("Test_acc:", acc)
    81. # print("--------------------------")
    82. total_time = time.time() - now_time ##3##
    83. print("total_time", total_time) ##4##
    84. # 绘制 loss 曲线
    85. plt.title('Loss Function Curve') # 图片标题
    86. plt.xlabel('Epoch') # x轴变量名称
    87. plt.ylabel('Loss') # y轴变量名称
    88. plt.plot(train_loss_results, label="$Loss$") # 逐点画出trian_loss_results值并连线,连线图标是Loss
    89. plt.legend() # 画出曲线图标
    90. plt.show() # 画出图像
    91. # 绘制 Accuracy 曲线
    92. plt.title('Acc Curve') # 图片标题
    93. plt.xlabel('Epoch') # x轴变量名称
    94. plt.ylabel('Acc') # y轴变量名称
    95. plt.plot(test_acc, label="$Accuracy$") # 逐点画出test_acc值并连线,连线图标是Accuracy
    96. plt.legend()
    97. plt.show()
    98. # 请将loss曲线、ACC曲线、total_time记录到 class2\优化器对比.docx 对比各优化器收敛情况
    total_time 4.61161208152771

    png

    png

    Adam

    imags

    1. m_w, m_b = 0, 0
    2. v_w, v_b = 0, 0
    3. beta1, beta2 = 0.9, 0.999
    4. delta_w, delta_b = 0, 0
    5. global_step = 0
    6. m_w = beta1 * m_w + (1 - beta1) * grads[0]
    7. m_b = beta1 * m_b + (1 - beta1) * grads[1]
    8. v_w = beta2 * v_w + (1 - beta2) * tf.square(grads[0])
    9. v_b = beta2 * v_b + (1 - beta2) * tf.square(grads[1])
    10. m_w_correction = m_w / (1 - tf.pow(beta1, int(global_step)))
    11. m_b_correction = m_b / (1 - tf.pow(beta1, int(global_step)))
    12. v_w_correction = v_w / (1 - tf.pow(beta2, int(global_step)))
    13. v_b_correction = v_b / (1 - tf.pow(beta2, int(global_step)))
    14. w1.assign_sub(lr * m_w_correction / tf.sqrt(v_w_correction))
    15. b1.assign_sub(lr * m_b_correction / tf.sqrt(v_b_correction))
    1. # 导入数据,分别为输入特征和标签
    2. x_data = datasets.load_iris().data
    3. y_data = datasets.load_iris().target
    4. # 随机打乱数据(因为原始数据是顺序的,顺序不打乱会影响准确率)
    5. # seed: 随机数种子,是一个整数,当设置之后,每次生成的随机数都一样(为方便教学,以保每位同学结果一致)
    6. np.random.seed(116) # 使用相同的seed,保证输入特征和标签一一对应
    7. np.random.shuffle(x_data)
    8. np.random.seed(116)
    9. np.random.shuffle(y_data)
    10. tf.random.set_seed(116)
    11. # 将打乱后的数据集分割为训练集和测试集,训练集为前120行,测试集为后30行
    12. x_train = x_data[:-30]
    13. y_train = y_data[:-30]
    14. x_test = x_data[-30:]
    15. y_test = y_data[-30:]
    16. # 转换x的数据类型,否则后面矩阵相乘时会因数据类型不一致报错
    17. x_train = tf.cast(x_train, tf.float32)
    18. x_test = tf.cast(x_test, tf.float32)
    19. # from_tensor_slices函数使输入特征和标签值一一对应。(把数据集分批次,每个批次batch组数据)
    20. train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)
    21. test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)
    22. # 生成神经网络的参数,4个输入特征故,输入层为4个输入节点;因为3分类,故输出层为3个神经元
    23. # 用tf.Variable()标记参数可训练
    24. # 使用seed使每次生成的随机数相同(方便教学,使大家结果都一致,在现实使用时不写seed)
    25. w1 = tf.Variable(tf.random.truncated_normal([4, 3], stddev=0.1, seed=1))
    26. b1 = tf.Variable(tf.random.truncated_normal([3], stddev=0.1, seed=1))
    27. lr = 0.1 # 学习率为0.1
    28. train_loss_results = [] # 将每轮的loss记录在此列表中,为后续画loss曲线提供数据
    29. test_acc = [] # 将每轮的acc记录在此列表中,为后续画acc曲线提供数据
    30. epoch = 500 # 循环500轮
    31. loss_all = 0 # 每轮分4个step,loss_all记录四个step生成的4个loss的和
    32. ##########################################################################
    33. m_w, m_b = 0, 0
    34. v_w, v_b = 0, 0
    35. beta1, beta2 = 0.9, 0.999
    36. delta_w, delta_b = 0, 0
    37. global_step = 0
    38. ##########################################################################
    39. # 训练部分
    40. now_time = time.time() ##2##
    41. for epoch in range(epoch): # 数据集级别的循环,每个epoch循环一次数据集
    42. for step, (x_train, y_train) in enumerate(train_db): # batch级别的循环 ,每个step循环一个batch
    43. ##########################################################################
    44. global_step += 1
    45. ##########################################################################
    46. with tf.GradientTape() as tape: # with结构记录梯度信息
    47. y = tf.matmul(x_train, w1) + b1 # 神经网络乘加运算
    48. y = tf.nn.softmax(y) # 使输出y符合概率分布(此操作后与独热码同量级,可相减求loss)
    49. y_ = tf.one_hot(y_train, depth=3) # 将标签值转换为独热码格式,方便计算loss和accuracy
    50. loss = tf.reduce_mean(tf.square(y_ - y)) # 采用均方误差损失函数mse = mean(sum(y-out)^2)
    51. loss_all += loss.numpy() # 将每个step计算出的loss累加,为后续求loss平均值提供数据,这样计算的loss更准确
    52. # 计算loss对各个参数的梯度
    53. grads = tape.gradient(loss, [w1, b1])
    54. ##########################################################################
    55. # adam
    56. m_w = beta1 * m_w + (1 - beta1) * grads[0]
    57. m_b = beta1 * m_b + (1 - beta1) * grads[1]
    58. v_w = beta2 * v_w + (1 - beta2) * tf.square(grads[0])
    59. v_b = beta2 * v_b + (1 - beta2) * tf.square(grads[1])
    60. m_w_correction = m_w / (1 - tf.pow(beta1, int(global_step)))
    61. m_b_correction = m_b / (1 - tf.pow(beta1, int(global_step)))
    62. v_w_correction = v_w / (1 - tf.pow(beta2, int(global_step)))
    63. v_b_correction = v_b / (1 - tf.pow(beta2, int(global_step)))
    64. w1.assign_sub(lr * m_w_correction / tf.sqrt(v_w_correction))
    65. b1.assign_sub(lr * m_b_correction / tf.sqrt(v_b_correction))
    66. ##########################################################################
    67. # 每个epoch,打印loss信息
    68. # print("Epoch {}, loss: {}".format(epoch, loss_all / 4))
    69. train_loss_results.append(loss_all / 4) # 将4个step的loss求平均记录在此变量中
    70. loss_all = 0 # loss_all归零,为记录下一个epoch的loss做准备
    71. # 测试部分
    72. # total_correct为预测对的样本个数, total_number为测试的总样本数,将这两个变量都初始化为0
    73. total_correct, total_number = 0, 0
    74. for x_test, y_test in test_db:
    75. # 使用更新后的参数进行预测
    76. y = tf.matmul(x_test, w1) + b1
    77. y = tf.nn.softmax(y)
    78. pred = tf.argmax(y, axis=1) # 返回y中最大值的索引,即预测的分类
    79. # 将pred转换为y_test的数据类型
    80. pred = tf.cast(pred, dtype=y_test.dtype)
    81. # 若分类正确,则correct=1,否则为0,将bool型的结果转换为int型
    82. correct = tf.cast(tf.equal(pred, y_test), dtype=tf.int32)
    83. # 将每个batch的correct数加起来
    84. correct = tf.reduce_sum(correct)
    85. # 将所有batch中的correct数加起来
    86. total_correct += int(correct)
    87. # total_number为测试的总样本数,也就是x_test的行数,shape[0]返回变量的行数
    88. total_number += x_test.shape[0]
    89. # 总的准确率等于total_correct/total_number
    90. acc = total_correct / total_number
    91. test_acc.append(acc)
    92. # print("Test_acc:", acc)
    93. # print("--------------------------")
    94. total_time = time.time() - now_time ##3##
    95. print("total_time", total_time) ##4##
    96. # 绘制 loss 曲线
    97. plt.title('Loss Function Curve') # 图片标题
    98. plt.xlabel('Epoch') # x轴变量名称
    99. plt.ylabel('Loss') # y轴变量名称
    100. plt.plot(train_loss_results, label="$Loss$") # 逐点画出trian_loss_results值并连线,连线图标是Loss
    101. plt.legend() # 画出曲线图标
    102. plt.show() # 画出图像
    103. # 绘制 Accuracy 曲线
    104. plt.title('Acc Curve') # 图片标题
    105. plt.xlabel('Epoch') # x轴变量名称
    106. plt.ylabel('Acc') # y轴变量名称
    107. plt.plot(test_acc, label="$Accuracy$") # 逐点画出test_acc值并连线,连线图标是Accuracy
    108. plt.legend()
    109. plt.show()
    110. # 请将loss曲线、ACC曲线、total_time记录到 class2\优化器对比.docx 对比各优化器收敛情况
    total_time 5.473050594329834

    png

    png

  • 相关阅读:
    三项第一!天翼云通过DevSecOps能力成熟度评估认证
    Splashtop 与 Canopy 携手共同增强对物联网设备的远程管理
    [PAT练级笔记] 69 Basic Level 1069 微博转发抽奖
    CVE-2021-42287&CVE-2021-42278 域内提权
    如何获取小程序运行能力,知道Finclip就够了?
    零基础入门篇①⑥ Python可变序列类型--字典
    【Excel单元格类型的解析校验】Java使用POI解析excel数据
    全排列[中等]
    [附源码]java毕业设计柠檬电动车租赁系统
    管网数字孪生应用3d场景展示「优势解析」
  • 原文地址:https://blog.csdn.net/ks2686/article/details/126453579