• 第一周 改善深层神经网络—初始化、正则化、梯度校验(1 & 2 & 3)


    课程2 - 改善深层神经网络

    第一周 改善深层神经网络—初始化、正则化、梯度校验(1 & 2 & 3)

    cd D:\software\OneDrive\桌面\吴恩达深度学习课后作业\第二部分 改善深层神经网络\第一周 初始化、正则化、梯度校验(1&2&3)
    
    • 1

    D:\software\OneDrive\桌面\吴恩达深度学习课后作业\第二部分 改善深层神经网络\第一周
    初始化、正则化、梯度校验(1&2&3)

    import numpy as np
    import matplotlib.pyplot as plt
    import sklearn
    import sklearn.datasets
    import scipy.io
    from testCases import *
    from init_utils import sigmoid, relu, compute_loss, forward_propagation, backward_propagation
    from init_utils import update_parameters, predict, load_dataset, plot_decision_boundary, predict_dec
    from reg_utils import sigmoid, relu, plot_decision_boundary, initialize_parameters, load_2D_dataset, predict_dec
    from reg_utils import compute_cost, predict, forward_propagation, backward_propagation, update_parameters
    from gc_utils import sigmoid, relu, dictionary_to_vector, vector_to_dictionary, gradients_to_vector
    
    %matplotlib inline
    plt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plots
    plt.rcParams['image.interpolation'] = 'nearest'
    plt.rcParams['image.cmap'] = 'gray'
    
    # load image dataset: blue/red dots in circles
    #1、初始化
    train_X, train_Y, test_X, test_Y = load_dataset()
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UA8mIJKc-1660710511490)(output_2_0.png)]

    一、初始化

    初始化权重

    好的初始化可以:
    1、加快梯度下降、模型收敛
    2、减小梯度下降收敛过程中训练(和泛化)出现误差的几率

    (一)神经网络模型

    零初始化 :在输入参数中设置initialization = “zeros”。
    随机初始化 :在输入参数中设置initialization = “random”,这会将权重初始化为较大的随机值。
    He初始化 :在输入参数中设置initialization = “he”,这会根据He等人(2015)的论文将权重初始化为按比例缩放的随机值。

    实现此model()调用的三种初始化方法

    def model(X, Y, learning_rate = 0.01, num_iterations = 15000, print_cost = True, initialization = "he"):
        
        m = X.shape[1]
        grads = {}
        costs = []
        layers_dims = [X.shape[0], 10, 5, 1]
        
        if initialization == "zeros":
            parameters  = initialize_parameters_zeros(layers_dims)
        elif initialization == "he":
            parameters = initialize_parameters_he(layers_dims)
        elif initialization == "random":
            parameters = initialize_parameters_random(layers_dims)
            
        for i in range(0,num_iterations):
            a3,cache = forward_propagation(X,parameters)
            cost = compute_loss(a3,Y)
            grads = backward_propagation(X,Y,cache)
            parameters = update_parameters(parameters,grads,learning_rate)
            
            if print_cost and i%1000==0:
                print("Cost after iteration {}: {}".format(i, cost))
                costs.append(cost)
        
        plt.plot(costs)
        plt.ylabel("cost")
        plt.xlabel("iterations (per hundreds)")
        plt.title("learning_rate="+str(learning_rate))
        plt.show()
        
        return parameters
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31

    (二)零初始化

    练习:实现以下函数以将所有参数初始化为零。 稍后你会看到此方法会报错,因为它无法“打破对称性”。
    总之先尝试一下,看看会发生什么。确保使用正确维度的np.zeros((…,…))。

    def initialize_parameters_zeros(layers_dims):
        
        parameters = {}
        L = len(layers_dims)
        
        for i in range(1,L):
            parameters["W"+str(i)] = np.zeros((layers_dims[i],layers_dims[i-1]))
            parameters["b"+str(i)] = np.zeros((layers_dims[i],1))
            
        return parameters
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    parameters = initialize_parameters_zeros([3,2,1])
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))
    
    • 1
    • 2
    • 3
    • 4
    • 5

    W1 = [[0. 0. 0.]
    [0. 0. 0.]]
    b1 = [[0.]
    [0.]]
    W2 = [[0. 0.]]
    b2 = [[0.]]

    parameters = model(train_X, train_Y, initialization = "zeros")
    print ("On the train set:")
    predictions_train = predict(train_X, train_Y, parameters)
    print ("On the test set:")
    predictions_test = predict(test_X, test_Y, parameters)
    
    • 1
    • 2
    • 3
    • 4
    • 5

    Cost after iteration 0: 0.6931471805599453
    Cost after iteration 1000: 0.6931471805599453
    Cost after iteration 2000: 0.6931471805599453
    Cost after iteration 3000: 0.6931471805599453
    Cost after iteration 4000: 0.6931471805599453
    Cost after iteration 5000: 0.6931471805599453
    Cost after iteration 6000: 0.6931471805599453
    Cost after iteration 7000: 0.6931471805599453
    Cost after iteration 8000: 0.6931471805599453
    Cost after iteration 9000: 0.6931471805599453
    Cost after iteration 10000: 0.6931471805599455
    Cost after iteration 11000: 0.6931471805599453
    Cost after iteration 12000: 0.6931471805599453
    Cost after iteration 13000: 0.6931471805599453
    Cost after iteration 14000: 0.6931471805599453

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9yiUNTll-1660710511492)(output_9_1.png)]

    On the train set:
    Accuracy: 0.5
    On the test set:
    Accuracy: 0.5

    print ("predictions_train = " + str(predictions_train))
    print ("predictions_test = " + str(predictions_test))
    
    • 1
    • 2
    predictions_train = [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0]]
    predictions_test = [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    #将init_util.py中plot_decision_boundary的c=y改成c=np.squeeze(y)
    plt.title("Model with Zeros initialization")
    axes = plt.gca() #获取axes对象
    axes.set_xlim([-1.5, 1.5]) #设置x轴视图限制
    axes.set_ylim([-1.5, 1.5]) #设置y轴视图限制
    plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    在这里插入图片描述

    (三)随机初始化

    练习:实现以下函数,将权重初始化为较大的随机值(按*10缩放),并将偏差设为0。 将 np.random.randn(…,…) * 10用于权重,将np.zeros((…, …))用于偏差。
    我们使用固定的np.random.seed(…),以确保你的“随机”权重与我们的权重匹配。
    因此,如果运行几次代码后参数初始值始终相同,也请不要疑惑。

    def initialize_parameters_random(layers_dims):
        
        np.random.seed(3)
        L = len(layers_dims)
        parameters = {}
        
        for i in range(1,L):
            parameters["W"+str(i)] = np.random.randn(layers_dims[i],layers_dims[i-1])*10
            parameters["b"+str(i)] = np.zeros((layers_dims[i],1))
        
        return parameters
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    parameters = initialize_parameters_random([3, 2, 1])
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))
    
    • 1
    • 2
    • 3
    • 4
    • 5

    W1 = [[ 17.88628473 4.36509851 0.96497468]
    [-18.63492703 -2.77388203 -3.54758979]]
    b1 = [[0.]
    [0.]]
    W2 = [[-0.82741481 -6.27000677]]
    b2 = [[0.]]

    parameters = model(train_X, train_Y, initialization = "random")
    print ("On the train set:")
    predictions_train = predict(train_X, train_Y, parameters)
    print ("On the test set:")
    predictions_test = predict(test_X, test_Y, parameters)
    
    • 1
    • 2
    • 3
    • 4
    • 5

    D:\software\OneDrive\桌面\吴恩达深度学习课后作业\第二部分 改善深层神经网络\第一周
    初始化、正则化、梯度校验(1&2&3)\init_utils.py:50: RuntimeWarning: divide by zero
    encountered in log
    logprobs = np.multiply(-np.log(a3),Y) + np.multiply(-np.log(1 - a3), 1 - Y)
    D:\software\OneDrive\桌面\吴恩达深度学习课后作业\第二部分 改善深层神经网络\第一周 初始化、正则化、梯度校验(1&2&3)\init_utils.py:50: RuntimeWarning: invalid value
    encountered in multiply
    logprobs = np.multiply(-np.log(a3),Y) + np.multiply(-np.log(1 - a3), 1 - Y)

    Cost after iteration 0: inf
    Cost after iteration 1000: 0.6250982793959966
    Cost after iteration 2000: 0.5981216596703697
    Cost after iteration 3000: 0.5638417572298645
    Cost after iteration 4000: 0.5501703049199763
    Cost after iteration 5000: 0.5444632909664456
    Cost after iteration 6000: 0.5374513807000807
    Cost after iteration 7000: 0.4764042074074983
    Cost after iteration 8000: 0.39781492295092263
    Cost after iteration 9000: 0.3934764028765484
    Cost after iteration 10000: 0.3920295461882659
    Cost after iteration 11000: 0.38924598135108
    Cost after iteration 12000: 0.3861547485712325
    Cost after iteration 13000: 0.384984728909703
    Cost after iteration 14000: 0.3827828308349524
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1Q5MBAQp-1660710511494)(output_15_2.png)]

    On the train set:
    Accuracy: 0.83
    On the test set:
    Accuracy: 0.86

    print (predictions_train)
    print (predictions_test)
    
    • 1
    • 2
    [[1 0 1 1 0 0 1 1 1 1 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1 1 1 1 1 1 0 1 1 0 0 1
      1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 0 1 0 1 1 1 1 0
      0 0 0 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1 0
      1 0 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0
      0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 1 1 0 1 1 1
      1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 1 0 0 1
      0 1 1 0 1 1 0 1 1 0 1 1 1 0 1 1 1 1 0 1 0 0 1 1 0 1 1 1 0 0 0 1 1 0 1 1
      1 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 1 1 1
      1 1 1 1 0 0 0 1 1 1 1 0]]
    [[1 1 1 1 0 1 0 1 1 0 1 1 1 0 0 0 0 1 0 1 0 0 1 0 1 0 1 1 1 1 1 0 0 0 0 1
      0 1 1 0 0 1 1 1 1 1 0 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0
      1 1 1 1 1 0 1 0 0 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0]]
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    #将权重初始化为非常大的随机值效果不佳。
    plt.title("Model with large random initialization")
    axes = plt.gca()
    axes.set_xlim([-1.5,1.5])
    axes.set_ylim([-1.5,1.5])
    plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    在这里插入图片描述

    四、He初始化

    练习:实现以下函数,以He初始化来初始化参数。
    He初始化建议使用的ReLU激活层

    def initialize_parameters_he(layers_dims):
        np.random.seed(3)
        L = len(layers_dims)
        parameters = {}
        
        for i in range(1,L):
            parameters["W"+str(i)] = np.random.randn(layers_dims[i],layers_dims[i-1])*np.sqrt(2./layers_dims[i-1])
            parameters["b"+str(i)] = np.zeros((layers_dims[i],1))
        
        return parameters
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    parameters = initialize_parameters_he([2, 4, 1])
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))
    
    • 1
    • 2
    • 3
    • 4
    • 5

    W1 = [[ 1.78862847 0.43650985]
    [ 0.09649747 -1.8634927 ]
    [-0.2773882 -0.35475898]
    [-0.08274148 -0.62700068]]
    b1 = [[0.]
    [0.]
    [0.]
    [0.]]
    W2 = [[-0.03098412 -0.33744411 -0.92904268 0.62552248]]
    b2 = [[0.]]

    parameters = model(train_X, train_Y, initialization = "he")
    print ("On the train set:")
    predictions_train = predict(train_X, train_Y, parameters)
    print ("On the test set:")
    predictions_test = predict(test_X, test_Y, parameters)
    
    • 1
    • 2
    • 3
    • 4
    • 5

    Cost after iteration 0: 0.8830537463419761
    Cost after iteration 1000: 0.6879825919728063
    Cost after iteration 2000: 0.6751286264523371
    Cost after iteration 3000: 0.6526117768893807
    Cost after iteration 4000: 0.6082958970572937
    Cost after iteration 5000: 0.5304944491717495
    Cost after iteration 6000: 0.4138645817071793
    Cost after iteration 7000: 0.3117803464844441
    Cost after iteration 8000: 0.23696215330322556
    Cost after iteration 9000: 0.18597287209206828
    Cost after iteration 10000: 0.15015556280371808
    Cost after iteration 11000: 0.12325079292273548
    Cost after iteration 12000: 0.09917746546525937
    Cost after iteration 13000: 0.08457055954024274
    Cost after iteration 14000: 0.07357895962677366

    在这里插入图片描述

    On the train set:
    Accuracy: 0.9933333333333333
    On the test set:
    Accuracy: 0.96

    #使用He初始化的模型可以在少量迭代中很好地分离蓝色点和红色点。
    plt.title("Model with He initialization")
    axes = plt.gca()
    axes.set_xlim([-1.5,1.5])
    axes.set_ylim([-1.5,1.5])
    plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-HOhNfYXc-1660710511498)(output_22_0.png)]

    此作业中应记住的内容
    1、不同的初始化会导致不同的结果
    2、随机初始化用于打破对称性,并确保不同的隐藏单元可以学习不同的东西
    3、不要初始化为太大的值
    4、初始化对于带有ReLU激活的网络非常有效

    二、 正则化

    你将使用以下神经网络(已为你实现),可以如下使用此模型:

    regularization mode中,通过lambd将输入设置为非零值。我们使用lambd代替lambda,因为lambda是Python中的保留关键字。
    dropout mode中,将keep_prob设置为小于1的值
    首先,你将尝试不进行任何正则化的模型。然后,你将实现:

    L2 正则化 函数:compute_cost_with_regularization()和backward_propagation_with_regularization()
    Dropout 函数:forward_propagation_with_dropout()和backward_propagation_with_dropout()
    在每个部分中,你都将使用正确的输入来运行此模型,以便它调用已实现的函数。查看以下代码以熟悉该模型。

    (一)非正则化模型

    train_X, train_Y, test_X, test_Y = load_2D_dataset()
    
    • 1

    在这里插入图片描述

    def model(X, Y, learning_rate = 0.3, num_iterations = 30000, print_cost = True, lambd = 0, keep_prob = 1):
        
        grads = {}
        costs = []
        m = X.shape[1]
        layers_dims = [X.shape[0], 20, 3, 1]
        
        parameters = initialize_parameters(layers_dims)
        
        for i in range(num_iterations):
            
            #1、正向
            if keep_prob==1:
                a3,cache = forward_propagation(X,parameters)
            elif keep_prob<1:
                a3,cache =forward_propagation_with_dropout(X,parameters,keep_prob)
            
            #2、成本函数
            if lambd == 0:
                cost = compute_cost(a3,Y)
            else:
                cost =compute_cost_with_regularization(a3,Y,parameters,lambd)
            
            assert(keep_prob==1 or lambd == 0)
            
            if lambd == 0 and keep_prob==1:
                grads = backward_propagation(X,Y,cache)
            elif keep_prob<1:
                grads = backward_propagation_with_dropout(X,Y,cache,keep_prob)
            elif lambd!=0:
                grads = backward_propagation_with_regularization(X,Y,cache,lambd)
            
            parameters = update_parameters(parameters,grads,learning_rate)
            
            if print_cost and i%10000==0:
                print("Cost after iteration {}:{}".format(i,cost))
            if print_cost and i%1000==0:
                costs.append(cost)
        
        # plot the cost
        plt.plot(costs)
        plt.ylabel('cost')
        plt.xlabel('iterations (x1,000)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()
        
        return parameters
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    parameters = model(train_X,train_Y)
    print ("On the training set:")
    predictions_train = predict(train_X, train_Y, parameters)
    print ("On the test set:")
    predictions_test = predict(test_X, test_Y, parameters)
    
    • 1
    • 2
    • 3
    • 4
    • 5

    Cost after iteration 0:0.6557412523481002
    Cost after iteration 10000:0.16329987525724196
    Cost after iteration 20000:0.13851642423253843

    在这里插入图片描述

    On the training set:
    Accuracy: 0.9478672985781991
    On the test set:
    Accuracy: 0.915

    plt.title("Model without regularization")
    axes = plt.gca()
    axes.set_xlim([-0.75,0.40])
    axes.set_ylim([-0.75,0.65])
    plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
    
    • 1
    • 2
    • 3
    • 4
    • 5

    在这里插入图片描述

    (二)L2正则化

    练习:实现compute_cost_with_regularization(),以计算公式(2)的损失。

    np.square:计算平方

    L2正则化的原理:

    L2正则化基于以下假设:权重较小的模型比权重较大的模型更简单。因此,通过对损失函数中权重的平方值进行惩罚,可以将所有权重驱动为较小的值。比重太大会使损失过高!这将导致模型更平滑,输出随着输入的变化而变化得更慢。

    你应该记住 L2正则化的影响:

    损失计算:
    - 正则化条件会添加到损失函数中
    反向传播函数:
    - 有关权重矩阵的渐变中还有其他术语
    权重最终变小(“权重衰减”):
    - 权重被推到较小的值。

    def compute_cost_with_regularization(A3, Y, parameters, lambd):
        
        m = Y.shape[1]
        W1 = parameters["W1"]
        W2 = parameters["W2"]
        W3 = parameters["W3"]
        
        cross_entropy_cost = compute_cost(A3,Y)
        #平方和
        L2_regularization_cost  = (1./m*lambd/2)*(np.sum(np.square(W1)) + np.sum(np.square(W2)) + np.sum(np.square(W3)))
        
        cost = cross_entropy_cost + L2_regularization_cost
        return cost
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    A3, Y_assess, parameters = compute_cost_with_regularization_test_case()
    cost = compute_cost_with_regularization(A3, Y_assess, parameters, lambd = 0.1)
    print("cost="+str(cost))
    
    • 1
    • 2
    • 3

    cost=1.7864859451590758

    练习:实现正则化后的反向传播。更改仅涉及dW1,dW2和dW3。

    def backward_propagation_with_regularization(X, Y, cache, lambd):
        m = X.shape[1]
        (Z1,A1,W1,b1,Z2,A2,W2,b2,Z3,A3,W3,b3) = cache
        
        dZ3 = A3-Y
        
        dW3 = 1/m*np.dot(dZ3,A2.T) + lambd/m*W3
        db3 = 1/m*np.sum(dZ3,axis=1,keepdims=True)
        
        dA2 = np.dot(W3.T,dZ3)
        #relu的导数
        dZ2 = np.multiply(dA2, np.int64(A2 > 0))
        dW2 = 1/m*np.dot(dZ2,A1.T) + lambd/m*W2
        db2 = 1/m*np.sum(dZ2,axis=1,keepdims=True)
        
        dA1 = np.dot(W2.T,dZ2)
        dZ1 = np.multiply(dA1, np.int64(A1 > 0))
        dW1 = 1/m*np.dot(dZ1,X.T)+lambd/m*W1
        db1 = 1/m*np.sum(dZ1,axis=1,keepdims=True)
        
        gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,"dA2": dA2,
                     "dZ2": dZ2, "dW2": dW2, "db2": db2, "dA1": dA1, 
                     "dZ1": dZ1, "dW1": dW1, "db1": db1}
    
        return gradients
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    X_assess, Y_assess, cache = backward_propagation_with_regularization_test_case()
    grads = backward_propagation_with_regularization(X_assess, Y_assess, cache, lambd=0.7)
    print ("dW1 = "+ str(grads["dW1"]))
    print ("dW2 = "+ str(grads["dW2"]))
    print ("dW3 = "+ str(grads["dW3"]))
    
    • 1
    • 2
    • 3
    • 4
    • 5

    dW1 = [[-0.25604646 0.12298827 -0.28297129]
    [-0.17706303 0.34536094 -0.4410571 ]]
    dW2 = [[ 0.79276486 0.85133918]
    [-0.0957219 -0.01720463]
    [-0.13100772 -0.03750433]]
    dW3 = [[-1.77691347 -0.11832879 -0.09397446]]

    parameters = model(train_X, train_Y, lambd = 0.7)
    print ("On the train set:")
    predictions_train = predict(train_X, train_Y, parameters)
    print ("On the test set:")
    predictions_test = predict(test_X, test_Y, parameters)
    
    • 1
    • 2
    • 3
    • 4
    • 5

    Cost after iteration 0:0.6974484493131264
    Cost after iteration 10000:0.2684918873282239
    Cost after iteration 20000:0.2680916337127301

    在这里插入图片描述

    On the train set:
    Accuracy: 0.9383886255924171
    On the test set:
    Accuracy: 0.93
    
    • 1
    • 2
    • 3
    • 4
    plt.title("Model with L2-regularization")
    axes = plt.gca()
    axes.set_xlim([-0.75,0.40])
    axes.set_ylim([-0.75,0.65])
    plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
    
    • 1
    • 2
    • 3
    • 4
    • 5

    在这里插入图片描述

    (1)带有Dropout的正向传播

    def forward_propagation_with_dropout(X, parameters, keep_prob = 0.5):
        
        np.random.seed(1)
        
        W1 = parameters["W1"]
        b1 = parameters["b1"]
        W2 = parameters["W2"]
        b2 = parameters["b2"]
        W3 = parameters["W3"]
        b3 = parameters["b3"]
        
        Z1 = np.dot(W1, X) + b1
        A1 = relu(Z1)
        
        D1 = np.random.rand(A1.shape[0],A1.shape[1])     # Step 1: 与A1同规模
        D1 = D1 < keep_prob                             # Step 2: 判断1或0
        A1 = A1 * D1                                    # Step 3: 去除
        A1 = A1 / keep_prob                             #Step4 :/规模,保证平均值
        
        Z2 = np.dot(W2, A1) + b2
        A2 = relu(Z2)
    
        D2 = np.random.rand(A2.shape[0],A2.shape[1])                
        D2 = D2 < keep_prob                                   
        A2 = A2 * D2                                 
        A2 = A2 / keep_prob                                 
    
        Z3 = np.dot(W3, A2) + b3
        A3 = sigmoid(Z3)
    
        cache = (Z1, D1, A1, W1, b1, Z2, D2, A2, W2, b2, Z3, A3, W3, b3)
    
        return A3, cache
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    X_assess, parameters = forward_propagation_with_dropout_test_case()
    
    A3, cache = forward_propagation_with_dropout(X_assess, parameters, keep_prob = 0.7)
    print ("A3 = " + str(A3))
    
    • 1
    • 2
    • 3
    • 4

    A3 = [[0.36974721 0.00305176 0.04565099 0.49683389 0.36974721]]

    (2)带有dropout的反向传播

    带有dropout的反向传播实现上非常容易。你将必须执行2个步骤:
    1.你先前通过在A1上应用掩码D来关闭正向传播过程中的某些神经元。在反向传播中,你将必须将相同的掩码D重新应用于dA1来关闭相同的神经元。
    2.在正向传播过程中,你已将A1除以keep_prob。 因此,在反向传播中,必须再次将dA1除以keep_prob(计算的解释是,如果A被keep_prob缩放,则其派生dA的也由相同的keep_prob缩放)。

    关dropout你应该记住的事情:

    dropout是一种正则化技术。
    仅在训练期间使用dropout,在测试期间不要使用。
    在正向和反向传播期间均应用dropout。
    在训练期间,将每个dropout层除以keep_prob,以保持激活的期望值相同。例如,如果keep_prob为0.5,则平均而言,我们将关闭一半的节点,因此输出将按0.5缩放,因为只有剩余的一半对解决方案有所贡献。除以0.5等于乘以2,因此输出现在具有相同的期望值。你可以检查此方法是否有效,即使keep_prob的值不是0.5。
    
    • 1
    • 2
    • 3
    • 4
    def backward_propagation_with_dropout(X, Y, cache, keep_prob):
        
        m = X.shape[1]
        (Z1, D1, A1, W1, b1, Z2, D2, A2, W2, b2, Z3, A3, W3, b3) = cache
        
        dZ3 = A3 - Y
        dW3 = 1./m * np.dot(dZ3, A2.T)
        db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)
        dA2 = np.dot(W3.T, dZ3)
    
        dA2 = dA2 * D2         #1:乘以D  
        dA2 = dA2 / keep_prob        #2:除以规模 和A保持一致
    
        dZ2 = np.multiply(dA2, np.int64(A2 > 0))
        dW2 = 1./m * np.dot(dZ2, A1.T)
        db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)
    
        dA1 = np.dot(W2.T, dZ2)
    
        dA1 = dA1 * D1            
        dA1 = dA1 / keep_prob         
    
        dZ1 = np.multiply(dA1, np.int64(A1 > 0))
        dW1 = 1./m * np.dot(dZ1, X.T)
        db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True)
    
        gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,"dA2": dA2,
                     "dZ2": dZ2, "dW2": dW2, "db2": db2, "dA1": dA1, 
                     "dZ1": dZ1, "dW1": dW1, "db1": db1}
        
        return gradients
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    X_assess, Y_assess, cache = backward_propagation_with_dropout_test_case()
    
    gradients = backward_propagation_with_dropout(X_assess, Y_assess, cache, keep_prob = 0.8)
    
    print ("dA1 = " + str(gradients["dA1"]))
    print ("dA2 = " + str(gradients["dA2"]))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    dA1 = [[ 0.36544439 0. -0.00188233 0. -0.17408748]
    [ 0.65515713 0. -0.00337459 0. -0. ]]
    dA2 = [[ 0.58180856 0. -0.00299679 0. -0.27715731]
    [ 0. 0.53159854 -0. 0.53159854 -0.34089673]
    [ 0. 0. -0.00292733 0. -0. ]]

    parameters = model(train_X, train_Y, keep_prob = 0.86, learning_rate = 0.3)
    
    print ("On the train set:")
    predictions_train = predict(train_X, train_Y, parameters)
    print ("On the test set:")
    predictions_test = predict(test_X, test_Y, parameters)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    Cost after iteration 0:0.6543912405149825

    D:\software\OneDrive\桌面\吴恩达深度学习课后作业\第二部分 改善深层神经网络\第一周 初始化、正则化、梯度校验(1&2&3)\reg_utils.py:121: RuntimeWarning: divide by zero
    
    • 1

    encountered in log
    logprobs = np.multiply(-np.log(a3),Y) + np.multiply(-np.log(1 - a3), 1 - Y)
    D:\software\OneDrive\桌面\吴恩达深度学习课后作业\第二部分 改善深层神经网络\第一周 初始化、正则化、梯度校验(1&2&3)\reg_utils.py:121: RuntimeWarning: invalid value
    encountered in multiply
    logprobs = np.multiply(-np.log(a3),Y) + np.multiply(-np.log(1 - a3), 1 - Y)

    Cost after iteration 10000:0.061016986574905605
    Cost after iteration 20000:0.060582435798513114
    
    • 1
    • 2

    在这里插入图片描述

    On the train set:
    Accuracy: 0.9289099526066351
    On the test set:
    Accuracy: 0.95

    plt.title("Model with dropout")
    axes = plt.gca()
    axes.set_xlim([-0.75,0.40])
    axes.set_ylim([-0.75,0.65])
    plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
    
    • 1
    • 2
    • 3
    • 4
    • 5

    在这里插入图片描述

    三、梯度检验

    (一)梯度检验原理

    (二)一维梯度检查

    你需要3个步骤来计算此公式:

    1. 使用np.linalg.norm(…)计算分子
    2. 计算分母,调用np.linalg.norm(…)两次
    3. 相除
    def forward_propagation(x, theta):
        J = theta * x
        return J
    
    • 1
    • 2
    • 3
    x, theta = 2, 4
    J = forward_propagation(x, theta)
    print ("J = " + str(J))
    
    • 1
    • 2
    • 3
    J = 8
    
    • 1
    def backward_propagation(x, theta):
         dtheta = x
         return dtheta
    
    • 1
    • 2
    • 3
    x, theta = 2, 4
    dtheta = backward_propagation(x, theta)
    print ("dtheta = " + str(dtheta))
    
    • 1
    • 2
    • 3
    dtheta = 2
    
    • 1
    def gradient_check(x, theta, epsilon = 1e-7):
        thetaplus = theta + epsilon                               # Step 1
        thetaminus = theta - epsilon                              # Step 2
        J_plus = forward_propagation(x, thetaplus)                                  # Step 3
        J_minus = forward_propagation(x, thetaminus)                                 # Step 4
        gradapprox = (J_plus - J_minus) / (2 * epsilon)                              # Step 5
        
        grad = backward_propagation(x, theta)
        
        numerator = np.linalg.norm(grad - gradapprox)                               # Step 1'  np.linalg.norm:欧氏距离
        denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)                            # Step 2'
        difference = numerator / denominator                              # Step 3'
        
        if difference < 1e-7:
            print ("The gradient is correct!")
        else:
            print ("The gradient is wrong!")
        
        return difference
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    x, theta = 2, 4
    difference = gradient_check(x, theta)
    print("difference = " + str(difference))
    
    • 1
    • 2
    • 3
    The gradient is correct!
    difference = 2.919335883291695e-10
    
    • 1
    • 2

    (三)N维梯度检验

    def forward_propagation_n(X, Y, parameters):
        
        m = X.shape[1]
        W1 = parameters["W1"]
        b1 = parameters["b1"]
        W2 = parameters["W2"]
        b2 = parameters["b2"]
        W3 = parameters["W3"]
        b3 = parameters["b3"]
        
        Z1 = np.dot(W1, X) + b1
        A1 = relu(Z1)
        Z2 = np.dot(W2, A1) + b2
        A2 = relu(Z2)
        Z3 = np.dot(W3, A2) + b3
        A3 = sigmoid(Z3)
        
        logprobs = np.multiply(-np.log(A3),Y) + np.multiply(-np.log(1 - A3), 1 - Y)
        cost = 1./m * np.sum(logprobs)
        
        cache = (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3)
        
        return cost, cache
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    def backward_propagation_n(X, Y, cache):
        
        m = X.shape[1]
        (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) = cache
        
        dZ3 = A3 - Y
        dW3 = 1./m * np.dot(dZ3, A2.T)
        db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)
        
        dA2 = np.dot(W3.T, dZ3)
        dZ2 = np.multiply(dA2, np.int64(A2 > 0))
        dW2 = 1./m * np.dot(dZ2, A1.T) * 2   #错
        db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)
        
        dA1 = np.dot(W2.T, dZ2)
        dZ1 = np.multiply(dA1, np.int64(A1 > 0))
        dW1 = 1./m * np.dot(dZ1, X.T)
        db1 = 4./m * np.sum(dZ1, axis=1, keepdims = True)  #故意写错为了测试difference
        
        gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,
                     "dA2": dA2, "dZ2": dZ2, "dW2": dW2, "db2": db2,
                     "dA1": dA1, "dZ1": dZ1, "dW1": dW1, "db1": db1}
        
        return gradients
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    def gradient_check_n(parameters, gradients, X, Y, epsilon = 1e-7):
        
        parameters_values, _ = dictionary_to_vector(parameters)
        grad = gradients_to_vector(gradients)
        num_parameters = parameters_values.shape[0]
        J_plus = np.zeros((num_parameters, 1))
        J_minus = np.zeros((num_parameters, 1))
        gradapprox = np.zeros((num_parameters, 1))
        
        for i in range(num_parameters):
            
            thetaplus = np.copy(parameters_values)                                      # Step 1 所有参数加epsilon
            thetaplus[i][0] = thetaplus[i][0] + epsilon                                # Step 2
            J_plus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaplus))                                  # Step 3
    
            thetaminus = np.copy(parameters_values)                                     # Step 1 所有参数减epsilon
            thetaminus[i][0] = thetaminus[i][0] - epsilon                            # Step 2        
            J_minus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaminus))                                  # Step 3
    
            gradapprox[i] = (J_plus[i] - J_minus[i]) / (2.* epsilon)
    
        numerator = np.linalg.norm(grad - gradapprox)                                           # Step 1'
        denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)                                         # Step 2'
        difference = numerator / denominator                                          # Step 3'
    
        if difference > 1e-7:
            print ("\033[93m" + "There is a mistake in the backward propagation! difference = " + str(difference) + "\033[0m")
        else:
            print ("\033[92m" + "Your backward propagation works perfectly fine! difference = " + str(difference) + "\033[0m")
        
        return difference
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    X, Y, parameters = gradient_check_n_test_case()
    
    cost, cache = forward_propagation_n(X, Y, parameters)
    gradients = backward_propagation_n(X, Y, cache)
    difference = gradient_check_n(parameters, gradients, X, Y)
    
    • 1
    • 2
    • 3
    • 4
    • 5

    There is a mistake in the backward propagation! difference = 0.2850931566540251

  • 相关阅读:
    最新PHP号卡商城V1.31 号卡推广管理系统源码/手机卡流量卡推广网站源码
    Java集合框架详解(四)——Map接口、HashMap类、LinkedHashMap类
    .NET 团队公布.NET 9开发目标 并发布.NET9的首个预览版
    四、DMSP/OLS夜光数据校正全过程
    精选Python面试100题,还愁找不到工作?
    【Go语言入门教程】Go语言简介
    Mac | Vmware Fusion | 分辨率自动还原问题解决
    openEuler 系统搭建高可用 Kubernetes 集群
    29.5.4 恢复数据
    容联七陌助力鱼跃医疗升级智能联络中心,让客户服务更“鱼跃”
  • 原文地址:https://blog.csdn.net/woailiqi12134/article/details/126383583