• 8月20日计算机视觉理论学习笔记——神经网络与BP算法



    前言

    本文为8月20日计算机视觉理论学习笔记——神经网络与BP算法,分为三个章节:

    • Delta 学习规则;
    • 梯度下降;
    • Numpy 实现反向传播。

    一、Delta 学习规则

    有监督学习算法,根据神经元的实际输出与期望输出差别来调整连接权:

    △ w i j = a ⋅ ( d i − y i ) x j ( t ) \bigtriangleup w_{ij} = a\cdot (d_i - y_i)x_j(t) wij=a(diyi)xj(t)

    其中: △ w i j \bigtriangleup w_{ij} wij 为权重增量, d i d_i di 是神经元 i i i 的期望输出, y i y_i yi 是神经元 i i i 的实际输出, a a a 是学习速度。

    • 目标函数:
      J ( w ) = 1 2 ∣ ∣ t − z ∣ ∣ 2 = 1 2 ∑ k = 1 c ( t k − z k ) 2 J(w) = \frac{1}{2} ||\textbf {t} - \textbf{z}||^2 = \frac{1}{2} \sum_{k=1}^{c}(t_k - z_k)^2 J(w)=21∣∣tz2=21k=1c(tkzk)2

    二、梯度下降

    w ( m + 1 ) = w ( m ) + △ w ( m ) = w ( m ) − η ∂ J ∂ w w(m+1) = w(m) + \bigtriangleup w(m) = w(m) - \eta \frac{\partial J}{\partial w} w(m+1)=w(m)+w(m)=w(m)ηwJ

    1、输出层权重改变量

    1

    J ( w ) = 1 2 ∑ k = 1 c ( t k − z k ) 2 ∂ J ∂ w k j = ∂ J ∂ n e t k ∂ n e t k ∂ w k j J(w) = \frac{1}{2} \sum_{k=1}^{c}(t_k - z_k)^2\\ \frac{\partial J}{\partial w_{kj}} = \frac{\partial J}{\partial net_k} \frac{\partial net_k}{\partial w_{kj}} J(w)=21k=1c(tkzk)2wkjJ=netkJwkjnetk
    其中,输出单元的总输入 n e t k = ∑ i = 1 n H w k i y i net_k = \sum_{i=1}^{n_H} w_{ki}y_i netk=i=1nHwkiyi ∂ n e t k ∂ w k j = y j \frac{\partial net_k}{\partial w_{kj}} = y_j wkjnetk=yj

    ∂ J ∂ n e t k = ∂ J ∂ z k ∂ z k ∂ n e t k = − ( t k − z k ) f ′ ( n e t k ) \frac{\partial J}{\partial net_k} = \frac{\partial J}{\partial z_k} \frac{\partial z_k}{\partial net_k} = -(t_k - z_k)f'(net_k) netkJ=zkJnetkzk=(tkzk)f(netk)

    δ k = ( t k − z k ) f ′ ( n e t k ) \delta_k = (t_k - z_k)f'(net_k) δk=(tkzk)f(netk),则:
    ∂ J ∂ n e t k = − ( t k − z k ) f ′ ( n e t k ) y j = − δ k y j \frac{\partial J}{\partial net_k} = -(t_k - z_k)f'(net_k)y_j = -\delta _ky_j netkJ=(tkzk)f(netk)yj=δkyj

    2

    2、隐藏层权重该变量

    3
    ∂ J ∂ w j i = ∂ J ∂ y j ∂ y j ∂ n e t j ∂ n e t j ∂ w j i \frac{\partial J}{\partial w_{ji}} = \frac{\partial J}{\partial y_j} \frac{\partial y_j}{\partial net_j} \frac{\partial net_j}{\partial w_{ji}} wjiJ=yjJnetjyjwjinetj

    又,
    n e t j = ∑ m = 1 d w j m x m net_j = \sum_{m=1}^{d} w_{jm} x_m netj=m=1dwjmxm
    则:
    ∂ y j ∂ n e t j = f ′ ( n e t j ) ∂ n e t j ∂ w j i = x i ∂ J ∂ y j = ∂ ∂ y j [ 1 2 ∑ k = 1 c ( t k − z k ) 2 ] = − ∑ k = 1 c ( t k − z k ) f ′ ( n e t k ) w k j ∂ J ∂ w j i = − [ ∑ k = 1 c ( t k − z k ) f ′ ( n e t k ) w k j ] f ′ ( n e t j ) x i \frac{\partial y_j}{\partial net_j} = f'(net_j)\\ \frac{\partial net_j}{\partial w_{ji}} = x_i\\ \frac{\partial J}{\partial y_j} = \frac{\partial }{\partial y_j} [\frac{1}{2} \sum_{k=1}^{c} (t_k - z_k)^2 ] = -\sum_{k=1}^{c} (t_k - z_k) f'(net_k) w_{kj}\\ \frac{\partial J}{\partial w_{ji}} = -[\sum_{k=1}^{c} (t_k - z_k) f'(net_k) w_{kj}] f'(net_j) x_i netjyj=f(netj)wjinetj=xiyjJ=yj[21k=1c(tkzk)2]=k=1c(tkzk)f(netk)wkjwjiJ=[k=1c(tkzk)f(netk)wkj]f(netj)xi

    δ j = f ′ ( n e t j ) ∑ k = 1 c δ k w k j \delta_j = f'(net_j) \sum_{k=1}^{c} \delta_k w_{kj} δj=f(netj)k=1cδkwkj,则:
    ∂ J ∂ w j i = − δ j x i \frac{\partial J}{\partial w_{ji}} = -\delta_j x_i wjiJ=δjxi

    4
    总结如下:

    • 权重增量 = -1 × 学习步长 × 目标函数对权重的偏导数;
    • 目标函数对权重的偏导数 = -1 × 残差 × 当前层的输入;
    • 残差 = 当前层激励函数的导数 × 上层反传来的误差;
    • 上层反传来的误差 = 上层残差的加权和。

    代码如下:

    tf.set_random_seed(777)
    learning_rate = 0.1
    
    x_data = [[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]]
    
    y_data = [[0],
              [1],
              [1],
              [0]]
    
    x_data = np.array(x_data, dtype=np.float32)
    y_data = np.array(y_data, dtype=np.float32)
    
    X = tf.placeholder(tf.float32, [None, 2])
    Y = tf.placeholder(tf.float32, [None, 1])
    
    W = tf.Variable(tf.random_normal([2, 1]), name='weight')
    b = tf.Variable(tf.random_normal([1]), name='bias')
    
    # 期望值
    hypothesis = tf.sigmoid(tf.matmul(X, W) + b)
    
    # 损失函数
    loss = -tf.reduce_mean(Y * tf.log(hypothesis) + (1 - Y) *
                           tf.log(1 - hypothesis))
    
    # 训练
    train = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(loss)
    # Accuracy computation
    # True if hypothesis > 0.5 else False
    pred = tf.cast(hypothesis>0.5, dtype=tf.float32)
    acc = tf.reduce_mean(tf.cast(tf.equal(pred, Y), dtype=tf.float32))
    
    # Launch graph
    with tf.Session() as sess:
        # 初始化变量
        sess.run(tf.global_variables_initializer())
        
        for step in range(10001):
            sess.run(train, feed_dict={X: x_data, Y: y_data})
            if step % 100 == 0:
                print(step, sess.run(loss, feed_dict={
                      X: x_data, Y: y_data}), sess.run(W))
                
        # 展示准确率
        h, c, a = sess.run([hypothesis, pred, acc],
                           feed_dict={X: x_data, Y: y_data})
        print("\nHypothesis: ", h, "\nCorrect: ", c, "\nAccuracy: ", a)
    
    >>> Hypothesis:  [[0.5]
    	 [0.5]
    	 [0.5]
    	 [0.5]] 
    	Correct:  [[0.]
    	 [0.]
    	 [0.]
    	 [0.]] 
    	Accuracy:  0.5
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61

    3、随机梯度下降(SGD)

    用部分样本迭代。


    三、Numpy 实现反向传播

    # 定义双曲函数和它们的导数
    def tanh(x):
        return np.tanh(x)
    
    def tanh_deriv(x):
        return 1. - np.tanh(x)**2
    
    def logistic(x):
        return 1 / (1 + np.exp(-x))
    
    def logistic_derivative(x):
        return logistic(x) * (1 - logistic(x))
    
    # 定义神经网络
    class NeuralNetwork:
        #初始化,layes表示的是一个list,eg[10,10,3]表示第一层10个神经元,第二层10个神经元,第三层3个神经元
        
        def __init__(self, layers, activation='tanh'):
            '''
            layers: 列表,至少有 2个值;
            activation: 'tanh' or 'logistic'
            '''
            
            if activation == 'logistic':
                self.activation = logistic
                self.activation_deriv = logistic_derivative
                
            elif activation == 'tanh':
                self.activation = tanh
                self.activation_deriv = tanh_deriv
                
            self.weights = []
            #循环从1开始,相当于以第二层为基准,进行权重的初始化
            for i in range(1, len(layers) - 1):
                # 对当前神经节点的前驱赋值
                self.weights.append((2*np.random.random((layers[i - 1] + 1, layers[i] + 1))-1)*0.25)
                
                # 对当前神经节点的后继赋值
                self.weights.append((2*np.random.random((layers[i] + 1, layers[i + 1]))-1)*0.25)
                
        #训练函数   ,X矩阵,每行是一个实例 ,y是每个实例对应的结果
            
        def fit(self, X, y, learning_rate=0.1, epochs=100):
            X = np.atleast_2d(X) # 确定 X 至少是二维数据
            temp = np.ones([X.shape[0], X.shape[1] + 1]) # 初始化矩阵
            temp[:, 0:-1] = X 
            
            X = temp
            y = np.array(y)
            
            for k in range(epochs):
                #随机选取一行,对神经网络进行更新
                i = np.random.randint(X.shpae[0])
                a = [X[i]]
                
                # 完成所有正向的更新
                for l in range(len(self.weights)):  
                    a.append(self.activation(np.dot(a[l], self.weights[l])))
                    
                error = y[i] - a[-1]  
                deltas = [error * self.activation_deriv(a[-1])]
                if  k%1000 == 0:
                    print(k,'...',error*error*100)
                    
                # 反向计算误差,更新权重
                for l in range(len(a)-2, 0, -1): # 从倒数第二层开始
                    deltas.append(deltas[-1].dot(self.weights[l].T)*self.activation_deriv(a[l]))
                    
                deltas.reverse()
                for i in range(len(self.weights)):
                    layer = np.atleast_2d(a[i])
                    delta = np.atleast_2d(deltas[i])
                    self.weights[i] = learning_rate * layer.T.dot(delta)
                    
        # 预测函数
        def predict(self, x):
            x = np.array(x)
            temp = np.ones(x.shape[0] + 1)
            temp[0:-1] = x
            a = temp
            for l in range(0, len(self.weights)):
                a = self.activation(np.self.weights[l])
            return a
    
    nn = NeuralNetwork([2,2,1], 'tanh')  
    X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])  
    y = np.array([0, 1, 1, 0])  
    nn.fit(X, y)  
    for i in [[0, 0], [0, 1], [1, 0], [1,1]]:  
        print(i,nn.predict(i))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90

  • 相关阅读:
    MCU主频 服务器台式机主频 处理器主频那些事
    基于Python实现用于实时监控和分析 MySQL 服务器的性能指标和相关信息工具源码
    WebGL 从0到1绘制一个立方体
    CCF CSP认证历年题目自练Day28
    css 高级选择器
    【YOLO系列】YOLOv1
    鸟哥私房菜 第三部分 学习shell与shell script 学习笔记
    【开源】基于Vue.js的在线课程教学系统的设计和实现
    用“价值”的视角来看安全:《构建新型网络形态下的网络空间安全体系》
    并发、并行和多线程关系
  • 原文地址:https://blog.csdn.net/Ashen_0nee/article/details/126438236