• 牛顿法与拟牛顿法摘记


    1 牛顿法

    \qquad 除了可以采用最速下降法求解“无约束最优化问题”,另一种常用的方法就是牛顿法。设 f ( x ) f(\boldsymbol{x}) f(x) 为二次可微的实函数,在第 k k k 次迭代点 x ( k ) \boldsymbol{x}^{(k)} x(k) 附近用二阶泰勒级数展开式 ϕ ( x ) \phi(\boldsymbol{x}) ϕ(x) 来近似,也就是

    f ( x ) ≈ ϕ ( x ) = f ( x ( k ) ) + ∇ f ( x ( k ) ) T ( x − x ( k ) ) + 1 2 ( x − x ( k ) ) T ∇ 2 f ( x ( k ) ) ( x − x ( k ) ) 2 \qquad\qquad f(\boldsymbol{x})\approx\phi(\boldsymbol{x})=f(\boldsymbol{x}^{(k)})+\nabla f(\boldsymbol{x}^{(k)})^T(\boldsymbol{x}-\boldsymbol{x}^{(k)})+\dfrac{1}{2}(\boldsymbol{x}-\boldsymbol{x}^{(k)})^T\nabla^2f(\boldsymbol{x}^{(k)})(\boldsymbol{x}-\boldsymbol{x}^{(k)})^2 f(x)ϕ(x)=f(x(k))+f(x(k))T(xx(k))+21(xx(k))T2f(x(k))(xx(k))2

    \qquad 根据一阶必要条件,令 ϕ ′ ( x ) = 0 \phi^{\prime}(\boldsymbol{x})=0 ϕ(x)=0,可得到

    ϕ ′ ( x ) = ∇ f ( x ( k ) ) + ∇ 2 f ( x ( k ) ) ( x − x ( k ) ) = 0 \qquad\qquad\phi^{\prime}(\boldsymbol{x})=\nabla f(\boldsymbol{x}^{(k)})+\nabla^2f(\boldsymbol{x}^{(k)})(\boldsymbol{x}-\boldsymbol{x}^{(k)})=0 ϕ(x)=f(x(k))+2f(x(k))(xx(k))=0

    \qquad ϕ ( x ) \phi(\boldsymbol{x}) ϕ(x) 的驻点作为下一个迭代点 x ( k + 1 ) \boldsymbol{x}^{(k+1)} x(k+1) 就得到牛顿法的迭代公式

    x ( k + 1 ) = x ( k ) − ∇ 2 f ( x ( k ) ) − 1 ∇ f ( x ( k ) ) \qquad\qquad\boldsymbol{x}^{(k+1)}=\boldsymbol{x}^{(k)}-\nabla^2f(\boldsymbol{x}^{(k)})^{-1}\nabla f(\boldsymbol{x}^{(k)}) x(k+1)=x(k)2f(x(k))1f(x(k))

    \qquad 通常,也将牛顿方向定义为: − ∇ 2 f ( x ( k ) ) − 1 ∇ f ( x ( k ) ) -\nabla^2f(\boldsymbol{x}^{(k)})^{-1}\nabla f(\boldsymbol{x}^{(k)}) 2f(x(k))1f(x(k))
    \qquad
    ∙ \bullet  实现代码

    import numpy as np
    import matplotlib.pyplot as plt
    import time
    def diffMat(f,x,h=0.001):	
        dx = np.array([h,0])
        dy = np.array([0,h])
        gx = (f(x+dx)-f(x-dx))/(2*h)
        gy = (f(x+dy)-f(x-dy))/(2*h)
        grad = np.asmatrix(np.array([gx,gy])).T       
        g2x = (f(x+dx) + f(x-dx) - 2*f(x))/(h*h)
        g2y = (f(x+dy) + f(x-dy) - 2*f(x))/(h*h)    
        gx2 = (f(x+dy+dx)-f(x+dy-dx))/(2*h)
        gx1 = (f(x-dy+dx)-f(x-dy-dx))/(2*h)    
        gy2 = (f(x+dx+dy)-f(x+dx-dy))/(2*h)
        gy1 = (f(x-dx+dy)-f(x-dx-dy))/(2*h)   
        gxy1 = (gx2 - gx1)/(2*h)
        gxy2 = (gy2 - gy1)/(2*h)
        hesse = np.matrix([[g2x,gxy1],[gxy2,g2y]])
        return grad,hesse
    def newton(f,x0):
        k=1
        xk = x0
        while True:
            print('#',k)
            k = k+1        
            grad, hesse = diffMat(f,xk)
            d = np.asarray(hesse.I*grad).flatten()
            print('grad:\n',np.round(grad,4),'\nhesse:\n',np.round(hesse,4))
            xk = xk - d
            print('d:',d/np.linalg.norm(d))
            print('x[{}]:{}\n'.format(k,np.round(xk,4)))
            if np.linalg.norm(grad)<0.00001:
                break
        return xk
    if __name__ == "__main__":       
        f = lambda x: (x[0]-1)**4 + (x[1]-0)**2 
        x0 = np.array([0,1],dtype='float')
        time0 = time.process_time()
        minval = newton(f,x0)
        time1 = time.process_time()
        print('x:',np.round(minval,4),'minval:',np.round(f(minval),4))
        print('time: %fs'%(time1-time0))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42

    运行结果:

    # 1
    grad:
     [[-4.]
     [ 2.]] 
    hesse:
     [[12.  0.]
     [ 0.  2.]]
    d: [-0.316228    0.94868322]
    x[2]:[0.3333 0.    ]
    # 2
    grad:
     [[-1.1852]
     [ 0.    ]] 
    hesse:
     [[5.3333 0.    ]
     [0.     2.    ]]
    d: [-1.00000000e+00  6.28995937e-10]
    x[3]:[0.5556 0.    ]
    (略)
    # 12
    grad:
     [[-0.]
     [ 0.]] 
    hesse:
     [[1.6e-03 0.0e+00]
     [0.0e+00 2.0e+00]]
    d: [-1.  0.]
    x[13]:[0.9923 0.    ]
    
    x: [0.9923 0.    ] minval: 0.0
    time: 0.000000s
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31

    阻尼牛顿法

    \qquad 牛顿法只是通过在 x ( k ) \boldsymbol{x}^{(k)} x(k) 附近用二阶泰勒级数来近似目标函数,牛顿方向 − ∇ 2 f ( x ( k ) ) − 1 ∇ f ( x ( k ) ) -\nabla^2f(\boldsymbol{x}^{(k)})^{-1}\nabla f(\boldsymbol{x}^{(k)}) 2f(x(k))1f(x(k)) 并不一定是下降方向,可能会使目标函数值增大。阻尼牛顿法,也称修正牛顿法,为了使目标函数值下降,增加了在牛顿方向上的一维搜索过程,其算法步骤为:
    \qquad 步骤 ( 1 ) : (1): (1): 给定初始点 x ( 1 ) ∈ R n \boldsymbol x^{(1)}\in R^n x(1)Rn,设置允许误差 ε > 0 \varepsilon>0 ε>0,令 k = 1 k=1 k=1
    \qquad 步骤 ( 2 ) : (2): (2): 计算搜索方向 d ( k ) = − ∇ 2 f ( x ( k ) ) − 1 ∇ f ( x ( k ) ) \boldsymbol d^{(k)}=-\nabla^2f(\boldsymbol{x}^{(k)})^{-1}\nabla f(\boldsymbol{x}^{(k)}) d(k)=2f(x(k))1f(x(k))
    \qquad 步骤 ( 3 ) : (3): (3): ∥ ∇ f ( x ( k ) ) ∥ ≤ ε \Vert\boldsymbol \nabla f(\boldsymbol{x}^{(k)})\Vert\le\varepsilon f(x(k))ε,则停止计算;
    \qquad\qquad\qquad 否则,沿着 d ( k ) \boldsymbol d^{(k)} d(k) 进行一维搜索求出 λ k \lambda_k λk,使 f ( x ( k ) + λ k d ( k ) ) = min ⁡ λ > 0 f ( x ( k ) + λ d ( k ) ) f(\boldsymbol x^{(k)}+\lambda_k\boldsymbol d^{(k)})=\displaystyle\min_{\lambda>0} f(\boldsymbol x^{(k)}+\lambda\boldsymbol d^{(k)}) f(x(k)+λkd(k))=λ>0minf(x(k)+λd(k))
    \qquad 步骤 ( 4 ) : (4): (4): x ( k + 1 ) = x ( k ) + λ k d ( k ) \boldsymbol x^{(k+1)}=\boldsymbol x^{(k)}+\lambda_k\boldsymbol d^{(k)} x(k+1)=x(k)+λkd(k),且 k = k + 1 k=k+1 k=k+1,转步骤 ( 2 ) (2) (2)
    \qquad
    ∙ \bullet  实现代码

    import numpy as np
    import matplotlib.pyplot as plt
    import time
    def diffMat(f,x,h=0.001):
    (略)
    def newton1d(f,xk,d,h=0.001):
        dif1 = (f(xk+h*d) - f(xk-h*d))/(2*h)
        dif2 = (f(xk+h*d) + f(xk-h*d) - 2*f(xk))/(h*h)
        deltax = dif1/dif2
        xk1 = xk - deltax * d
        return xk1
    def dampednewton(f,x0):
        k=0
        xk = x0
        while True:
            k = k+1
            print('#',k)                
            grad, hesse = diffMat(f,xk)
            d = -np.asarray(hesse.I*grad).flatten()
            if np.linalg.norm(grad)<0.00001:
                break
            print('grad:\n',np.round(grad,4),'\nhesse:\n',np.round(hesse,4))
            print('d:',d/np.linalg.norm(d))
            xk = newton1d(f,xk,d)
            print('x[{}]:{}\n'.format(k,np.round(xk,4)))
        return xk
    if __name__ == "__main__":       
        f = lambda x: (x[0]-1)**4 + (x[1]-0)**2 
        x0 = np.array([0,1],dtype='float')
        time0 = time.process_time()
        minval = dampednewton(f,x0)
        time1 = time.process_time()
        print('x:',np.round(minval,4),'minval:',np.round(f(minval),4))
        print('time: %fs'%(time1-time0))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34

    运行结果:

    # 1
    grad:
     [[-4.]
     [ 2.]] 
    hesse:
     [[12.  0.]
     [ 0.  2.]]
    d: [ 0.316228   -0.94868322]
    x[2]:[0.3333 0.    ]
    
    # 2
    grad:
     [[-1.1852]
     [ 0.    ]] 
    hesse:
     [[5.3333 0.    ]
     [0.     2.    ]]
    d: [ 1.00000000e+00 -1.33281943e-06]
    x[3]:[0.5556 0.    ]
    
    # 3
    grad:
     [[-0.3512]
     [ 0.    ]] 
    hesse:
     [[2.3704 0.    ]
     [0.     2.    ]]
    d: [ 1.00000e+00 -3.53622e-12]
    x[4]:[0.7037 0.    ]
    
    # 4
    grad:
     [[-0.1041]
     [ 0.    ]] 
    hesse:
     [[ 1.0535 -0.    ]
     [-0.      2.    ]]
    d: [1.00000000e+00 1.06224728e-13]
    x[5]:[0.8025 0.    ]
    (略)
    # 11
    grad:
     [[-0.]
     [ 0.]] 
    hesse:
     [[0.0036 0.    ]
     [0.     2.    ]]
    d: [ 1. -0.]
    x[12]:[0.9884 0.    ]
    
    # 12
    x: [0.9884 0.    ] minval: 0.0
    time: 0.000000s
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53

    \qquad

    2 拟牛顿法

    \qquad 牛顿法的优点是收敛很快,但是牛顿法的每次迭代过程中都需要计算二阶偏导数、求 Hesse \text{Hesse} Hesse 矩阵的逆矩阵,而目标函数的 Hesse \text{Hesse} Hesse 矩阵可能是非正定的。

    \qquad 拟牛顿法 (Quasi-Newton Method) \text{(Quasi-Newton\ Method)} (Quasi-Newton Method)是为了克服牛顿法的缺点而提出了“拟牛顿条件”,其基本思想是用“不包含二阶导数的矩阵”来近似牛顿法中 Hesse \text{Hesse} Hesse 矩阵的逆矩阵。
    \qquad

    2.1 拟牛顿条件

    \qquad 拟牛顿法是构造近似矩阵 H k H_k Hk 替代 “ Hesse \text{Hesse} Hesse 矩阵的逆矩阵”件,即: H k ≈ ∇ 2 f ( x ( k ) ) − 1 H_k\approx\nabla^2f(\boldsymbol x^{(k)})^{-1} Hk2f(x(k))1拟牛顿条件是构造 H k H_k Hk 替代 ∇ 2 f ( x ( k ) ) − 1 \nabla^2f(\boldsymbol x^{(k)})^{-1} 2f(x(k))1 执行牛顿法迭代运算时所需要满足的条件。

    \qquad 使用一维搜索时,牛顿法的迭代公式为:

    x ( k + 1 ) = x ( k ) − λ k ∇ 2 f ( x ( k ) ) − 1 ∇ f ( x ( k ) ) \qquad\qquad\boldsymbol x^{(k+1)}=\boldsymbol x^{(k)}-\lambda_k\nabla^2f(\boldsymbol x^{(k)})^{-1}\nabla f(\boldsymbol x^{(k)}) x(k+1)=x(k)λk2f(x(k))1f(x(k))

    \qquad 在第 k k k 次迭代后得到了点 x ( k + 1 ) \boldsymbol x^{(k+1)} x(k+1),将目标函数在点 x ( k + 1 ) \boldsymbol x^{(k+1)} x(k+1) 进行二阶泰勒级数展开:

    f ( x ) ≈ f ( x ( k + 1 ) ) + ∇ f ( x ( k + 1 ) ) T ( x − x ( k + 1 ) ) + 1 2 ( x − x ( k + 1 ) ) T ∇ 2 f ( x ( k + 1 ) ) ( x − x ( k + 1 ) ) \qquad\qquad\textcolor{brown}{f(\boldsymbol x)\approx f(\boldsymbol x^{(k+1)})+\nabla f(\boldsymbol x^{(k+1)})^T(\boldsymbol x-\boldsymbol x^{(k+1)})+\dfrac{1}{2}(\boldsymbol x-\boldsymbol x^{(k+1)})^T\nabla^2f(\boldsymbol x^{(k+1)})(\boldsymbol x-\boldsymbol x^{(k+1)})} f(x)f(x(k+1))+f(x(k+1))T(xx(k+1))+21(xx(k+1))T2f(x(k+1))(xx(k+1))

    \qquad 对上式两端取梯度,就得到:

    ∇ f ( x ) ≈ ∇ f ( x ( k + 1 ) ) + ∇ 2 f ( x ( k + 1 ) ) ( x − x ( k + 1 ) ) \qquad\qquad\textcolor{royalblue}{\nabla f(\boldsymbol x)\approx\nabla f(\boldsymbol x^{(k+1)})+\nabla^2f(\boldsymbol x^{(k+1)})(\boldsymbol x-\boldsymbol x^{(k+1)})} f(x)f(x(k+1))+2f(x(k+1))(xx(k+1))

    \qquad x = x ( k ) \boldsymbol x = \boldsymbol x^{(k)} x=x(k) 带入上式:

    ∇ f ( x ( k ) ) ≈ ∇ f ( x ( k + 1 ) ) + ∇ 2 f ( x ( k + 1 ) ) ( x ( k ) − x ( k + 1 ) ) \qquad\qquad\textcolor{crimson}{\nabla f(\boldsymbol x^{(k)})}\approx\textcolor{royalblue}{\nabla f(\boldsymbol x^{(k+1)})}+\nabla^2f(\boldsymbol x^{(k+1)})(\textcolor{crimson}{\boldsymbol x^{(k)}}-\textcolor{royalblue}{\boldsymbol x^{(k+1)}}) f(x(k))f(x(k+1))+2f(x(k+1))(x(k)x(k+1))

    \qquad
    \qquad 在上式中,令 p ( k ) = x ( k + 1 ) − x ( k ) \textcolor{royalblue}{\boldsymbol p^{(k)}=\boldsymbol x^{(k+1)}-\boldsymbol x^{(k)}} p(k)=x(k+1)x(k) q ( k ) = ∇ f ( x ( k + 1 ) ) − ∇ f ( x ( k ) ) \textcolor{crimson}{\boldsymbol q^{(k)}=\nabla f(\boldsymbol x^{(k+1)})-\nabla f(\boldsymbol x^{(k)})} q(k)=f(x(k+1))f(x(k)),那么:

    q ( k ) ≈ ∇ 2 f ( x ( k + 1 ) ) p ( k ) \qquad\qquad\textcolor{crimson}{\boldsymbol q^{(k)}}\approx\nabla^2f(\boldsymbol x^{(k+1)})\textcolor{royalblue}{\boldsymbol p^{(k)}} q(k)2f(x(k+1))p(k)

    \qquad
    \qquad 若矩阵 ∇ 2 f ( x ( k + 1 ) ) \nabla^2f(\boldsymbol x^{(k+1)}) 2f(x(k+1)) 可逆,则: p ( k ) ≈ ∇ 2 f ( x ( k + 1 ) ) − 1 q ( k ) \textcolor{royalblue}{\boldsymbol p^{(k)}}\approx\nabla^2f(\boldsymbol x^{(k+1)})^{-1}\textcolor{crimson}{\boldsymbol q^{(k)}} p(k)2f(x(k+1))1q(k)

    \qquad 若记 H k + 1 = ∇ 2 f ( x ( k + 1 ) ) − 1 H_{k+1}=\nabla^2f(\boldsymbol x^{(k+1)})^{-1} Hk+1=2f(x(k+1))1,那么

    p ( k ) = H k + 1 q ( k ) \qquad\qquad\textcolor{royalblue}{\boldsymbol p^{(k)}}= H_{k+1}\textcolor{crimson}{\boldsymbol q^{(k)}} p(k)=Hk+1q(k)拟牛顿条件

    \qquad 拟牛顿法就是为了构造拟牛顿法中的矩阵 H k + 1 H_{k+1} Hk+1 而提出的一类算法。
    \qquad

    2.2 秩1校正

    \qquad ∇ 2 f ( x ( k + 1 ) ) − 1 \nabla^2f(\boldsymbol x^{(k+1)})^{-1} 2f(x(k+1))1 n n n 阶对称正定矩阵时,满足拟牛顿条件的近似矩阵 H k + 1 H_{k+1} Hk+1 也是 n n n 阶对称正定矩阵。构造这样的近似矩阵,可以通过不断修正 H k H_{k} Hk 来构造下一个 H k + 1 H_{k+1} Hk+1,通常令 H 1 = I H_{1}=\bold I H1=I,那么 H k + 1 H_{k+1} Hk+1 的修正过程为

    H k + 1 = H k + Δ H k \qquad\qquad H_{k+1}=H_k+\textcolor{crimson}{\Delta H_k} Hk+1=Hk+ΔHk, 其中 Δ H k \textcolor{crimson}{\Delta H_k} ΔHk校正矩阵

    \qquad
    \qquad 确定 Δ H k \textcolor{crimson}{\Delta H_k} ΔHk 的方法之一是,令 Δ H k = α k z ( k ) ( z ( k ) ) T \textcolor{SeaGreen}{\Delta H_k= \alpha_k \boldsymbol z^{(k)} (\boldsymbol z^{(k)})^T} ΔHk=αkz(k)(z(k))T,其中 α k \alpha_k αk 是一个常数, z ( k ) ∈ R n \boldsymbol z^{(k)}\in R^n z(k)Rn n n n 维列向量。显然,矩阵 Δ H k \textcolor{crimson}{\Delta H_k} ΔHk 的秩为 1 1 1,且 z ( k ) \boldsymbol z^{(k)} z(k) 应该满足拟牛顿条件 p ( k ) = H k + 1 q ( k ) \textcolor{royalblue}{\boldsymbol p^{(k)}}=H_{k+1}\textcolor{crimson}{\boldsymbol q^{(k)}} p(k)=Hk+1q(k),那么

    p ( k ) = H k + 1 q ( k ) = H k q ( k ) + Δ H k q ( k ) = H k q ( k ) + α k z ( k ) ( z ( k ) ) T q ( k ) \qquad\qquad p(k)=Hk+1q(k)=Hkq(k)+ΔHkq(k)=Hkq(k)+αkz(k)(z(k))Tq(k)

    \qquad 由于 ( z ( k ) ) T q ( k ) \textcolor{SeaGreen}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}} (z(k))Tq(k) 是两个 n n n 维列向量的内积,其值为常数,可得到:  z ( k ) = p ( k ) − H k q ( k ) α k ( z ( k ) ) T q ( k ) \boldsymbol z^{(k)} =\dfrac{\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}}{\alpha_k \textcolor{crimson}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}}} z(k)=αk(z(k))Tq(k)p(k)Hkq(k)

    \qquad 两边都左乘 ( q ( k ) ) T (\boldsymbol q^{(k)})^T (q(k))T,可得到:

    ( q ( k ) ) T z ( k ) = ( z ( k ) ) T q ( k ) = ( q ( k ) ) T ( p ( k ) − H k q ( k ) ) α k ( z ( k ) ) T q ( k ) \qquad\qquad (\boldsymbol q^{(k)})^T\boldsymbol z^{(k)}=\textcolor{SeaGreen}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}} =\dfrac{(\boldsymbol q^{(k)})^T(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)})}{\alpha_k \textcolor{SeaGreen}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}}} (q(k))Tz(k)=(z(k))Tq(k)=αk(z(k))Tq(k)(q(k))T(p(k)Hkq(k))

    ⟹ ( q ( k ) ) T ( p ( k ) − H k q ( k ) ) = α k ( ( z ( k ) ) T q ( k ) ) 2 \qquad\qquad \Longrightarrow\quad\textcolor{blue}{(\boldsymbol q^{(k)})^T(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)})}=\alpha_k (\textcolor{crimson}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}})^2 (q(k))T(p(k)Hkq(k))=αk((z(k))Tq(k))2

    x T y = y T x ,   ∀ x , y ∈ R n ⟹ ( z ( k ) ) T q ( k ) = ( q ( k ) ) T z ( k ) \boldsymbol x^T\boldsymbol y=\boldsymbol y^T\boldsymbol x,\ \forall\boldsymbol x,\boldsymbol y\in R^n\quad\Longrightarrow\quad(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}=(\boldsymbol q^{(k)})^T\boldsymbol z^{(k)} xTy=yTx, x,yRn(z(k))Tq(k)=(q(k))Tz(k)

    \qquad 可得到:

    Δ H k = α k z ( k ) ( z ( k ) ) T = α k p ( k ) − H k q ( k ) α k ( z ( k ) ) T q ( k ) ( p ( k ) − H k q ( k ) α k ( z ( k ) ) T q ( k ) ) T , ( z ( k ) ) T q ( k ) 为常量 = α k ( p ( k ) − H k q ( k ) ) ( p ( k ) − H k q ( k ) ) T α k ( ( z ( k ) ) T q ( k ) ) 2 = α k ( p ( k ) − H k q ( k ) ) ( p ( k ) − H k q ( k ) ) T ( q ( k ) ) T ( p ( k ) − H k q ( k ) ) \qquad\qquad ΔHk=αkz(k)(z(k))T=αkαk(z(k))Tq(k)p(k)Hkq(k)(αk(z(k))Tq(k)p(k)Hkq(k))T,(z(k))Tq(k)为常量=αkαk((z(k))Tq(k))2(p(k)Hkq(k))(p(k)Hkq(k))T=αk(q(k))T(p(k)Hkq(k))(p(k)Hkq(k))(p(k)Hkq(k))T

    \qquad
    \qquad 因此, H k + 1 H_{k+1} Hk+1 的修正过程称为秩1校正公式,也就是:

    H k + 1 = H k + Δ H k = H k + α k ( p ( k ) − H k q ( k ) ) ( p ( k ) − H k q ( k ) ) T ( q ( k ) ) T ( p ( k ) − H k q ( k ) ) \qquad\qquad H_{k+1}=H_k+\textcolor{DeepPink}{\Delta H_k}=H_k+\alpha_k\dfrac{\left(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}\right)\left(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}\right)^T}{\textcolor{blue}{(\boldsymbol q^{(k)})^T(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)})}} Hk+1=Hk+ΔHk=Hk+αk(q(k))T(p(k)Hkq(k))(p(k)Hkq(k))(p(k)Hkq(k))T

    \qquad
    \qquad 基于秩1校正公式拟牛顿法计算步骤为:

    ( 1 ) \qquad(1) (1) 搜索方向为 d ( k ) = − H k ∇ f ( x ( k ) ) , H 1 = I \boldsymbol d^{(k)}=-H_k\nabla f(\boldsymbol x^{(k)}),\quad H_1=\bold I d(k)=Hkf(x(k)),H1=I

    ( 2 ) \qquad(2) (2) 沿 d ( k ) \boldsymbol d^{(k)} d(k) 进行一维搜索求出 x ( k + 1 ) \boldsymbol x^{(k+1)} x(k+1)

    ( 3 ) \qquad(3) (3) 求出 p ( k ) = x ( k + 1 ) − x ( k ) \boldsymbol p^{(k)}=\boldsymbol x^{(k+1)}-\boldsymbol x^{(k)} p(k)=x(k+1)x(k) 以及 q ( k ) = ∇ f ( x ( k + 1 ) ) − ∇ f ( x ( k ) ) \boldsymbol q^{(k)}=\nabla f(\boldsymbol x^{(k+1)})-\nabla f(\boldsymbol x^{(k)}) q(k)=f(x(k+1))f(x(k)),计算近似矩阵

    H k + 1 = H k + α k ( p ( k ) − H k q ( k ) ) ( p ( k ) − H k q ( k ) ) T ( q ( k ) ) T ( p ( k ) − H k q ( k ) ) \qquad\qquad H_{k+1}=H_k+\alpha_k\dfrac{\left(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}\right)\left(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}\right)^T}{\textcolor{blue}{(\boldsymbol q^{(k)})^T(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)})}} Hk+1=Hk+αk(q(k))T(p(k)Hkq(k))(p(k)Hkq(k))(p(k)Hkq(k))T

    \qquad\quad 重新回到第 ( 1 ) (1) (1) 步开始下一轮的计算,直到 ∥ ∇ f ( x ( k ) ) ∥ < ε \Vert \nabla f(\boldsymbol x^{(k)})\Vert<\varepsilon ∥∇f(x(k))<ε
    \qquad
    ∙ \bullet  实现代码

    import numpy as np
    import matplotlib.pyplot as plt
    import time
    def diff1(f,x,h=0.001):
        dx = np.array([h,0])
        dy = np.array([0,h])
        gx = (f(x+dx)-f(x-dx))/(2*h)
        gy = (f(x+dy)-f(x-dy))/(2*h)
        return np.array([gx,gy])
    def newton1d(f,xk,d,h=0.001):
        dif1 = (f(xk+h*d) - f(xk-h*d))/(2*h)
        dif2 = (f(xk+h*d) + f(xk-h*d) - 2*f(xk))/(h*h)
        deltax = dif1/dif2
        xk1 = xk - deltax * d
        return xk1
    def newton(f,x0):
        k=1
        xk = x0
        Hk = np.eye(x0.shape[0])
        while True:
            gk = diff1(f,xk)        
            dk = -np.dot(Hk,gk)
            xk1 = newton1d(f,xk,dk)	
            gk1 = diff1(f,xk1)
            if np.linalg.norm(gk)<0.0001:
                break
            pk = xk1 - xk
            qk = gk1 - gk
            t1 = (pk-Hk.dot(qk)).reshape(x0.shape[0],-1)
            deltHk = t1.dot(t1.T)/qk.dot(pk-Hk.dot(qk))
            Hk = Hk + deltHk
            xk = xk1
            print('#{} min:{}'.format(k,np.round(xk,4)))
            k = k + 1
        return xk
    if __name__ == "__main__":       
        f = lambda x: (x[0]-1)**4 + (x[1]-0)**2 
        x0 = np.array([0,1],dtype='float')
        time0 = time.process_time()
        minval = newton(f,x0)
        time1 = time.process_time()
        print('x:',np.round(minval,4),'minval:',np.round(f(minval),4))
        print('time: %fs'%(time1-time0))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43

    运行结果:

    #1 min:[0.4 0.8]
    #2 min:[ 0.4084 -0.0043]
    #3 min:[6.056e-01 1.000e-04]
    #4 min:[ 0.7371 -0.    ]
    #5 min:[0.8247 0.    ]
    #6 min:[ 0.8831 -0.    ]
    #7 min:[0.9221 0.    ]
    #8 min:[ 0.9481 -0.    ]
    #9 min:[0.9654 0.    ]
    #10 min:[ 0.9769 -0.    ]
    x: [ 0.9769 -0.    ] minval: 0.0
    time: 0.000000s
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    \quad

    2.3 DFP算法(变尺度法)

    \qquad 变尺度法,也就是 DFP \text{DFP} DFP法,其校正矩阵为:

    Δ H k = p ( k ) ( p ( k ) ) T ( p ( k ) ) T q ( k ) − H k q ( k ) ( q ( k ) ) T H k ( q ( k ) ) T H k q ( k ) \qquad\qquad\Delta H_k=\dfrac{\boldsymbol p^{(k)}(\boldsymbol p^{(k)})^T}{(\boldsymbol p^{(k)})^T\boldsymbol q^{(k)}}-\dfrac{H_k\boldsymbol q^{(k)}(\boldsymbol q^{(k)})^TH_k}{(\boldsymbol q^{(k)})^TH_k\boldsymbol q^{(k)}} ΔHk=(p(k))Tq(k)p(k)(p(k))T(q(k))THkq(k)Hkq(k)(q(k))THk

    \qquad 近似矩阵 H k + 1 H_{k+1} Hk+1 的修正过程为:

    H k + 1 = H k + Δ H k = H k + p ( k ) ( p ( k ) ) T ( p ( k ) ) T q ( k ) − H k q ( k ) ( q ( k ) ) T H k ( q ( k ) ) T H k q ( k ) \qquad\qquad H_{k+1}=H_k+\textcolor{DeepPink}{\Delta H_k}=H_k+\dfrac{\boldsymbol p^{(k)}(\boldsymbol p^{(k)})^T}{(\boldsymbol p^{(k)})^T\boldsymbol q^{(k)}}-\dfrac{H_k\boldsymbol q^{(k)}(\boldsymbol q^{(k)})^TH_k}{(\boldsymbol q^{(k)})^TH_k\boldsymbol q^{(k)}} Hk+1=Hk+ΔHk=Hk+(p(k))Tq(k)p(k)(p(k))T(q(k))THkq(k)Hkq(k)(q(k))THk

    DFP \qquad\text{DFP} DFP法具有二次终止性,对于正定二次函数的目标函数,经过有限次搜索可以达到极小点。

    \qquad
    DFP \qquad\text{DFP} DFP法的计算步骤为:

    ( 1 ) \qquad(1) (1) 搜索方向为 d ( k ) = − H k ∇ f ( x ( k ) ) , H 1 = I \boldsymbol d^{(k)}=-H_k\nabla f(\boldsymbol x^{(k)}),\quad H_1=\bold I d(k)=Hkf(x(k)),H1=I

    ( 2 ) \qquad(2) (2) 沿 d ( k ) \boldsymbol d^{(k)} d(k) 进行一维搜索求出 x ( k + 1 ) \boldsymbol x^{(k+1)} x(k+1)

    ( 3 ) \qquad(3) (3) 求出 p ( k ) = x ( k + 1 ) − x ( k ) \boldsymbol p^{(k)}=\boldsymbol x^{(k+1)}-\boldsymbol x^{(k)} p(k)=x(k+1)x(k) 以及 q ( k ) = ∇ f ( x ( k + 1 ) ) − ∇ f ( x ( k ) ) \boldsymbol q^{(k)}=\nabla f(\boldsymbol x^{(k+1)})-\nabla f(\boldsymbol x^{(k)}) q(k)=f(x(k+1))f(x(k)),计算近似矩阵

    H k + 1 = H k + p ( k ) ( p ( k ) ) T ( p ( k ) ) T q ( k ) − H k q ( k ) ( q ( k ) ) T H k ( q ( k ) ) T H k q ( k ) \qquad\qquad H_{k+1}=H_k+\dfrac{\boldsymbol p^{(k)}(\boldsymbol p^{(k)})^T}{(\boldsymbol p^{(k)})^T\boldsymbol q^{(k)}}-\dfrac{H_k\boldsymbol q^{(k)}(\boldsymbol q^{(k)})^TH_k}{(\boldsymbol q^{(k)})^TH_k\boldsymbol q^{(k)}} Hk+1=Hk+(p(k))Tq(k)p(k)(p(k))T(q(k))THkq(k)Hkq(k)(q(k))THk

    \qquad\qquad 重新回到第 ( 1 ) (1) (1) 步开始下一轮的计算,直到 ∥ ∇ f ( x ( k ) ) ∥ < ε \Vert \nabla f(\boldsymbol x^{(k)})\Vert<\varepsilon ∥∇f(x(k))<ε
    \qquad
    ∙ \bullet  实现代码

    import numpy as np
    import matplotlib.pyplot as plt
    import time
    def diff1(f,x,h=0.001):
        dx = np.array([h,0])
        dy = np.array([0,h])
        gx = (f(x+dx)-f(x-dx))/(2*h)
        gy = (f(x+dy)-f(x-dy))/(2*h)
        return np.array([gx,gy])
    def newton1d(f,xk,d,h=0.001):
        dif1 = (f(xk+h*d) - f(xk-h*d))/(2*h)
        dif2 = (f(xk+h*d) + f(xk-h*d) - 2*f(xk))/(h*h)
        deltax = dif1/dif2
        xk1 = xk - deltax * d
        return xk1
    def newton(f,x0):
        k=1
        xk = x0
        Hk = np.eye(x0.shape[0])
        while True:
            gk = diff1(f,xk)        
            dk = -np.dot(Hk,gk)
            xk1 = newton1d(f,xk,dk)	
            gk1 = diff1(f,xk1)
            if np.linalg.norm(gk)<0.0001:
                break
            pk = xk1 - xk
            qk = gk1 - gk
            pkm = (xk1 - xk).reshape(x0.shape[0],-1)
            qkm = (gk1 - gk).reshape(x0.shape[0],-1)
            deltHk = pkm.dot(pkm.T)/pk.dot(qk) - Hk.dot(qkm.dot(qkm.T)).dot(Hk)/qk.dot(Hk.dot(qk))
            Hk = Hk + deltHk
            xk = xk1
            print('#{} min:{}'.format(k,np.round(xk,4)))
            k = k + 1
        return xk
    if __name__ == "__main__":       
        f = lambda x: (x[0]-1)**4 + (x[1]-0)**2 
        # f = lambda x: 2*(x[0]-0)**2 + (x[1]-0)**2 - 4*x[0] + 2
        x0 = np.array([0,1],dtype='float')
        time0 = time.process_time()
        minval = newton(f,x0)
        time1 = time.process_time()
        print('x:',np.round(minval,4),'minval:',np.round(f(minval),4))
        print('time: %fs'%(time1-time0))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45

    运行结果:

    #1 min:[0.4 0.8]
    #2 min:[ 0.4064 -0.0033]
    #3 min:[0.6043 0.    ]
    #4 min:[ 0.7362 -0.    ]
    #5 min:[0.8241 0.    ]
    #6 min:[ 0.8828 -0.    ]
    #7 min:[0.9218 0.    ]
    #8 min:[ 0.9479 -0.    ]
    #9 min:[0.9653 0.    ]
    #10 min:[ 0.9768 -0.    ]
    x: [ 0.9768 -0.    ] minval: 0.0
    time: 0.000000s
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    \qquad

    2.4 BFGS公式

    \qquad 在讨论拟牛顿条件时,首先构造出 q ( k ) ≈ ∇ 2 f ( x ( k + 1 ) ) p ( k ) \textcolor{crimson}{\boldsymbol q^{(k)}}\approx\nabla^2f(\boldsymbol x^{(k+1)})\textcolor{royalblue}{\boldsymbol p^{(k)}} q(k)2f(x(k+1))p(k) 的关系,然后记 H k + 1 = ∇ 2 f ( x ( k + 1 ) ) − 1 H_{k+1}=\nabla^2f(\boldsymbol x^{(k+1)})^{-1} Hk+1=2f(x(k+1))1,得到了 p ( k ) = H k + 1 q ( k ) \textcolor{royalblue}{\boldsymbol p^{(k)}}= H_{k+1}\textcolor{crimson}{\boldsymbol q^{(k)}} p(k)=Hk+1q(k)拟牛顿条件,前文中的秩 1 1 1校正和 DFP \text{DFP} DFP公式都是基于 p ( k ) = H k + 1 q ( k ) \textcolor{royalblue}{\boldsymbol p^{(k)}}= H_{k+1}\textcolor{crimson}{\boldsymbol q^{(k)}} p(k)=Hk+1q(k)拟牛顿条件
    \qquad
    \qquad BFGS \text{BFGS} BFGS公式直接基于 q ( k ) ≈ ∇ 2 f ( x ( k + 1 ) ) p ( k ) \textcolor{crimson}{\boldsymbol q^{(k)}}\approx\nabla^2f(\boldsymbol x^{(k+1)})\textcolor{royalblue}{\boldsymbol p^{(k)}} q(k)2f(x(k+1))p(k) 的关系,用不含二阶导数的矩阵 B k + 1 B_{k+1} Bk+1 近似 ∇ 2 f ( x ( k + 1 ) ) \nabla^2f(\boldsymbol x^{(k+1)}) 2f(x(k+1)),从而有 q ( k ) = B k + 1 p ( k ) \textcolor{crimson}{\boldsymbol q^{(k)}}=B_{k+1}\textcolor{royalblue}{\boldsymbol p^{(k)}} q(k)=Bk+1p(k),关于 B k B_{k} Bk 的修正公式为:

    B k + 1 = B k + q ( k ) ( q ( k ) ) T ( q ( k ) ) T p ( k ) − B k p ( k ) ( p ( k ) ) T B k ( p ( k ) ) T B k p ( k ) \qquad\qquad B_{k+1}=B_k+\dfrac{\boldsymbol q^{(k)}(\boldsymbol q^{(k)})^T}{(\boldsymbol q^{(k)})^T\boldsymbol p^{(k)}}-\dfrac{B_k\boldsymbol p^{(k)}(\boldsymbol p^{(k)})^TB_k}{(\boldsymbol p^{(k)})^TB_k\boldsymbol p^{(k)}} Bk+1=Bk+(q(k))Tp(k)q(k)(q(k))T(p(k))TBkp(k)Bkp(k)(p(k))TBk

    \qquad 上式即为关于矩阵 B \text{B} B BFGS \text{BFGS} BFGS 修正公式。
     
    \qquad 由于 q ( k ) = B k + 1 p ( k ) \textcolor{crimson}{\boldsymbol q^{(k)}}=B_{k+1}\textcolor{royalblue}{\boldsymbol p^{(k)}} q(k)=Bk+1p(k),因此 p ( k ) = B k + 1 − 1 q ( k ) \textcolor{royalblue}{\boldsymbol p^{(k)}}= B_{k+1}^{-1}\textcolor{crimson}{\boldsymbol q^{(k)}} p(k)=Bk+11q(k),也就相当于令 H k + 1 = B k + 1 − 1 H_{k+1}=B_{k+1}^{-1} Hk+1=Bk+11 时的拟牛顿法。
     
    \qquad 利用 Sherman-Morrison \text{Sherman-Morrison} Sherman-Morrison公式:

    ( M + u v T ) − 1 = M − 1 − M − 1 u v T M − 1 1 + v T M − 1 u \qquad\qquad(\boldsymbol{M}+\boldsymbol{u}\boldsymbol{v}^T)^{-1}=\boldsymbol{M}^{-1}-\dfrac{\boldsymbol{M}^{-1} \boldsymbol{u}\boldsymbol{v}^T \boldsymbol{M}^{-1}}{1+\boldsymbol{v}^T \boldsymbol{M}^{-1}\boldsymbol{u}} (M+uvT)1=M11+vTM1uM1uvTM1

    \qquad 可以求出 B k + 1 − 1 B_{k+1}^{-1} Bk+11,从而得到关于 H \boldsymbol{H} H BFGS \text{BFGS} BFGS公式:

    H k + 1 BFGS = H k + ( 1 + ( q ( k ) ) T H k q ( k ) ( p ( k ) ) T q ( k ) ) p ( k ) ( p ( k ) ) T ( p ( k ) ) T q ( k ) − p ( k ) ( q ( k ) ) T H k + H k q ( k ) ( p ( k ) ) T ( p ( k ) ) T q ( k ) \qquad\qquad H_{k+1}^{\text{BFGS}}=H_k+\left(1+\dfrac{(\boldsymbol q^{(k)})^TH_k \boldsymbol q^{(k)} }{(\boldsymbol p^{(k)})^T\boldsymbol q^{(k)}} \right)\dfrac{\boldsymbol p^{(k)}(\boldsymbol p^{(k)})^T}{(\boldsymbol p^{(k)})^T\boldsymbol q^{(k)}}-\dfrac{\boldsymbol p^{(k)}(\boldsymbol q^{(k)})^TH_k + H_k\boldsymbol q^{(k)}(\boldsymbol p^{(k)})^T }{(\boldsymbol p^{(k)})^T\boldsymbol q^{(k)}} Hk+1BFGS=Hk+(1+(p(k))Tq(k)(q(k))THkq(k))(p(k))Tq(k)p(k)(p(k))T(p(k))Tq(k)p(k)(q(k))THk+Hkq(k)(p(k))T

    \qquad
    DFP \qquad\text{DFP} DFP公式和 BFGS \text{BFGS} BFGS公式都是由 p ( k ) \boldsymbol p^{(k)} p(k) H k q ( k ) H_k\boldsymbol q^{(k)} Hkq(k) 构成的对称秩 2 2 2校正,因此两个公式的加权组合具有相同的形式,所有这类修正公式的集合可以表示为:

    H k + 1 ϕ = ( 1 − ϕ ) H k + 1 DFP + ϕ H k + 1 BFGS \qquad\qquad H_{k+1}^{\phi}=(1-\phi)H_{k+1}^{\text{DFP}}+\phi H_{k+1}^{\text{BFGS}} Hk+1ϕ=(1ϕ)Hk+1DFP+ϕHk+1BFGS

    \qquad 这类修正公式的全体,称为 Broyden \text{Broyden} Broyden族。当 ϕ = 0 \phi=0 ϕ=0 时为 DFP \text{DFP} DFP公式,当 ϕ = 1 \phi=1 ϕ=1 时为 BFGS \text{BFGS} BFGS公式。

  • 相关阅读:
    基于Springboot实现社区维修平台管理系统演示【项目源码+论文说明】
    微信小程序实现连续签到七天
    卡尔曼滤波器(二):Simulink卡尔曼滤波器模块使用
    HTML表格合并行和列
    数据结构与算法之LeetCode-513. 找树左下角的值 - 力扣(DFS,BFS)
    深夜学习:有关Inner、Outer等相关词汇的理解
    JAVA-----注释、字面量、关键字、制表符
    栈和队列基础
    前端开发如何做新手引导
    pip 时报错 no such option: --bulid-dir 的解决办法
  • 原文地址:https://blog.csdn.net/xfijun/article/details/121624504