机器学习笔记之无约束优化问题——(阶段性收尾)共轭方向法与Wolfe准则优化方法Python示例

机器学习笔记之无约束优化问题——基于共轭方向法与Wolfe准则优化方法的Python示例

引言

引言

本节使用 $\text{Python}$ 对共轭梯度法的精确搜索与非精确搜索进行示例。

本人数学水平与代码水平有限，欢迎小伙伴一起讨论~关联文章：

无约束优化问题——共轭梯度法
线搜索方法(步长角度；非精确搜索； $\text{Wolfe Condition}$ )

小插曲：画图——非标准二次型的等值线

非标准二次型——这意味着：对应函数

f (x) = x^{T} Q x

中的正定矩阵

\mathcal Q

不是对角阵。本节以凸二次函数：

x^T \mathcal Q x \quad \mathcal Q = (1112);x \in \mathbb R^2

的等值线为例，使用

\text{Python}

做出基于

\text{Wolfe Condition}

与精确搜索的共轭梯度法效果。完整代码见文章最下方，后面不再赘述。如果使用二元函数进行表示，可以表示为如下形式：

f (x_{1}, x_{2}) = (x_{1} x_{2}) (1112) (x_{1} x_{2}) = (x_{1} + x_{2} x_{1} + 2 x_{2}) (x_{1} x_{2}) = x_{1}^{2} + 2 x_{1} x_{2} + 2 x_{2}^{2}

很明显：该函数中不仅包含平方项，并且包含交叉项。因而无法将 $x_1,x_2$ 进行独立表示。对于等值线函数表示如下：
基于 $\mathcal C>0$ 的不同取值，可以得到相应大小的等值线结果。
$x_1^2 + 2 x_1 x_2 + 2x_2^2 = \mathcal C$
针对上述问题，这里的思路是：在给定等值线 $\mathcal C$ 的条件下，确定 $x_1$ 的范围。判断 $x_1$ 是否为边缘的条件是：将 $x_1,\mathcal C$ 均视作常数，此时上述函数就是关于 $x_2$ 的一元二次方程，只需要求解根判别公式： $\Delta = b^2 - 4ac \triangleq 0$ 即可找到该范围：

因为 $\Delta \triangleq 0$ 说明一元二次方程有唯一解,意味着随机变量 $x_1$ 的正交基在该位置与函数图像相切,见下示例图:
其中 $x_1,c = x_1^2 - \mathcal C$ ,带入即可。
$0 ≜ Δ = b^{2} - 4 a c = 4 x_{1}^{2} - 8 (x_{1}^{2} - C) \Rightarrow x = \pm 2 C C > 0$

基于该范围，范围边缘的 $\pm \sqrt{2 \mathcal C}$ 只有唯一解，范围内的其他点均对应两个不相同的解。使用求根公式：

\frac{- b \pm b ^{2} - 4 a c}{2 a}

将大值与小值分开作图：
一次画半个椭圆~
对应代码表示如下：

import math
import matplotlib.pyplot as plt

def CalxLimits(C):
	# 根判别式
    return math.sqrt(2 * C),-1 * math.sqrt(2 * C)

# def f(x,y):
    # 没有用到;只是描述一下这个非标准二次型函数。
    # return 2 * (y ** 2) + 2 * x * y + (x ** 2)

def GetAnalytical(x,C):
	# 求根公式 
    return 0.25 * (-2 * x + math.sqrt(8 * C - (4 * (x ** 2)))),0.25 * (-2 * x - math.sqrt(8 * C - (4 * (x ** 2))))

def DrawContourPicture(CList):
    for C in CList:
        UpperPointList,LowerPointList = list(),list()
        Upper,Lower = CalxLimits(C)
        xList = list(np.linspace(Lower,Upper,10000))
        for x in xList:
            Upperx,Lowerx = GetAnalytical(x,C)
            UpperPointList.append(Upperx)
            LowerPointList.append(Lowerx)

        plt.plot(xList, UpperPointList,'--',c="tab:blue")
        plt.plot(xList, LowerPointList,'--',c="tab:blue")
  	plt.show()

if __name__ == '__main__':
    CList = [0.01, 0.1, 0.5, 2, 4.5]
    DrawContourPicture(CList)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

对应图像效果表示如下：
非标准二次型

算法在图像中的表示

基于精确搜索的共轭梯度法

由于目标函数 $f(\cdot)$ 是凸二次函数，那么该函数一定存在全局最优解。因而可以使用基于精确搜索的共轭梯度法获取最优解。回顾无约束优化问题——共轭梯度法，其算法步骤表示如下：
初始化操作：

给定初始点 $x_0 = (3 \quad 2)^T$ ，记 $d_0 = - \nabla f(x_0)$ ；设置阈值 $\epsilon = 0.05;k=0$

算法过程：

首先判断范数 $\|\nabla f(x_k)\| \leq \epsilon$ 是否成立 $?$ 若成立，则算法终止；
这里利用范数来侧面描述梯度向量 $\nabla f(x_k)$ 的大小。当 $\|\nabla f(x_k)\| \Rightarrow 0$ 意味着向量 $\nabla f(x_k)$ 趋于零向量。
计算当前迭代步骤的最优步长 $\alpha_k$ ：
需要注意一下：上面链接文章中对目标函数的定义为 $f (x) = \frac{1}{2} x^{T} Q x + C^{T} x$ ,而系数 $\frac{1}{2}$ 只是为了方便求导。在本节中的 $x^T \mathcal Q x$ 没有该系数，因而需要在相应 $\alpha_k$ 化简结果中填上一个系数 $\frac{1}{2}$ 。
$\alpha_k = - \frac{[\nabla f(x_k)]^T d_k}{2(d_k)^T \mathcal Q d_k}$
计算新位置点： $x_{k+1} = x_k + \alpha_k \cdot d_k$ ，并计算共轭方向 $d_{k+1}$ ：
$d_{k+1} = - \nabla f(x_{k+1}) + \beta_k \cdot d_k,\beta_k = \frac{[\nabla f(x_{k+1})]^T \mathcal Q d_k}{(d_k)^T \mathcal Q d_k}$
令 $k = k + 1$ ，转步骤 $1$ 重新进行判断。

相应代码表示如下：

def ConjugateGradient():

    def f(PointInput, Q):
        # 二次型
        return np.dot(np.dot(PointInput, Q), PointInput)

    def Nablaf(PointInput, Q):
        # 二次型求导
        return 2 * np.dot(Q, PointInput)

    def GetNorm(ArrayInput):
        return np.linalg.norm(ArrayInput)

    Epsilon = 0.05
    InitPoint = np.array([3., 2.])
    Q = np.array([[1., 1.], [1., 2.]])
    PointList = list()
    PointList.append(InitPoint)

    ConjugateStart = -1 * Nablaf(InitPoint, Q)

    while True:
        if GetNorm(Nablaf(InitPoint, Q)) <= Epsilon:
            break
        else:
            alpha = -0.5 * (np.dot(Nablaf(InitPoint, Q), ConjugateStart) / np.dot(np.dot(ConjugateStart, Q),ConjugateStart))

            NewPoint = InitPoint + alpha * ConjugateStart
            Beta = np.dot(np.dot(Nablaf(NewPoint, Q), Q), ConjugateStart) / np.dot(np.dot(ConjugateStart, Q),ConjugateStart)
            NewConjugate = -1 * Nablaf(NewPoint, Q) + Beta * ConjugateStart
            PointList.append(NewPoint)
            print("[info] Iterations: ", len(PointList))
            
            InitPoint = NewPoint
            ConjugateStart = NewConjugate
    return PointList
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

对应图像表示如下：
精确搜索共轭梯度法示例
很明显，可以发现：使用精确搜索，它仅需要 $1$ 次线搜索： $n = 2$ 次迭代必然能够找到最优解。由于下降方向是共轭方向 $d_k$ ，并且 $\mathcal Q$ 不是对角阵，因此上面迭代产生的下降方向之间并不满足垂直关系。
只是看起来有点迷惑人~

当然，如果在计算 $\alpha_k$ 过程中没有乘以

\frac{1}{2}

，会得到下面的结果：
循环无法停止，它只会在这几个点上无限循环下去。不要问我是怎么知道的~
循环错误示例

基于Wolfe准则的共轭梯度法

在非精确搜索—— $\text{Wolfe Condition}$ 一节中介绍了这种方法，它主要通过参数 $\mathcal C_1$ 来描述所选 $\alpha$ 的上界；以及参数 $\mathcal C_2$ 来描述上界以下范围内 $\alpha$ 满足的梯度范围：
其中 $\phi(\alpha) = f(x_{k+1}) = f(x_k + \alpha \cdot d_k)$
${ϕ(α)≤f(xk)+C1[∇f(xk)]Tdk⋅αϕ′(α)≥C2⋅[∇f(xk)]TdkC1∈(0,1)C2∈(C1,1)$

⎩ ⎨ ⎧ ϕ (α) \leq f (x_{k}) + C_{1} [\nabla f (x_{k})]^{T} d_{k} \cdot α ϕ^{'} (α) \geq C_{2} \cdot [\nabla f (x_{k})]^{T} d_{k} C_{1} \in (0, 1) C_{2} \in (C_{1}, 1)

线搜索过程中，每次迭代只选择一个满足上述条件的优质结果(不一定是最优解)参与迭代，从而得到近似最优解。关于

\text{Wolfe Condition}

的收敛性证明详见传送门。对应代码描述如下：

    def WolfeConditionOperation(C1,C2,PointInput,Conjugate,Q,UpperLimits):

        def phi(alpha,PointInput,Conjugate,Q):
            return np.dot(np.dot(PointInput + alpha * Conjugate,Q),PointInput + alpha * Conjugate)

        def Derphi(alpha,PointInput,Conjugate,Q):
            # phi()函数关于alpha的导函数
            return np.dot(np.dot(Conjugate,Q),PointInput) + np.dot(np.dot(PointInput,Q),Conjugate) \
                   + 2 * alpha * np.dot(np.dot(Conjugate,Q),Conjugate)

        assert 0 < C1 < 1 and C1 < C2 < 1
        while True:
            alpha = random.uniform(0.0,UpperLimits)
            if phi(alpha,PointInput,Conjugate,Q) <= f(PointInput,Q) + C1 * alpha * np.dot(Nablaf(PointInput,Q),Conjugate):
                if Derphi(alpha,PointInput,Conjugate,Q) >= C2 * np.dot(Nablaf(PointInput,Q),Conjugate):
                    if not alpha:
                        continue
                    else:
                        UpperLimits /= 1.2
                        break
        return alpha
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

基于 $\text{Wolfe Condition}$ 的共轭梯度法收敛效果描述如下：
需要说明的是，每次迭代产生的 $\alpha$ 不是固定的，对应图像也不是固定的。甚至有些时候选择出的 $\alpha$ 结果不满足迭代条件甚至卡死。多试几次~
WolfeCondition共轭梯度效果

附：共轭梯度法完整代码

import math
import random
import numpy as np
import matplotlib.pyplot as plt

def CalxLimits(C):
    return math.sqrt(2 * C),-1 * math.sqrt(2 * C)

# def f(x,y):
    # 没有用到;只是描述一下这个非标准型函数。
    # return 2 * (y ** 2) + 2 * x * y + (x ** 2)

def GetAnalytical(x,C):
    return 0.25 * (-2 * x + math.sqrt(8 * C - (4 * (x ** 2)))),0.25 * (-2 * x - math.sqrt(8 * C - (4 * (x ** 2))))

def DrawContourPicture(CList):

    for C in CList:
        UpperPointList,LowerPointList = list(),list()
        Upper,Lower = CalxLimits(C)
        xList = list(np.linspace(Lower,Upper,10000))
        for x in xList:
            Upperx,Lowerx = GetAnalytical(x,C)
            UpperPointList.append(Upperx)
            LowerPointList.append(Lowerx)

        plt.plot(xList, UpperPointList,'--',c="tab:blue")
        plt.plot(xList, LowerPointList,'--',c="tab:blue")
    # plt.show()

def ConjugateGradient(mode="WolfeCondition"):
    # 使用精确搜索(步长是最优解)的迭代效果。

    def f(PointInput,Q):
        # 二次型
        return np.dot(np.dot(PointInput,Q),PointInput)

    def Nablaf(PointInput,Q):
        # 二次型求导
        return 2 * np.dot(Q,PointInput)

    def GetNorm(ArrayInput):
        return np.linalg.norm(ArrayInput)

    def WolfeConditionOperation(C1,C2,PointInput,Conjugate,Q,UpperLimits):

        def phi(alpha,PointInput,Conjugate,Q):
            return np.dot(np.dot(PointInput + alpha * Conjugate,Q),PointInput + alpha * Conjugate)

        def Derphi(alpha,PointInput,Conjugate,Q):
            # phi()函数关于alpha的导函数
            return np.dot(np.dot(Conjugate,Q),PointInput) + np.dot(np.dot(PointInput,Q),Conjugate) \
                   + 2 * alpha * np.dot(np.dot(Conjugate,Q),Conjugate)

        assert 0 < C1 < 1 and C1 < C2 < 1

        while True:
            alpha = random.uniform(0.0,UpperLimits)
            if phi(alpha,PointInput,Conjugate,Q) <= f(PointInput,Q) + C1 * alpha * np.dot(Nablaf(PointInput,Q),Conjugate):
                if Derphi(alpha,PointInput,Conjugate,Q) >= C2 * np.dot(Nablaf(PointInput,Q),Conjugate):
                    if not alpha:
                        continue
                    else:
                        UpperLimits /= 1.2
                        break
        return alpha

    assert mode in ["Exact","WolfeCondition"]
    Epsilon = 0.05
    InitPoint = np.array([3.,2.])
    Q = np.array([[1.,1.],[1.,2.]])
    PointList = list()
    PointList.append(InitPoint)
    UpperLimits = 2.0
    C1 = 0.5
    C2 = 0.8
    ConjugateStart = -1 * Nablaf(InitPoint,Q)

    while True:
        if GetNorm(Nablaf(InitPoint,Q)) <= Epsilon:
            break
        else:
            if mode == "Exact":
                alpha = -0.5 * (np.dot(Nablaf(InitPoint,Q),ConjugateStart) / np.dot(np.dot(ConjugateStart,Q),ConjugateStart))
            else:
                alpha = WolfeConditionOperation(C1,C2,InitPoint,ConjugateStart,Q,UpperLimits)

            NewPoint = InitPoint + alpha * ConjugateStart
            Beta = np.dot(np.dot(Nablaf(NewPoint,Q),Q),ConjugateStart) / np.dot(np.dot(ConjugateStart,Q),ConjugateStart)
            NewConjugate = -1 * Nablaf(NewPoint,Q) + Beta * ConjugateStart
            PointList.append(NewPoint)

            InitPoint = NewPoint
            ConjugateStart = NewConjugate
            
    return PointList

def DrawPicture(PointList):
    CList = [0.01,0.1,0.5,2,4.5]
    DrawContourPicture(CList)
    
    plotList = list()
    for (x,y) in PointList:
        plotList.append((x,y))
        plt.scatter(x, y, s=40, facecolor="none", edgecolors="tab:red", marker='o')
        if len(plotList) < 2:
            continue
        else:
            plt.plot([plotList[0][0], plotList[1][0]], [plotList[0][1], plotList[1][1]], c="tab:red")
            plotList.pop(0)
    plt.show()

if __name__ == '__main__':
    PointList = ConjugateGradient()
    # PointList = ConjugateGradient(mode="Exact")
    DrawPicture(PointList)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116

相关阅读:
猿创征文｜【Python数据科学快速入门系列 | 05】常用科学计算函数
LoRa技术未来发展前景：物联网和边缘计算的引领者
中科大遭钓鱼邮件攻击了？3500名师生中招
SQL必需掌握的100个重要知识点：插入数据
放弃36年的鞋服业务转而“卖粮”，贵人鸟胜算几何？
C语言详解系列——数组详解，一维数组、二维数组
R包WGCNA---转录组WGCNA共表达网络构建（基本概念）
【C++】类与对象基本知识（构造析构拷贝 explicit 对象数组动态静态对象）
关于Vuex的简单理解和使用
Android 13.0 无源码app修改它的icon图标

原文地址：https://blog.csdn.net/qq_34758157/article/details/132908237