上一节介绍了线性判别分析(Linear Discriminant Analysis)的策略构建思想以及策略思想的数学符号实现过程。本节将基于策略思想继续介绍线性判别分析的模型参数求解过程。
由于线性判别分析的本质依然是 使用直线(超平面)对样本空间进行划分,但是它的特点是将线性模型中的模型参数 W \mathcal W W赋予一个实际意义:模型参数 W \mathcal W W是 p p p维样本空间映射到一维空间的参考系。
可以看出,寻找最优参考系
W
^
\hat {\mathcal W}
W^与构建模型
W
^
T
x
(
i
)
+
b
\hat {\mathcal W}^{T}x^{(i)} + b
W^Tx(i)+b本质上是同一个任务。
如何寻找最优参考系
W
^
\hat {\mathcal W}
W^? 换句话说,判别参考系
W
\mathcal W
W优劣性的标准是什么?根据线性判别分析高内聚、低耦合的思想,将判别标准从类内(with classes)和类内(Between classes)两个角度进行判定:
将类内、类间两种角度相融合——策略(损失函数)既要满足类内角度的要求,也要满足类间角度的要求。基于上一节的场景描述,得到的损失函数结果
J
(
W
)
\mathcal J(\mathcal W)
J(W)表示如下:
J
(
W
)
=
(
Z
1
ˉ
−
Z
2
ˉ
)
2
S
1
+
S
2
J(W)=(¯Z1−¯Z2)2S1+S2
J(W)=S1+S2(Z1ˉ−Z2ˉ)2
其中,
Z
j
^
(
j
=
1
,
2
)
\hat {\mathcal Z_j}(j=1,2)
Zj^(j=1,2)表示 各类映射样本的均值结果:
Z
j
^
=
1
N
j
∑
x
(
i
)
∈
X
C
j
W
T
x
(
i
)
\hat {\mathcal Z_j} = \frac{1}{N_j}\sum_{x^{(i)} \in \mathcal X_{C_j}}\mathcal W^{T}x^{(i)}
Zj^=Nj1x(i)∈XCj∑WTx(i)
S
j
(
j
=
1
,
2
)
\mathcal S_j(j=1,2)
Sj(j=1,2)表示 各类映射样本的方差结果:
S
j
=
1
N
j
∑
x
(
i
)
∈
X
C
j
(
W
T
x
(
i
)
−
Z
j
ˉ
)
(
W
T
x
(
i
)
−
Z
j
ˉ
)
T
\mathcal S_j = \frac{1}{N_j}\sum_{x^{(i)} \in \mathcal X_{C_j}}(\mathcal W^{T}x^{(i)} - \bar {\mathcal Z_j})(\mathcal W^{T}x^{(i)} - \bar {\mathcal Z_j})^{T}
Sj=Nj1x(i)∈XCj∑(WTx(i)−Zjˉ)(WTx(i)−Zjˉ)T
经过化简,关于参考系
W
\mathcal W
W的策略表示如下:
J
(
W
)
=
W
T
(
X
C
1
ˉ
−
X
C
2
ˉ
)
(
X
C
1
ˉ
−
X
C
2
ˉ
)
T
W
W
T
(
S
C
1
+
S
C
2
)
W
\mathcal J(\mathcal W) = \frac{\mathcal W^{T}(\bar {\mathcal X_{C_1}} - \bar {\mathcal X_{C_2}})(\bar {\mathcal X_{C_1}} - \bar {\mathcal X_{C_2}})^{T}\mathcal W}{\mathcal W^{T}(\mathcal S_{C_1} + \mathcal S_{C_2})\mathcal W}
J(W)=WT(SC1+SC2)WWT(XC1ˉ−XC2ˉ)(XC1ˉ−XC2ˉ)TW
其中
X
C
j
ˉ
(
j
=
1
,
2
)
\bar {\mathcal X_{C_j}}(j=1,2)
XCjˉ(j=1,2)表示 各类原始样本的均值结果:
X
C
j
ˉ
=
1
N
j
∑
x
(
i
)
∈
X
C
j
x
(
j
)
\bar {\mathcal X_{C_j}} = \frac{1}{N_j}\sum_{x^{(i)} \in \mathcal X_{C_j}} x^{(j)}
XCjˉ=Nj1x(i)∈XCj∑x(j)
S
C
j
(
j
=
1
,
2
)
\mathcal S_{C_j}(j=1,2)
SCj(j=1,2)表示 各类原始样本的方差结果:
S
C
j
=
1
N
j
∑
x
(
j
)
∈
X
C
j
(
x
(
j
)
−
X
C
j
ˉ
)
(
x
(
j
)
−
X
C
j
ˉ
)
T
\mathcal S_{C_j} = \frac{1}{N_j} \sum_{x^{(j)} \in \mathcal X_{C_j}}(x^{(j)} - \bar {\mathcal X_{C_j}})(x^{(j)} - \bar {\mathcal X_{C_j}})^{T}
SCj=Nj1x(j)∈XCj∑(x(j)−XCjˉ)(x(j)−XCjˉ)T
重新观察
J
(
W
)
\mathcal J(\mathcal W)
J(W),定义分子的中间项为类间方差,用
S
b
e
t
\mathcal S_{bet}
Sbet表示。即:
S
b
e
t
\mathcal S_{bet}
Sbet是
p
×
p
p \times p
p×p维矩阵;
S
b
e
t
=
(
X
C
1
ˉ
−
X
C
2
ˉ
)
(
X
C
1
ˉ
−
X
C
2
ˉ
)
T
\mathcal S_{bet} = (\bar {\mathcal X_{C_1}} - \bar {\mathcal X_{C_2}})(\bar {\mathcal X_{C_1}} - \bar {\mathcal X_{C_2}})^{T}
Sbet=(XC1ˉ−XC2ˉ)(XC1ˉ−XC2ˉ)T
定义分母的中间项为类内方差,用
S
w
i
t
h
\mathcal S_{with}
Swith表示。即:
S
w
i
t
h
\mathcal S_{with}
Swith也是
p
×
p
p \times p
p×p维矩阵;
S
w
i
t
h
=
S
C
1
+
S
C
2
\mathcal S_{with} = \mathcal S_{C_1} + \mathcal S_{C_2}
Swith=SC1+SC2
策略
J
(
W
)
\mathcal J(\mathcal W)
J(W)将重新化简为:
J
(
W
)
=
W
T
S
b
e
t
W
W
T
S
w
i
t
h
W
=
W
T
S
b
e
t
W
(
W
T
S
w
i
t
h
W
)
−
1
J(W)=WTSbetWWTSwithW=WTSbetW(WTSwithW)−1
J(W)=WTSwithWWTSbetW=WTSbetW(WTSwithW)−1
直接对
J
(
W
)
\mathcal J(\mathcal W)
J(W)求导:
需要补一下矩阵论中的矩阵乘法求导~
∂
(
W
T
S
b
e
t
W
)
∂
W
=
2
×
S
b
e
t
W
∂
J
(
W
)
∂
W
=
2
×
S
b
e
t
W
(
W
T
S
w
i
t
h
W
)
−
1
+
W
T
S
b
e
t
W
×
(
−
1
)
(
W
T
S
w
i
t
h
W
)
−
2
×
2
×
S
w
i
t
h
W
=
S
b
e
t
W
(
W
T
S
w
i
t
h
W
)
−
1
−
2
×
W
T
S
b
e
t
W
×
(
W
T
S
w
i
t
h
W
)
−
2
S
w
i
t
h
W
\frac{\partial (\mathcal W^{T}\mathcal S_{bet} \mathcal W)}{\partial \mathcal W} = 2 \times \mathcal S_{bet} \mathcal W \\ ∂J(W)∂W=2×SbetW(WTSwithW)−1+WTSbetW×(−1)(WTSwithW)−2×2×SwithW=SbetW(WTSwithW)−1−2×WTSbetW×(WTSwithW)−2SwithW
∂W∂(WTSbetW)=2×SbetW∂W∂J(W)=2×SbetW(WTSwithW)−1+WTSbetW×(−1)(WTSwithW)−2×2×SwithW=SbetW(WTSwithW)−1−2×WTSbetW×(WTSwithW)−2SwithW
令
∂
J
(
W
)
∂
W
≜
0
\frac{\partial \mathcal J(\mathcal W)}{\partial \mathcal W} \triangleq 0
∂W∂J(W)≜0,等式两端同时乘以
(
W
T
S
w
i
t
h
W
)
2
(\mathcal W^{T}\mathcal S_{with} \mathcal W)^2
(WTSwithW)2,有:
S
b
e
t
W
(
W
T
S
w
i
t
h
W
)
−
(
W
T
S
b
e
t
W
)
S
w
i
t
h
W
=
0
S
b
e
t
W
(
W
T
S
w
i
t
h
W
)
=
(
W
T
S
b
e
t
W
)
S
w
i
t
h
W
\mathcal S_{bet} \mathcal W (\mathcal W^{T} \mathcal S_{with} \mathcal W) - (\mathcal W^{T} \mathcal S_{bet} \mathcal W) \mathcal S_{with} \mathcal W = 0 \\ \mathcal S_{bet}\mathcal W(\mathcal W^{T} \mathcal S_{with} \mathcal W) = (\mathcal W^{T}\mathcal S_{bet}\mathcal W)\mathcal S_{with}\mathcal W
SbetW(WTSwithW)−(WTSbetW)SwithW=0SbetW(WTSwithW)=(WTSbetW)SwithW
观察:
W
T
S
w
i
t
h
W
\mathcal W^{T}\mathcal S_{with}\mathcal W
WTSwithW和
W
T
S
b
e
t
W
\mathcal W^{T}\mathcal S_{bet}\mathcal W
WTSbetW它们的结果均是标量,即常数;因此最优参数
W
^
\hat {\mathcal W}
W^可表示为如下形式:
W
T
\mathcal W^{T}
WT维度
1
×
p
1 \times p
1×p;
S
w
i
t
h
,
S
b
e
t
\mathcal S_{with},\mathcal S_{bet}
Swith,Sbet维度均是
p
×
p
p \times p
p×p;
W
\mathcal W
W维度
p
×
1
p \times 1
p×1;
W
^
=
W
T
S
w
i
t
h
W
W
T
S
b
e
t
W
S
w
i
t
h
−
1
S
b
e
t
W
\hat {\mathcal W} = \frac{\mathcal W^{T}\mathcal S_{with} \mathcal W}{\mathcal W^{T}\mathcal S_{bet} \mathcal W} \mathcal S_{with}^{-1} \mathcal S_{bet} \mathcal W
W^=WTSbetWWTSwithWSwith−1SbetW
先观察分式项,由于分子、分母都是常数,因此该分式项也是一个常数,由于
W
^
\hat {\mathcal W}
W^本身就是一个向量,我们更关心向量的方向而不是向量的大小。因此通常忽略常数项(系数项)的影响,
W
^
\hat {\mathcal W}
W^可以表示为如下形式:
W
^
∝
S
w
i
t
h
−
1
S
b
e
t
W
\hat {\mathcal W} \propto \mathcal S_{with}^{-1} \mathcal S_{bet} \mathcal W
W^∝Swith−1SbetW
基于上式,将
S
b
e
t
\mathcal S_{bet}
Sbet(类间方差)带入并展开:
W
^
∝
S
w
i
t
h
−
1
(
X
C
1
ˉ
−
X
C
2
ˉ
)
(
X
C
1
ˉ
−
X
C
2
ˉ
)
T
W
\hat {\mathcal W} \propto \mathcal S_{with}^{-1}(\bar {\mathcal X_{C_1}} - \bar {\mathcal X_{C_2}})(\bar {\mathcal X_{C_1}} - \bar {\mathcal X_{C_2}})^{T}\mathcal W
W^∝Swith−1(XC1ˉ−XC2ˉ)(XC1ˉ−XC2ˉ)TW
观察后两项:
(
X
C
1
ˉ
−
X
C
2
ˉ
)
T
W
(\bar {\mathcal X_{C_1}} - \bar {\mathcal X_{C_2}})^{T}\mathcal W
(XC1ˉ−XC2ˉ)TW,同上,
(
X
C
1
ˉ
−
X
C
2
ˉ
)
T
(\bar {\mathcal X_{C_1}} - \bar {\mathcal X_{C_2}})^{T}
(XC1ˉ−XC2ˉ)T是一个
1
×
p
1 \times p
1×p的向量,
W
\mathcal W
W是
p
×
1
p \times 1
p×1的向量。因此,
(
X
C
1
ˉ
−
X
C
2
ˉ
)
T
W
(\bar {\mathcal X_{C_1}} - \bar{\mathcal X_{C_2}})^{T}\mathcal W
(XC1ˉ−XC2ˉ)TW也是一个标量、常数。如果要追究它的实际意义,可以理解为“各类样本均值的差距(或者称类间差距)在参考系
W
\mathcal W
W上的映射结果”。
系数依然不会影响向量的方向。因此,继续将上式化简为:
需要说明一下,这里的方向并不具体指向量的方向,而是‘向量所在直线的朝向’。系数
W
T
S
w
i
t
h
W
W
T
S
b
e
t
W
\frac{\mathcal W^{T}\mathcal S_{with}\mathcal W}{\mathcal W^{T}\mathcal S_{bet}\mathcal W}
WTSbetWWTSwithW和
(
X
C
1
ˉ
−
X
C
2
ˉ
)
T
W
(\bar {\mathcal X_{C_1}} - \bar {\mathcal X_{C_2}})^{T}\mathcal W
(XC1ˉ−XC2ˉ)TW正、负都有可能,但无论其结果是正还是负,乘以该系数对应的向量所在直线不会发生变化。
W
^
∝
S
w
i
t
h
−
1
(
X
C
1
ˉ
−
X
C
2
ˉ
)
\hat {\mathcal W} \propto \mathcal S_{with}^{-1}(\bar {\mathcal X_{C_1}} - \bar {\mathcal X_{C_2}})
W^∝Swith−1(XC1ˉ−XC2ˉ)
换句话说,最优参考系
W
^
\hat {\mathcal W}
W^的方向只和向量
S
w
i
t
h
−
1
(
X
C
1
ˉ
−
X
C
2
ˉ
)
\mathcal S_{with}^{-1}(\bar {\mathcal X_{C_1}} - \bar {\mathcal X_{C_2}})
Swith−1(XC1ˉ−XC2ˉ)的方向相关,因此,上式为基于二分类的线性判别分析最优参考系(线性模型的最优模型参数)
W
^
\hat {\mathcal W}
W^的解。
相关参考:
机器学习-线性分类4-线性判别分析(模型求解)