Set up: D = { ( x i , y i ) } i = 1 n \mathbf{D}=\{(\mathbf{x}_i,y_i) \}_{i=1}^n D={(xi,yi)}i=1n, 其中 y i = 1 , 2 y_i=1,2 yi=1,2(或 ± 1 \pm 1 ±1 等), D 1 = { x i ∣ y i = 1 } \mathbf{D}_1=\{\mathbf{x}_i|y_i=1 \} D1={xi∣yi=1}, D 2 = { x i ∣ y i = 2 } \mathbf{D}_2=\{\mathbf{x}_i|y_i=2 \} D2={xi∣yi=2}
Goal:寻找向量 w ∈ R d \mathbf{w}\in \mathbb{R}^d w∈Rd (代表直线方向)使得 D 1 , D 2 \mathbf{D}_1,\mathbf{D}_2 D1,D2 的“平均值”距离最大且“总方差”最小。
设 w ∈ R d , w T w = 1 \mathbf{w} \in \mathbb{R}^d,\mathbf{w}^T\mathbf{w}=1 w∈Rd,wTw=1,则 x i \mathbf{x}_i xi 在 w \mathbf{w} w 方向上的投影为 x i ′ = ( w T x i w T u ) w = a i w , a i = w T x i \mathbf{x}_{i}^{\prime}=\left(\frac{\mathbf{w}^{T} \mathbf{x}_{i}}{\mathbf{w}^{T} \mathbf{u}}\right) \mathbf{w}=a_{i} \mathbf{w},a_{i}=\mathbf{w}^{T} \mathbf{x}_{i} xi′=(wTuwTxi)w=aiw,ai=wTxi
则
D
1
\mathbf{D}_1
D1 中数据在
w
\mathbf{w}
w 上的投影平均值为:(
∣
D
1
∣
=
n
1
|\mathbf{D}_1|=n_1
∣D1∣=n1)
m
1
:
=
1
n
1
∑
x
i
∈
D
1
a
i
=
μ
1
T
w
m_1:=\frac{1}{n_1}\sum\limits_{\mathbf{x}_i\in \mathbf{D}_1}a_i=\boldsymbol{\mu}_1^T\mathbf{w}
m1:=n11xi∈D1∑ai=μ1Tw
投影平均值等于平均值的投影。
类似地:
D
2
\mathbf{D}_2
D2 中数据在
w
\mathbf{w}
w 上的投影平均值为:
m
2
:
=
1
n
2
∑
x
i
∈
D
2
a
i
=
μ
2
T
w
m_2:=\frac{1}{n_2}\sum\limits_{\mathbf{x}_i\in \mathbf{D}_2}a_i=\boldsymbol{\mu}_2^T\mathbf{w}
m2:=n21xi∈D2∑ai=μ2Tw
目标之一:寻找
w
\mathbf{w}
w 使得
(
m
1
−
m
2
)
2
(m_1-m_2)^2
(m1−m2)2 最大。
对于
D
i
\mathbf{D}_i
Di,定义:
s
i
2
=
∑
x
k
∈
D
i
(
a
k
−
m
i
)
2
s_i^2=\sum\limits_{\mathbf{x}_k\in \mathbf{D}_i}(a_k-m_i)^2
si2=xk∈Di∑(ak−mi)2
注意:
s
i
2
=
n
i
σ
i
2
(
∣
D
i
∣
=
n
i
)
s_i^2=n_i\sigma^2_i\ (|D_i|=n_i)
si2=niσi2 (∣Di∣=ni)
Goal:Fisher LDA目标函数:
max
w
J
(
w
)
=
(
m
1
−
m
2
)
2
s
1
2
+
s
2
2
\max\limits_{\mathbf{w}}J(\mathbf{w})=\frac{(m_1-m_2)^2}{s_1^2+s_2^2}
wmaxJ(w)=s12+s22(m1−m2)2
注意:
J
(
w
)
=
J
(
w
1
,
w
2
,
⋯
,
w
d
)
J(\mathbf{w})=J(w_1,w_2,\cdots,w_d)
J(w)=J(w1,w2,⋯,wd)
(
m
1
−
m
2
)
2
=
(
w
T
(
μ
1
−
μ
2
)
)
2
=
w
T
(
(
μ
1
−
μ
2
)
(
μ
1
−
μ
2
)
T
)
w
=
w
T
B
w
B \mathbf{B} B 被称为类间扩散矩阵
s
1
2
=
∑
x
i
∈
D
1
(
a
i
−
m
1
)
2
=
∑
x
i
∈
D
1
(
w
T
x
i
−
w
T
μ
1
)
2
=
∑
x
i
∈
D
1
(
w
T
(
x
i
−
μ
1
)
)
2
=
w
T
(
∑
x
i
∈
D
1
(
x
i
−
μ
1
)
(
x
i
−
μ
1
)
T
)
w
=
w
T
S
1
w
S 1 \mathbf{S}_{1} S1 被称为 D 1 \mathbf{D}_1 D1 的扩散矩阵 S 1 = n 1 Σ 1 \mathbf{S}_{1}=n_1\Sigma_1 S1=n1Σ1
类似地, s 2 2 = w T S 2 w s_{2}^{2}=\mathbf{w}^{T} \mathbf{S}_{2} \mathbf{w} s22=wTS2w
令
S
=
S
1
+
S
2
\mathbf{S}=\mathbf{S}_{1}+\mathbf{S}_{2}
S=S1+S2,则
max
w
J
(
w
)
=
(
m
1
−
m
2
)
2
s
1
2
+
s
2
2
=
w
T
B
w
w
T
S
w
\max\limits_{\mathbf{w}}J(\mathbf{w})=\frac{(m_1-m_2)^2}{s_1^2+s_2^2}=\frac{\mathbf{w}^{T} \mathbf{B} \mathbf{w}}{\mathbf{w}^{T} \mathbf{S} \mathbf{w}}
wmaxJ(w)=s12+s22(m1−m2)2=wTSwwTBw
注意:
d
d
w
J
(
w
)
=
2
B
w
(
w
T
S
w
)
−
2
S
w
(
w
T
B
w
)
(
w
T
S
w
)
2
=
0
\frac{d}{d\mathbf{w}}J(\mathbf{w})=\frac{2\mathbf{B}\mathbf{w}(\mathbf{w}^T\mathbf{S}\mathbf{w})-2\mathbf{S}\mathbf{w}(\mathbf{w}^T\mathbf{B}\mathbf{w})}{(\mathbf{w}^T\mathbf{S}\mathbf{w})^2}=\mathbf{0}
dwdJ(w)=(wTSw)22Bw(wTSw)−2Sw(wTBw)=0
即有:
B
w
(
w
T
S
w
)
=
S
w
(
w
T
B
w
)
B
w
=
S
w
⋅
w
T
B
w
w
T
S
w
B
w
=
J
(
w
)
⋅
S
w
(
∗
)
若
S
−
1
\mathbf{S}^{-1}
S−1 存在,则
S
−
1
B
w
=
J
(
w
)
⋅
w
\mathbf{S}^{-1}\mathbf{B}\mathbf{w}=J(\mathbf{w})\cdot\mathbf{w}
S−1Bw=J(w)⋅w
故要求最大
J
(
w
)
J(\mathbf{w})
J(w) ,只需
S
−
1
B
\mathbf{S}^{-1}\mathbf{B}
S−1B 的最大特征值,
w
\mathbf{w}
w 为其特征向量。
☆ 不求特征向量求出 w \mathbf{w} w 的方法
将
B
=
(
μ
1
−
μ
2
)
(
μ
1
−
μ
2
)
T
\mathbf{B}=(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2})(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2})^{T}
B=(μ1−μ2)(μ1−μ2)T 代入
(
∗
)
(*)
(∗) 得
(
μ
1
−
μ
2
)
(
μ
1
−
μ
2
)
T
w
=
J
(
w
)
⋅
S
w
S
−
1
(
μ
1
−
μ
2
)
[
(
μ
1
−
μ
2
)
T
w
J
(
w
)
]
=
w
故只需计算
S
−
1
(
μ
1
−
μ
2
)
\mathbf{S}^{-1}(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{2})
S−1(μ1−μ2),再单位化。
事实1:如果 ( S ϕ − 1 B ϕ ) w = λ w \left(\mathbf{S}_{\phi}^{-1} \mathbf{B}_{\phi}\right) \mathbf{w}=\lambda \mathbf{w} (Sϕ−1Bϕ)w=λw,那么 w = ∑ j = 1 n a j ϕ ( x j ) \mathbf{w}=\sum\limits_{j=1}^na_j\phi(\mathbf{x}_j) w=j=1∑najϕ(xj),证明见讲稿最后两页。
令 a = ( a 1 , ⋯ , a n ) T \mathbf{a}=(a_1,\cdots,a_n)^T a=(a1,⋯,an)T 是“事实1”中的向量。
下面将 max w J ( w ) = ( m 1 − m 2 ) 2 s 1 2 + s 2 2 = w T B ϕ w w T S ϕ w \max\limits_{\mathbf{w}}J(\mathbf{w})=\frac{(m_1-m_2)^2}{s_1^2+s_2^2}=\frac{\mathbf{w}^{T} \mathbf{B}_{\phi} \mathbf{w}}{\mathbf{w}^{T} \mathbf{S}_{\phi} \mathbf{w}} wmaxJ(w)=s12+s22(m1−m2)2=wTSϕwwTBϕw 的问题转化为 max G ( a ) \max G(\mathbf{a}) maxG(a) s.t. 使用 K \mathbf{K} K 能求解。
注意到:
m
i
=
w
T
μ
i
ϕ
=
(
∑
j
=
1
n
a
j
ϕ
(
x
j
)
)
T
(
1
n
i
∑
x
i
∈
D
i
ϕ
(
x
k
)
)
=
1
n
i
∑
j
=
1
n
∑
x
k
∈
D
i
a
j
ϕ
(
x
j
)
T
ϕ
(
x
k
)
=
1
n
i
∑
j
=
1
n
∑
x
k
∈
D
i
a
j
K
(
x
j
,
x
k
)
=
a
T
m
i
其中,
m
i
=
1
n
i
(
∑
x
k
∈
D
i
K
(
x
1
,
x
k
)
∑
x
k
∈
D
i
K
(
x
2
,
x
k
)
⋮
∑
x
k
∈
D
i
K
(
x
n
,
x
k
)
)
n
×
1
\mathbf{m}_{i}=\frac{1}{n_{i}}\left(
故
(
m
1
−
m
2
)
2
=
(
w
T
μ
1
ϕ
−
w
T
μ
2
ϕ
)
2
=
(
a
T
m
1
−
a
T
m
2
)
2
=
a
T
(
m
1
−
m
2
)
(
m
1
−
m
2
)
T
a
=
a
T
M
a
(
M
\mathbf{M}
M 被称为核类间扩散矩阵)
s
1
2
=
∑
x
i
∈
D
1
∥
w
T
ϕ
(
x
i
)
−
w
T
μ
1
ϕ
∥
2
=
∑
x
i
∈
D
1
∥
w
T
ϕ
(
x
i
)
∥
2
−
2
∑
x
i
∈
D
1
w
T
ϕ
(
x
i
)
⋅
w
T
μ
1
ϕ
+
∑
x
i
∈
D
1
∥
w
T
μ
1
ϕ
∥
2
=
(
∑
x
i
∈
D
1
∥
∑
j
=
1
n
a
j
ϕ
(
x
j
)
T
ϕ
(
x
i
)
∥
2
)
−
2
⋅
n
1
⋅
∥
w
T
μ
1
ϕ
∥
2
+
n
1
⋅
∥
w
T
μ
1
ϕ
∥
2
=
(
∑
x
i
∈
D
1
a
T
K
i
K
i
T
a
)
−
n
1
⋅
a
T
m
1
m
1
T
a
=
a
T
(
(
∑
x
i
∈
D
1
K
i
K
i
T
)
−
n
1
m
1
m
1
T
)
a
=
a
T
N
1
a
类似地,令
N
2
=
(
∑
x
i
∈
D
2
K
i
K
i
T
−
n
2
m
2
m
2
T
)
\mathbf{N}_2=\left(\sum\limits_{\mathbf{x}_{i} \in \mathbf{D}_{2}} \mathbf{K}_{i} \mathbf{K}_{i}^{T}-n_{2} \mathbf{m}_{2} \mathbf{m}_{2}^{T}\right)
N2=(xi∈D2∑KiKiT−n2m2m2T)
则 s 1 2 + s 2 2 = a T ( N 1 + N 2 ) a = a T N a s_1^2+s_2^2=\mathbf{a}^{T} (\mathbf{N}_{1}+\mathbf{N}_{2}) \mathbf{a}=\mathbf{a}^{T}\mathbf{N} \mathbf{a} s12+s22=aT(N1+N2)a=aTNa
故: J ( w ) = a T M a a T N a : = G ( a ) J(\mathbf{w})=\frac{\mathbf{a}^{T}\mathbf{M} \mathbf{a}}{\mathbf{a}^{T}\mathbf{N} \mathbf{a}}:=G(\mathbf{a}) J(w)=aTNaaTMa:=G(a)
类似 20.1, M a = λ N a \mathbf{M} \mathbf{a}=\lambda\mathbf{N} \mathbf{a} Ma=λNa
若 N − 1 \mathbf{N} ^{-1} N−1 存在, N − 1 M a = λ a \mathbf{N}^{-1} \mathbf{M} \mathbf{a}=\lambda \mathbf{a} N−1Ma=λa, λ \lambda λ 取 N − 1 M \mathbf{N}^{-1} \mathbf{M} N−1M 的最大特征值, a \mathbf{a} a 是相应的特征向量。
若 N − 1 \mathbf{N} ^{-1} N−1 不存在,MATLAB 求广义逆
最后考查 w T w = 1 \mathbf{w}^T\mathbf{w}=1 wTw=1,即
(
∑
j
=
1
n
a
j
ϕ
(
x
j
)
)
T
(
∑
i
=
1
n
a
i
ϕ
(
x
i
)
)
=
1
∑
j
=
1
n
∑
i
=
1
n
a
j
a
i
ϕ
(
x
j
)
T
ϕ
(
x
i
)
=
1
∑
j
=
1
n
∑
i
=
1
n
a
j
a
i
K
(
x
i
,
x
j
)
=
1
a
T
K
a
=
1
求出
N
−
1
M
\mathbf{N}^{-1} \mathbf{M}
N−1M 的特征向量
a
\mathbf{a}
a 后,
a
←
a
a
T
K
a
\mathbf{a}\leftarrow \frac{\mathbf{a}}{\sqrt{\mathbf{a}^T\mathbf{K}\mathbf{a}}}
a←aTKaa 以保证
w
T
w
=
1
\mathbf{w}^T\mathbf{w}=1
wTw=1