上一节介绍了使用EM算法求解高斯混合模型参数的E步操作,本节将继续介绍后续的M步操作。
表示样本
x
(
i
)
x^{(i)}
x(i)属于离散常量
z
j
z_j
zj对应概率分布
N
(
μ
j
,
Σ
j
)
\mathcal N(\mu_j,\Sigma_j)
N(μj,Σj)的概率结果。
表示
x
(
i
)
x^{(i)}
x(i)属于离散常量
z
j
z_j
zj对应概率分布
N
(
μ
j
,
Σ
j
)
\mathcal N(\mu_j,\Sigma_j)
N(μj,Σj)的期望信息。
表示
x
(
i
)
x^{(i)}
x(i)属于离散常量
z
j
z_j
zj对应概率分布
N
(
μ
j
,
Σ
j
)
\mathcal N(\mu_j,\Sigma_j)
N(μj,Σj)的协方差信息。
重新观察E步结果:
L
(
θ
,
θ
(
t
)
)
=
∑
i
=
1
N
∑
z
(
i
)
[
log
p
z
(
i
)
⋅
N
(
x
(
i
)
∣
μ
z
(
i
)
,
Σ
z
(
i
)
)
]
⋅
p
z
(
i
)
⋅
N
(
x
(
i
)
∣
μ
z
(
i
)
,
Σ
z
(
i
)
)
∑
k
=
1
K
p
k
⋅
N
(
x
(
i
)
∣
μ
k
,
Σ
k
)
\mathcal L(\theta,\theta^{(t)}) = \sum_{i=1}^N \sum_{z^{(i)}} \left[\log p_{z^{(i)}} \cdot\mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})\right] \cdot \frac{p_{z^{(i)}} \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})}{\sum_{k=1}^{\mathcal K} p_k \cdot \mathcal N(x^{(i)} \mid \mu_k,\Sigma_k)}
L(θ,θ(t))=i=1∑Nz(i)∑[logpz(i)⋅N(x(i)∣μz(i),Σz(i))]⋅∑k=1Kpk⋅N(x(i)∣μk,Σk)pz(i)⋅N(x(i)∣μz(i),Σz(i))
其中
P
(
X
,
Z
∣
θ
)
P(\mathcal X,\mathcal Z \mid \theta)
P(X,Z∣θ)部分表示如下:
log
P
(
x
(
i
)
,
z
(
i
)
∣
θ
)
=
log
[
p
z
(
i
)
⋅
N
(
x
(
i
)
∣
μ
z
(
i
)
,
Σ
z
(
i
)
)
]
\log P(x^{(i)},z^{(i)} \mid \theta) = \log \left[p_{z^{(i)}} \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})\right]
logP(x(i),z(i)∣θ)=log[pz(i)⋅N(x(i)∣μz(i),Σz(i))]
θ
\theta
θ具体指(共三项):
p
z
(
i
)
,
μ
z
(
i
)
,
Σ
z
(
i
)
(
i
=
1
,
2
,
⋯
,
N
)
p_{z^{(i)}},\mu_{z^{(i)}},\Sigma_{z^{(i)}} \quad (i=1,2,\cdots,N)
pz(i),μz(i),Σz(i)(i=1,2,⋯,N)
P
(
Z
∣
X
,
θ
(
t
)
)
P(\mathcal Z \mid \mathcal X,\theta^{(t)})
P(Z∣X,θ(t))部分表示如下:
P
(
z
(
i
)
∣
z
(
i
)
,
θ
(
t
)
)
=
p
z
(
i
)
⋅
N
(
x
(
i
)
∣
μ
z
(
i
)
,
Σ
z
(
i
)
)
∑
k
=
1
K
p
k
⋅
N
(
x
(
i
)
∣
μ
k
,
Σ
k
)
P(\mathcal z^{(i)} \mid z^{(i)},\theta^{(t)}) = \frac{p_{z^{(i)}} \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})}{\sum_{k=1}^{\mathcal K} p_k \cdot \mathcal N(x^{(i)} \mid \mu_k,\Sigma_k)}
P(z(i)∣z(i),θ(t))=∑k=1Kpk⋅N(x(i)∣μk,Σk)pz(i)⋅N(x(i)∣μz(i),Σz(i))
θ
(
t
)
\theta^{(t)}
θ(t)具体指(共6项):
p
z
(
i
)
,
μ
z
(
i
)
,
Σ
z
(
i
)
(
i
=
1
,
2
,
⋯
,
N
)
p
k
,
μ
k
,
Σ
k
(
k
=
1
,
2
,
⋯
,
K
)
p_{z^{(i)}},\mu_{z^{(i)}},\Sigma_{z^{(i)}} \quad (i=1,2,\cdots,N) \\ p_k,\mu_k,\Sigma_k \quad (k=1,2,\cdots,\mathcal K)
pz(i),μz(i),Σz(i)(i=1,2,⋯,N)pk,μk,Σk(k=1,2,⋯,K)
由于
θ
(
t
)
\theta^{(t)}
θ(t)是上一次迭代得到的参数结果,是已知量;因此将
L
(
θ
,
θ
(
t
)
)
\mathcal L(\theta,\theta^{(t)})
L(θ,θ(t))中的
θ
(
t
)
\theta^{(t)}
θ(t)项修正过来:
例如
p
z
(
i
)
(
t
)
p_{z^{(i)}}^{(t)}
pz(i)(t)是
θ
(
t
)
\theta^{(t)}
θ(t)的一个解,区别于对应
θ
\theta
θ的解
p
z
(
i
)
p_{z^{(i)}}
pz(i)。
L
(
θ
,
θ
(
t
)
)
=
∑
z
(
i
)
∑
i
=
1
N
[
log
p
z
(
i
)
⋅
N
(
x
(
i
)
∣
μ
z
(
i
)
,
Σ
z
(
i
)
)
]
⋅
p
z
(
i
)
(
t
)
⋅
N
(
x
(
i
)
∣
μ
z
(
i
)
(
t
)
,
Σ
z
(
i
)
(
t
)
)
∑
k
=
1
K
p
k
(
t
)
⋅
N
(
x
(
i
)
∣
μ
k
(
t
)
,
Σ
k
(
t
)
)
\mathcal L(\theta,\theta^{(t)}) = \sum_{z^{(i)}} \sum_{i=1}^N\left[\log p_{z^{(i)}} \cdot\mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})\right] \cdot \frac{p_{z^{(i)}}^{(t)} \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}}^{(t)},\Sigma_{z^{(i)}}^{(t)})}{\sum_{k=1}^{\mathcal K} p_k^{(t)} \cdot \mathcal N(x^{(i)} \mid \mu_k^{(t)},\Sigma_k^{(t)})}
L(θ,θ(t))=z(i)∑i=1∑N[logpz(i)⋅N(x(i)∣μz(i),Σz(i))]⋅∑k=1Kpk(t)⋅N(x(i)∣μk(t),Σk(t))pz(i)(t)⋅N(x(i)∣μz(i)(t),Σz(i)(t))
实际上
p
z
(
i
)
(
t
)
⋅
N
(
x
(
i
)
∣
μ
z
(
i
)
(
t
)
,
Σ
z
(
i
)
(
t
)
)
∑
k
=
1
K
p
k
(
t
)
⋅
N
(
x
(
i
)
∣
μ
k
(
t
)
,
Σ
k
(
t
)
)
\frac{p_{z^{(i)}}^{(t)} \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}}^{(t)},\Sigma_{z^{(i)}}^{(t)})}{\sum_{k=1}^{\mathcal K} p_k^{(t)} \cdot \mathcal N(x^{(i)} \mid \mu_k^{(t)},\Sigma_k^{(t)})}
∑k=1Kpk(t)⋅N(x(i)∣μk(t),Σk(t))pz(i)(t)⋅N(x(i)∣μz(i)(t),Σz(i)(t))是由
θ
(
t
)
\theta^{(t)}
θ(t)构成的量,它的结果不会对当前迭代步骤
θ
\theta
θ的最优值产生影响。因此,在这里将其缩写成
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
P(z^{(i)} \mid x^{(i)},\theta^{(t)})
P(z(i)∣x(i),θ(t))。
由于
z
i
z_i
zi本质上是样本
x
(
i
)
x^{(i)}
x(i)所有可能属于的高斯分布组成的向量,即:
z
(
i
)
=
(
z
1
(
i
)
,
z
2
(
i
)
,
⋯
,
z
K
(
i
)
)
T
z^{(i)} = (z_1^{(i)},z_2^{(i)},\cdots,z_{\mathcal K}^{(i)})^{T}
z(i)=(z1(i),z2(i),⋯,zK(i))T
并且对应的
p
z
(
i
)
,
μ
z
(
i
)
,
Σ
z
(
i
)
p_{z^{(i)}},\mu_{z^{(i)}},\Sigma_{z^{(i)}}
pz(i),μz(i),Σz(i)分别表示如下:
查看详情移步至
传送门
p
z
(
i
)
=
(
p
1
(
i
)
,
p
2
(
i
)
,
⋯
,
p
K
(
i
)
)
T
μ
z
(
i
)
=
(
μ
1
(
i
)
,
μ
2
(
i
)
,
⋯
,
μ
K
(
i
)
)
T
Σ
z
(
i
)
=
(
Σ
1
(
i
)
,
Σ
2
(
i
)
,
⋯
,
Σ
K
(
i
)
)
T
p_{z^{(i)}} = (p_1^{(i)},p_2^{(i)},\cdots,p_{\mathcal K}^{(i)})^{T} \\ \mu_{z^{(i)}} = (\mu_1^{(i)},\mu_2^{(i)},\cdots,\mu_{\mathcal K}^{(i)})^{T} \\ \Sigma_{z^{(i)}} = (\Sigma_1^{(i)},\Sigma_2^{(i)},\cdots,\Sigma_{\mathcal K}^{(i)})^{T}
pz(i)=(p1(i),p2(i),⋯,pK(i))Tμz(i)=(μ1(i),μ2(i),⋯,μK(i))TΣz(i)=(Σ1(i),Σ2(i),⋯,ΣK(i))T
因此,
∑
z
(
i
)
p
z
(
i
)
\sum_{z^{(i)}} p_{z^{(i)}}
∑z(i)pz(i)分别表示如下:
∑
z
(
i
)
p
z
(
i
)
=
∑
k
=
1
K
p
k
(
i
)
∑
z
(
i
)
μ
z
(
i
)
=
∑
k
=
1
K
μ
k
(
i
)
∑
z
(
i
)
Σ
z
(
i
)
=
∑
k
=
1
K
Σ
k
(
i
)
\sum_{z^{(i)}} p_{z^{(i)}} = \sum_{k=1}^{\mathcal K} p_k^{(i)} \quad \sum_{z^{(i)}} \mu_{z^{(i)}} = \sum_{k=1}^{\mathcal K} \mu_{k}^{(i)} \quad \sum_{z^{(i)}} \Sigma_{z^{(i)}} = \sum_{k=1}^{\mathcal K} \Sigma_{k}^{(i)}
z(i)∑pz(i)=k=1∑Kpk(i)z(i)∑μz(i)=k=1∑Kμk(i)z(i)∑Σz(i)=k=1∑KΣk(i)
基于上述公式,对
L
(
θ
,
θ
(
t
)
)
\mathcal L(\theta,\theta^{(t)})
L(θ,θ(t))进行变换:
L
(
θ
,
θ
(
t
)
)
=
∑
k
=
1
K
∑
i
=
1
N
log
[
p
k
(
i
)
⋅
N
(
x
(
i
)
∣
μ
k
(
i
)
,
Σ
k
(
i
)
)
]
P
(
z
(
i
)
=
z
k
∣
x
(
i
)
,
θ
(
t
)
)
=
∑
k
=
1
K
∑
i
=
1
N
[
log
p
k
(
i
)
+
log
N
(
x
(
i
)
∣
μ
k
(
i
)
,
Σ
k
(
i
)
)
]
P
(
z
(
i
)
=
z
k
∣
x
(
i
)
,
θ
(
t
)
)
L(θ,θ(t))=K∑k=1N∑i=1log[p(i)k⋅N(x(i)∣μ(i)k,Σ(i)k)]P(z(i)=zk∣x(i),θ(t))=K∑k=1N∑i=1[logp(i)k+logN(x(i)∣μ(i)k,Σ(i)k)]P(z(i)=zk∣x(i),θ(t))
我们以求解
p
k
(
i
)
p_k^{(i)}
pk(i)在当前时刻的最优解
[
p
k
(
i
)
]
(
t
+
1
)
[p_k^{(i)}]^{(t+1)}
[pk(i)](t+1)为例。上述式子中只有方括号内第一项包含
p
k
(
i
)
p_k^{(i)}
pk(i),因此则有:
[
p
k
(
i
)
]
(
t
+
1
)
=
arg
max
p
k
(
i
)
∑
k
=
1
K
∑
i
=
1
N
log
p
k
(
i
)
P
(
z
(
i
)
=
z
k
∣
x
(
i
)
,
θ
(
t
)
)
[p_k^{(i)}]^{(t+1)} = \mathop{\arg\max}\limits_{p_k^{(i)}}\sum_{k=1}^{\mathcal K} \sum_{i=1}^{N} \log p_k^{(i)} P(z^{(i)} = z_k \mid x^{(i)},\theta^{(t)})
[pk(i)](t+1)=pk(i)argmaxk=1∑Ki=1∑Nlogpk(i)P(z(i)=zk∣x(i),θ(t))
并且
p
k
(
i
)
p_k^{(i)}
pk(i)是概率结果,因此
p
k
(
i
)
p_k^{(i)}
pk(i)需要满足约束条件:
∑
k
=
1
K
p
k
(
i
)
=
1
\sum_{k=1}^{\mathcal K} p_k^{(i)} = 1
k=1∑Kpk(i)=1
至此,求解
p
z
(
i
)
p_{z^{(i)}}
pz(i)的最优解
p
z
(
i
)
(
t
+
1
)
p_{z^{(i)}}^{(t+1)}
pz(i)(t+1)即:
p
z
(
i
)
(
t
+
1
)
=
(
[
p
1
(
i
)
]
(
t
+
1
)
,
[
p
2
(
i
)
]
(
t
+
1
)
,
⋯
,
[
p
K
(
i
)
]
(
t
+
1
)
)
T
p_{z^{(i)}}^{(t+1)} = ([p_1^{(i)}]^{(t+1)},[p_2^{(i)}]^{(t+1)},\cdots,[p_{\mathcal K}^{(i)}]^{(t+1)})^{T}
pz(i)(t+1)=([p1(i)](t+1),[p2(i)](t+1),⋯,[pK(i)](t+1))T
其中任意一项
[
p
k
(
i
)
]
(
t
+
1
)
(
k
=
1
,
2
,
⋯
,
K
)
[p_k^{(i)}]^{(t+1)}(k=1,2,\cdots,\mathcal K)
[pk(i)](t+1)(k=1,2,⋯,K)使用如下优化函数进行表示:
{
arg
max
p
k
(
i
)
∑
k
=
1
K
∑
i
=
1
N
log
p
k
(
i
)
⋅
P
(
z
(
i
)
=
z
k
∣
x
(
i
)
,
θ
(
t
)
)
s
.
t
.
∑
k
=
1
K
p
k
(
i
)
=
1
{argmaxp(i)k∑Kk=1∑Ni=1logp(i)k⋅P(z(i)=zk∣x(i),θ(t))s.t.∑Kk=1p(i)k=1
使用拉格朗日乘数法进行求解:
观察第一个连加号
∑
k
=
1
K
\sum_{k=1}^{\mathcal K}
∑k=1K,只有唯一一个
k
k
k和
p
k
(
i
)
p_k^{(i)}
pk(i)相关,其余均为常数;
条件概率密度积分~约束条件~
同理,可求出其他隐变量 z ( 1 ) , z ( 2 ) , ⋯ , z ( N ) z^{(1)},z^{(2)},\cdots,z^{(N)} z(1),z(2),⋯,z(N)的离散参数的迭代概率结果。
p z ( 1 ) ( t + 1 ) , p z ( 2 ) ( t + 1 ) , ⋯ , p z ( N ) ( t + 1 ) p_{z^{(1)}}^{(t+1)},p_{z^{(2)}}^{(t+1)},\cdots,p_{z^{(N)}}^{(t+1)} pz(1)(t+1),pz(2)(t+1),⋯,pz(N)(t+1)
至此,高斯混合模型部分介绍结束。下一节将介绍隐马尔可夫模型
推导过程本身不是重点,只需要知道‘高斯混合模型’可以使用‘狭义EM’直接求解即可。高斯混合模型作为最经典的‘概率生成模型’,它的‘生成模型思想’需要留意。