上一节介绍了尝试使用极大似然估计求解高斯混合模型的模型参数,但无法求出解析解。本节将介绍使用EM算法求解高斯混合模型的模型参数。
令
X
\mathcal X
X表示观测数据(Observed Data),共包含
N
N
N个样本点,并假设 任意样本之间独立同分布:
X
=
{
x
(
1
)
,
x
(
2
)
,
⋯
,
x
(
N
)
}
x
(
i
)
∼
i.i.d.
x
(
j
)
(
x
(
i
)
,
x
(
j
)
∈
X
;
i
≠
j
)
\mathcal X = \{x^{(1)},x^{(2)},\cdots, x^{(N)}\} \\ x^{(i)} \overset{\text{i.i.d.}}{\sim}x^{(j)} \quad (x^{(i)},x^{(j)} \in \mathcal X;i\neq j)
X={x(1),x(2),⋯,x(N)}x(i)∼i.i.d.x(j)(x(i),x(j)∈X;i=j)
任意一个样本点
x
(
i
)
x^{(i)}
x(i)均对应一个隐变量
z
(
i
)
z^{(i)}
z(i)。从样本数量角度观察,隐变量集合
Z
\mathcal Z
Z表示如下:
Z
=
{
z
(
1
)
,
z
(
2
)
,
⋯
,
z
(
N
)
}
\mathcal Z = \{z^{(1)},z^{(2)},\cdots,z^{(N)}\}
Z={z(1),z(2),⋯,z(N)}
称
(
X
,
Z
)
(\mathcal X,\mathcal Z)
(X,Z)为完整数据(Complete Data),样本数量角度表示如下:
(
X
,
Z
)
=
{
(
x
(
1
)
,
z
(
1
)
)
,
⋯
,
(
x
(
N
)
,
z
(
N
)
)
}
(\mathcal X,\mathcal Z) = \{(x^{(1)},z^{(1)}),\cdots,(x^{(N)},z^{(N)})\}
(X,Z)={(x(1),z(1)),⋯,(x(N),z(N))}
从变量分布的角度观察,隐变量
Z
\mathcal Z
Z是基于
K
\mathcal K
K个参数的离散分布,各参数及对应概率分布表示如下:
| Z \mathcal Z Z | z 1 z_1 z1 | z 2 z_2 z2 | ⋯ \cdots ⋯ | z K z_{\mathcal K} zK |
|---|---|---|---|---|
| P ( Z ) P(\mathcal Z) P(Z) | p 1 p_1 p1 | p 2 p_2 p2 | ⋯ \cdots ⋯ | p K p_{\mathcal K} pK |
并满足:
∑
k
=
1
K
p
k
=
1
\sum_{k=1}^{\mathcal K} p_k = 1
k=1∑Kpk=1
任意
z
j
∈
Z
z_j \in \mathcal Z
zj∈Z均唯一对应一个高斯分布。换句话说,给定隐变量标签
z
j
∈
Z
z_j \in \mathcal Z
zj∈Z的条件下,
z
j
z_j
zj标签下的样本数据
x
x
x服从高斯分布。因而共包含
K
\mathcal K
K个高斯分布:
| Z \mathcal Z Z | z 1 z_1 z1 | z 2 z_2 z2 | ⋯ \cdots ⋯ | z K z_{\mathcal K} zK |
|---|---|---|---|---|
| P ( X ∣ Z ) P(\mathcal X \mid \mathcal Z) P(X∣Z) | N ( μ 1 , Σ 1 ) \mathcal N(\mu_1,\Sigma_1) N(μ1,Σ1) | N ( μ 2 , Σ 2 ) \mathcal N(\mu_2,\Sigma_2) N(μ2,Σ2) | ⋯ \cdots ⋯ | N ( μ K , Σ K ) \mathcal N(\mu_{\mathcal K},\Sigma_{\mathcal K}) N(μK,ΣK) |
数学符号表达即:
P
(
X
∣
Z
=
z
k
)
∼
N
(
X
∣
μ
k
,
Σ
k
)
(
k
=
1
,
2
,
⋯
,
K
)
P(\mathcal X\mid \mathcal Z = z_k) \sim \mathcal N(\mathcal X \mid \mu_{k},\Sigma_k) \quad (k=1,2,\cdots,\mathcal K)
P(X∣Z=zk)∼N(X∣μk,Σk)(k=1,2,⋯,K)
因此,高斯混合模型的概率模型
P
(
X
)
P(\mathcal X)
P(X)表达如下:
P
(
X
)
=
∑
Z
P
(
X
∣
Z
=
z
k
)
P
(
Z
=
z
k
)
=
∑
k
=
1
K
N
(
X
∣
μ
k
,
Σ
k
)
⋅
p
k
P(X)=∑ZP(X∣Z=zk)P(Z=zk)=K∑k=1N(X∣μk,Σk)⋅pk
概率模型的模型参数
θ
\theta
θ表示如下:
θ
=
{
p
1
,
⋯
,
p
K
,
μ
1
,
⋯
,
μ
K
,
Σ
1
,
⋯
,
Σ
K
}
\theta = \{p_1,\cdots,p_{\mathcal K},\mu_1,\cdots,\mu_{\mathcal K},\Sigma_1,\cdots,\Sigma_{\mathcal K}\}
θ={p1,⋯,pK,μ1,⋯,μK,Σ1,⋯,ΣK}
EM算法是求解概率模型
P
(
X
∣
θ
)
P(\mathcal X \mid \theta)
P(X∣θ)模型参数的一种方法,它的底层是极大似然估计,它的迭代求解公式具体表示如下:
θ
(
t
+
1
)
=
arg
max
θ
∫
Z
log
P
(
X
,
Z
∣
θ
)
⋅
P
(
Z
∣
X
,
θ
(
t
)
)
d
Z
=
arg
max
θ
E
Z
∣
X
,
θ
[
log
P
(
X
,
Z
∣
θ
)
]
θ(t+1)=argmaxθ∫ZlogP(X,Z∣θ)⋅P(Z∣X,θ(t))dZ=argmaxθEZ∣X,θ[logP(X,Z∣θ)]
基于上述公式,可以将EM算法分成两个步骤:
E步、M步交替进行,最终迭代收敛至最优解(至少局部最优)。
EM算法中符号表示与高斯混合模型中的符号表示 对比如下:
等号左端是‘EM算法’的符号表示;等号右端是‘高斯混合模型’的符号表示。
P
(
X
,
Z
∣
θ
)
=
P
(
Z
=
z
j
)
⋅
P
(
X
∣
Z
=
z
j
)
=
p
Z
⋅
N
(
X
∣
μ
Z
,
Σ
Z
)
=
∏
i
=
1
N
p
z
(
i
)
⋅
N
(
x
(
i
)
∣
μ
z
(
i
)
,
Σ
z
(
i
)
)
P
(
Z
∣
X
,
θ
)
=
P
(
X
,
Z
)
P
(
X
)
=
∏
i
=
1
N
p
z
(
i
)
⋅
N
(
x
(
i
)
∣
μ
z
(
i
)
,
Σ
z
(
i
)
)
∑
k
=
1
K
p
k
⋅
N
(
X
∣
μ
k
,
Σ
k
)
P(X,Z∣θ)=P(Z=zj)⋅P(X∣Z=zj)=pZ⋅N(X∣μZ,ΣZ)=N∏i=1pz(i)⋅N(x(i)∣μz(i),Σz(i))P(Z∣X,θ)=P(X,Z)P(X)=∏Ni=1pz(i)⋅N(x(i)∣μz(i),Σz(i))∑Kk=1pk⋅N(X∣μk,Σk)
已知
L
(
θ
,
θ
(
t
)
)
\mathcal L(\theta,\theta^{(t)})
L(θ,θ(t))函数表示如下:
L
(
θ
,
θ
(
t
)
)
=
∫
Z
log
P
(
X
,
Z
∣
θ
)
⋅
P
(
Z
∣
X
,
θ
(
t
)
)
d
Z
\mathcal L(\theta,\theta^{(t)}) = \int_{\mathcal Z} \log P(\mathcal X,\mathcal Z \mid \theta) \cdot P(\mathcal Z \mid \mathcal X,\theta^{(t)}) d\mathcal Z
L(θ,θ(t))=∫ZlogP(X,Z∣θ)⋅P(Z∣X,θ(t))dZ
将
P
(
X
,
Z
∣
θ
)
,
P
(
Z
∣
X
,
θ
)
P(\mathcal X,\mathcal Z \mid \theta),P(\mathcal Z \mid \mathcal X,\theta)
P(X,Z∣θ),P(Z∣X,θ)代入上式:
由于‘高斯混合模型’隐变量
Z
\mathcal Z
Z是离散型参数,因而将
∫
\int
∫符号改为
∑
\sum
∑符号,并且各样本之间服从’独立同分布‘。
∑
Z
log
∏
i
=
1
N
P
(
x
(
i
)
,
z
(
i
)
∣
θ
)
⋅
∏
i
=
1
N
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
\sum_{\mathcal Z} \log \prod_{i=1}^N P(x^{(i)},z^{(i)} \mid \theta) \cdot \prod_{i=1}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)})
Z∑logi=1∏NP(x(i),z(i)∣θ)⋅i=1∏NP(z(i)∣x(i),θ(t))
将
log
∏
i
=
1
N
P
(
x
(
i
)
,
z
(
i
)
∣
θ
)
\log \prod_{i=1}^N P(x^{(i)},z^{(i)} \mid \theta)
log∏i=1NP(x(i),z(i)∣θ)进行变换,并将
∑
Z
\sum_{\mathcal Z}
∑Z展开:
∑
z
(
1
)
,
z
(
2
)
,
⋯
,
z
(
N
)
∑
i
=
1
N
log
P
(
x
(
i
)
,
z
(
i
)
∣
θ
)
⋅
∏
i
=
1
N
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
\sum_{z^{(1)},z^{(2)},\cdots,z^{(N)}} \sum_{i=1}^N \log P(x^{(i)},z^{(i)} \mid \theta) \cdot \prod_{i=1}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)})
z(1),z(2),⋯,z(N)∑i=1∑NlogP(x(i),z(i)∣θ)⋅i=1∏NP(z(i)∣x(i),θ(t))
关于
∑
z
(
1
)
,
z
(
2
)
,
⋯
,
z
(
N
)
\sum_{z^{(1)},z^{(2)},\cdots,z^{(N)}}
∑z(1),z(2),⋯,z(N)这个形式 需要解释一下,我们之前并没有讨论过
z
(
i
)
(
i
=
1
,
2
,
⋯
,
N
)
z^{(i)}(i=1,2,\cdots,N)
z(i)(i=1,2,⋯,N)到底是什么,只是知道每个样本下
x
(
i
)
x^{(i)}
x(i)均对应一个隐变量
z
(
i
)
z^{(i)}
z(i)。
z
(
i
)
z^{(i)}
z(i)不是一个具体数值,而是一个向量。它表示样本
x
(
i
)
x^{(i)}
x(i)“可能属于的高斯分布”所组成的向量。示例:
依然假设样本空间内一共包含
K
\mathcal K
K个高斯分布,样本
x
(
1
)
x^{(1)}
x(1)对应的隐变量
z
(
1
)
z^{(1)}
z(1)表示如下:
z
(
1
)
=
(
z
1
(
1
)
,
z
2
(
1
)
,
⋯
,
z
K
(
1
)
)
T
z^{(1)} = (z_1^{(1)},z_2^{(1)},\cdots,z_{\mathcal K}^{(1)})^{T}
z(1)=(z1(1),z2(1),⋯,zK(1))T
其中,
z
k
(
1
)
(
k
=
1
,
2
,
⋯
,
K
)
z_{k}^{(1)}(k=1,2,\cdots,\mathcal K)
zk(1)(k=1,2,⋯,K)表示 样本
x
(
1
)
x^{(1)}
x(1)可能属于编号为
k
k
k的高斯分布。注意,
z
k
(
1
)
z_k^{(1)}
zk(1)只表示高斯分布的编号(或者称为离散参数),它不表示概率。 它是如何表示概率的?结果如下表:
| z ( 1 ) z^{(1)} z(1) | z 1 ( 1 ) z_1^{(1)} z1(1) | z 2 ( 1 ) z_2^{(1)} z2(1) | ⋯ \cdots ⋯ | z K ( 1 ) z_{\mathcal K}^{(1)} zK(1) |
|---|---|---|---|---|
| P ( z ( 1 ) ) P(z^{(1)}) P(z(1)) | p 1 ( 1 ) p_1^{(1)} p1(1) | p 2 ( 1 ) p_2^{(1)} p2(1) | ⋯ \cdots ⋯ | p K ( 1 ) p_{\mathcal K}^{(1)} pK(1) |
p
j
(
i
)
p_j^{(i)}
pj(i)是样本点
x
(
i
)
x^{(i)}
x(i)指向编号为
z
j
z_j
zj的隐变量对应的高斯分布
N
(
μ
j
,
Σ
j
)
\mathcal N(\mu_j,\Sigma_j)
N(μj,Σj)的概率,而
P
(
z
(
i
)
)
P(z^{(i)})
P(z(i))表示
K
\mathcal K
K个概率结果组成的向量。用数学语言表达即:
p
j
(
i
)
=
P
(
x
(
i
)
→
z
j
)
=
P
(
x
(
i
)
∈
N
(
μ
j
,
Σ
j
)
)
P
(
z
(
i
)
)
=
(
p
1
(
i
)
,
p
2
(
i
)
,
⋯
,
p
K
(
i
)
)
T
p_j^{(i)} = P(x^{(i)} \to z_j) = P(x^{(i)} \in \mathcal N(\mu_j,\Sigma_j)) \\ P(z^{(i)}) = (p_1^{(i)},p_2^{(i)},\cdots,p_{\mathcal K}^{(i)}) ^{T}
pj(i)=P(x(i)→zj)=P(x(i)∈N(μj,Σj))P(z(i))=(p1(i),p2(i),⋯,pK(i))T
同样存在这种现象的不仅仅是概率,还有均值、协方差:
由于 ∑ i = 1 N log P ( x ( i ) , z ( i ) ∣ θ ) \sum_{i=1}^N \log P(x^{(i)},z^{(i)}\mid \theta) ∑i=1NlogP(x(i),z(i)∣θ)中隐变量的形式是 z ( i ) ( i = 1 , 2 , ⋯ , N ) z^{(i)}(i=1,2,\cdots,N) z(i)(i=1,2,⋯,N)而不是 z j ( j = 1 , 2 , ⋯ , K ) z_j(j=1,2,\cdots,\mathcal K) zj(j=1,2,⋯,K)因此对 ∑ Z \sum_{\mathcal Z} ∑Z的展开不是 ∑ z 1 , z 2 , ⋯ , z K \sum_{z_1,z_2,\cdots,z_{\mathcal K}} ∑z1,z2,⋯,zK而是 ∑ z ( 1 ) , z ( 2 ) , ⋯ , z ( N ) \sum_{z^{(1)},z^{(2)},\cdots,z^{(N)}} ∑z(1),z(2),⋯,z(N)。
继续将
∑
i
=
1
N
log
P
(
x
(
i
)
,
z
(
i
)
∣
θ
)
\sum_{i=1}^N \log P(x^{(i)},z^{(i)}\mid \theta)
∑i=1NlogP(x(i),z(i)∣θ)展开,展开结果如下:
L
(
θ
,
θ
(
t
)
)
=
∑
z
(
1
)
,
z
(
2
)
,
⋯
,
z
(
N
)
[
log
P
(
x
(
1
)
,
z
(
1
)
∣
θ
)
+
P
(
x
(
2
)
,
z
(
2
)
∣
θ
)
+
⋯
+
P
(
x
(
N
)
,
z
(
N
)
∣
θ
)
]
⋅
∏
i
=
1
N
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
=
[
∑
z
(
1
)
,
z
(
2
)
,
⋯
,
z
(
N
)
log
P
(
x
(
1
)
,
z
(
1
)
∣
θ
)
∏
i
=
1
N
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
]
+
⋯
+
[
∑
z
(
1
)
,
z
(
2
)
,
⋯
,
z
(
N
)
log
P
(
x
(
N
)
,
z
(
N
)
∣
θ
)
∏
i
=
1
N
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
]
L(θ,θ(t))=∑z(1),z(2),⋯,z(N)[logP(x(1),z(1)∣θ)+P(x(2),z(2)∣θ)+⋯+P(x(N),z(N)∣θ)]⋅N∏i=1P(z(i)∣x(i),θ(t))=[∑z(1),z(2),⋯,z(N)logP(x(1),z(1)∣θ)N∏i=1P(z(i)∣x(i),θ(t))]+⋯+[∑z(1),z(2),⋯,z(N)logP(x(N),z(N)∣θ)N∏i=1P(z(i)∣x(i),θ(t))]
基于上述结果,仅观察第一项:
∑
z
(
1
)
,
z
(
2
)
,
⋯
,
z
(
N
)
log
P
(
x
(
1
)
,
z
(
1
)
∣
θ
)
∏
i
=
1
N
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
\sum_{z^{(1)},z^{(2)},\cdots,z^{(N)}} \log P(x^{(1)},z^{(1)} \mid \theta) \prod_{i=1}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)})
z(1),z(2),⋯,z(N)∑logP(x(1),z(1)∣θ)i=1∏NP(z(i)∣x(i),θ(t))
观察
∏
i
=
1
N
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
\prod_{i=1}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)})
∏i=1NP(z(i)∣x(i),θ(t)),发现只有第一项
P
(
z
(
1
)
∣
x
(
1
)
,
θ
(
t
)
)
P(z^{(1)} \mid x^{(1)},\theta^{(t)})
P(z(1)∣x(1),θ(t))和
z
(
1
)
z^{(1)}
z(1)相关;因此,将上式表示为如下形式:
∑
z
(
1
)
,
z
(
2
)
,
⋯
,
z
(
N
)
log
P
(
x
(
1
)
,
z
(
1
)
∣
θ
)
⋅
P
(
z
(
1
)
∣
x
(
1
)
,
θ
(
t
)
)
∏
i
=
2
N
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
=
∑
z
(
1
)
[
log
P
(
x
(
1
)
,
z
(
1
)
∣
θ
)
⋅
P
(
z
(
1
)
∣
x
(
1
)
,
θ
(
t
)
)
]
⋅
∑
z
(
2
)
,
⋯
,
z
(
N
)
[
∏
i
=
2
N
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
]
∑z(1),z(2),⋯,z(N)logP(x(1),z(1)∣θ)⋅P(z(1)∣x(1),θ(t))N∏i=2P(z(i)∣x(i),θ(t))=∑z(1)[logP(x(1),z(1)∣θ)⋅P(z(1)∣x(1),θ(t))]⋅∑z(2),⋯,z(N)[N∏i=2P(z(i)∣x(i),θ(t))]
观察
∑
z
(
2
)
,
⋯
,
z
(
N
)
[
∏
i
=
2
N
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
]
\sum_{z^{(2)},\cdots,z^{(N)}} \left[\prod_{i=2}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)})\right]
∑z(2),⋯,z(N)[∏i=2NP(z(i)∣x(i),θ(t))],它可以展开成如下形式:
∑
z
(
2
)
,
⋯
,
z
(
N
)
[
∏
i
=
2
N
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
]
=
∑
z
(
2
)
,
⋯
,
z
(
N
)
[
P
(
z
(
2
)
∣
x
(
2
)
,
θ
(
t
)
)
×
⋯
×
P
(
z
(
N
)
∣
x
(
N
)
,
θ
(
t
)
)
]
=
∑
z
(
2
)
P
(
z
(
2
)
∣
x
(
2
)
,
θ
(
t
)
)
×
⋯
×
∑
z
(
N
)
P
(
z
(
N
)
∣
x
(
N
)
,
θ
(
t
)
)
∑z(2),⋯,z(N)[N∏i=2P(z(i)∣x(i),θ(t))]=∑z(2),⋯,z(N)[P(z(2)∣x(2),θ(t))×⋯×P(z(N)∣x(N),θ(t))]=∑z(2)P(z(2)∣x(2),θ(t))×⋯×∑z(N)P(z(N)∣x(N),θ(t))
上述结果的任意一项
∑
z
(
j
)
P
(
z
(
j
)
∣
x
(
j
)
,
θ
(
t
)
)
(
j
=
2
,
⋯
,
N
)
\sum_{z^{(j)}} P(z^{(j)} \mid x^{(j)},\theta^{(t)}) (j=2,\cdots,N)
∑z(j)P(z(j)∣x(j),θ(t))(j=2,⋯,N)都是 基于离散型变量的概率密度积分,因此则有:
∑
z
(
2
)
P
(
z
(
2
)
∣
x
(
2
)
,
θ
(
t
)
)
=
⋯
=
∑
z
(
N
)
P
(
z
(
N
)
∣
x
(
N
)
,
θ
(
t
)
)
=
1
∑
z
(
2
)
,
⋯
,
z
(
N
)
[
∏
i
=
2
N
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
]
=
1
×
⋯
×
1
=
1
\sum_{z^{(2)}} P(z^{(2)} \mid x^{(2)},\theta^{(t)}) = \cdots =\sum_{z^{(N)}} P(z^{(N)} \mid x^{(N)},\theta^{(t)}) = 1 \\ \sum_{z^{(2)},\cdots,z^{(N)}} \left[\prod_{i=2}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)})\right] = 1 \times \cdots \times 1 = 1
z(2)∑P(z(2)∣x(2),θ(t))=⋯=z(N)∑P(z(N)∣x(N),θ(t))=1z(2),⋯,z(N)∑[i=2∏NP(z(i)∣x(i),θ(t))]=1×⋯×1=1
因此,被观察的第一项 结果如下:
∑
z
(
1
)
,
z
(
2
)
,
⋯
,
z
(
N
)
log
P
(
x
(
1
)
,
z
(
1
)
∣
θ
)
∏
i
=
1
N
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
=
∑
z
(
1
)
[
log
P
(
x
(
1
)
,
z
(
1
)
∣
θ
)
⋅
P
(
z
(
1
)
∣
x
(
1
)
,
θ
(
t
)
)
]
⋅
1
=
∑
z
(
1
)
[
log
P
(
x
(
1
)
,
z
(
1
)
∣
θ
)
⋅
P
(
z
(
1
)
∣
x
(
1
)
,
θ
(
t
)
)
]
∑z(1),z(2),⋯,z(N)logP(x(1),z(1)∣θ)N∏i=1P(z(i)∣x(i),θ(t))=∑z(1)[logP(x(1),z(1)∣θ)⋅P(z(1)∣x(1),θ(t))]⋅1=∑z(1)[logP(x(1),z(1)∣θ)⋅P(z(1)∣x(1),θ(t))]
基于上一步骤,
L
(
θ
,
θ
(
t
)
)
\mathcal L(\theta,\theta^{(t)})
L(θ,θ(t))可表示为如下形式:
L
(
θ
,
θ
(
t
)
)
=
∑
z
(
1
)
[
log
P
(
x
(
1
)
,
z
(
1
)
∣
θ
)
⋅
P
(
z
(
1
)
∣
x
(
1
)
,
θ
(
t
)
)
]
+
⋯
+
∑
z
(
N
)
[
log
P
(
x
(
N
)
,
z
(
N
)
∣
θ
)
⋅
P
(
z
(
N
)
∣
x
(
N
)
,
θ
(
t
)
)
]
=
∑
i
=
1
N
∑
z
(
i
)
[
log
P
(
x
(
i
)
,
z
(
i
)
∣
θ
)
⋅
P
(
z
(
i
)
∣
x
(
i
)
,
θ
(
t
)
)
]
L(θ,θ(t))=∑z(1)[logP(x(1),z(1)∣θ)⋅P(z(1)∣x(1),θ(t))]+⋯+∑z(N)[logP(x(N),z(N)∣θ)⋅P(z(N)∣x(N),θ(t))]=N∑i=1∑z(i)[logP(x(i),z(i)∣θ)⋅P(z(i)∣x(i),θ(t))]
将场景整理中的对应结果代入,有:
关于
P
(
z
(
i
)
)
,
μ
z
(
i
)
,
Σ
z
(
i
)
P(z^{(i)}),\mu_{z^{(i)}},\Sigma_{z^{(i)}}
P(z(i)),μz(i),Σz(i)详见上面黄色字解释。
L
(
θ
,
θ
(
t
)
)
=
∑
i
=
1
N
∑
z
(
i
)
log
P
(
z
(
i
)
)
⋅
N
(
x
(
i
)
∣
μ
z
(
i
)
,
Σ
z
(
i
)
)
⋅
P
(
z
(
i
)
)
⋅
N
(
x
(
i
)
∣
μ
z
(
i
)
,
Σ
z
(
i
)
)
∑
k
=
1
K
p
k
⋅
N
(
x
(
i
)
∣
μ
k
,
Σ
k
)
\mathcal L(\theta,\theta^{(t)}) = \sum_{i=1}^N \sum_{z^{(i)}} \log P(z^{(i)}) \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}}) \cdot \frac{P(z^{(i)}) \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})}{\sum_{k=1}^{\mathcal K} p_k \cdot \mathcal N(x^{(i)} \mid \mu_k,\Sigma_k)}
L(θ,θ(t))=i=1∑Nz(i)∑logP(z(i))⋅N(x(i)∣μz(i),Σz(i))⋅∑k=1Kpk⋅N(x(i)∣μk,Σk)P(z(i))⋅N(x(i)∣μz(i),Σz(i))
至此,使用EM算法对高斯混合模型求解过程的E步求解完毕,下一节将介绍M步的求解过程。
p.s.这节视频中符号表示的信息确实很复杂,要多想~