
蓝色圆圈代表状态变量,绿色圆圈代表观测变量。
模型参数及符号:
状态集合: Q = { q 1 , … q N } Q=\left\{q_{1, \ldots} q_{N}\right\} Q={q1,…qN}
预测集合: V = { v 1 , … v M } V=\left\{v_{1, \ldots} v_{M}\right\} V={v1,…vM}
状态序列: I = { i 1 , … i T } i t ⊂ Q I=\left\{i_{1, \ldots} i_{T}\right\} \quad i_{\mathrm{t}} \subset Q I={i1,…iT}it⊂Q
预测序列 : O = { 0 1 , … o T } o t ⊂ O=\left\{0_{1, \ldots} o_{T}\right\} \quad o_{\mathrm{t}} \subset O={01,…oT}ot⊂ V V V
然后我们可以构建状态转移矩阵
i
2
=
q
1
i
2
=
q
2
…
i
2
=
q
N
i
1
=
q
1
a
11
a
12
…
a
1
N
a
1
j
=
P
(
i
2
=
q
j
∣
i
1
=
q
1
)
i
1
=
q
2
a
21
a
22
…
a
2
N
a
2
j
=
P
(
i
2
=
q
j
∣
i
1
=
q
2
)
…
…
…
…
…
…
i
1
=
q
N
a
N
1
a
N
2
…
a
N
N
a
N
j
=
P
(
i
2
=
q
j
∣
i
1
=
q
N
)
也就是
A
N
×
N
=
[
a
11
a
12
⋯
a
1
N
a
21
a
22
⋯
a
2
N
⋯
⋯
⋯
⋯
a
N
1
a
N
2
⋯
a
N
N
]
A_{N \times N}=\left[
然后我们构建观测概率矩阵
o
1
=
v
1
o
1
=
v
2
…
o
1
=
v
M
i
1
=
q
1
b
1
(
1
)
b
1
(
2
)
…
b
1
(
M
)
i
1
=
q
2
b
2
(
1
)
b
2
(
2
)
…
b
2
(
M
)
…
…
…
…
…
i
1
=
q
N
b
N
(
1
)
b
N
(
2
)
…
b
N
(
M
)
B
N
×
M
=
[
b
11
b
12
⋯
b
1
M
b
21
b
22
⋯
b
2
M
⋯
⋯
⋯
⋯
b
N
1
b
N
2
⋯
b
N
M
]
B_{N \times M}=\left[
我们设定初始状态概率向量为
π
=
[
π
1
π
2
⋯
π
N
]
=
[
P
(
i
1
=
q
1
)
P
(
i
2
=
q
2
)
⋯
P
(
i
N
=
q
N
)
]
\pi=\left[
模型参数量是
λ
=
(
π
N
×
1
,
A
N
×
N
,
B
N
×
M
)
\lambda=\left(\pi_{N \times 1}, A_{N \times N}, B_{N \times M}\right)
λ=(πN×1,AN×N,BN×M)
总参数量=
N
×
(
N
×
N
)
×
(
N
×
M
)
N \times(N \times N) \times(N \times M)
N×(N×N)×(N×M)。而自由参数量=
(
N
−
1
)
×
(
N
×
N
−
N
)
×
(
N
×
M
−
N
)
(N-1) \times(N \times N-N) \times(N \times M-N)
(N−1)×(N×N−N)×(N×M−N)。
在隐马尔科夫模型中,我们有两个基本假设:
同时我们有三个基本问题:
P ( O ∣ λ ) = ∑ I P ( O ∣ I , λ ) P ( I ∣ λ ) = ∑ i 1 , i 2 , … , i T π i 1 b i 1 ( o 1 ) a i 1 i 2 b i 2 ( o 2 ) … a i T − 1 i T b i T ( o T ) P(O \mid \lambda)=\sum_{I} P(O \mid I, \lambda) P(I \mid \lambda)=\sum_{i_{1}, i_{2}, \ldots, i_T} \pi_{i_{1}} b_{i_{1}\left(o_{1}\right)} a_{i_{1} i_{2}} b_{i_{2}\left(o_{2}\right)} \ldots a_{i_{T-1} i_{T}} b_{i_{T}\left(o_{T}\right)} P(O∣λ)=I∑P(O∣I,λ)P(I∣λ)=i1,i2,…,iT∑πi1bi1(o1)ai1i2bi2(o2)…aiT−1iTbiT(oT)
但是这个式子的计算复杂度为 O ( T N T ) O(TN^T) O(TNT)
计算复杂度为: O ( T N 2 ) O\left(T N^{2}\right) O(TN2)
引入新变量: α t ( i ) = P ( o 1 , ⋯ , o t , i t = q i ∣ λ ) , i = 1 , … , N \alpha_{t}(i)=P\left(o_{1}, \cdots, o_{t}, i_{t}=q_{i} \mid \lambda\right), i=1, \ldots, N αt(i)=P(o1,⋯,ot,it=qi∣λ),i=1,…,N

输入:隐马尔科夫模型 λ \lambda λ, 观测序列 O O O
输出:观测序列概率 P ( O ∣ λ ) P(O \mid \lambda) P(O∣λ)
步骤:
(1) 初值:
α
1
(
i
)
=
π
i
b
i
(
o
1
)
,
i
=
1
,
2
,
…
,
N
\alpha_{1}(i)=\pi_{i} b_{i}\left(o_{1}\right), i=1,2, \ldots, N
α1(i)=πibi(o1),i=1,2,…,N
(2) 递推: 对于
t
=
1
,
2
,
…
,
T
−
1
\mathrm{t}=1,2, \ldots, T-1
t=1,2,…,T−1
α
t
+
1
(
i
)
=
P
(
O
1
,
…
,
o
t
,
o
t
+
1
,
i
t
+
1
=
q
i
)
=
∑
j
=
1
N
P
(
O
1
,
…
,
O
t
,
i
t
=
q
i
)
P
(
O
t
+
1
∣
i
t
+
1
=
q
i
)
P
(
i
t
+
1
=
q
i
∣
i
t
=
q
j
)
=
[
∑
j
=
1
N
α
t
(
j
)
a
j
i
]
b
i
(
o
t
+
1
)
(3) 终止:
P
(
O
∣
λ
)
=
P
(
O
1
,
…
,
O
T
∣
λ
)
=
∑
i
=
1
N
P
(
O
1
,
…
,
O
T
,
i
T
=
q
i
∣
λ
)
=
∑
i
=
1
N
α
T
(
i
)
P(O \mid \lambda)=\mathrm{P}\left(O_{1}, \ldots, O_{T} \mid \lambda\right)=\sum_{i=1}^{N} \mathrm{P}\left(O_{1}, \ldots, O_{T}, i_{T}=q_{i} \mid \lambda\right)=\sum_{i=1}^{N} \alpha_{T}(i)
P(O∣λ)=P(O1,…,OT∣λ)=i=1∑NP(O1,…,OT,iT=qi∣λ)=i=1∑NαT(i)
我们引入新变量
β
t
(
i
)
=
P
(
o
t
+
1
,
⋯
,
o
T
∣
i
t
=
q
i
,
λ
)
,
i
=
1
,
…
,
N
\beta_{t}(i)=\mathrm{P}\left(o_{t+1}, \cdots, o_{T} \mid i_{t}=q_{i}, \lambda\right), i=1, \ldots, N
βt(i)=P(ot+1,⋯,oT∣it=qi,λ),i=1,…,N

计算复杂度 O ( T N 2 ) O(TN^2) O(TN2)
输入:隐马尔科夫模型 λ \lambda λ,观测序列 O O O
输出:观测序列概率 P ( O ∣ λ ) P(O\mid \lambda) P(O∣λ)
步骤:
(1) 初值:
β
T
(
i
)
=
1
,
i
=
1
,
2
,
…
,
N
\beta_{T}(i)=1, i=1,2, \ldots, N
βT(i)=1,i=1,2,…,N
(2) 递推:对t
=
T
−
1
,
T
−
2
,
…
,
1
=T-1, T-2, \ldots, 1
=T−1,T−2,…,1
β
t
(
i
)
=
∑
j
=
1
N
a
i
j
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
,
i
=
1
,
2
,
…
,
N
\beta_{t}(i)=\sum_{j=1}^{N} a_{i j} b_{j}\left(o_{t+1}\right) \beta_{t+1}(j), i=1,2, \ldots, N
βt(i)=j=1∑Naijbj(ot+1)βt+1(j),i=1,2,…,N
(3) 终止:
P
(
O
∣
λ
)
=
∑
i
=
1
N
π
i
b
i
(
o
i
)
β
1
(
i
)
P(O\mid\lambda)=\sum_{i=1}^{N} \pi_{i} b_{i}\left(o_{i}\right) \beta_{1}(i)
P(O∣λ)=∑i=1Nπibi(oi)β1(i)
已知训练数据包含 S S S个长度相同的观测序列和对应的状态序列 { ( O 1 , I 1 ) , ( O 2 , I 2 ) , … , ( O S , I S ) } \left\{\left(O_{1}, I_{1}\right),\left(O_{2}, I_{2}\right), \ldots,\left(O_{S}, I_{S}\right)\right\} {(O1,I1),(O2,I2),…,(OS,IS)}
(1)转移概率
a
i
j
a_{i j}
aij 的估计
a
^
i
j
=
A
i
j
∑
j
=
1
N
A
i
j
,
i
=
1
,
2
,
…
,
N
;
j
=
1
,
2
,
…
,
N
\hat{a}_{i j}=\frac{A_{i j}}{\sum_{j=1}^{N} A_{i j}}, i=1,2, \ldots, N ; j=1,2, \ldots, N
a^ij=∑j=1NAijAij,i=1,2,…,N;j=1,2,…,N
(2)转移概率
a
i
j
a_{i j}
aij 的估计
b
^
j
(
k
)
=
B
j
k
∑
k
=
1
M
B
j
k
j
=
1
,
2
,
…
,
N
;
k
=
1
,
2
,
…
,
M
\widehat{b}_{j}(k)=\frac{B_{j k}}{\sum_{k=1}^{M} B_{j k}} j=1,2, \ldots, N ; k=1,2, \ldots, M
b
j(k)=∑k=1MBjkBjkj=1,2,…,N;k=1,2,…,M
(3)初始状态概率
π
i
\pi_{i}
πi 的估计
π
^
i
\hat{\pi}_{i}
π^i 为
S
S
S 个样本中初始状态为
q
i
q_{i}
qi 的频率
输入:观测数据 O = ( O 1 , O 2 , … , O T ) O=\left(O_{1}, O_{2}, \ldots, O_{T}\right) O=(O1,O2,…,OT)
输出:隐马尔科夫模型参数
(1)初始化:对 n = 0 n=0 n=0,选取 a i j ( 0 ) , b j ( k ) ( 0 ) , π i ( 0 ) a_{i j}^{(0)}, b_{j}(k)^{(0)}, \pi_{i}^{(0)} aij(0),bj(k)(0),πi(0),得到模型 λ ( 0 ) = ( A ( 0 ) , B ( 0 ) , π ( 0 ) ) \lambda^{(0)}=\left(A^{(0)}, B^{(0)}, \pi^{(0)}\right) λ(0)=(A(0),B(0),π(0))
(2)递推:对
n
=
1
,
2
,
⋯
,
n=1, 2, \cdots,
n=1,2,⋯,
a
i
j
(
n
+
1
)
=
∑
t
=
1
T
−
1
ξ
t
(
i
,
j
)
∑
t
=
1
T
−
1
γ
t
(
i
)
b
j
(
k
)
(
n
+
1
)
=
∑
t
=
1
,
o
t
=
v
k
T
γ
t
(
j
)
∑
t
=
1
T
γ
t
(
j
)
π
i
(
n
+
1
)
=
γ
1
(
i
)
a_{i j}^{(n+1)}=\frac{\sum_{t=1}^{T-1} \xi_{t}(i, j)}{\sum_{t=1}^{T-1} \gamma_{t}(i)}\\ b_{j}(k)^{(n+1)}=\frac{\sum_{t=1, o t=v k}^{T} \gamma_{t}(j)}{\sum_{t=1}^{T} \gamma_{t}(j)}\\ \pi_{i}^{(n+1)}=\gamma_{1}(i)
aij(n+1)=∑t=1T−1γt(i)∑t=1T−1ξt(i,j)bj(k)(n+1)=∑t=1Tγt(j)∑t=1,ot=vkTγt(j)πi(n+1)=γ1(i)
其中
γ
t
(
i
)
=
α
t
(
i
)
β
t
(
i
)
P
(
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
j
=
1
N
α
t
(
j
)
β
t
(
j
)
ξ
t
(
i
,
j
)
=
α
t
(
i
)
a
i
j
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
∑
i
=
1
N
∑
j
=
1
N
α
t
(
i
)
a
i
j
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
目标:计算 arg max P ( I ∣ O , λ ) \arg \max P(I \mid O, \lambda) argmaxP(I∣O,λ)
i t ∗ = arg max 1 ≤ i ≤ N [ γ t ( i ) ] , t = 1 , 2 , … , T i_{t}^{*}=\arg \max _{1 \leq i \leq N}\left[\gamma_{t}(i)\right], t=1,2, \ldots, T it∗=arg1≤i≤Nmax[γt(i)],t=1,2,…,T
其中 γ t ( i ) = P ( i t = q i ∣ O , λ ) \gamma_{t}(\mathrm{i})=P\left(i_{t}=q_{i} \mid O, \lambda\right) γt(i)=P(it=qi∣O,λ)
当
t
=
1
t=1
t=1:
i
1
∗
=
arg
max
1
≤
j
≤
N
P
(
i
1
=
q
j
∣
O
,
λ
)
=
arg
max
1
≤
j
≤
N
{
P
(
i
1
=
q
j
∣
O
,
λ
)
⋯
(
i
1
=
q
N
∣
O
,
λ
)
}
i_{1}^{*}=\arg \max _{1 \leq j \leq N} P\left(i_{1}=q_{j} \mid O, \lambda\right)=\arg \max _{1 \leq j \leq N}\left\{
当
t
=
2
t=2
t=2
i
2
∗
=
arg
max
1
≤
j
≤
N
P
(
i
2
=
q
j
∣
O
,
λ
)
i_{2}^{*}=\arg \max _{1 \leq j \leq N} P\left(i_{2}=q_{j} \mid O, \lambda\right)
i2∗=arg1≤j≤NmaxP(i2=qj∣O,λ)
最后
I
∗
=
(
i
1
∗
,
…
,
i
T
∗
)
I^{*}=\left(i_{1}^{*}, \ldots, i_{T}^{*}\right)
I∗=(i1∗,…,iT∗)
输入:模型 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π)观测数据 O = ( o 1 , o 2 , ⋯ , o T ) O=(o_1, o_2, \cdots, o_T) O=(o1,o2,⋯,oT)
输出:最优路径 I ∗ = ( i 1 ∗ , … , i T ∗ ) I^{*}=\left(i_{1}^{*}, \ldots, i_{T}^{*}\right) I∗=(i1∗,…,iT∗)
(1)初始化
δ
t
(
i
)
=
π
i
b
i
(
o
1
)
,
i
=
1
,
2
,
…
,
N
Ψ
t
(
i
)
=
0
,
i
=
1
,
2
,
…
,
N
(2)递推
对于
t
=
2
,
3
,
⋯
,
T
t=2, 3, \cdots, T
t=2,3,⋯,T
δ
t
(
i
)
=
max
1
≤
j
≤
N
[
δ
t
−
1
(
j
)
a
i
j
]
b
i
(
o
t
)
,
i
=
1
,
2
,
…
,
N
Ψ
t
(
i
)
=
arg
max
1
≤
j
≤
N
[
δ
t
−
1
(
j
)
a
i
j
]
,
i
=
1
,
2
,
…
,
N
(3)终止
P
∗
=
max
1
≤
i
≤
N
δ
T
(
i
)
i
T
∗
=
arg
max
1
≤
i
≤
N
[
δ
T
(
i
)
]
(4)最优路径回溯
对于
t
=
T
−
1
,
T
−
2
,
⋯
,
1
t=T-1, T-2, \cdots, 1
t=T−1,T−2,⋯,1
i
t
∗
=
Ψ
t
+
1
(
i
t
+
1
∗
)
i_{t}^{*}=\Psi_{t+1}\left(i_{t+1}^{*}\right)
it∗=Ψt+1(it+1∗)
求得最优路径
I
∗
=
(
i
1
∗
,
i
2
∗
,
…
,
i
T
∗
)
I^{*}=\left(i_{1}^{*}, i_{2}^{*}, \ldots, i_{T}^{*}\right)
I∗=(i1∗,i2∗,…,iT∗)
如上一节所示,我们在初始化中引入了新变量
δ
t
(
j
)
=
max
i
1
,
i
2
,
…
,
i
t
−
1
P
(
i
1
,
…
,
i
t
−
1
,
i
t
=
j
,
o
t
,
…
,
o
1
∣
λ
)
,
i
=
1
,
2
,
…
,
N
\delta_{t}(j)=\max _{i_{1}, i_{2}, \ldots, i_{t-1}} \mathrm{P}\left(i_{1}, \ldots, i_{t-1}, i_{t}=j, o_{t}, \ldots, o_{1} \mid \lambda\right), i=1,2, \ldots, N
δt(j)=i1,i2,…,it−1maxP(i1,…,it−1,it=j,ot,…,o1∣λ),i=1,2,…,N
推导:
δ
t
+
1
(
i
)
=
max
i
1
,
i
2
,
…
,
i
t
P
(
i
1
,
…
,
i
t
,
i
t
+
1
=
i
,
O
t
+
1
,
…
,
o
1
∣
λ
)
=
max
1
≤
j
≤
N
δ
t
(
j
)
P
(
o
t
+
1
∣
i
t
+
1
=
q
i
)
P
(
i
t
+
1
=
q
i
∣
i
t
=
q
j
)
=
max
1
≤
j
≤
N
[
δ
t
(
j
)
a
j
i
]
b
i
(
o
t
+
1
)
i
=
1
,
2
,
…
,
N
;
t
=
1
,
2
,
…
,
T
−
1
维特比算法本质是用动态规划来解决隐马尔科夫模型的预测问题