输入门(input gate):
i
t
=
σ
(
W
i
x
x
t
+
W
i
h
h
t
−
1
+
b
i
)
i_t = \sigma(W_{ix}x_t + W_{ih}h_{t-1} + b_i)
it=σ(Wixxt+Wihht−1+bi)
遗忘门(forget gate):
f
t
=
σ
(
W
f
x
x
t
+
W
f
h
h
t
−
1
+
b
f
)
f_t = \sigma(W_{fx}x_t + W_{fh}h_{t-1} + b_f)
ft=σ(Wfxxt+Wfhht−1+bf)
细胞状态(cell state)更新:
C
~
t
=
tanh
(
W
c
x
x
t
+
W
c
h
h
t
−
1
+
b
c
)
\tilde{C}_t = \text{tanh}(W_{cx}x_t + W_{ch}h_{t-1} + b_c)
C~t=tanh(Wcxxt+Wchht−1+bc)
细胞状态(cell state):
C
t
=
f
t
⊙
C
t
−
1
+
i
t
⊙
C
~
t
C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t
Ct=ft⊙Ct−1+it⊙C~t
输出门(output gate):
o
t
=
σ
(
W
o
x
x
t
+
W
o
h
h
t
−
1
+
b
o
)
o_t = \sigma(W_{ox}x_t + W_{oh}h_{t-1} + b_o)
ot=σ(Woxxt+Wohht−1+bo)
隐状态(hidden state):
h
t
=
o
t
⊙
tanh
(
C
t
)
h_t = o_t \odot \text{tanh}(C_t)
ht=ot⊙tanh(Ct)
后向 LSTM:
输入门(input gate):
i
t
′
=
σ
(
W
i
x
′
x
t
+
W
i
h
′
h
t
+
1
′
+
b
i
′
)
i'_t = \sigma(W'_{ix}x_t + W'_{ih}h'_{t+1} + b'_i)
it′=σ(Wix′xt+Wih′ht+1′+bi′)
遗忘门(forget gate):
f
t
′
=
σ
(
W
f
x
′
x
t
+
W
f
h
′
h
t
+
1
′
+
b
f
′
)
f'_t = \sigma(W'_{fx}x_t + W'_{fh}h'_{t+1} + b'_f)
ft′=σ(Wfx′xt+Wfh′ht+1′+bf′)
细胞状态(cell state)更新:
C
~
t
′
=
tanh
(
W
c
x
′
x
t
+
W
c
h
′
h
t
+
1
′
+
b
c
′
)
\tilde{C}'_t = \text{tanh}(W'_{cx}x_t + W'_{ch}h'_{t+1} + b'_c)
C~t′=tanh(Wcx′xt+Wch′ht+1′+bc′)
细胞状态(cell state):
C
t
′
=
f
t
′
⊙
C
t
+
1
′
+
i
t
′
⊙
C
~
t
′
C'_t = f'_t \odot C'_{t+1} + i'_t \odot \tilde{C}'_t
Ct′=ft′⊙Ct+1′+it′⊙C~t′
输出门(output gate):
o
t
′
=
σ
(
W
o
x
′
x
t
+
W
o
h
′
h
t
+
1
′
+
b
o
′
)
o'_t = \sigma(W'_{ox}x_t + W'_{oh}h'_{t+1} + b'_o)
ot′=σ(Wox′xt+Woh′ht+1′+bo′)
隐状态(hidden state):
h
t
′
=
o
t
′
⊙
tanh
(
C
t
′
)
h'_t = o'_t \odot \text{tanh}(C'_t)
ht′=ot′⊙tanh(Ct′)
其中,
x
t
x_t
xt 是输入序列的第
t
t
t 个时间步的向量表示,
h
t
h_t
ht 是前向 LSTM 在第
t
t
t 个时间步的隐状态,
h
t
+
1
′
h'_{t+1}
ht+1′ 是后向 LSTM 在第
t
t
t 个时间步的隐状态,
C
t
C_t
Ct 是前向 LSTM 在第
t
t
t 个时间步的细胞状态,
C
t
+
1
′
C'_{t+1}
Ct+1′ 是后向 LSTM 在第
t
t
t 个时间步的细胞状态。
W
W
W 和
b
b
b 是模型的参数,
σ
\sigma
σ 是 sigmoid 函数,
⊙
\odot
⊙ 表示逐元素相乘。