以下来自于知乎文章《机器学习中的数学理论1:三步搞定矩阵求导》
在机器学习,控制论中总会遇到这样或那样需要借助矩阵或者向量求导才能解决的问题(例:Gradient Descent)。这类问题对于在机器学习中分析,推导,应用其原理性理论有很重要的作用。
x : x: x:标量; x : \mathbf x: x:向量; X X X:矩阵
d ( X ± Y ) = d ( X ) ± d ( Y ) d(X\pm Y)=d(X)\pm d(Y) d(X±Y)=d(X)±d(Y)
d ( X Y ) = X d Y + ( d X ) Y d(XY)=XdY+(dX)Y d(XY)=XdY+(dX)Y
d ( X T ) = ( d X ) T d(X^T)=(dX)^T d(XT)=(dX)T
d t r ( X ) = t r d ( X ) dtr(X)=trd(X) dtr(X)=trd(X)
d ( X ⊙ Y ) = d X ⊙ Y + X ⊙ d Y d(X\odot Y)=dX\odot Y+X \odot dY d(X⊙Y)=dX⊙Y+X⊙dY
d X − 1 = − X − 1 d X X − 1 dX^{-1}=-X^{-1}dXX^{-1} dX−1=−X−1dXX−1
d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) d|X|=|X|tr(X^{-1}dX) d∣X∣=∣X∣tr(X−1dX)
d σ ( X ) = σ ‘ ( X ) ⊙ d X d\sigma(X)=\sigma^{`}(X)\odot dX dσ(X)=σ‘(X)⊙dX
A ⨂ B ≠ B ⨂ A A\bigotimes B \neq B\bigotimes A A⨂B=B⨂A
( A 1 + A 2 ) ⨂ B = A 1 ⨂ B + A 2 ⨂ B (A_1+A_2)\bigotimes B=A_1\bigotimes B+A_2\bigotimes B (A1+A2)⨂B=A1⨂B+A2⨂B
( A ⨂ B ) ⨂ C = A ⨂ ( B ⨂ C ) (A\bigotimes B)\bigotimes C=A\bigotimes (B\bigotimes C) (A⨂B)⨂C=A⨂(B⨂C)
若
A
1
,
A
2
A_1,A_2
A1,A2可以做乘法运算,
B
1
,
B
2
B_1,B_2
B1,B2可以做乘法运算:
(
A
1
⨂
A
2
)
(
B
1
⨂
B
2
)
=
(
A
1
A
2
)
⨂
(
B
1
B
2
)
(A_1\bigotimes A_2)(B_1\bigotimes B_2)=(A_1A_2)\bigotimes(B_1B_2)
(A1⨂A2)(B1⨂B2)=(A1A2)⨂(B1B2)
若
A
,
B
A,B
A,B可以求逆:
(
A
⨂
B
)
−
1
=
A
−
1
⨂
B
−
1
(A \bigotimes B)^{-1}=A^{-1}\bigotimes B^{-1}
(A⨂B)−1=A−1⨂B−1
若不能求逆运算则:
(
A
⨂
B
)
+
=
A
+
⨂
B
+
(A\bigotimes B)^{+}=A^{+}\bigotimes B^{+}
(A⨂B)+=A+⨂B+
( A ⨂ B ) H = A H ⨂ B H (A\bigotimes B)^H=A^H\bigotimes B^H (A⨂B)H=AH⨂BH
d e t ( A ⨂ B ) = ( d e t A ) n ( d e t B ) m ( A ∈ C m × m , B ∈ C n × n ) det(A\bigotimes B)=(detA)^n(detB)^m(A\in C^{m\times m},B\in C^{n\times n}) det(A⨂B)=(detA)n(detB)m(A∈Cm×m,B∈Cn×n)
t r ( A ⨂ B ) = ( t r A ) ⨂ ( t r B ) tr(A\bigotimes B)=(trA)\bigotimes (trB) tr(A⨂B)=(trA)⨂(trB)
r a n k ( A ⨂ B ) = r a n k A ⨂ r a n k B rank(A\bigotimes B)=rankA\bigotimes rankB rank(A⨂B)=rankA⨂rankB
e I ⨂ A = I ⨂ e A , e A ⨂ I = A ⨂ I e^{I \bigotimes A} = I\bigotimes e^A,e^{A\bigotimes I} = A\bigotimes I eI⨂A=I⨂eA,eA⨂I=A⨂I
e ( A ⨂ I n + I m ⨂ B ) = e A ⨂ e B e^{(A\bigotimes I_n+I_m \bigotimes B)}=e^A\bigotimes e^B e(A⨂In+Im⨂B)=eA⨂eB
I n p u t : X , f \mathbf {Input}:X,f Input:X,f
O u t p u t : ∂ f ∂ X \mathbf{Output}:\frac{\partial f}{\partial X} Output:∂X∂f
A l g o r i t h m \mathbf{Algorithm} Algorithm:
- 根据 f f f寻找 d f df df.
- d f df df左右两边套 t r tr tr: t r ( d f ) = d f tr(df)=df tr(df)=df
- 根据 d f = t r ( ∂ f T ∂ X d X ) df=tr(\frac{\partial f^T}{\partial X}dX) df=tr(∂X∂fTdX)凑出 ∂ f ∂ X \frac{\partial f}{\partial X} ∂X∂f

2.解: 首先对
f
f
f左右两边求微分,令
u
=
X
b
u=Xb
u=Xb:
1.
d
f
=
a
T
d
(
exp
(
u
)
)
=
a
T
exp
(
u
)
d
u
=
a
T
exp
(
X
b
)
⊙
(
d
X
b
)
1.df=a^Td(\exp(u))=a^T\exp(u)du=a^T\exp(Xb)\odot(dXb)\\
1.df=aTd(exp(u))=aTexp(u)du=aTexp(Xb)⊙(dXb)
2. d f = t r ( d f ) = t r ( a T ( exp ( X b ) ⊙ ( d X b ) ) ) = t r ( ( a ⊙ exp ( X b ) ) T d X b ) = t r ( b ( a ⊙ exp ( X b ) ) T d X ) = t r ( ( a ⊙ exp ( X b ) b T ) T d X ) 2.df=tr(df)=tr(a^T(\exp(Xb)\odot(dXb)))\\ =tr((a\odot\exp(Xb))^T dXb)\\ =tr(b(a\odot\exp(Xb))^T dX)\\ =tr((a\odot\exp(Xb)b^T)^T dX)\\ 2.df=tr(df)=tr(aT(exp(Xb)⊙(dXb)))=tr((a⊙exp(Xb))TdXb)=tr(b(a⊙exp(Xb))TdX)=tr((a⊙exp(Xb)bT)TdX)
3. 由 d f = t r ( ∂ f T ∂ X d X ) ∂ f ∂ X = a ⊙ exp ( X b ) b T 3.由df=tr(\frac{\partial f^T}{\partial X}dX)\\\ \frac{\partial f}{\partial X}=a\odot\exp(Xb)b^T 3.由df=tr(∂X∂fTdX) ∂X∂f=a⊙exp(Xb)bT











假设向量(列向量)之间存在依赖关系,比如:
x
→
y
→
z
\mathbf x\rightarrow \mathbf y \rightarrow \mathbf z
x→y→z,则有:
∂
z
∂
x
=
∂
z
∂
y
∂
y
∂
x
\frac{\partial \mathbf z}{\partial \mathbf x}=\frac{\partial \mathbf z}{\partial \mathbf y}\frac{\partial \mathbf y}{\partial \mathbf x}
∂x∂z=∂y∂z∂x∂y
假设向量(列向量)之间存在依赖关系,比如:
x
→
y
→
z
\mathbf x\rightarrow \mathbf y \rightarrow \mathbf z
x→y→z,要求导的是标量
z
z
z。那么就有:
∂
z
∂
y
:
n
×
1
,
∂
z
∂
x
:
m
×
1
,
∂
y
∂
x
:
n
×
m
\frac{\partial z}{\partial \mathbf y}:n\times 1,\frac{\partial z}{\partial \mathbf x}:m\times 1,\frac{\partial \mathbf y}{\partial \mathbf x}:n\times m
∂y∂z:n×1,∂x∂z:m×1,∂x∂y:n×m,则:
∂
z
∂
x
=
(
∂
y
∂
x
)
T
∂
z
∂
y
\frac{\partial \mathbf z}{\partial \mathbf x}=(\frac{\partial \mathbf y}{\partial \mathbf x})^T\frac{\partial \mathbf z}{\partial \mathbf y}
∂x∂z=(∂x∂y)T∂y∂z。当形式更为复杂有:
y
1
→
y
1
→
.
.
.
y
n
→
z
\mathbf y_1 \rightarrow \mathbf y_1\rightarrow ...\mathbf y_n\rightarrow z
y1→y1→...yn→z
那链式法则为:
∂
z
∂
y
1
=
(
∂
y
n
∂
y
n
−
1
∂
y
n
−
1
∂
y
n
−
2
.
.
.
∂
y
2
∂
y
1
)
T
∂
z
∂
y
n
\frac{\partial z}{\partial \mathbf y_1}=(\frac{\partial \mathbf y_{n}}{\partial \mathbf y_{n-1}}\frac{\partial \mathbf y_{n-1}}{\partial \mathbf y_{n-2}}...\frac{\partial \mathbf y_2}{\partial \mathbf y_1})^T\frac{\partial z}{\partial \mathbf y_n}
∂y1∂z=(∂yn−1∂yn∂yn−2∂yn−1...∂y1∂y2)T∂yn∂z


