上一篇:https://blog.csdn.net/m0_37567738/article/details/133444201?spm=1001.2014.3001.5502
参考网址:https://zhuanlan.zhihu.com/p/262751195
在机器学习的算法推导里,通常遵循以下布局的规范:
t r ( A B ) = t r ( B A ) tr(AB) = tr(BA) tr(AB)=tr(BA)
t r ( A T B ) = ∑ i , j A i j B i j , 即 t r ( A T B ) 是矩阵 A , B 的内积 ( 或卷积和 ) (2.1) tr(A^TB) = \sum_{i,j}A_{ij}B_{ij},即tr(A^TB)是矩阵A,B的内积(或卷积和)\tag{2.1} tr(ATB)=i,j∑AijBij,即tr(ATB)是矩阵A,B的内积(或卷积和)(2.1)
d f = ∑ i = 1 n ∂ f ∂ x i d x i = ∂ f ∂ x T d x (2.2) df =\sum_{i=1}^{n}\frac{\partial f}{\partial x_{i}}dx_{i} =\frac{\partial f}{\partial x } ^T d x\tag{2.2} df=i=1∑n∂xi∂fdxi=∂x∂fTdx(2.2)
将{2.2}中的向量推广到矩阵,由{2.1}和{2.2}可得:
d
f
=
∑
i
=
1
m
∑
j
=
1
n
∂
f
∂
X
i
j
d
X
i
j
=
t
r
(
∂
f
∂
X
T
d
X
)
df = \sum_{i=1}^{m}\sum_{j=1}^{n}\frac{\partial f}{\partial X_{ij}}dX_{ij} =tr(\frac{\partial f}{\partial X } ^T d X)
df=i=1∑mj=1∑n∂Xij∂fdXij=tr(∂X∂fTdX)
上述微分法,是对矩阵求导数的基本思想。
例1:
若
y
=
a
T
X
b
,
其中
y
是标量
,
a
是
m
维向量,
b
是
n
维向量,
X
是
m
∗
n
维矩阵,求
∂
y
∂
X
若y = a^TXb,其中y是标量,a是m维向量,b是n维向量,X是m*n维矩阵,求\frac{\partial y}{\partial X}
若y=aTXb,其中y是标量,a是m维向量,b是n维向量,X是m∗n维矩阵,求∂X∂y
解法1:
按照分母布局可得:
∂
y
∂
X
=
a
b
T
\frac{\partial y}{\partial X} = ab^T
∂X∂y=abT
解法2:
d
f
=
d
a
T
X
b
+
a
T
d
X
b
+
a
T
X
d
b
=
a
T
d
X
b
df = da^T Xb + a^TdXb + a^TXdb = a^TdXb
df=daTXb+aTdXb+aTXdb=aTdXb
d f = t r ( a T d X b ) = t r ( ( a b T ) T d X ) df = tr(a^TdXb)=tr((ab^T)^T dX) df=tr(aTdXb)=tr((abT)TdX)
∂ f ∂ X = a b T \frac{\partial f}{\partial X} = ab^T ∂X∂f=abT
例子2:
求
d
A
−
1
d A^{-1}
dA−1
解:
A
−
1
A
=
I
=
>
d
(
A
−
1
A
)
=
d
(
A
−
1
)
A
+
A
−
1
d
A
=
d
I
=
>
d
(
A
−
1
)
A
=
−
A
−
1
d
A
=
>
d
(
A
−
1
)
=
−
A
−
1
d
A
A
−
1
A^{-1} A = I => \\ d (A^{-1} A) = d(A^{-1})A + A^{-1}dA = dI=>\\ d(A^{-1})A = - A^{-1}dA =>\\ d(A^{-1}) = - A^{-1}dA A^{-1}
A−1A=I=>d(A−1A)=d(A−1)A+A−1dA=dI=>d(A−1)A=−A−1dA=>d(A−1)=−A−1dAA−1
或者:
A A − 1 = I = > d ( A A − 1 ) = d A A − 1 + A d ( A − 1 ) = d I = > A d ( A − 1 ) = − d A A − 1 = > d ( A − 1 ) = − A − 1 d A A − 1 A A^{-1} = I => \\ d (A A^{-1} ) = dA A^{-1} + A d(A^{-1}) = dI=>\\ A d(A^{-1})= -dA A^{-1}=>\\ d(A^{-1}) = - A^{-1}dA A^{-1} AA−1=I=>d(AA−1)=dAA−1+Ad(A−1)=dI=>Ad(A−1)=−dAA−1=>d(A−1)=−A−1dAA−1
例3: 求d (detA)
解:
d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) = t r ( X ∗ d X ) d|X| = |X|tr(X^{-1}dX) = tr(X^*dX) d∣X∣=∣X∣tr(X−1dX)=tr(X∗dX)
行列式求导公式推导:https://www.cnblogs.com/analysis101/p/14677671.html
https://jingyan.baidu.com/article/a501d80cb6ef00ac620f5e21.html
例4:
f
(
x
)
=
∣
x
+
a
a
a
a
x
+
a
a
a
a
x
+
a
∣
,求
f
′
(
x
)
f(x) = \begin {vmatrix} x+ a & a& a \\ a & x+a & a\\a & a & x+a \end{vmatrix},求f'(x)
f(x)=
x+aaaax+aaaax+a
,求f′(x)
解:
按照例3的公式求导即可。该公式的证明(验证)见例3中的链接地址。
f ′ ( x ) = 3 ∣ x + a a a x + a ∣ = 3 x 2 + 6 a x f'(x) = 3 \begin {vmatrix} x+ a & a\\a&x+a\end{vmatrix} = 3x^2+6ax f′(x)=3 x+aaax+a =3x2+6ax
例5:
验证 d ( A B ) = d A B + A d B d(AB) = dA B + AdB d(AB)=dAB+AdB
解:
设A=
{
f
1
f
2
f
3
f
4
}
\begin {Bmatrix} f_1 & f_2\\f_3& f_4\end{Bmatrix}
{f1f3f2f4},B=
{
g
1
g
2
g
3
g
4
}
\begin {Bmatrix} g_1 & g_2\\g_3& g_4\end{Bmatrix}
{g1g3g2g4}
则A’=
{
f
1
′
f
2
′
f
3
′
f
4
′
}
\begin {Bmatrix} f^{'}_1 & f^{'}_2\\f^{'}_3& f^{'}_4\end{Bmatrix}
{f1′f3′f2′f4′},B’=
{
g
1
′
g
2
′
g
3
′
g
4
′
}
\begin {Bmatrix} g^{'}_1 & g^{'}_2\\g^{'}_3& g^{'}_4\end{Bmatrix}
{g1′g3′g2′g4′}
d A B = d { f 1 g 1 + f 2 g 3 f 1 g 2 + f 2 g 4 f 3 g 1 + f 4 g 3 f 3 g 2 + f 4 g 4 } = { f 1 ′ g 1 + f 2 ′ g 3 f 1 ′ g 2 + f 2 ′ g 4 f 3 ′ g 1 + f 4 ′ g 3 f 3 ′ g 2 + f 4 ′ g 4 } + { f 1 g 1 ′ + f 2 g 3 ′ f 1 g 2 ′ + f 2 g 4 ′ f 3 g 1 ′ + f 4 g 3 ′ f 3 g 2 ′ + f 4 g 4 ′ } = d A b + A d B dAB = d \begin {Bmatrix} f_1g_1+f_2g_3 & f_1g_2+f_2g_4\\f_3g_1+f_4g_3& f_3g_2+f_4g_4\end{Bmatrix} =\\ \begin {Bmatrix} f^{'}_1g_1+f^{'}_2g_3 & f^{'}_1g_2+f^{'}_2g_4\\f^{'}_3g_1+f^{'}_4g_3& f^{'}_3g_2+f^{'}_4g_4\end{Bmatrix} +\\ \begin {Bmatrix} f_1g^{'}_1+f_2g^{'}_3 & f_1g^{'}_2+f_2g^{'}_4\\f_3g^{'}_1+f_4g^{'}_3& f_3g^{'}_2+f_4g^{'}_4\end{Bmatrix} =dAb+AdB dAB=d{f1g1+f2g3f3g1+f4g3f1g2+f2g4f3g2+f4g4}={f1′g1+f2′g3f3′g1+f4′g3f1′g2+f2′g4f3′g2+f4′g4}+{f1g1′+f2g3′f3g1′+f4g3′f1g2′+f2g4′f3g2′+f4g4′}=dAb+AdB
例6:
求dtr(AB)对A的导数。
解:
t r ( A B ) = ∑ i , j A i j B j i tr(AB) = \sum_{i,j}A_{ij}B_{ji} tr(AB)=i,j∑AijBji
d t r ( A B ) = ∂ ( ∑ i , j A i j B j i ) T ∂ A d A = ( B T ) T d A dtr(AB) = \frac{\partial (\sum_{i,j}A_{ij}B_{ji})^T}{\partial A} dA = (B^T)^TdA dtr(AB)=∂A∂(∑i,jAijBji)TdA=(BT)TdA
即dtr(AB)对A的导数是 B T B^T BT。
例7: 求 ∂ t r ( A ) ∂ A \frac{\partial tr(A)}{\partial A} ∂A∂tr(A)
解:
∂
t
r
(
A
)
∂
A
=
I
\frac{\partial tr(A)}{\partial A} = I
∂A∂tr(A)=I
说明:
此题是标量对矩阵求导的典型例子,此种类型还是按照开头讲的思路计算。这个题没什么技巧,记住即可。