• 矩阵求导之二


    上一篇:https://blog.csdn.net/m0_37567738/article/details/133444201?spm=1001.2014.3001.5502

    参考网址:https://zhuanlan.zhihu.com/p/262751195

    机器学习的算法推导里,通常遵循以下布局的规范:

    1. 如果向量或者矩阵对标量求导,则以分子布局为准。
    2. 如果标量对向量或者矩阵求导,则以分母布局为准。
    3. 对于向量对对向量求导,一般以分子布局的雅克比矩阵为主,即结果是一个矩阵。
    4. 分子布局和分母布局的结果相差一个转置。

    t r ( A B ) = t r ( B A ) tr(AB) = tr(BA) tr(AB)=tr(BA)

    t r ( A T B ) = ∑ i , j A i j B i j , 即 t r ( A T B ) 是矩阵 A , B 的内积 ( 或卷积和 ) (2.1) tr(A^TB) = \sum_{i,j}A_{ij}B_{ij},即tr(A^TB)是矩阵A,B的内积(或卷积和)\tag{2.1} tr(ATB)=i,jAijBij,tr(ATB)是矩阵AB的内积(或卷积和)(2.1)

    d f = ∑ i = 1 n ∂ f ∂ x i d x i = ∂ f ∂ x T d x (2.2) df =\sum_{i=1}^{n}\frac{\partial f}{\partial x_{i}}dx_{i} =\frac{\partial f}{\partial x } ^T d x\tag{2.2} df=i=1nxifdxi=xfTdx(2.2)

    将{2.2}中的向量推广到矩阵,由{2.1}和{2.2}可得:
    d f = ∑ i = 1 m ∑ j = 1 n ∂ f ∂ X i j d X i j = t r ( ∂ f ∂ X T d X ) df = \sum_{i=1}^{m}\sum_{j=1}^{n}\frac{\partial f}{\partial X_{ij}}dX_{ij} =tr(\frac{\partial f}{\partial X } ^T d X) df=i=1mj=1nXijfdXij=tr(XfTdX)

    上述微分法,是对矩阵求导数的基本思想。

    例1:
    若 y = a T X b , 其中 y 是标量 , a 是 m 维向量, b 是 n 维向量, X 是 m ∗ n 维矩阵,求 ∂ y ∂ X 若y = a^TXb,其中y是标量,a是m维向量,b是n维向量,X是m*n维矩阵,求\frac{\partial y}{\partial X} y=aTXb,其中y是标量,am维向量,bn维向量,Xmn维矩阵,求Xy

    解法1:
    按照分母布局可得:
    ∂ y ∂ X = a b T \frac{\partial y}{\partial X} = ab^T Xy=abT

    解法2:
    d f = d a T X b + a T d X b + a T X d b = a T d X b df = da^T Xb + a^TdXb + a^TXdb = a^TdXb df=daTXb+aTdXb+aTXdb=aTdXb

    d f = t r ( a T d X b ) = t r ( ( a b T ) T d X ) df = tr(a^TdXb)=tr((ab^T)^T dX) df=tr(aTdXb)=tr((abT)TdX)

    ∂ f ∂ X = a b T \frac{\partial f}{\partial X} = ab^T Xf=abT

    例子2:
    d A − 1 d A^{-1} dA1

    解:
    A − 1 A = I = > d ( A − 1 A ) = d ( A − 1 ) A + A − 1 d A = d I = > d ( A − 1 ) A = − A − 1 d A = > d ( A − 1 ) = − A − 1 d A A − 1 A^{-1} A = I => \\ d (A^{-1} A) = d(A^{-1})A + A^{-1}dA = dI=>\\ d(A^{-1})A = - A^{-1}dA =>\\ d(A^{-1}) = - A^{-1}dA A^{-1} A1A=I=>d(A1A)=d(A1)A+A1dA=dI=>d(A1)A=A1dA=>d(A1)=A1dAA1

    或者:

    A A − 1 = I = > d ( A A − 1 ) = d A A − 1 + A d ( A − 1 ) = d I = > A d ( A − 1 ) = − d A A − 1 = > d ( A − 1 ) = − A − 1 d A A − 1 A A^{-1} = I => \\ d (A A^{-1} ) = dA A^{-1} + A d(A^{-1}) = dI=>\\ A d(A^{-1})= -dA A^{-1}=>\\ d(A^{-1}) = - A^{-1}dA A^{-1} AA1=I=>d(AA1)=dAA1+Ad(A1)=dI=>Ad(A1)=dAA1=>d(A1)=A1dAA1

    例3: 求d (detA)

    解:

    d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) = t r ( X ∗ d X ) d|X| = |X|tr(X^{-1}dX) = tr(X^*dX) dX=Xtr(X1dX)=tr(XdX)

    行列式求导公式推导:https://www.cnblogs.com/analysis101/p/14677671.html

    https://jingyan.baidu.com/article/a501d80cb6ef00ac620f5e21.html

    例4:
    f ( x ) = ∣ x + a a a a x + a a a a x + a ∣ ,求 f ′ ( x ) f(x) = \begin {vmatrix} x+ a & a& a \\ a & x+a & a\\a & a & x+a \end{vmatrix},求f'(x) f(x)= x+aaaax+aaaax+a ,求f(x)

    解:
    按照例3的公式求导即可。该公式的证明(验证)见例3中的链接地址。

    f ′ ( x ) = 3 ∣ x + a a a x + a ∣ = 3 x 2 + 6 a x f'(x) = 3 \begin {vmatrix} x+ a & a\\a&x+a\end{vmatrix} = 3x^2+6ax f(x)=3 x+aaax+a =3x2+6ax

    例5:

    验证 d ( A B ) = d A B + A d B d(AB) = dA B + AdB d(AB)=dAB+AdB

    解:

    设A= { f 1 f 2 f 3 f 4 } \begin {Bmatrix} f_1 & f_2\\f_3& f_4\end{Bmatrix} {f1f3f2f4},B= { g 1 g 2 g 3 g 4 } \begin {Bmatrix} g_1 & g_2\\g_3& g_4\end{Bmatrix} {g1g3g2g4}
    则A’= { f 1 ′ f 2 ′ f 3 ′ f 4 ′ } \begin {Bmatrix} f^{'}_1 & f^{'}_2\\f^{'}_3& f^{'}_4\end{Bmatrix} {f1f3f2f4},B’= { g 1 ′ g 2 ′ g 3 ′ g 4 ′ } \begin {Bmatrix} g^{'}_1 & g^{'}_2\\g^{'}_3& g^{'}_4\end{Bmatrix} {g1g3g2g4}

    d A B = d { f 1 g 1 + f 2 g 3 f 1 g 2 + f 2 g 4 f 3 g 1 + f 4 g 3 f 3 g 2 + f 4 g 4 } = { f 1 ′ g 1 + f 2 ′ g 3 f 1 ′ g 2 + f 2 ′ g 4 f 3 ′ g 1 + f 4 ′ g 3 f 3 ′ g 2 + f 4 ′ g 4 } + { f 1 g 1 ′ + f 2 g 3 ′ f 1 g 2 ′ + f 2 g 4 ′ f 3 g 1 ′ + f 4 g 3 ′ f 3 g 2 ′ + f 4 g 4 ′ } = d A b + A d B dAB = d \begin {Bmatrix} f_1g_1+f_2g_3 & f_1g_2+f_2g_4\\f_3g_1+f_4g_3& f_3g_2+f_4g_4\end{Bmatrix} =\\ \begin {Bmatrix} f^{'}_1g_1+f^{'}_2g_3 & f^{'}_1g_2+f^{'}_2g_4\\f^{'}_3g_1+f^{'}_4g_3& f^{'}_3g_2+f^{'}_4g_4\end{Bmatrix} +\\ \begin {Bmatrix} f_1g^{'}_1+f_2g^{'}_3 & f_1g^{'}_2+f_2g^{'}_4\\f_3g^{'}_1+f_4g^{'}_3& f_3g^{'}_2+f_4g^{'}_4\end{Bmatrix} =dAb+AdB dAB=d{f1g1+f2g3f3g1+f4g3f1g2+f2g4f3g2+f4g4}={f1g1+f2g3f3g1+f4g3f1g2+f2g4f3g2+f4g4}+{f1g1+f2g3f3g1+f4g3f1g2+f2g4f3g2+f4g4}=dAb+AdB

    例6:

    求dtr(AB)对A的导数。

    解:

    t r ( A B ) = ∑ i , j A i j B j i tr(AB) = \sum_{i,j}A_{ij}B_{ji} tr(AB)=i,jAijBji

    d t r ( A B ) = ∂ ( ∑ i , j A i j B j i ) T ∂ A d A = ( B T ) T d A dtr(AB) = \frac{\partial (\sum_{i,j}A_{ij}B_{ji})^T}{\partial A} dA = (B^T)^TdA dtr(AB)=A(i,jAijBji)TdA=(BT)TdA

    即dtr(AB)对A的导数是 B T B^T BT

    例7: 求 ∂ t r ( A ) ∂ A \frac{\partial tr(A)}{\partial A} Atr(A)

    解:
    ∂ t r ( A ) ∂ A = I \frac{\partial tr(A)}{\partial A} = I Atr(A)=I

    说明:
    此题是标量对矩阵求导的典型例子,此种类型还是按照开头讲的思路计算。这个题没什么技巧,记住即可。

  • 相关阅读:
    双非本科是如何逆袭的?这位同学有点东西
    HCIA VLAN间通信 多臂路由与单臂路由
    自学Python 56 多线程开发(六)使用 Process
    2023年中国轮胎模具需求量、竞争格局及行业市场规模分析[图]
    win7开机有画面进系统黑屏怎么办
    搭建深度学习网络时节约GPU显存的技巧
    艾体宝案例 | 智能家居销售商的数字化转型故事
    Pycharm常用快捷键和替换正则表达式
    uniapp项目实战系列(3):底部导航栏与头部导航栏的配置
    微信小程序canvas画布绘制base64图片并保存图片到相册中
  • 原文地址:https://blog.csdn.net/m0_37567738/article/details/133872924