• 矩阵求导详解


    基础

    矩阵求导的本质 : d A d B \frac{dA}{dB} dBdA :矩阵A的每个元素对矩阵B的每个元素进行求导。
     
     假设矩阵A为 1 × 1 1\times 1 1×1,矩阵B为 1 × n 1\times n 1×n, 则 d A d B \frac{dA}{dB} dBdA 1 × n 1\times n 1×n
     假设矩阵A为 q × p q\times p q×p,矩阵B为 m × n m\times n m×n, 则 d A d B \frac{dA}{dB} dBdA q × p × m × n q\times p\times m\times n q×p×m×n

    标量不变,向量拉伸
    前面横向拉伸,后面纵面拉伸

    示例

    例1. 求 d f ( x ) d x \frac{df(x)}{dx} dxdf(x),其中 f ( x ) f(x) f(x)为标量函数, f ( x ) = f ( x 1 , x 2 , x 3 . . . x n ) f(x)=f(x_1, x_2,x_3...x_n) f(x)=f(x1,x2,x3...xn),x为向量函数。
    解: d f ( x ) d x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 . . . ∂ f ( x ) ∂ x n ] \frac{df(x)}{dx}=

    [f(x)x1f(x)x2...f(x)xn]" role="presentation" style="position: relative;">[f(x)x1f(x)x2...f(x)xn]
    dxdf(x)= x1f(x)x2f(x)...xnf(x)

    标量 f ( x ) f(x) f(x)不变,向量 x x x纵向拉伸

    例2. 求 d f ( x ) d x \frac{df(x)}{dx} dxdf(x),其中 f ( x ) f(x) f(x)为向量函数, f ( x ) = [ f 1 ( x ) f 2 ( x ) . . . f n ( x ) ] f(x)=

    [f1(x)f2(x)...fn(x)]" role="presentation" style="position: relative;">[f1(x)f2(x)...fn(x)]
    f(x)= f1(x)f2(x)...fn(x) , x x x为标量。

    解: d f ( x ) d x = [ ∂ f 1 ( x ) ∂ x ∂ f 2 ( x ) ∂ x . . . ∂ f n ( x ) ∂ x ] \frac{df(x)}{dx}=

    [f1(x)xf2(x)x...fn(x)x]" role="presentation" style="position: relative;">[f1(x)xf2(x)x...fn(x)x]
    dxdf(x)=[xf1(x)xf2(x)...xfn(x)]

    向量函数横向拉伸,标量x不变

    例3. 求 d f ( x ) d x \frac{df(x)}{dx} dxdf(x),其中 f ( x ) f(x) f(x)为向量函数, f ( x ) = [ f 1 ( x ) f 2 ( x ) . . . f n ( x ) ] f(x)=

    [f1(x)f2(x)...fn(x)]" role="presentation" style="position: relative;">[f1(x)f2(x)...fn(x)]
    f(x)= f1(x)f2(x)...fn(x) , x x x为向量, x = [ x 1 x 2 . . . x n ] x=
    [x1x2...xn]" role="presentation" style="position: relative;">[x1x2...xn]
    x= x1x2...xn

    解: d f ( x ) d x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 . . . ∂ f ( x ) ∂ x n ] = [ ∂ f 1 ( x ) ∂ x 1 ∂ f 2 ( x ) ∂ x 1 . . . ∂ f n ( x ) ∂ x 1 ∂ f 1 ( x ) ∂ x 2 ∂ f 2 ( x ) ∂ x 2 . . . ∂ f n ( x ) ∂ x 2 . . . ∂ f 1 ( x ) ∂ x n ∂ f 2 ( x ) ∂ x n . . . ∂ f n ( x ) ∂ x n ] \frac{df(x)}{dx}=

    [f(x)x1f(x)x2...f(x)xn]" role="presentation" style="position: relative;">[f(x)x1f(x)x2...f(x)xn]
    =
    [f1(x)x1f2(x)x1...fn(x)x1f1(x)x2f2(x)x2...fn(x)x2...f1(x)xnf2(x)xn...fn(x)xn]" role="presentation" style="position: relative;">[f1(x)x1f2(x)x1...fn(x)x1f1(x)x2f2(x)x2...fn(x)x2...f1(x)xnf2(x)xn...fn(x)xn]
    dxdf(x)= x1f(x)x2f(x)...xnf(x) = x1f1(x)x2f1(x)...xnf1(x)x1f2(x)x2f2(x)xnf2(x).........x1fn(x)x2fn(x)xnfn(x)

    常见矩阵求导公式

    例1: 求 d f ( x ) d x \frac{df(x)}{dx} dxdf(x),其中 f ( x ) = A T X f(x)=A^TX f(x)=ATX A = [ a 1 a 2 . . . a n ] n × 1 A=

    [a1a2...an]" role="presentation" style="position: relative;">[a1a2...an]
    _{n\times 1} A= a1a2...an n×1, X = [ x 1 x 2 . . . x n ] n × 1 X=
    [x1x2...xn]" role="presentation" style="position: relative;">[x1x2...xn]
    _{n\times 1}
    X= x1x2...xn n×1

    解: f ( x ) = A T X = ∑ i = 1 n a i x i f(x)=A^TX=\sum^n_{i=1}a_ix_i f(x)=ATX=i=1naixi
    d f ( x ) d x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 . . . ∂ f ( x ) ∂ x n ] = [ a 1 a 2 . . . a n ] = A \frac{df(x)}{dx}=
    [f(x)x1f(x)x2...f(x)xn]" role="presentation" style="position: relative;">[f(x)x1f(x)x2...f(x)xn]
    =
    [a1a2...an]" role="presentation" style="position: relative;">[a1a2...an]
    =A
    dxdf(x)= x1f(x)x2f(x)...xnf(x) = a1a2...an =A

    由于{标量 T = ^T= T=标量},所以 f ( x ) = A T X = X T A f(x)=A^TX=X^TA f(x)=ATX=XTA,所以 d A T X d x = d X T A d x = A \frac{dA^TX}{dx}=\frac{dX^TA}{dx}=A dxdATX=dxdXTA=A

    例2: 求 d f ( x ) d x \frac{df(x)}{dx} dxdf(x),其中 f ( x ) = X T A X f(x)=X^TAX f(x)=XTAX, x = [ x 1 x 2 . . . x n ] n × 1 x=

    [x1x2...xn]" role="presentation" style="position: relative;">[x1x2...xn]
    _{n\times 1} x= x1x2...xn n×1 A = [ a 11 a 12 . . . a 1 n a 21 a 22 . . . a 2 n . . . a n 1 a n 2 . . . a n n ] A=
    [a11a12...a1na21a22...a2n...an1an2...ann]" role="presentation" style="position: relative;">[a11a12...a1na21a22...a2n...an1an2...ann]
    A= a11a21...an1a12a22an2.........a1na2nann

    解: f ( x ) = X 1 × n T A n × n X n × 1 f(x)=X^T_{1\times n }A_{n\times n}X_{n\times 1} f(x)=X1×nTAn×nXn×1,为标量
    f ( x ) = [ x 1 x 2 . . . x n ] [ a 11 a 12 . . . a 1 n a 21 a 22 . . . a 2 n . . . a n 1 a n 2 . . . a n n ] [ x 1 x 2 . . . x n ] = ∑ i = 1 n ∑ j = 1 n a i j x i x j f(x)=
    [x1x2...xn]" role="presentation" style="position: relative;">[x1x2...xn]
    [a11a12...a1na21a22...a2n...an1an2...ann]" role="presentation" style="position: relative;">[a11a12...a1na21a22...a2n...an1an2...ann]
    [x1x2...xn]" role="presentation" style="position: relative;">[x1x2...xn]
    =\sum^n_{i=1}\sum^n_{j=1}a_{ij}x_ix_j
    f(x)=[x1x2...xn] a11a21...an1a12a22an2.........a1na2nann x1x2...xn =i=1nj=1naijxixj

    d f ( x ) d x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 . . . ∂ f ( x ) ∂ x n ] = [ ∑ j = 1 n a 1 j x j + ∑ i = 1 n a i 1 x i ∑ j = 1 n a 2 j x j + ∑ i = 1 n a i 2 x i . . . ∑ j = 1 n a n j x j + ∑ i = 1 n a i n x i ] = [ ∑ j = 1 n a 1 j x j ∑ j = 1 n a 2 j x j . . . ∑ j = 1 n a n j x j ] + [ ∑ i = 1 n a i 1 x i ∑ i = 1 n a i 2 x i . . . ∑ i = 1 n a i n x i ] \frac{df(x)}{dx}=

    [f(x)x1f(x)x2...f(x)xn]" role="presentation" style="position: relative;">[f(x)x1f(x)x2...f(x)xn]
    =
    [j=1na1jxj+i=1nai1xij=1na2jxj+i=1nai2xi...j=1nanjxj+i=1nainxi]" role="presentation" style="position: relative;">[j=1na1jxj+i=1nai1xij=1na2jxj+i=1nai2xi...j=1nanjxj+i=1nainxi]
    =
    [j=1na1jxjj=1na2jxj...j=1nanjxj]" role="presentation" style="position: relative;">[j=1na1jxjj=1na2jxj...j=1nanjxj]
    +
    [i=1nai1xii=1nai2xi...i=1nainxi]" role="presentation" style="position: relative;">[i=1nai1xii=1nai2xi...i=1nainxi]
    dxdf(x)= x1f(x)x2f(x)...xnf(x) = j=1na1jxj+i=1nai1xij=1na2jxj+i=1nai2xi...j=1nanjxj+i=1nainxi = j=1na1jxjj=1na2jxj...j=1nanjxj + i=1nai1xii=1nai2xi...i=1nainxi = [ a 11 a 12 . . . a 1 n a 21 a 22 . . . a 2 n . . . a n 1 a n 2 . . . a n n ] [ x 1 x 2 . . . x n ]
    [a11a12...a1na21a22...a2n...an1an2...ann]" role="presentation" style="position: relative;">[a11a12...a1na21a22...a2n...an1an2...ann]
    [x1x2...xn]" role="presentation" style="position: relative;">[x1x2...xn]
    a11a21...an1a12a22an2.........a1na2nann x1x2...xn
    + [ a 11 a 21 . . . a n 1 a 12 a 22 . . . a n 2 . . . a 1 n a 2 n . . . a n n ] [ x 1 x 2 . . . x n ] = A X + A T X
    [a11a21...an1a12a22...an2...a1na2n...ann]" role="presentation" style="position: relative;">[a11a21...an1a12a22...an2...a1na2n...ann]
    [x1x2...xn]" role="presentation" style="position: relative;">[x1x2...xn]
    =AX+A^TX
    a11a12...a1na21a22a2n.........an1an2ann x1x2...xn =AX+ATX

    参考

    https://www.bilibili.com/video/BV1xk4y1B7RQ?p=4

  • 相关阅读:
    SSM毕业设计管理系统
    如何能够在发现问题和提问的时候一并带出自己的解决方案
    树莓派4b安装xenomai3(xenomai3 on raspberry4b)
    CIKM 2022 AnalytiCup Competition: 联邦异质任务学习
    最强分布式搜索引擎——ElasticSearch
    每日五题-202112
    罗丹明聚乙二醇叠氮,Rhodamine-PEG-N3,N3-PEG-Rhodamine,罗丹明PEG叠氮,叠氮PEG罗丹明,叠氮聚乙二醇罗丹明
    python自动化测试中装饰器@ddt和@data源码解析
    【C++】STL详解(十二)—— 用哈希表封装出unordered_map和unordered_set
    Grafana 10 新特性解读:体验与协作全面提升
  • 原文地址:https://blog.csdn.net/weixin_45626706/article/details/126356011