• 【李沐深度学习笔记】矩阵计算(4)


    课程地址和说明

    线性代数实现p4
    本系列文章是我学习李沐老师深度学习系列课程的学习笔记,可能会对李沐老师上课没讲到的进行补充。
    本节是第四篇,由于CSDN限制,只能被迫拆分

    矩阵计算

    矩阵的导数运算

    向量对向量求导的基本运算规则

    已知向量函数 y → = f → ( x → ) \overrightarrow y=\overrightarrow {f}(\overrightarrow x) y =f (x )与向量 x → = [ x 1 x 2 ⋮ x m ] m × 1 \overrightarrow x=\begin{bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{m} \end{bmatrix}_{m\times 1} x = x1x2xm m×1

    • y → = a → \overrightarrow y=\overrightarrow a y =a ,且 a → \overrightarrow a a 不是 x → \overrightarrow x x 的函数(即 a → \overrightarrow a a 中没有分量和 x → \overrightarrow x x 相关)时,则有:
      ∂ y → ∂ x → = [ ∂ f ( x → ) ∂ x 1 ∂ f ( x → ) ∂ x 2 ⋮ ∂ f ( x → ) ∂ x m ] = [ 0 0 ⋮ 0 ] = 0 → \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}= \begin{bmatrix} \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{1}}}\\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{2}}}\\ \vdots \\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}=\begin{bmatrix} 0\\ 0\\ \vdots \\ 0 \end{bmatrix}=\overrightarrow 0 x y = x1f(x )x2f(x )xmf(x ) = 000 =0
    • y → = x → \overrightarrow y=\overrightarrow x y =x 时,即 y → = [ f 1 ( x → ) f 2 ( x → ) ⋮ f m ( x → ) ] = [ x 1 x 2 ⋮ x m ] \overrightarrow y=\begin{bmatrix} f_{1}(\overrightarrow x) \\ f_{2}(\overrightarrow x) \\ \vdots \\ f_{m}(\overrightarrow x) \end{bmatrix}=\begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{m} \end{bmatrix} y = f1(x )f2(x )fm(x ) = x1x2xm ,则有:
      ∂ y → ∂ x → = [ ∂ f ( x → ) ∂ x 1 ∂ f ( x → ) ∂ x 2 ⋮ ∂ f ( x → ) ∂ x m ] = [ ∂ f 1 ( x → ) ∂ x 1 ∂ f 2 ( x → ) ∂ x 1 … ∂ f n ( x → ) ∂ x 1 ∂ f 1 ( x → ) ∂ x 2 ∂ f 2 ( x → ) ∂ x 2 … ∂ f n ( x → ) ∂ x 2 ⋮ ⋮ ⋱ ⋮ ∂ f 1 ( x → ) ∂ x m ∂ f 2 ( x → ) ∂ x m … ∂ f n ( x → ) ∂ x m ] m × n = [ 1 0 … 0 0 1 … 0 ⋮ ⋮ ⋱ ⋮ 0 0 … 1 ] = I 或 E (单位矩阵的两种不同记号,含义一致) \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}= \begin{bmatrix} \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{1}}}\\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{2}}}\\ \vdots \\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}=\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{1}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{1}}} \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{2}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{2}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{2}}} \\ \vdots & \vdots & \ddots &\vdots \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{m}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{m}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}_{m\times n}=\begin{bmatrix} 1& 0&\dots &0 \\ 0& 1& \dots &0 \\ \vdots & \vdots & \ddots &\vdots \\ 0 & 0& \dots &1 \end{bmatrix}=\bm{I}或\bm{E}(单位矩阵的两种不同记号,含义一致) x y = x1f(x )x2f(x )xmf(x ) = x1f1(x )x2f1(x )xmf1(x )x1f2(x )x2f2(x )xmf2(x )x1fn(x )x2fn(x )xmfn(x ) m×n= 100010001 =IE(单位矩阵的两种不同记号,含义一致)
    • y → = A x → \overrightarrow y=\bm{A}\overrightarrow {x} y =Ax A = [ a 11 a 12 ⋯ a 1 m a 21 a 22 ⋯ a 2 m ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m m ] \bm{A}=\begin{bmatrix} a_{11}&a_{12} & \cdots & a_{1m}\\ a_{21}&a_{22} & \cdots & a_{2m} \\ \vdots & \vdots & \ddots &\vdots \\ a_{m1}&a_{m2} & \cdots & a_{mm} \end{bmatrix} A= a11a21am1a12a22am2a1ma2mamm ,则有:
      ∂ y → ∂ x → = ∂ A x → ∂ x → = A T (按分母布局) \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}=\frac{\partial {\bm{A}\overrightarrow x}}{\partial {\overrightarrow x}} =\bm{A}^{T}(按分母布局) x y =x Ax =AT(按分母布局)
      ∂ y → ∂ x → = ∂ A x → ∂ x → = A (按分子布局) \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}=\frac{\partial {\bm{A}\overrightarrow x}}{\partial {\overrightarrow x}} =\bm{A}(按分子布局) x y =x Ax =A(按分子布局)
      (证明见本节第三篇
    • y → = x → T A \overrightarrow y=\overrightarrow {x}^{T}\bm{A} y =x TA A = [ a 11 a 12 ⋯ a 1 m a 21 a 22 ⋯ a 2 m ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m m ] \bm{A}=\begin{bmatrix} a_{11}&a_{12} & \cdots & a_{1m}\\ a_{21}&a_{22} & \cdots & a_{2m} \\ \vdots & \vdots & \ddots &\vdots \\ a_{m1}&a_{m2} & \cdots & a_{mm} \end{bmatrix} A= a11a21am1a12a22am2a1ma2mamm
      y → = x → T A = [ x 1 , x 2 , … , x m ] ⋅ [ a 11 a 12 ⋯ a 1 m a 21 a 22 ⋯ a 2 m ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m m ] = [ a 11 x 1 + a 21 x 2 + ⋯ + a m 1 x m , a 12 x 1 + a 22 x 2 + ⋯ + a m 2 x m , … , a 1 m x 1 + a 2 m x 2 + ⋯ + a m m x m ] \overrightarrow y=\overrightarrow {x}^{T}\bm{A}=\begin{bmatrix} x_{1}, & x_{2} ,& \dots ,& x_{m} \end{bmatrix}\cdot \begin{bmatrix} a_{11}&a_{12} & \cdots & a_{1m}\\ a_{21}&a_{22} & \cdots & a_{2m} \\ \vdots & \vdots & \ddots &\vdots \\ a_{m1}&a_{m2} & \cdots & a_{mm} \end{bmatrix}=\begin{bmatrix} a_{11}x_{1}+a_{21}x_{2}+\dots +a_{m1}x_{m}, & a_{12}x_{1}+a_{22}x_{2}+\dots +a_{m2}x_{m} ,& \dots ,& a_{1m}x_{1}+a_{2m}x_{2}+\dots +a_{mm}x_{m} \end{bmatrix} y =x TA=[x1,x2,,xm] a11a21am1a12a22am2a1ma2mamm =[a11x1+a21x2++am1xm,a12x1+a22x2++am2xm,,a1mx1+a2mx2++ammxm],所以按一一对应法则只能理解成(这里行向量列向量混用了,没办法) y → = [ f 1 ( x → ) f 2 ( x → ) ⋮ f m ( x → ) ] = [ a 11 x 1 + a 21 x 2 + ⋯ + a m 1 x m a 12 x 1 + a 22 x 2 + ⋯ + a m 2 x m ⋮ a 1 m x 1 + a 2 m x 2 + ⋯ + a m m x m ] \overrightarrow y=\begin{bmatrix} f_{1}(\overrightarrow x) \\ f_{2}(\overrightarrow x) \\ \vdots \\ f_{m}(\overrightarrow x) \end{bmatrix}=\begin{bmatrix} a_{11}x_{1}+a_{21}x_{2}+\dots +a_{m1}x_{m}\\ a_{12}x_{1}+a_{22}x_{2}+\dots +a_{m2}x_{m}\\ \vdots \\ a_{1m}x_{1}+a_{2m}x_{2}+\dots +a_{mm}x_{m} \end{bmatrix} y = f1(x )f2(x )fm(x ) = a11x1+a21x2++am1xma12x1+a22x2++am2xma1mx1+a2mx2++ammxm ,则有:
      ∂ y → ∂ x → = ∂ x → T A ∂ x → = [ ∂ f 1 ( x → ) ∂ x 1 ∂ f 2 ( x → ) ∂ x 1 … ∂ f n ( x → ) ∂ x 1 ∂ f 1 ( x → ) ∂ x 2 ∂ f 2 ( x → ) ∂ x 2 … ∂ f n ( x → ) ∂ x 2 ⋮ ⋮ ⋱ ⋮ ∂ f 1 ( x → ) ∂ x m ∂ f 2 ( x → ) ∂ x m … ∂ f n ( x → ) ∂ x m ] = [ a 11 a 21 … a m 1 a 12 a 22 … a m 2 ⋮ ⋮ ⋱ ⋮ a 1 m a 2 m … a m m ] = A T \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}=\frac{\partial {\overrightarrow {x}^{T}\bm{A}}}{\partial {\overrightarrow x}} =\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{1}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{1}}} \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{2}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{2}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{2}}} \\ \vdots & \vdots & \ddots &\vdots \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{m}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{m}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}=\begin{bmatrix} a_{11}& a_{21}&\dots &a_{m1} \\ a_{12}& a_{22}& \dots &a_{m2} \\ \vdots & \vdots & \ddots &\vdots \\ a_{1m}& a_{2m}& \dots &a_{mm} \end{bmatrix}=\bm{A}^{T} x y =x x TA= x1f1(x )x2f1(x )xmf1(x )x1f2(x )x2f2(x )xmf2(x )x1fn(x )x2fn(x )xmfn(x ) = a11a12a1ma21a22a2mam1am2amm =AT
    • y → = a u → \overrightarrow y=a\overrightarrow u y =au a a a是任意常数, u → = u → ( x → ) \overrightarrow u=\overrightarrow {u}(\overrightarrow x) u =u (x ),则有:
      ∂ y → ∂ x → = a ∂ u → ∂ x → \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}=a\frac{\partial {\overrightarrow u}}{\partial\overrightarrow x} x y =ax u
    • y → = A u → \overrightarrow y=\bm{A}\overrightarrow u y =Au u → = u → ( x → ) \overrightarrow u=\overrightarrow {u}(\overrightarrow x) u =u (x ) A \bm{A} A中的元素与 x → \overrightarrow x x 中的元素无关系,则有:
      ∂ y → ∂ x → = A ∂ u → ∂ x → \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}=\bm{A}\frac{\partial {\overrightarrow u}}{\partial\overrightarrow x} x y =Ax u
    • y → = u → + v → \overrightarrow y=\overrightarrow u+\overrightarrow v y =u +v 时, u → = u → ( x → ) , v → = v → ( x → ) \overrightarrow u = \overrightarrow {u}(\overrightarrow x),\overrightarrow v = \overrightarrow {v}(\overrightarrow x) u =u (x ),v =v (x ),则有:
      ∂ y → ∂ x → = ∂ u → ∂ x → + ∂ v → ∂ x → \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}=\frac{\partial {\overrightarrow u}}{\partial\overrightarrow x}+\frac{\partial {\overrightarrow v}}{\partial\overrightarrow x} x y =x u +x v

    拓展到矩阵

    就是升维度,升到了四维空间,矩阵可以相当于四维空间里的向量,反正挺难懂的,我看个乐hhhhhhhh

  • 相关阅读:
    tensorflow数据统计
    【深度学习】之 卷积神经网络(CNN)概念 简析:名词介绍 || 为何要用卷积? || 卷积 || 激活函数 || 池化层 || 全连接层 || CNN的优点
    基于函数计算FC3.0 部署AI数字绘画stable-diffusion自定义模型
    关于访问权限控制问题
    横向AlGaN/GaN基SBD结构及物理模型数据库的开发
    踩雷react-useRef钩子函数
    【初识 Docker | 基础篇】 Docker 搭建仓库
    【03】Spring源码-手写篇-手写AOP实现(上)
    JavaWeb开发之——数据库设计(20)
    顶象首期业务安全月报
  • 原文地址:https://blog.csdn.net/qq_30204431/article/details/133183823