• 深度度量学习(Deep Metric Learning)函数求导公式


    归一化向量的求导

    向量模求导: 定义 x ∈ R 1 × d \bm{x}\in \mathbb{R}^{1\times d} xR1×d为一个列向量, ∣ ∣ x ∣ ∣ ||\bm{x}|| x为向量的模, x ^ \hat{\bm{x}} x^表示经过L2归一化之后的向量,长度为1

    ∂ ∂ x ( ∣ ∣ x ∣ ∣ ) = ∂ ∂ x x T x = 1 2 2 x T x T x = x T ∣ ∣ x ∣ ∣ = x ^ T \frac{\partial}{\partial \bm{x}}( ||\bm{x}||)=\frac{\partial }{\partial \bm{x}}\sqrt{\bm{x}^T \bm{x}}=\frac{1}{2}\frac{2\bm{x}^T}{\sqrt{\bm{x}^T \bm{x}}}=\frac{\bm{x}^T}{ ||\bm{x}||}=\hat{\bm{x}}^T x(x)=xxTx =21xTx 2xT=xxT=x^T

    归一化向量求导
    ∂ ∂ x ( x ^ ) = ∂ ∂ x ( x ∣ ∣ x ∣ ∣ ) = ∂ ∂ x ( x T 1 ∣ ∣ x ∣ ∣ ) = 1 ∣ ∣ x ∣ ∣ ∂ ∂ x ( x ) + x ∂ ∂ x ( 1 ∣ ∣ x ∣ ∣ ) = 1 ∣ ∣ x ∣ ∣ − x x ^ T ∣ ∣ x ∣ ∣ 2 = ∣ ∣ x ∣ ∣ I − x x ^ T ∣ ∣ x ∣ ∣ 2 = I − x ^ x ^ T ∣ ∣ x ∣ ∣ \frac{\partial}{\partial \bm{x}}(\hat{\bm{x}})=\frac{\partial}{\partial \bm{x}}(\frac{\bm{x}}{ ||\bm{x}||})=\frac{\partial}{\partial \bm{x}}(\bm{x}^T\frac{1}{ ||\bm{x}||})\\ =\frac{1}{ ||\bm{x}||}\frac{\partial}{\partial \bm{x}}(\bm{x})+\bm{x}\frac{\partial}{\partial \bm{x}}(\frac{1}{ ||\bm{x}||})\\ =\frac{1}{ ||\bm{x}||}-\bm{x}\frac{\hat{\bm{x}}^T}{||\bm{x}||^2}\\ =\frac{||\bm{x}||I-\bm{x}\hat{\bm{x}}^T}{||\bm{x}||^2}\\ =\frac{I-\hat{\bm{x}}\hat{\bm{x}}^T}{||\bm{x}||} x(x^)=x(xx)=x(xTx1)=x1x(x)+xx(x1)=x1xx2x^T=x2xIxx^T=xIx^x^T

    对比误差求导:

    对于anchor向量 q q q,对比向量为 { k i } i = 0 K \{k_i\}_{i=0}^K {ki}i=0K,其中 k 0 k_0 k0是正样本向量, { k i } i = 1 K \{k_i\}_{i=1}^K {ki}i=1K为K个负样本向量,基于交叉熵误差函数为:
    L = − log ⁡ exp ⁡ ( q T k 0 / τ ) ∑ i = 0 K exp ⁡ ( q T k i / τ ) L=-\log\frac{\exp(q^Tk_0/\tau)}{\sum_{i=0}^K\exp(q^Tk_i/\tau)} L=logi=0Kexp(qTki/τ)exp(qTk0/τ)
    这里,可以认为logits z = [ q T k 0 / τ , q T k 1 / τ , ⋯   , q T k K / τ ] z=[q^Tk_0/\tau,q^Tk_1/\tau,\cdots,q^Tk_K/\tau] z=[qTk0/τ,qTk1/τ,,qTkK/τ],这里 τ \tau τ是一个温度系数,可以设为可学习变量,也可以设置为一个常数
    根据交叉熵误差梯度计算规则(具体可以参考参考地址),可以得到:
    ∂ L ∂ z i = p i − y i \frac{\partial L}{\partial z_i}=p_i-y_i ziL=piyi
    其中 p i = exp ⁡ ( q T k i / τ ) ∑ i = 0 K exp ⁡ ( q T k i / τ ) p_i=\frac{\exp(q^Tk_i/\tau)}{\sum_{i=0}^K\exp(q^Tk_i/\tau)} pi=i=0Kexp(qTki/τ)exp(qTki/τ)为softmax归一化之后的概率值。对于锚样本 q q q的梯度,计算为:
    ∂ L ∂ q = ∑ i = 0 K ( ∂ L ∂ z i ∂ z i ∂ q ) = ∑ i = 0 K ( p i − y i ) k i / τ \frac{\partial L}{\partial q}=\sum_{i=0}^{K}(\frac{\partial L}{\partial z_i}\frac{\partial z_i}{\partial q})=\sum_{i=0}^{K}(p_i-y_i)k_i/\tau qL=i=0K(ziLqzi)=i=0K(piyi)ki/τ

    对比向量梯度为:
    ∂ L ∂ k i = ∂ L ∂ z i ∂ z i ∂ k i = ( p i − y i ) q / τ \frac{\partial L}{\partial k_i}=\frac{\partial L}{\partial z_i}\frac{\partial z_i}{\partial k_i}=(p_i-y_i)q/\tau kiL=ziLkizi=(piyi)q/τ

    温度系数的梯度为:
    ∂ L ∂ τ = ∑ i = 0 K ∂ L ∂ z i ∂ z i ∂ τ = ∑ i = 0 K ( p i − y i ) q T k i ( − 1 / τ 2 ) = ∑ i = 0 K ( y i − p i ) z i / τ \frac{\partial L}{\partial \tau}=\sum_{i=0}^{K}\frac{\partial L}{\partial z_i}\frac{\partial z_i}{\partial \tau}=\sum_{i=0}^K(p_i-y_i)q^Tk_i(-1/\tau^2)=\sum_{i=0}^K(y_i-p_i)z_i/\tau τL=i=0KziLτzi=i=0K(piyi)qTki(1/τ2)=i=0K(yipi)zi/τ

    pytorch代码实现:

    import torch
    import torch.nn.functional as F
    p = torch.tensor([[0.1, 0.2, 0.3],
                      [0.5, 0.6, 0.8]], requires_grad=True)
    k = torch.tensor([[0.4, 0.5, 0.7],
                      [0.6, 0.8, 0.6],
                      [0.6, 0.8, 0.6],], requires_grad=True)
    tau = torch.tensor(0.01, requires_grad=True)
    targets = torch.tensor([0, 0])
    
    class CrossEntropyLoss(torch.autograd.Function):
        @staticmethod
        def forward(ctx, p, k, tau, targets):
            logits = p @ k.T / tau
            targets = F.one_hot(targets, num_classes=logits.size(1)).float()
            prob = F.softmax(logits, 1)
            ctx.save_for_backward(logits, prob, targets, p, k, tau)
            logits = F.log_softmax(logits, 1)
            loss = -(targets * logits).sum(1).mean()
            return loss
    
        @staticmethod
        def backward(ctx, grad_output):
            logits, prob, targets, p, k, tau = ctx.saved_tensors
            grad_p = grad_output * (prob - targets) @ k / tau / targets.size(0)
            embed_size = p.size(1)
            prob_targets_repeat = (prob - targets).t().repeat(1, embed_size).view(-1,embed_size, p.size(0))
            grad_k = grad_output * (prob_targets_repeat * (p.t() / tau).unsqueeze(0)).sum(-1) / targets.size(0)
            tau_grad = grad_output * torch.sum((targets - prob) * logits / tau, dim=1).mean()
            grad_targets = None 
            return grad_p, grad_k, tau_grad, grad_targets
    
    loss = CrossEntropyLoss.apply(p, k, tau, targets)
    loss.backward()
    
    print(p.grad)
    print(k.grad)
    print(tau.grad)
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39

    输出:

    tensor([[ 9.9664, 14.9496, -4.9832],
            [10.0000, 15.0000, -5.0000]])
    tensor([[-29.9832, -39.9664, -54.9496],
            [ 14.9916,  19.9832,  27.4748],
            [ 14.9916,  19.9832,  27.4748]])
    tensor(-1249.1605)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    pytorch函数验证:

    import torch
    import torch.nn.functional as F
    p = torch.tensor([[0.1, 0.2, 0.3],
                      [0.5, 0.6, 0.8]], requires_grad=True)
    k = torch.tensor([[0.4, 0.5, 0.7],
                      [0.6, 0.8, 0.6],
                      [0.6, 0.8, 0.6],], requires_grad=True)
    tau = torch.tensor(0.01, requires_grad=True)
    
    logits = p @ k.T / tau
    targets = torch.tensor([0, 0])
    
    loss = F.cross_entropy(logits, targets)
    
    loss.backward()
    
    print(p.grad)
    print(k.grad)
    print(tau.grad)
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20

    输出:

    tensor([[ 9.9664, 14.9496, -4.9832],
            [10.0000, 15.0000, -5.0000]])
    tensor([[-29.9832, -39.9664, -54.9496],
            [ 14.9916,  19.9832,  27.4748],
            [ 14.9916,  19.9832,  27.4748]])
    tensor(-1249.1606)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
  • 相关阅读:
    【嵌入式linux】Ubuntu 修改用户名
    linux查看串口信息tty*
    Python3操作Redis最新版|CRUD基本操作(保姆级)
    基于Java+SpringBoot+Vue宠物咖啡馆平台设计和实现
    网络SDK套件:Rebex Total Pack 6.8.0.X FOR NET Crack
    SpringBoot篇
    从键盘任意输出一个整数n,若n不是素数,则计算并输出其所有因子(不包括1),否则输出该数为素数
    今年双十二值得买的数码好物推荐!双十二数码产品抢购攻略
    【opencv-c++】windows10系统Visual Studio 2022配置OpenCV4.6.0
    什么是电感?
  • 原文地址:https://blog.csdn.net/winycg/article/details/127837693