• PGD(projected gradient descent)算法源码解析


    论文链接:https://arxiv.org/abs/1706.06083
    源码出处:https://github.com/Harry24k/adversarial-attacks-pytorch/tree/master


    PGDLinf源码

    import torch
    import torch.nn as nn
    
    from ..attack import Attack
    
    
    class PGD(Attack):
        r"""
        PGD in the paper 'Towards Deep Learning Models Resistant to Adversarial Attacks'
        [https://arxiv.org/abs/1706.06083]
    
        Distance Measure : Linf
    
        Arguments:
            model (nn.Module): model to attack.
            eps (float): maximum perturbation. (Default: 8/255)
            alpha (float): step size. (Default: 2/255)
            steps (int): number of steps. (Default: 10)
            random_start (bool): using random initialization of delta. (Default: True)
    
        Shape:
            - images: :math:`(N, C, H, W)` where `N = number of batches`, `C = number of channels`,        `H = height` and `W = width`. It must have a range [0, 1].
            - labels: :math:`(N)` where each value :math:`y_i` is :math:`0 \leq y_i \leq` `number of labels`.
            - output: :math:`(N, C, H, W)`.
    
        Examples::
            >>> attack = torchattacks.PGD(model, eps=8/255, alpha=1/255, steps=10, random_start=True)
            >>> adv_images = attack(images, labels)
    
        """
        def __init__(self, model, eps=8/255,
                     alpha=2/255, steps=10, random_start=True):
            super().__init__("PGD", model)
            self.eps = eps
            self.alpha = alpha
            self.steps = steps
            self.random_start = random_start
            self.supported_mode = ['default', 'targeted']
    
        def forward(self, images, labels):
            r"""
            Overridden.
            """
            self._check_inputs(images)
    
            images = images.clone().detach().to(self.device)
            labels = labels.clone().detach().to(self.device)
    
            if self.targeted:
                target_labels = self.get_target_label(images, labels)
    
            loss = nn.CrossEntropyLoss()
    
            adv_images = images.clone().detach()
    
            if self.random_start:
                # Starting at a uniformly random point
                adv_images = adv_images + torch.empty_like(adv_images).uniform_(-self.eps, self.eps)
                adv_images = torch.clamp(adv_images, min=0, max=1).detach()
    
            for _ in range(self.steps):
                adv_images.requires_grad = True
                outputs = self.get_logits(adv_images)
    
                # Calculate loss
                if self.targeted:
                    cost = -loss(outputs, target_labels)
                else:
                    cost = loss(outputs, labels)
    
                # Update adversarial images
                grad = torch.autograd.grad(cost, adv_images,
                                           retain_graph=False, create_graph=False)[0]
    
                adv_images = adv_images.detach() + self.alpha*grad.sign()
                delta = torch.clamp(adv_images - images, min=-self.eps, max=self.eps)
                adv_images = torch.clamp(images + delta, min=0, max=1).detach()
    
            return adv_images
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80

    解析

    PGD算法(projected gradient descent)是在BIM算法的基础上的小改进,二者非常相近,BIM算法的源码解析在上一篇博客中,建议先看上一篇博客理解BIM算法的原理。

    具体来说,在BIM算法开始迭代前,就先给图像加上扰动(在 ϵ \epsilon ϵ邻域内均匀分布)。换句话说,也就是图像开始迭代的起点随机,而不是像BIM算法一样从原始图像开始迭代。论文这么做的目的是为了研究从随机的起点开始迭代扰动,损失能够达到的不同的局部最大值的关系。

    PGD算法的公式如下所示: X 0 a d v = X + η , X N + 1 a d v = C l i p X , ϵ { X N a d v + α s i g n ( ▽ x J ( X N a d v , y t r u e ) ) } X^{adv}_0=X+\eta,X^{adv}_{N+1}=Clip_{X,\epsilon}\{X^{adv}_N+\alpha sign(\triangledown_{x}J(X^{adv}_N,y_{true}))\} X0adv=X+η,XN+1adv=ClipX,ϵ{XNadv+αsign(xJ(XNadv,ytrue))}其中, η \eta η是一个随机扰动,在 ϵ \epsilon ϵ邻域内均匀分布。

    eps:即 ϵ \epsilon ϵ,表示最大扰动。
    alpha:即 α \alpha α,表示每次迭代中扰动的增加量(或减少量)。
    steps:表示迭代次数。
    random_start:迭代的起点是否随机,也就是是否要加随机扰动 η \eta η,若为False,则该算法就和BIM算法相同。
    images = images.clone().detach().to(self.device)clone()将图像克隆到一块新的内存区(pytorch默认同样的tensor共享一块内存区);detach()是将克隆的新的tensor从当前计算图中分离下来,作为叶节点,从而可以计算其梯度;to()作用就是将其载入设备。
    target_labels = self.get_target_label(images, labels):若是有目标攻击的情况,获取目标标签。目标标签的选取有多种方式,例如可以选择与真实标签相差最大的标签,也可以随机选择除真实标签外的标签。
    loss = nn.CrossEntropyLoss():设置损失函数为交叉熵损失。

    adv_images = adv_images + torch.empty_like(adv_images).uniform_(-self.eps, self.eps)
    adv_images = torch.clamp(adv_images, min=0, max=1).detach()
    
    • 1
    • 2

    以上两行代码作用即为添加随机扰动,torch.empty_like(adv_images)会返回一个形状同adv_images的空的Tensor,uniform_(-self.eps, self.eps)将Tensor中的值在 [ − ϵ , ϵ ] [-\epsilon,\epsilon] [ϵ,ϵ]范围内的均匀分布中随机取值。torch.clamp(adv_images, min=0, max=1)会将图像中大于1的值设为1、小于0的值设为0,防止超出范围。
    adv_images.requires_grad = True:将requires_grad 参数设置为True,torch就会在图像的计算过程中自动计算计算图,用于反向梯度计算。
    outputs = self.get_logits(images):获得图像的在模型中的输出值。
    cost = -loss(outputs, target_labels):有目标情况下计算损失。
    cost = loss(outputs, labels):无目标情况下计算损失。
    grad = torch.autograd.grad(cost, images, retain_graph=False, create_graph=False)[0]costimages求导,得到梯度grad
    adv_images = images + self.alpha*grad.sign():根据公式在图像上沿着梯度上升方向以步长为 α \alpha α增加扰动。

    delta = torch.clamp(adv_images - images, min=-self.eps, max=self.eps)  # 得到改变量
    adv_images = torch.clamp(images + delta, min=0, max=1).detach()  # 防止图像超出有效范围
    
    • 1
    • 2

    以上两行代码就是裁剪的过程,同BIM算法中的 C l i p Clip Clip过程,防止图像超出 [ 0 , 1 ] [0,1] [0,1]范围。


    PGDL2源码

    import torch
    import torch.nn as nn
    
    from ..attack import Attack
    
    
    class PGDL2(Attack):
        r"""
        PGD in the paper 'Towards Deep Learning Models Resistant to Adversarial Attacks'
        [https://arxiv.org/abs/1706.06083]
    
        Distance Measure : L2
    
        Arguments:
            model (nn.Module): model to attack.
            eps (float): maximum perturbation. (Default: 1.0)
            alpha (float): step size. (Default: 0.2)
            steps (int): number of steps. (Default: 10)
            random_start (bool): using random initialization of delta. (Default: True)
    
        Shape:
            - images: :math:`(N, C, H, W)` where `N = number of batches`, `C = number of channels`,        `H = height` and `W = width`. It must have a range [0, 1].
            - labels: :math:`(N)` where each value :math:`y_i` is :math:`0 \leq y_i \leq` `number of labels`.
            - output: :math:`(N, C, H, W)`.
    
        Examples::
            >>> attack = torchattacks.PGDL2(model, eps=1.0, alpha=0.2, steps=10, random_start=True)
            >>> adv_images = attack(images, labels)
    
        """
    
        def __init__(self, model, eps=1.0, alpha=0.2, steps=10,
                     random_start=True, eps_for_division=1e-10):
            super().__init__("PGDL2", model)
            self.eps = eps
            self.alpha = alpha
            self.steps = steps
            self.random_start = random_start
            self.eps_for_division = eps_for_division
            self.supported_mode = ['default', 'targeted']
    
        def forward(self, images, labels):
            r"""
            Overridden.
            """
            self._check_inputs(images)
    
            images = images.clone().detach().to(self.device)
            labels = labels.clone().detach().to(self.device)
    
            if self.targeted:
                target_labels = self.get_target_label(images, labels)
    
            loss = nn.CrossEntropyLoss()
    
            adv_images = images.clone().detach()
            batch_size = len(images)
    
            if self.random_start:
                # Starting at a uniformly random point
                delta = torch.empty_like(adv_images).normal_()
                d_flat = delta.view(adv_images.size(0), -1)  # 将图片矩阵展平,方便计算范数
                n = d_flat.norm(p=2, dim=1).view(adv_images.size(0), 1, 1, 1)  # 计算每个向量的模长
                r = torch.zeros_like(n).uniform_(0, 1)  # 随机[0,1]之间均匀分布
                delta *= r/n*self.eps  # 即将delta向量变为模长为[0,eps]之间的向量
                adv_images = torch.clamp(adv_images + delta, min=0, max=1).detach()
    
            for _ in range(self.steps):
                adv_images.requires_grad = True
                outputs = self.get_logits(adv_images)
    
                # Calculate loss
                if self.targeted:
                    cost = -loss(outputs, target_labels)
                else:
                    cost = loss(outputs, labels)
    
                # Update adversarial images
                grad = torch.autograd.grad(cost, adv_images,
                                           retain_graph=False, create_graph=False)[0]
                grad_norms = torch.norm(grad.view(batch_size, -1), p=2, dim=1) + self.eps_for_division  # 这边加上了self.eps_for_division是为了防止下面除0
                grad = grad / grad_norms.view(batch_size, 1, 1, 1)  # 使梯度变为单位向量
                adv_images = adv_images.detach() + self.alpha * grad
    			
    			# 下面是为了改变后的图像与原图像的L2距离不超过eps
                delta = adv_images - images
                delta_norms = torch.norm(delta.view(batch_size, -1), p=2, dim=1)  # 计算改变量的模长
                factor = self.eps / delta_norms
                # 如果eps/delta_norms小于1,则说明改变量的L2距离超过了eps
                # 那么就会在factor与delta相乘的过程中被替换为eps
                factor = torch.min(factor, torch.ones_like(delta_norms))
                delta = delta * factor.view(-1, 1, 1, 1)
    
                adv_images = torch.clamp(images + delta, min=0, max=1).detach()
    
            return adv_images
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96

    解析

    PGDL2和PGDLinf的区别就在于度量样本之间的距离的范式不同,假设样本 X = ( x 1 , x 2 , x 3 , . . . , x n ) X=(x_1,x_2,x_3,...,x_n) X=(x1,x2,x3,...,xn),L2范数 ∣ ∣ X ∣ ∣ 2 = x 1 2 + x 2 2 + x 3 2 + . . . + x n 2 ||X||_2=\sqrt{x^2_1+x^2_2+x^2_3+...+x^2_n} ∣∣X2=x12+x22+x32+...+xn2 ,Linf范数 ∣ ∣ X ∣ ∣ ∞ = x 1 n + x 2 n + x 3 n + . . . + x n n n ||X||_\infty=\sqrt[n]{x^n_1+x^n_2+x^n_3+...+x^n_n} ∣∣X=nx1n+x2n+x3n+...+xnn ,简单来说,L2范数可以理解为向量的模长,Linf范数可以理解为向量中最大元素的值。

    二者在源码中的区别可以看我写在代码中的注释。

  • 相关阅读:
    Neural Sewing Machine (NSM)
    ideaSSM 校园兼职招聘平台bootstrap开发mysql数据库web结构java编程计算机网页源码maven项目
    5分钟让你在大火的多模态领域权威榜单VQA上超越人类
    事务提交之后再执行某些操作 → 你有哪些实现方式?
    制造企业如何三步实现进销存管理?
    Java全栈开发第一阶段--01.Java基础编程(基本语法-运算符)
    影刀Rpa 、英佑科技面试总结
    企业架构LNMP学习笔记18
    矩阵的乘法运算与css的3d变换(transform)
    MATLAB | 官方举办的动图绘制大赛 | 第二周赛情回顾
  • 原文地址:https://blog.csdn.net/Sankkl1/article/details/134215790