论文链接:https://arxiv.org/abs/1511.07528v1
源码出处:https://github.com/Harry24k/adversarial-attacks-pytorch/tree/master
FGSM、PGD等算法生成的对抗样本的扰动方向都是损失函数的梯度方向(可以参考本人以前的博客),该论文生成的对抗样本的扰动方向是目标类别标记的预测值的梯度方向,作者将这个梯度称为前向梯度(forward derivative)。作者将前向梯度定义为神经网络在训练期间学习的函数
F
F
F的雅可比矩阵(Jacobian matrix),即
▽
F
(
X
)
=
∂
F
(
X
)
∂
X
=
[
∂
F
j
(
X
)
∂
X
i
]
\triangledown F(X)=\dfrac{\partial F(X)}{\partial X}=
通过如上的前向梯度,我们可以知道每个像素点对模型分类的结果的影响程度,进而利用前向梯度信息来更新干净样本
X
X
X,生成的对抗样本就能被分类成为指定的类别。
作者引入了显著图的概念,该概念来自于计算机视觉领域,表示不同的输入特征对分类结果的影响程度。若发现某些特征对应分类器中某个特定输出,可通过在输入样本中增强或减弱这些特征来使分类器产生指定输出。
计算模型类别置信度输出层中的每一个类别置信度对于输入
X
X
X的偏导,该偏导值表示不同位置的像素点对分类结果的影响程度。公式如下:
▽
F
(
X
)
=
∂
F
(
X
)
∂
X
=
[
∂
F
j
(
X
)
∂
X
i
]
\triangledown F(X)=\dfrac{\partial F(X)}{\partial X}=
若是正向扰动,即增加的扰动
θ
>
0
\theta>0
θ>0,则:
S
(
X
,
t
)
[
i
]
=
{
0
i
f
∂
F
t
(
X
)
∂
X
i
<
0
o
r
∑
j
≠
t
∂
F
j
(
X
)
∂
X
i
>
0
(
∂
F
t
(
X
)
∂
X
i
)
∣
∑
j
≠
t
∂
F
j
(
X
)
∂
X
i
∣
o
t
h
e
r
w
i
s
e
S(X,t)[i]=\left\{
若是负向扰动,即增加的扰动
θ
<
0
\theta<0
θ<0,则:
S
(
X
,
t
)
[
i
]
=
{
0
i
f
∂
F
t
(
X
)
∂
X
i
>
0
o
r
∑
j
≠
t
∂
F
j
(
X
)
∂
X
i
<
0
(
∂
F
t
(
X
)
∂
X
i
)
∣
∑
j
≠
t
∂
F
j
(
X
)
∂
X
i
∣
o
t
h
e
r
w
i
s
e
S(X,t)[i]=\left\{
计算得到了
S
(
X
,
t
)
S(X,t)
S(X,t)就可以得知哪些像素位置的改变对目标分类
t
t
t的影响最大。
作者经实践发现,找到单个满足要求的特征很困难,所以作者提出了另一种解决方案。通过显著图寻找对分类器特定输出影响程度最大的输入特征对,即每次计算得到两个特征。简单来说,就是每次迭代只找一对像素进行更改,这一对像素满足以下公式: arg max ( p 1 , p 2 ) ( ∑ i = p 1 , p 2 ∂ F t ( X ) ∂ X i ) × ∣ ∑ i = p 1 , p 2 ∑ j ≠ t ∂ F j ( X ) ∂ X i ∣ \arg \mathop{\max}\limits_{(p_1,p_2)}\left(\sum\limits_{i=p_1,p_2}\dfrac{\partial F_t(X)}{\partial X_i}\right)\times\Bigg|\sum\limits_{i=p_1,p_2}\sum\limits_{j\ne t}\dfrac{\partial F_j(X)}{\partial X_i}\Bigg| arg(p1,p2)max(i=p1,p2∑∂Xi∂Ft(X))× i=p1,p2∑j=t∑∂Xi∂Fj(X)
相关代码的解析我写在了注释中。
import torch
import torch.nn as nn
from ..attack import Attack
class JSMA(Attack):
r"""
Jacobian Saliency Map Attack in the paper 'The Limitations of Deep Learning in Adversarial Settings'
[https://arxiv.org/abs/1511.07528v1]
This includes Algorithm 1 and 3 in v1
Code is from
[https://github.com/BorealisAI/advertorch/blob/master/advertorch/attacks/jsma.py]
Distance Measure : Linf
Arguments:
model (nn.Module): model to attack.
num_classes: number of clasess.
gamma: highest percentage of pixels can be modified
theta: perturb length, range is either [theta, 0], [0, theta]
Shape:
- images: :math:`(N, C, H, W)` where `N = number of batches`, `C = number of channels`, `H = height` and `W = width`. It must have a range [0, 1].
- labels: :math:`(N)` where each value :math:`y_i` is :math:`0 \leq y_i \leq` `number of labels`.
- output: :math:`(N, C, H, W)`.
Examples::
>>> attack = torchattacks.JSMA(model, num_classes=10, gamma=1.0, theta=1.0)
>>> adv_images = attack(images, labels)
"""
def __init__(self, model, num_classes=10, gamma=1.0, theta=1.0):
super().__init__("JSMA", model)
self.num_classes = num_classes # 类别数量
self.gamma = gamma # 最多修改的像素比例
self.theta = theta # 扰动的长度,范围为[theta, 0]、[0, theta]
self.supported_mode = ['default'] # 默认为有目标攻击(targeted)
# 计算x在模型的输出中output_class类的置信度对于x的梯度
def jacobian(self, model, x, output_class):
r"""
Compute the output_class'th row of a Jacobian matrix. In other words,
compute the gradient wrt to the output_class.
Return output_class'th row of the Jacobian matrix wrt x.
Arguments:
model: forward pass function.
x: input tensor.
output_class: the output class we want to compute the gradients.
"""
xvar = x.detach().clone().requires_grad_()
scores = model(xvar)
# compute gradients for the class output_class wrt the input x
# using backpropagation
torch.sum(scores[:, output_class]).backward()
return xvar.grad
# 计算前向梯度
def compute_forward_derivative(self, adv_images, labels):
# 分别计算所有label对于adv_images的梯度
# jacobians的维度为(labels大小,adv_images大小,adv_image的高,adv_image的宽)
jacobians = torch.stack([self.jacobian(
self.model, adv_images, adv_labels) for adv_labels in range(self.num_classes)])
# 此时grads第一维表示不同的label,第二维表示一个批次中不同的adv_image
grads = jacobians.view((jacobians.shape[0], jacobians.shape[1], -1))
# 获取目标标签labels的输出值对于adv_images的梯度
# grads_target的维度为(len(adv_images),adv_image总像素数)
grads_target = grads[labels, range(len(labels)), :]
# 获取其他标签labels的输出值对于adv_images的梯度,这里用到了pytorch的广播机制
# grads_other的维度为(len(adv_images),adv_image总像素数)
grads_other = grads.sum(dim=0) - grads_target
return grads_target, grads_other
# 计算梯度的和,这里也用到了pytorch的广播机制:
# 首先将grads最后两维设为(dim_x, 1)和(1, dim_x),相加后的tensor最后两维大小为(dim_x, dim_x)
# 该函数用于直接计算像素对的梯度和
def sum_pair(self, grads, dim_x):
return grads.view(-1, dim_x, 1) + grads.view(-1, 1, dim_x)
# 首先将grads最后两维设为(dim_x, 1)和(1, dim_x),相加后的tensor最后两维大小为(dim_x, dim_x)
# 该函数用于完成矩阵的和操作
def and_pair(self, cond, dim_x):
return cond.view(-1, dim_x, 1) & cond.view(-1, 1, dim_x)
# 计算显著图
def saliency_map(self, search_space, grads_target, grads_other):
dim_x = search_space.shape[1] # dim_x表示一张图片的像素点总数
# alpha in Algorithm 3 line 2
gradsum_target = self.sum_pair(grads_target, dim_x)
# beta in Algorithm 3 line 3
gradsum_other = self.sum_pair(grads_other, dim_x)
if self.theta > 0:
# 表示在gradsum_target中大于0同时在gradsum_other中小于0的像素对
scores_mask = (torch.gt(gradsum_target, 0) &
torch.lt(gradsum_other, 0))
else:
# 表示在gradsum_target中小于0同时在gradsum_other中大于0的像素对
scores_mask = (torch.lt(gradsum_target, 0) &
torch.gt(gradsum_other, 0))
# 除去使用过的像素
scores_mask &= self.and_pair(search_space.ne(0), dim_x)
# 除去像素对是相同的情况,也就是对角线
scores_mask[:, range(dim_x), range(dim_x)] = 0
# valid表示每一个adv_image中是否有符合要求的点对
valid = torch.any(scores_mask.view(-1, dim_x * dim_x), dim=1)
# 计算每一对像素对的得分,即论文中的alpha*beta
scores = scores_mask.float() * (-gradsum_target * gradsum_other)
# 得到得分最大的像素对的下标
best = torch.max(scores.view(-1, dim_x * dim_x), 1)[1]
# 获得下标除以dim_x的余数,作为p1
p1 = torch.remainder(best, dim_x)
# 计算下标整除dix_x,作为p2
p2 = (best / dim_x).long()
return p1, p2, valid
# 更新图像
def modify_adv_images(self, adv_images, batch_size, cond, p1, p2):
ori_shape = adv_images.shape
adv_images = adv_images.view(batch_size, -1)
# 更新像素点,即加上theta
for idx in range(batch_size):
if cond[idx] != 0:
adv_images[idx, p1[idx]] += self.theta
adv_images[idx, p2[idx]] += self.theta
# 裁剪图像,防止超出[0, 1]范围
adv_images = torch.clamp(adv_images, min=0, max=1)
adv_images = adv_images.view(ori_shape)
return adv_images
# 更新搜索空间,即将对应的像素位置的值置为0
def update_search_space(self, search_space, p1, p2, cond):
for idx in range(len(cond)):
if cond[idx] != 0:
search_space[idx, p1[idx]] -= 1
search_space[idx, p2[idx]] -= 1
# images为一个批次中的图片,labels为攻击的目标标签
def forward(self, images, labels):
r"""
Overridden.
"""
self._check_inputs(images)
images = images.clone().detach().to(self.device)
labels = labels.clone().detach().to(self.device)
adv_images = images
batch_size = images.shape[0]
dim_x = int(torch.prod(torch.tensor(images.shape[1:]))) # 计算图片中总像素数
max_iters = int(dim_x * self.gamma / 2) # 最大迭代次数
search_space = images.new_ones(batch_size, dim_x, dtype=int) # 搜索空间,大小为batch_size*dim_x初始全为1
current_step = 0 # 当前迭代次数
adv_pred = torch.argmax(self.get_logits(adv_images), 1) # 找到每张图片的预测标签
# Algorithm 1
while (torch.any(labels != adv_pred) and current_step < max_iters):
# 计算前向梯度
grads_target, grads_other = self.compute_forward_derivative(adv_images, labels)
# Algorithm 3
# 得到像素对(p1, p2),valid表示批次中的每张图是否存在符合要求的(p1, p2)
p1, p2, valid = self.saliency_map(search_space, grads_target, grads_other, labels)
# cond中为True的样本需要更新像素点
cond = (labels != adv_pred) & valid
self.update_search_space(search_space, p1, p2, cond) # 更新搜索空间
# 更新图像
adv_images = self.modify_adv_images( adv_images, batch_size, cond, p1, p2)
adv_pred = torch.argmax(self.get_logits(adv_images), 1)
current_step += 1
# 裁剪图像,防止超出[0, 1]范围
adv_images = torch.clamp(adv_images, min=0, max=1)
return adv_images