2017年,AI对抗攻防迎来首个算法开源平台CleverHans——“聪明的汉斯”。
Cleverhans——A Python library to benchmark machine learning systems’ vulnerability to adversarial examples
由Ian Goodfellow和其团队开发并开源,Cleverhans平台的攻防框架,将攻防算法模块化,全球研究者能在这一平台上,快速研发不同的对抗样本生成算法和防御算法。
github链接: https://github.com/cleverhans-lab/cleverhans
看一下目录结构:
cleverhans/
: attack implementations(攻击的实现)cleverhans_v3.1.0/
:3.1.0版本tutorials/
:scripts demonstrating the features of CleverHans(演示,教程)defenses/
:defense implementations.(防御的实现)每个文件夹都由三种框架实现,pytorch,tensorflow,JAX
安装:
pip install cleverhans # 目前最新是 4.0.0 版本
首先以官方的**tutorial/**为例
代码路径为/tutorials/torch/cifar10_tutorial.py
1.导入包
from absl import app, flags
from easydict import EasyDict
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from cleverhans.torch.attacks.fast_gradient_method import fast_gradient_method
from cleverhans.torch.attacks.projected_gradient_descent import (
projected_gradient_descent,
)
FLAGS = flags.FLAGS
其中fgsm和pgd已经写好,直接调用!
2.定义一个CNN网络模型
class CNN(torch.nn.Module):
"""Basic CNN architecture."""
def __init__(self, in_channels=1):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(in_channels, 64, 8, 1)
self.conv2 = nn.Conv2d(64, 128, 6, 2)
self.conv3 = nn.Conv2d(128, 128, 5, 2)
self.fc = nn.Linear(128 * 3 * 3, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = F.relu(self.conv3(x))
x = x.view(-1, 128 * 3 * 3)
x = self.fc(x)
return x
网络模型应该比较清晰,先经过一个卷积层,再经过一个relu 激活层,以此重复三次,最后经过一个线性层。
3.数据的获取
def ld_cifar10():
"""Load training and test data."""
train_transforms = torchvision.transforms.Compose(
[torchvision.transforms.ToTensor()]
)
test_transforms = torchvision.transforms.Compose(
[torchvision.transforms.ToTensor()]
)
train_dataset = torchvision.datasets.CIFAR10(
root="/tmp/data", train=True, transform=train_transforms, download=True
)
test_dataset = torchvision.datasets.CIFAR10(
root="/tmp/data", train=False, transform=test_transforms, download=True
)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=128, shuffle=True, num_workers=2
)
test_loader = torch.utils.data.DataLoader(
test_dataset, batch_size=128, shuffle=False, num_workers=2
)
return EasyDict(train=train_loader, test=test_loader)
主要是transform变换(只用到了ToTensor),获取cifar10数据集,并加载到了DataLoader中,batch_size设置为128。最后返回的是训练数据和测试数据的EasyDict(EasyDict可以像访问属性一样访问dict里的变量)
4.主函数(开始攻击)
4.1 先加载数据,定义loss函数,optimizer优化器
def main():
# Load training and test data
data = ld_cifar10()
# Instantiate model, loss, and optimizer for training
net = CNN(in_channels=3)
device = "cuda" if torch.cuda.is_available() else "cpu"
if device == "cuda":
net = net.cuda()
loss_fn = torch.nn.CrossEntropyLoss(reduction="mean")
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
# Train vanilla model
net.train()
4.2 开始训练模型,输出loss大小
for epoch in range(1, FLAGS.nb_epochs + 1):
train_loss = 0.0
for x, y in data.train:
x, y = x.to(device), y.to(device)
if FLAGS.adv_train:
# Replace clean example with adversarial example for adversarial training
x = projected_gradient_descent(net, x, FLAGS.eps, 0.01, 40, np.inf)
optimizer.zero_grad()
loss = loss_fn(net(x), y)
loss.backward()
optimizer.step()
train_loss += loss.item()
print(
"epoch: {}/{}, train loss: {:.3f}".format(
epoch, FLAGS.nb_epochs, train_loss
)
)
4.3 以FGSM和PGD为例,生成对抗样本,与干净样本对比正确率
# Evaluate on clean and adversarial data
net.eval()
report = EasyDict(nb_test=0, correct=0, correct_fgm=0, correct_pgd=0)
for x, y in data.test:
x, y = x.to(device), y.to(device)
x_fgm = fast_gradient_method(net, x, FLAGS.eps, np.inf)
x_pgd = projected_gradient_descent(net, x, FLAGS.eps, 0.01, 40, np.inf)
# model prediction on clean examples
_, y_pred = net(x).max(1)
# model prediction on FGM adversarial examples
_, y_pred_fgm = net(x_fgm).max(1)
# model prediction on PGD adversarial examples
_, y_pred_pgd = net(x_pgd).max(1)
report.nb_test += y.size(0)
report.correct += y_pred.eq(y).sum().item()
report.correct_fgm += y_pred_fgm.eq(y).sum().item()
report.correct_pgd += y_pred_pgd.eq(y).sum().item()
print(
"test acc on clean examples (%): {:.3f}".format(
report.correct / report.nb_test * 100.0
)
)
print(
"test acc on FGM adversarial examples (%): {:.3f}".format(
report.correct_fgm / report.nb_test * 100.0
)
)
print(
"test acc on PGD adversarial examples (%): {:.3f}".format(
report.correct_pgd / report.nb_test * 100.0
)
)
主要来看一下攻击的参数,以fgsm为例,cleverhans写好的fgsm是这样定义的
def fast_gradient_method(
model_fn,
x ,
eps,
norm,
clip_min=None ,
clip_max=None,
y=None ,
targeted=False ,
sanity_checks=False,)
参数在源代码中有解释
5.程序入口
if __name__ == "__main__":
flags.DEFINE_integer("nb_epochs", 8, "Number of epochs.")
flags.DEFINE_float("eps", 0.3, "Total epsilon for FGM and PGD attacks.")
flags.DEFINE_bool(
"adv_train", False, "Use adversarial training (on PGD adversarial examples)."
)
app.run(main)
6.实验结果