从随机噪声生成MNIST手写字符,这个简单的GAN任务在目前主流的深度学习框架上都可以轻而易举地实现,在MindSpore平台上我们将它当做一个简单的GAN示例。
关键细节1:定义两个Optimizer,分别用来更新生成器和判别器
- # opt
- gen_opt = Momentum(params=gen_network.trainable_params(),
- learning_rate=Tensor(lr_schedule),
- momentum=args.momentum,
- weight_decay=0,
- loss_scale=args.loss_scale,
- decay_filter=default_wd_filter)
-
- dis_opt = Momentum(params=dis_network.trainable_params(),
- learning_rate=Tensor(lr_schedule),
- momentum=args.momentum,
- weight_decay=0,
- loss_scale=args.loss_scale,
- decay_filter=default_wd_filter)
关键细节2:定义两个TrainOneStepCell用来计算梯度
判别器的TrainOneStepCell比较简单,训练时将 (生成图片,0) 和 (真实图片,1) 送进去训练,就可以了。
- class TrainOneStepCellDIS(Cell):
- def __init__(self, network, optimizer, sens=1.0):
- super(TrainOneStepCellDIS, self).__init__(auto_prefix=False)
- self.network = network
- self.weights = ParameterTuple(network.trainable_params())
- self.optimizer = optimizer
- self.grad = C.GradOperation('grad', get_by_list=True, sens_param=True)
- self.sens = sens
- self.reducer_flag = False
- self.grad_reducer = None
- parallel_mode = _get_parallel_mode()
- if parallel_mode in (ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL):
- self.reducer_flag = True
- if self.reducer_flag:
- mean = _get_mirror_mean()
- degree = _get_device_num()
- self.grad_reducer = DistributedGradReducer(optimizer.parameters, mean, degree)
-
- def construct(self, loss, img, label):
- sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)
- grads = self.grad(self.network, self.weights)(img, label, sens)
-
- if self.reducer_flag:
- # apply grad reducer on grads
- grads = self.grad_reducer(grads)
-
- return F.depend(loss, self.optimizer(grads))
生成器的训练比较麻烦一点,因为生成器自己是不能产生Loss的,必须将结果送入判别器才能产生Loss,在更新时只更新生成器的参数。在构造生成器的TrainOneStepCell时,需要将判别器网络也传进来,计算出对于Input的梯度。这个Input,实际上就是生成器的Output。用这个梯度向前传播,去更新生成器的参数。注意,训练时将 (生成图片,1)送进去训练。
- class TrainOneStepCellGEN(Cell):
- def __init__(self, network, optimizer, postnetwork, sens=3.0):
- super(TrainOneStepCellGEN, self).__init__(auto_prefix=False)
- self.network = network
- self.postnetwork = postnetwork
- self.weights = ParameterTuple(network.trainable_params())
- self.postweights = ParameterTuple(postnetwork.trainable_params())
- self.optimizer = optimizer
- self.grad = C.GradOperation('grad', get_by_list=True, sens_param=True)
- self.postgrad = C.GradOperation('grad', get_all=True, get_by_list=True, sens_param=True)
- self.sens = sens
- self.reducer_flag = False
- self.grad_reducer = None
- parallel_mode = _get_parallel_mode()
-
- if parallel_mode in (ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL):
- self.reducer_flag = True
-
- if self.reducer_flag:
- mean = _get_mirror_mean()
- degree = _get_device_num()
- self.grad_reducer = DistributedGradReducer(optimizer.parameters, mean, degree)
-
- self.cast = P.Cast()
- self.print = P.Print()
-
- def construct(self, loss, z, fake_img, inverse_fake_label):
- sens_d = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)
- grads_d = self.postgrad(self.postnetwork, self.postweights)(fake_img, inverse_fake_label, sens_d)
- sens_g = grads_d[0][0]
- grads_g = self.grad(self.network, self.weights)(z, sens_g)
-
- if self.reducer_flag:
- # apply grad reducer on grads
- grads_g = self.grad_reducer(grads_g)
-
- return F.depend(loss, self.optimizer(grads_g))
需要注意的是,静态图中命名一致的变量就是一个变量。为了保证TrainOneStepCellDis和TrainOneStepCellGen中的判别器网络参数是一致的,一定要把auto_prefix设为False。这样在TrainOneStepCell的命名空间中,不会为参数增加新的前缀,就可以实现权重共享。
网络训练中有时候需要共享某几个层的权值,前向的时候使用同一个权重,反向的时候只更新一次。
示例 网络里多次用到了一个Conv+ReLU的结构,希望这些结构共享一个权值:
1.PyTorch实现
- conv1x1 = nn.Conv2d(16, 16, 1, has_bias=True)
- self.predict_conv_relu = nn.SequentialCell([conv1x1, nn.ReLU()])
- self.predict_conv_relu2 = nn.SequentialCell([conv1x1, nn.ReLU()])
- self.predict_conv_relu3 = nn.SequentialCell([conv1x1, nn.ReLU()])
PyTorch中,如果传入Sequential的模块是同一个Module实例的话参数就是共享的
2.MindSpore实现 先初始化好predict_conv_relu的权值,然后对要共享的其他层做赋值
- self.predict_conv_relu2[0].weight = self.predict_conv_relu[0].weight
- self.predict_conv_relu2[0].bias = self.predict_conv_relu[0].bias
- self.predict_conv_relu3[0].weight = self.predict_conv_relu[0].weight
- self.predict_conv_relu3[0].bias = self.predict_conv_relu[0].bias
在网络训练前向过程中有需求修改卷积的权值,比如做一个Mask操作
通过在init中取出某个Conv的权值,然后在construct中进行修改
- class MaskedConv2d(nn.Cell):
- def __init__(self, in_channels, out_channels, kernel_size):
- super(MaskedConv2d, self).__init__()
- self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, weight_init='ones')
- self.conv_p = self.conv.conv2d
- self.p = ParameterTuple((self.conv.weight,))
- self.mask = np.ones_like(self.conv.weight.data.asnumpy())
- self.mask[:,:,5//2,5//2:] = 0
- self.mask[:,:,5//2+1:]=0
- self.mask = Tensor(self.mask)
- self.mul = P.Mul()
- self.filter = np.ones_like(self.conv.weight.data.asnumpy())
- self.filter = Tensor(self.filter)
-
- def construct(self, x):
- filter = self.mul(self.p[0], self.mask)
- P.Assign()(self.p[0], filter)
- update_weight = self.p[0] * 1
- return self.conv_p(x, update_weight)
-
-
- class Context(nn.Cell):
- def __init__(self, N=3):
- super(Context, self).__init__()
- self.mask_conv = MaskedConv2d(N, N*2, kernel_size=5)
-
- def construct(self, x):
- x = self.mask_conv(x)
- return x