• 【MindSpore易点通】网络构建经验总结下篇


    MindSpore实现梯度不回传以及梯度回传后不更新权重

    背景信息

    训练中经常会用到某层的梯度不回传(比如互学习)或者梯度回传但是不更新权重(Fine-tuning

    经验总结

    1. 梯度不回传使用stop_gradient接口实现,代码示例如下:
    1. import mindspore.nn as nnfrom mindspore.ops
    2. import operations as Pfrom mindspore.ops
    3. import functional as Ffrom mindspore.nn.loss.loss
    4. import _Lossfrom mindspore
    5. import Tensor, Parameterfrom mindspore.common
    6. import dtype as mstypefrom mindspore.ops.functional
    7. import stop_gradient
    8. class Contrastive(_Loss):
    9.     def __init__(self, args):
    10.         super(Contrastive, self).__init__()
    11.         self.args = args
    12.         self.stride_slice = P.StridedSlice()
    13.         self.pow = P.Pow()
    14.         self.sum = P.CumSum()
    15.         self.dist_weight = Tensor(4, dtype=mstype.float32)
    16.         emb_list = list(range(args.per_batch_size))
    17.         emb1_list = emb_list[0::2]
    18.         emb2_list = emb_list[1::2]
    19.         self.emb1_param = Tensor(emb1_list, dtype=mstype.int32)
    20.         self.emb2_param = Tensor(emb2_list, dtype=mstype.int32)
    21.         self.add = P.TensorAdd()
    22.         self.div = P.RealDiv()
    23.         self.cast = P.Cast()
    24.         self.gatherv2 = P.GatherV2()
    25.     def construct(self, nembeddings):
    26.         nembeddings_shape = F.shape(nembeddings)
    27.         emb1 = self.gatherv2(nembeddings, self.emb1_param, 0)
    28.         emb2 = self.gatherv2(nembeddings, self.emb2_param, 0)
    29.         emb2_detach = stop_gradient(emb2)      //阻止emb2的梯度回传
    30.         emb3 = emb1 - emb2_detach
    31.         pow_emb3 = emb3 * emb3
    32.         dist = self.sum(pow_emb3, 1)
    33.         return self.div(dist*self.dist_weight, self.cast(F.scalar_to_array(nembeddings_shape[0]), mstype.float32))
    1. 梯度回传后不更新权重,使用requires_grad=False来实现,代码示例如下(假设要把名字为conv1的层权重冻结):
    1. for param in net.trainable_params():
    2.     if 'conv1' in param.name:
    3.         param.requires_grad = False
    4.     else:
    5.         param.requires_grad = True

    MindSpore中使用Loss Scale(Feed模式下)关于sens参数的配置

    背景信息

    D芯片的卷积只有FP16精度,所以用D芯片训练一定是在跑混合精度。为避免梯度下溢,需要使用Loss Scale。

    经验总结

    Feed模式流程下,接口中Optimizer和TrainOneStepCell的sens需要手动设置成同一数值

    1. opt = nn.Momentum(params=train_net.trainable_params(),
    2.                   learning_rate=lr_iter,
    3.                   momentum=0.9,
    4.                   weight_decay=0.0001,
    5.                   loss_scale=1000.0)
    6. train_net = TrainOneStepCell(train_net, opt, sens=1000.0)

    MindSpore中使用SequentialCell的输入必须为nn.Cell组成的List

    背景信息

    PyTorch在网络定义中经常使用torch.nn.Sequential来构造算子的列表,在MindSpore中要使用mindspore.nn.SequentialCell来实现这个功能。

    经验总结

    mindspore.nn.SequentialCell的输入和PyTorch的Sequential有所不同,输入必须为Cell组成的List,否则会有不符合预期的错误。 使用示例如下:

    1. class MyNet(nn.Cell):
    2.     def __init__(self):
    3.         super(MyNet, self).__init__()
    4.         self.conv = nn.Conv2d(16, 64, 3, pad_mode='pad', padding=0, dilation=2)
    5.         self.bn = nn.BatchNorm2d(64)
    6.         self.relu = nn.ReLU()
    7.         self.seq = nn.SequentialCell([self.conv, self.bn, self.relu])   #这里必须把nn.Cell的对象包装为List作为SequentialCell的输入
    8.     def construct(self, x):
    9.         x = self.seq(x)
    10.         return x

    Transformer中Positional Encoding的MindSpore简单实现

    背景信息

    《Attention Is All You Need》中的位置编码方法,Transformer中较为常用。公式如下:

    经验总结

    为了适用于动态shape的输入,又由于mindspore.nn.Cell.construct中不便于进行numpy操作,采用先生成一个足够长的positional encodding向量再根据输入长度进行截取的方法。

    1. import mindspore.ops.operations as Pimport mindspore.nn as nnfrom mindspore.common
    2. import dtype as mstypefrom mindspore import Tensorimport numpy as npimport math
    3. class PositionalEncoding(nn.Cell):
    4.     """Positional encoding as in Sec 3.5 https://arxiv.org/pdf/1706.03762.pdf
    5.     :param int dim: dimension of input
    6.     :param int maxlen: upper limit of sequence length
    7.     :param float dropout_rate: dropout rate
    8.     """
    9.     def __init__(self, dim, maxlen=10000, dropout_rate=0.1):
    10.         """Construct an PositionalEncoding object."""
    11.         super(PositionalEncoding, self).__init__()
    12.         xscale = math.sqrt(dim)
    13.         self.dropout = nn.Dropout(1 - dropout_rate)
    14.         self.mul = P.Mul()
    15.         self.add = P.TensorAdd()
    16.         self.shape = P.Shape()
    17.         self.pe = self.postion_encoding_table(maxlen, dim)
    18.         self.te = Tensor([xscale, ], mstype.float32)
    19.     def construct(self, x):
    20.         """
    21.         Add positional encoding
    22.         :param mindspore.Tensor x: batches of inputs (B, len, dim)
    23.         :return: Encoded x (B, len, dim)
    24.         """
    25.         (_, l, _) = self.shape(x)
    26.         pos = self.pe[:, :l, :]
    27.         x = self.mul(x, self.te)
    28.         x = self.add(x, pos)
    29.         x = self.dropout(x)
    30.         return x
    31.     def postion_encoding_table(self, max_length, dims):
    32.         pe = np.zeros((max_length, dims))
    33.         position = np.arange(0, max_length).reshape((max_length, 1))
    34.         div_term = np.exp(np.arange(0, dims, 2) * (-(math.log(10000.0) / dims)))
    35.         div_term = div_term.reshape((1, div_term.shape[0]))
    36.         pe[:, 0::2] = np.sin(np.matmul(position, div_term))
    37.         pe[:, 1::2] = np.cos(np.matmul(position, div_term))
    38.         pe = pe.reshape((1, max_length, dims))
    39.         pe = Tensor(pe, mstype.float32)
    40.         return pe

     

  • 相关阅读:
    在指定目录下建立conda虚拟环境后发现没有环境名(激活失败)的解决办法
    计算机毕业设计Python+django 宠物领养中心小程序(源码+系统+mysql数据库+Lw文档)
    面试官:来说说vue3是怎么处理内置的v-for、v-model等指令?
    用强化学习玩《超级马里奥》
    对接谷歌翻译接口的WordPressSEO插件
    智慧交通,迎来产业谍战丨产业特稿
    Java——Spring面向切面编程(详解AOP和OOP的区别)
    【C语言】数组和指针刷题练习
    pycharm远程连接Linux服务器
    Java数据结构与算法学习笔记
  • 原文地址:https://blog.csdn.net/Kenji_Shinji/article/details/127582971