• 技术干货|昇思MindSpore NLP模型迁移之LUKE模型——阅读理解任务


    LUKE作者提出的模型是在BERT的MLM的基础上使用一种新的预训练任务来训练的。这项任务涉及到预测从维基百科的大型实体注释语料库中检索出来的随机隐藏的单词和实体。作者还提出了一种实体感知的自我注意机制,它是transformer中的自我注意机制的扩展,并在计算注意力分数时考虑了token(单词或实体)的类型。主要贡献有:

    1、作者提出了一种新的专门用于处理与实体相关的任务的上下文表示方法LUKE(Language Understanding with Knowledge-based Embeddings)。LUKE利用从维基百科中获得的大量实体注释语料库,预测随机mask的单词和实体。 

    2、作者提出了一种实体感知的自我注意机制,它是对transformer原有的注意机制的有效扩展,该机制在计算注意力分数时考虑到了标记(单词或实体)的类型。 

    3、LUKE是以Roberta作为基础的预训练模型,并通过同时优化MLM的目标和我们提出的任务进行模型的预训练。并在5个流行的数据集上获得了最先进的结果。

    luke官方源码[1]:

    https://github.com/studio-ousia/luke

    luke论文(EMNLP2020)[2]:

    https://aclanthology.org/2020.emnlp-main.523.pdf

    前言

    本文环境:

    系统:ubuntu18

    GPU:3090

    MindSpore版本:1.3

    数据集:SQuAD1.1(阅读理解任务)

    阅读理解任务定义:

    机器阅读理解为QA问答技术中的全新领域,允许用户输入非结构化文本以及问题,机器在阅读理解基础上,从文本中寻找答案回答用户问题。

     01 

    数据处理

    先参考源码将数据处理为模型需要的Mindrecord文件。

    先参考源码使用roberta_tokenizer将文本转化为token,得到word_ids,word_segment_ids等多个字段。其中word_ids部分padding补全需要补1,entity_position_ids实体位置信息需要补-1,其余的字段都为补0。由于源码中是padding长度至256,故这里也参考源码对各个字段进行padding,具体padding代码如下:

    1. import json
    2. from mindspore.mindrecord import FileWriter
    3. SQUAD_MINDRECORD_FILE = "./data/dev_features.mindrecord"
    4. pad = lambda a,i : a[0:i] if len(a) > i else a + [0] * (i-len(a))
    5. pad1 = lambda a,i : a[0:i] if len(a) > i else a + [1] * (i-len(a))
    6. pad_entity = lambda a,i : a[0:i] if len(a) > i else np.append(a,[-1] * (i-len(a)))
    7. #list_dict 为tokenizer之后的token list包含word_ids,word_segment_ids等多个字段
    8. # padding
    9. for slist in list_dict:
    10. slist["entity_attention_mask"] = pad(slist["entity_attention_mask"], 24)
    11. slist["entity_ids"] = pad(slist["entity_attention_mask"], 24)
    12. slist["entity_segment_ids"] = pad(slist["entity_segment_ids"], 24)
    13. slist["word_ids"] = pad1(slist["word_ids"], 256)
    14. slist["word_segment_ids"] = pad(slist["word_segment_ids"], 256)
    15. slist["word_attention_mask"] = pad(slist["word_attention_mask"], 256)
    16. # entity_position padding
    17. entity_size = len(slist["entity_position_ids"])
    18. slist["entity_position_ids"] = np.array(slist["entity_position_ids"])
    19. temp = [[-1]*24 for i in range(24)]
    20. for i in range(24):
    21. if i < entity_size-1:
    22. temp[i]=(pad_entity(slist["entity_position_ids"][i], 24))
    23. slist["entity_position_ids"] =temp
    24. if os.path.exists(SQUAD_MINDRECORD_FILE):
    25. os.remove(SQUAD_MINDRECORD_FILE)
    26. os.remove(SQUAD_MINDRECORD_FILE + ".db")
    27. writer = FileWriter(file_name=SQUAD_MINDRECORD_FILE, shard_num=1)
    28. data_schema = {
    29. "unique_id": {"type": "int32", "shape": [-1]},
    30. "word_ids": {"type": "int32", "shape": [-1]},
    31. "word_segment_ids": {"type": "int32", "shape": [-1]},
    32. "word_attention_mask": {"type": "int32", "shape": [-1]},
    33. "entity_ids": {"type": "int32", "shape": [-1]},
    34. "entity_position_ids": {"type": "int32", "shape": [24,24]},
    35. "entity_segment_ids": {"type": "int32", "shape": [-1]},
    36. "entity_attention_mask": {"type": "int32", "shape": [-1]},
    37. #"start_positions": {"type": "int32", "shape": [-1]},
    38. #"end_positions": {"type": "int32", "shape": [-1]}
    39. }
    40. writer.add_schema(data_schema, "it is a preprocessed squad dataset")
    41. data = []
    42. i = 0
    43. for item in list_dict:
    44. i += 1
    45. sample = {
    46. "unique_id": np.array(item["unique_id"], dtype=np.int32),
    47. "word_ids": np.array(item["word_ids"], dtype=np.int32),
    48. "word_segment_ids": np.array(item["word_segment_ids"], dtype=np.int32),
    49. "word_attention_mask": np.array(item["word_attention_mask"], dtype=np.int32),
    50. "entity_ids": np.array(item["entity_ids"], dtype=np.int32),
    51. "entity_position_ids": np.array(item["entity_position_ids"], dtype=np.int32),
    52. "entity_segment_ids": np.array(item["entity_segment_ids"], dtype=np.int32),
    53. "entity_attention_mask": np.array(item["entity_attention_mask"], dtype=np.int32),
    54. #"start_positions": np.array(item["start_positions"], dtype=np.int32),
    55. #"end_positions": np.array(item["end_positions"], dtype=np.int32),
    56. }
    57. data.append(sample)
    58. #print(sample)
    59. if i % 10 == 0:
    60. writer.write_raw_data(data)
    61. data = []
    62. if data:
    63. writer.write_raw_data(data)
    64. writer.commit()

    具体数据样例如下:

     02 

    LUKE模型迁移

    主要可以参考官网的API映射的文档进行改写。

    链接:https://www.mindspore.cn/docs/migration_guide/zh-CN/r1.5/api_mapping/pytorch_api_mapping.html

    2.1 实体嵌入部分

    例如Entity_Embedding部分,改写前pytorch源码:

    1. class EntityEmbeddings(nn.Module):
    2. def __init__(self, config: LukeConfig):
    3. super(EntityEmbeddings, self).__init__()
    4. self.config = config
    5. self.entity_embeddings = nn.Embedding(config.entity_vocab_size, config.entity_emb_size, padding_idx=0)
    6. if config.entity_emb_size != config.hidden_size:
    7. self.entity_embedding_dense = nn.Linear(config.entity_emb_size, config.hidden_size, bias=False)
    8. self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size)
    9. self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
    10. self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
    11. self.dropout = nn.Dropout(config.hidden_dropout_prob)
    12. def forward(
    13. self, entity_ids: torch.LongTensor, position_ids: torch.LongTensor, token_type_ids: torch.LongTensor = None
    14. ):
    15. if token_type_ids is None:
    16. token_type_ids = torch.zeros_like(entity_ids)
    17. entity_embeddings = self.entity_embeddings(entity_ids)
    18. if self.config.entity_emb_size != self.config.hidden_size:
    19. entity_embeddings = self.entity_embedding_dense(entity_embeddings)
    20. position_embeddings = self.position_embeddings(position_ids.clamp(min=0))
    21. position_embedding_mask = (position_ids != -1).type_as(position_embeddings).unsqueeze(-1)
    22. position_embeddings = position_embeddings * position_embedding_mask
    23. position_embeddings = torch.sum(position_embeddings, dim=-2)
    24. position_embeddings = position_embeddings / position_embedding_mask.sum(dim=-2).clamp(min=1e-7)
    25. token_type_embeddings = self.token_type_embeddings(token_type_ids)
    26. embeddings = entity_embeddings + position_embeddings + token_type_embeddings
    27. embeddings = self.LayerNorm(embeddings)
    28. embeddings = self.dropout(embeddings)
    29. return embeddings

    改写后:

    1. class EntityEmbeddings(nn.Cell):
    2. """entity embeddings for luke model"""
    3. def __init__(self, config):
    4. super(EntityEmbeddings, self).__init__()
    5. self.entity_emb_size = config.entity_emb_size
    6. self.hidden_size = config.hidden_size
    7. self.entity_embeddings = nn.Embedding(config.entity_vocab_size, config.entity_emb_size, padding_idx=0)
    8. if config.entity_emb_size != config.hidden_size:
    9. self.entity_embedding_dense = nn.Dense(config.entity_emb_size, config.hidden_size, has_bias=False)
    10. self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size)
    11. self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
    12. self.layer_norm = nn.LayerNorm([config.hidden_size], epsilon=config.layer_norm_eps)
    13. self.dropout = nn.Dropout(config.hidden_dropout_prob)
    14. self.unsqueezee = ops.ExpandDims()
    15. def construct(self, entity_ids, position_ids, token_type_ids=None):
    16. """EntityEmbeddings for luke"""
    17. if token_type_ids is None:
    18. token_type_ids = ops.zeros_like(entity_ids)
    19. entity_embeddings = self.entity_embeddings(entity_ids)
    20. if self.entity_emb_size != self.hidden_size:
    21. entity_embeddings = self.entity_embedding_dense(entity_embeddings)
    22. entity_position_ids_int = clamp(position_ids)
    23. position_embeddings = self.position_embeddings(entity_position_ids_int)
    24. position_ids = position_ids.astype(mstype.int32)
    25. position_embedding_mask = 1.0 * self.unsqueezee((position_ids != -1), -1)
    26. position_embeddings = position_embeddings * position_embedding_mask
    27. position_embeddings = ops.reduce_sum(position_embeddings, -2)
    28. position_embeddings = position_embeddings / clamp(ops.reduce_sum(position_embedding_mask, -2), minimum=1e-7)
    29. token_type_embeddings = self.token_type_embeddings(token_type_ids)
    30. embeddings = entity_embeddings + position_embeddings
    31. embeddings += token_type_embeddings
    32. embeddings = self.layer_norm(embeddings)
    33. embeddings = self.dropout(embeddings)
    34. return embeddings
    35. def clamp(x, minimum=0):
    36. mask = x > minimum
    37. x = x * mask + minimum
    38. return x

     源码中模型的参数为:

    1. LukeConfig {
    2. "architectures": null,
    3. "attention_probs_dropout_prob": 0.1,
    4. "bert_model_name": "roberta-large",
    5. "bos_token_id": 0,
    6. "do_sample": false,
    7. "entity_emb_size": 256,
    8. "entity_vocab_size": 500000,
    9. "eos_token_ids": 0,
    10. "finetuning_task": null,
    11. "hidden_act": "gelu",
    12. "hidden_dropout_prob": 0.1,
    13. "hidden_size": 1024,
    14. "id2label": {
    15. "0": "LABEL_0",
    16. "1": "LABEL_1"
    17. },
    18. "initializer_range": 0.02,
    19. "intermediate_size": 4096,
    20. "is_decoder": false,
    21. "label2id": {
    22. "LABEL_0": 0,
    23. "LABEL_1": 1
    24. },
    25. "layer_norm_eps": 1e-05,
    26. "length_penalty": 1.0,
    27. "max_length": 20,
    28. "max_position_embeddings": 514,
    29. "model_type": "bert",
    30. "num_attention_heads": 16,
    31. "num_beams": 1,
    32. "num_hidden_layers": 24,
    33. "num_labels": 2,
    34. "num_return_sequences": 1,
    35. "output_attentions": false,
    36. "output_hidden_states": false,
    37. "output_past": true,
    38. "pad_token_id": 0,
    39. "pruned_heads": {},
    40. "repetition_penalty": 1.0,
    41. "temperature": 1.0,
    42. "top_k": 50,
    43. "top_p": 1.0,
    44. "torchscript": false,
    45. "type_vocab_size": 1,
    46. "use_bfloat16": false,
    47. "vocab_size": 50265
    48. }

    2.2实体感知自注意力机制

    输入序列为

    ,其中,输出的序列为,其中输出向量在输入向量转换后加权求和来计算。在该模型中,每个输入和输出都与一个token相关联,一个token可以是一个单词或实体。在一个有个单词和个实体的序列中,设,第i个输出向量计算过程如下:

    其中 

     ,   分别作为query、key和value矩阵,参数L为注意力头的数目,D为隐藏维度,后文提到该实验L=64,D=1024。

    由于LUKE处理两个token的类别,因此论文假设当计算注意力分数(

     )的时候,使用目标token类别的信息是有益的。在这种思路下,作者通过引入实体感知的查询(query)机制来加强类别信息的提取,该机制对每个可能的token对, 使用不同的查询矩阵。最终注意力分数( )的计算如下:

    具体MindSpore实现,如下:

    1. class EntityAwareSelfAttention(nn.Cell):
    2. """EntityAwareSelfAttention"""
    3. def __init__(self, config):
    4. super(EntityAwareSelfAttention, self).__init__()
    5. self.num_attention_heads = config.num_attention_heads
    6. self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
    7. self.all_head_size = self.num_attention_heads * self.attention_head_size
    8. self.query = nn.Dense(config.hidden_size, self.all_head_size)
    9. self.key = nn.Dense(config.hidden_size, self.all_head_size)
    10. self.value = nn.Dense(config.hidden_size, self.all_head_size)
    11. self.w2e_query = nn.Dense(config.hidden_size, self.all_head_size)
    12. self.e2w_query = nn.Dense(config.hidden_size, self.all_head_size)
    13. self.e2e_query = nn.Dense(config.hidden_size, self.all_head_size)
    14. # dropout需要 (1 - config.attention_probs_dropout_prob)
    15. self.dropout = nn.Dropout(1 - config.attention_probs_dropout_prob)
    16. self.concat = ops.Concat(1)
    17. self.concat2 = ops.Concat(2)
    18. self.concat3 = ops.Concat(3)
    19. self.sotfmax = ops.Softmax()
    20. self.shape = ops.Shape()
    21. self.reshape = ops.Reshape()
    22. self.transpose = ops.Transpose()
    23. self.softmax = ops.Softmax(axis=-1)
    24. def transpose_for_scores(self, x):
    25. new_x_shape = ops.shape(x)[:-1] + (self.num_attention_heads, self.attention_head_size)
    26. out = self.reshape(x, new_x_shape)
    27. out = self.transpose(out, (0, 2, 1, 3))
    28. return out
    29. def construct(self, word_hidden_states, entity_hidden_states, attention_mask):
    30. """EntityAwareSelfAttention construct"""
    31. word_size = self.shape(word_hidden_states)[1]
    32. w2w_query_layer = self.transpose_for_scores(self.query(word_hidden_states))
    33. w2e_query_layer = self.transpose_for_scores(self.w2e_query(word_hidden_states))
    34. e2w_query_layer = self.transpose_for_scores(self.e2w_query(entity_hidden_states))
    35. e2e_query_layer = self.transpose_for_scores(self.e2e_query(entity_hidden_states))
    36. key_layer = self.transpose_for_scores(self.key(self.concat([word_hidden_states, entity_hidden_states])))
    37. w2w_key_layer = key_layer[:, :, :word_size, :]
    38. e2w_key_layer = key_layer[:, :, :word_size, :]
    39. w2e_key_layer = key_layer[:, :, word_size:, :]
    40. e2e_key_layer = key_layer[:, :, word_size:, :]
    41. w2w_attention_scores = ops.matmul(w2w_query_layer, ops.transpose(w2w_key_layer, (0, 1, 3, 2)))
    42. w2e_attention_scores = ops.matmul(w2e_query_layer, ops.transpose(w2e_key_layer, (0, 1, 3, 2)))
    43. e2w_attention_scores = ops.matmul(e2w_query_layer, ops.transpose(e2w_key_layer, (0, 1, 3, 2)))
    44. e2e_attention_scores = ops.matmul(e2e_query_layer, ops.transpose(e2e_key_layer, (0, 1, 3, 2)))
    45. word_attention_scores = self.concat3([w2w_attention_scores, w2e_attention_scores])
    46. entity_attention_scores = self.concat3([e2w_attention_scores, e2e_attention_scores])
    47. attention_scores = self.concat2([word_attention_scores, entity_attention_scores])
    48. attention_scores = attention_scores / np.sqrt(self.attention_head_size)
    49. attention_scores = attention_scores + attention_mask
    50. attention_probs = self.softmax(attention_scores)
    51. attention_probs = self.dropout(attention_probs)
    52. value_layer = self.transpose_for_scores(
    53. self.value(self.concat([word_hidden_states, entity_hidden_states]))
    54. )
    55. context_layer = ops.matmul(attention_probs, value_layer)
    56. context_layer = ops.transpose(context_layer, (0, 2, 1, 3))
    57. new_context_layer_shape = ops.shape(context_layer)[:-2] + (self.all_head_size,)
    58. context_layer = self.reshape(context_layer, new_context_layer_shape)
    59. return context_layer[:, :word_size, :], context_layer[:, word_size:, :]

    需要注意的是MindSpore中的dropout与pytorch的dropout不一样。

    例如pytorch中dropout的入参为0.1,则MindSpore的dropout入参应为0.9。

     03 

    权重迁移pytorch->MindSpore

    由于官网已经提供了微调好的权重信息,所以我们尝试直接转换权重进行预测。

    我们先要知道模型权重名称以及形状等,需要pytorch与MindSpore模型一一对应。

    然后在输出MindSpore模型:

    最后写权重转换函数:

    1. ## torch2ms
    2. import os
    3. import collections
    4. from mindspore import log as logger
    5. from mindspore.common.tensor import Tensor
    6. from mindspore.common.initializer import initializer
    7. from mindspore import save_checkpoint
    8. from mindspore import Parameter
    9. def build_params_map(layer_num=24):
    10. """
    11. build params map from torch's LUKE to mindspore's LUKE
    12. map=> key:value,torch_name:ms_name
    13. 键:torch权重名称,值:mindspore权重名称
    14. :return:
    15. """
    16. weight_map = collections.OrderedDict({
    17. 'embeddings.word_embeddings.weight': "luke.embeddings.word_embeddings.embedding_table",
    18. 'embeddings.position_embeddings.weight': "luke.embeddings.position_embeddings.embedding_table",
    19. 'embeddings.token_type_embeddings.weight': "luke.embeddings.token_type_embeddings.embedding_table",
    20. 'embeddings.LayerNorm.weight': 'luke.embeddings.layer_norm.gamma',
    21. 'embeddings.LayerNorm.bias': 'luke.embeddings.layer_norm.beta',
    22. 'entity_embeddings.entity_embeddings.weight':'luke.entity_embeddings.entity_embeddings.embedding_table',
    23. 'entity_embeddings.entity_embedding_dense.weight':'luke.entity_embeddings.entity_embedding_dense.weight',
    24. 'entity_embeddings.position_embeddings.weight':'luke.entity_embeddings.position_embeddings.embedding_table',
    25. 'entity_embeddings.token_type_embeddings.weight':'luke.entity_embeddings.token_type_embeddings.embedding_table',
    26. 'entity_embeddings.LayerNorm.weight':'luke.entity_embeddings.layer_norm.gamma',
    27. 'entity_embeddings.LayerNorm.bias':'luke.entity_embeddings.layer_norm.beta',
    28. 'qa_outputs.weight':'qa_outputs.weight',
    29. 'qa_outputs.bias':'qa_outputs.bias',
    30. # 'pooler.dense.weight':'pooler.weight',
    31. # 'pooler.dense.bias':'pooler.bias'
    32. })
    33. # add attention layers
    34. for i in range(layer_num):
    35. weight_map[f'encoder.layer.{i}.attention.self.query.weight'] = \
    36. f'luke.encoder.layer.{i}.attention.self_attention.query.weight'
    37. weight_map[f'encoder.layer.{i}.attention.self.query.bias']= \
    38. f'luke.encoder.layer.{i}.attention.self_attention.query.bias'
    39. weight_map[f'encoder.layer.{i}.attention.self.key.weight']= \
    40. f'luke.encoder.layer.{i}.attention.self_attention.key.weight'
    41. weight_map[f'encoder.layer.{i}.attention.self.key.bias']= \
    42. f'luke.encoder.layer.{i}.attention.self_attention.key.bias'
    43. weight_map[f'encoder.layer.{i}.attention.self.value.weight']= \
    44. f'luke.encoder.layer.{i}.attention.self_attention.value.weight'
    45. weight_map[f'encoder.layer.{i}.attention.self.value.bias']= \
    46. f'luke.encoder.layer.{i}.attention.self_attention.value.bias'
    47. weight_map[f'encoder.layer.{i}.attention.self.w2e_query.weight']= \
    48. f'luke.encoder.layer.{i}.attention.self_attention.w2e_query.weight'
    49. weight_map[f'encoder.layer.{i}.attention.self.w2e_query.bias']= \
    50. f'luke.encoder.layer.{i}.attention.self_attention.w2e_query.bias'
    51. weight_map[f'encoder.layer.{i}.attention.self.e2w_query.weight']= \
    52. f'luke.encoder.layer.{i}.attention.self_attention.e2w_query.weight'
    53. weight_map[f'encoder.layer.{i}.attention.self.e2w_query.bias']= \
    54. f'luke.encoder.layer.{i}.attention.self_attention.e2w_query.bias'
    55. weight_map[f'encoder.layer.{i}.attention.self.e2e_query.weight']= \
    56. f'luke.encoder.layer.{i}.attention.self_attention.e2e_query.weight'
    57. weight_map[f'encoder.layer.{i}.attention.self.e2e_query.bias']= \
    58. f'luke.encoder.layer.{i}.attention.self_attention.e2e_query.bias'
    59. weight_map[f'encoder.layer.{i}.attention.output.dense.weight']= \
    60. f'luke.encoder.layer.{i}.attention.output.dense.weight'
    61. weight_map[f'encoder.layer.{i}.attention.output.dense.bias'] = \
    62. f'luke.encoder.layer.{i}.attention.output.dense.bias'
    63. weight_map[f'encoder.layer.{i}.attention.output.LayerNorm.weight'] = \
    64. f'luke.encoder.layer.{i}.attention.output.layernorm.gamma'
    65. weight_map[f'encoder.layer.{i}.attention.output.LayerNorm.bias'] = \
    66. f'luke.encoder.layer.{i}.attention.output.layernorm.beta'
    67. weight_map[f'encoder.layer.{i}.intermediate.dense.weight'] = \
    68. f'luke.encoder.layer.{i}.intermediate.weight'
    69. weight_map[f'encoder.layer.{i}.intermediate.dense.bias'] = \
    70. f'luke.encoder.layer.{i}.intermediate.bias'
    71. weight_map[f'encoder.layer.{i}.output.dense.weight'] = \
    72. f'luke.encoder.layer.{i}.output.dense.weight'
    73. weight_map[f'encoder.layer.{i}.output.dense.bias'] = \
    74. f'luke.encoder.layer.{i}.output.dense.bias'
    75. weight_map[f'encoder.layer.{i}.output.LayerNorm.weight'] = \
    76. f'luke.encoder.layer.{i}.output.layernorm.gamma'
    77. weight_map[f'encoder.layer.{i}.output.LayerNorm.bias'] = \
    78. f'luke.encoder.layer.{i}.output.layernorm.beta'
    79. # add pooler
    80. # weight_map.update(
    81. # {
    82. # 'pooled_fc.w_0': 'ernie.ernie.dense.weight',
    83. # 'pooled_fc.b_0': 'ernie.ernie.dense.bias',
    84. # 'cls_out_w': 'ernie.dense_1.weight',
    85. # 'cls_out_b': 'ernie.dense_1.bias'
    86. # }
    87. # )
    88. return weight_map
    89. def _update_param(param, new_param):
    90. """Updates param's data from new_param's data."""
    91. if isinstance(param.data, Tensor) and isinstance(new_param.data, Tensor):
    92. if param.data.dtype != new_param.data.dtype:
    93. logger.error("Failed to combine the net and the parameters for param %s.", param.name)
    94. msg = ("Net parameters {} type({}) different from parameter_dict's({})"
    95. .format(param.name, param.data.dtype, new_param.data.dtype))
    96. raise RuntimeError(msg)
    97. if param.data.shape != new_param.data.shape:
    98. if not _special_process_par(param, new_param):
    99. logger.error("Failed to combine the net and the parameters for param %s.", param.name)
    100. msg = ("Net parameters {} shape({}) different from parameter_dict's({})"
    101. .format(param.name, param.data.shape, new_param.data.shape))
    102. raise RuntimeError(msg)
    103. return
    104. param.set_data(new_param.data)
    105. return
    106. if isinstance(param.data, Tensor) and not isinstance(new_param.data, Tensor):
    107. if param.data.shape != (1,) and param.data.shape != ():
    108. logger.error("Failed to combine the net and the parameters for param %s.", param.name)
    109. msg = ("Net parameters {} shape({}) is not (1,), inconsistent with parameter_dict's(scalar)."
    110. .format(param.name, param.data.shape))
    111. raise RuntimeError(msg)
    112. param.set_data(initializer(new_param.data, param.data.shape, param.data.dtype))
    113. elif isinstance(new_param.data, Tensor) and not isinstance(param.data, Tensor):
    114. logger.error("Failed to combine the net and the parameters for param %s.", param.name)
    115. msg = ("Net parameters {} type({}) different from parameter_dict's({})"
    116. .format(param.name, type(param.data), type(new_param.data)))
    117. raise RuntimeError(msg)
    118. else:
    119. param.set_data(type(param.data)(new_param.data))
    120. def _special_process_par(par, new_par):
    121. """
    122. Processes the special condition.
    123. Like (12,2048,1,1)->(12,2048), this case is caused by GE 4 dimensions tensor.
    124. """
    125. par_shape_len = len(par.data.shape)
    126. new_par_shape_len = len(new_par.data.shape)
    127. delta_len = new_par_shape_len - par_shape_len
    128. delta_i = 0
    129. for delta_i in range(delta_len):
    130. if new_par.data.shape[par_shape_len + delta_i] != 1:
    131. break
    132. if delta_i == delta_len - 1:
    133. new_val = new_par.data.asnumpy()
    134. new_val = new_val.reshape(par.data.shape)
    135. par.set_data(Tensor(new_val, par.data.dtype))
    136. return True
    137. return False
    138. def extract_and_convert(torch_model, ms_model):
    139. """extract weights and convert to mindspore"""
    140. print('=' * 20 + 'extract weights' + '=' * 20)
    141. state_dict = []
    142. weight_map = build_params_map(layer_num=24)
    143. for weight_name, weight_value in torch_model.items():
    144. if weight_name not in weight_map.keys():
    145. continue
    146. state_dict.append({'name': weight_map[weight_name], 'data': Tensor(weight_value.numpy())})
    147. value = Parameter(Tensor(weight_value.numpy()),name=weight_map[weight_name])
    148. key = ms_model[weight_map[weight_name]]
    149. _update_param(key, value)
    150. print(weight_name, '->', weight_map[weight_name], weight_value.shape)
    151. save_checkpoint(model, os.path.join("./luke-large-qa.ckpt"))
    152. print('=' * 20 + 'extract weights finished' + '=' * 20)
    153. extract_and_convert(torch_model,ms_model)

    这里名称一定要一一对应。如果后期改动了模型,也需要在检查一下这个转换函数是否能对应。

     04 

    评估测试

    我们将输出后的概率进行后处理,可以得到最终的预测答案。

    最后将预测值与标准答案进行评估,得到最后的F1分数。

    (训练相关等将于后续更新)

    参考链接:

    [1] luke源码:

    https://github.com/studio-ousia/luke

    [2] luke论文:

    https://aclanthology.org/2020.emnlp-main.523.pdf

    [3] 参考文章:

    https://zhuanlan.zhihu.com/p/381626609

    MindSpore官方资料

    GitHub : https://github.com/mindspore-ai/mindspore

    Gitee : https : //gitee.com/mindspore/mindspore

    官方QQ群 : 871543426 

  • 相关阅读:
    【MySQL | 运维篇】05、MySQL 分库分表之 使用 MyCat 分片
    机器学习集成学习进阶Xgboost算法原理
    el-tooltip那些事
    学习分享-FutureTask和消息队列的区别
    107. SAP UI5 OverflowToolbar 容器控件以及 resize 事件处理的一些细节介绍
    关于 java 的动态绑定机制
    Kafka3.0.0版本——文件存储机制
    centos8安装cobbler
    一、excel转pdf格式jacob.jar
    交换机与路由器技术-05-路由器工作原理
  • 原文地址:https://blog.csdn.net/Kenji_Shinji/article/details/125513710