大模型的魔法

在神经网络中weight decay

weight decay（权值衰减）的最终目的是防止过拟合。在损失函数中，weight decay是放在正则项（regularization）前面的一个系数，正则项一般指示模型的复杂度，所以weight decay的作用是调节模型复杂度对损失函数的影响，若weight decay很大，则复杂的模型损失函数的值也就大。

momentum是梯度下降法中一种常用的加速技术。对于一般的SGD，其表达式为 $x \leftarrow x-\alpha \ast dx$
,x沿负梯度方向下降。而带momentum项的SGD则写生如下形式：

在这里插入图片描述

其中\beta 即momentum系数，通俗的理解上面式子就是，如果上一次的momentum（即v）与这一次的负梯度方向是相同的，那这次下降的幅度就会加大，所以这样做能够达到加速收敛的过程。

Template 构建

{'placeholder:} ——感觉是用来放置sent的
{‘meta’: } ——感觉是用来放置一些特定实体的，比如，entity、title等等
{‘soft’: ，‘duplicate’：} ——软标签，表示需要优化的参数，如果是word，则初始化为token的emb的吧（我理解是这样）如果是其他，则随机初始化。参数duplicate 表示soft tokens个数，比如，50个软令牌等。
{“mask”} ——表示产出
官方文档https://thunlp.github.io/OpenPrompt/notes/template.html?highlight=duplicate

template 参数

参数查看：[n for n,p in prompt_model.template.named_parameters()]
n _name ,p_paramters
LM模型参数查看：[n for n, p in prompt_model.plm.named_parameters()]

template包装结果查看

#如果您尝试定义 10000 个软令牌，请使用密钥 ，duplicate
template_text ='{"placeholder":"text_a"} {"soft": "quenstion", "duplicate": 50} {"placeholder":"text_b"} {"soft": "yes", "duplicate": 16} {"soft": "no", "duplicate":16} {"soft": "maybe" , "duplicate": 16} {"mask"}.'
mytemplate = MixedTemplate(model=plm,tokenizer=tokenizer, text=template_text)

# To better understand how does the template wrap the example, we visualize one instance.
wrapped_example = mytemplate.wrap_one_example(dataset['train'][0])
wrapped_example
1
2
3
4
5
6
7

----------------------其中，dataset[‘train’][0]格式为：

{
  "guid": 0,
  "label": 0,
  "meta": {},
  "text_a": "It was a complex language. Not written down but handed down. One might say it was peeled down.",
  "text_b": "the language was peeled down",
  "tgt_text": null
}
1
2
3
4
5
6
7
8

整个训练过程

1加载数据和LM

数据加载为字典形式

model_inputs = {}
for split in ['train', 'validation', 'test']:
    model_inputs[split] = []
    for sample in dataset[split]:
        tokenized_example = wrapped_t5tokenizer.tokenize_one_example(mytemplate.wrap_one_example(sample), teacher_forcing=False)
        model_inputs[split].append(tokenized_example)


from openprompt import PromptDataLoader

train_dataloader = PromptDataLoader(dataset=dataset["train"], template=mytemplate, tokenizer=tokenizer,
    tokenizer_wrapper_class=WrapperClass, max_seq_length=256, decoder_max_length=3,
    batch_size=4,shuffle=True, teacher_forcing=False, predict_eos_token=False,
    truncate_method="head")
#tokenizing: 250it [00:00, 624.06it/s]表示训练数据的数量
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

2 定义template

3训练

定义要更新的参数，比如，LM参数中那部分，template——model中的哪部分参数

from openprompt import PromptForClassification

use_cuda = torch.cuda.is_available()
print("GPU enabled? {}".format(use_cuda))
prompt_model = PromptForClassification(plm=plm,template=mytemplate, verbalizer=myverbalizer, freeze_plm=False)
if use_cuda:
    prompt_model=  prompt_model.cuda()


from transformers import  AdamW, get_linear_schedule_with_warmup
loss_func = torch.nn.CrossEntropyLoss()
no_decay = ['bias', 'LayerNorm.weight']
# it's always good practice to set no decay to biase and LayerNorm parameters
optimizer_grouped_parameters = [
    {'params': [p for n, p in prompt_model.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},#weight_decay ： 权重衰减项，防止过拟合的一个参数
    {'params': [p for n, p in prompt_model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
]

optimizer = AdamW(optimizer_grouped_parameters, lr=1e-4)

for epoch in range(5):
    tot_loss = 0
    for step, inputs in enumerate(train_dataloader):
        if use_cuda:
            inputs = inputs.cuda()
        logits = prompt_model(inputs)
        labels = inputs['label']
        loss = loss_func(logits, labels)
        loss.backward()
        tot_loss += loss.item()
        optimizer.step()
        optimizer.zero_grad()
        if step %100 ==1:
            print("Epoch {}, average loss: {}".format(epoch, tot_loss/(step+1)), flush=True)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

4测试

Verbalizer构建

手工构建，ManualVerbalizer，labels是由词构成，比如[[great,wonderful],[bad]] or {“World”: “politics”, “Tech”: “technology”}
SoftVerbalizer

Verbalizer参数

OpenPrompt
Prompt——demo链接：https://colab.research.google.com/drive/10syott1zXaQkjnlxOiSXKDFGy68SWR0y?usp=sharing#scrollTo=MHZc0szQ8tkY

opendelta
Delra——demo链接：
https://colab.research.google.com/drive/1uAhgAdc8Qr42UKYDlgUv0f7W1-gAFwGo?usp=sharing

相关阅读:
C# OpenVino Yolov8 Seg 分割
 二刷 K8s 源码 - workqueue 的所有细节
 解锁前端Vue3宝藏级资料第五章 Vue 组件应用 1( Props )
Java学习笔记18——SQLite3数据库安装与使用
 Python实战项目6-后端多方式登录接口/手机登录接口
 CentOS8安装MySQL
“技术”已成为京东持续“低价”的底气和重要动能
 LeetCode in Python 48. Rotate Image/Matrix (旋转图像/矩阵)
【随想】闲聊、沟通和谈判
 【Python】快速获取系统当前时间戳(精确到1ms)
原文地址：https://blog.csdn.net/Hekena/article/details/125602811