使用DeepSpeed/P-Tuning v2对ChatGLM-6B进行微调

使用DeepSpeed/P-Tuning v2对ChatGLM-6B进行微调
link
之前尝试了基于ChatGLM-6B使用LoRA进行参数高效微调，本文给大家分享使用DeepSpeed和P-Tuning v2对ChatGLM-6B进行微调，相关代码放置在GitHub上面：llm-action。

ChatGLM-6B简介

ChatGLM-6B相关的简介请查看之前的文章，这里不再赘述。

P-Tuning v2简介

P-Tuning是一种较新的模型微调方法，它采用了参数剪枝的技术，可以将微调的参数量减少到原来的0.1%。具体来说，P-Tuning v2是基于P-Tuning v1的升级版，主要的改进在于采用了更加高效的剪枝方法，可以进一步减少模型微调的参数量。

P-Tuning v2的原理是通过对已训练好的大型语言模型进行参数剪枝，得到一个更加小巧、效率更高的轻量级模型。具体地，P-Tuning v2首先使用一种自适应的剪枝策略，对大型语言模型中的参数进行裁剪，去除其中不必要的冗余参数。然后，对于被剪枝的参数，P-Tuning v2使用了一种特殊的压缩方法，能够更加有效地压缩参数大小，并显著减少模型微调的总参数量。

总的来说，P-Tuning v2的核心思想是让模型变得更加轻便、更加高效，同时尽可能地保持模型的性能不受影响。这不仅可以加快模型的训练和推理速度，还可以减少模型在使用过程中的内存和计算资源消耗，让模型更适用于各种实际应用场景中。

环境搭建

基础环境配置如下：
- 操作系统: Ubuntu 18.04
- CPUs: 单个节点具有 1TB 内存的 Intel CPU，物理CPU个数为64，每颗CPU核数为16
- GPUs: 8 卡 A800 80GB GPUs
- Python: 3.10 (需要先升级OpenSSL到1.1.1t版本（点击下载OpenSSL），然后再编译安装Python)，点击下载Python
- NVIDIA驱动程序版本: 515.65.01，根据不同型号选择不同的驱动程序，点击下载。
- CUDA工具包: 11.7，点击下载
- NCCL: nccl_2.14.3-1+cuda11.7，点击下载
- cuDNN: 8.8.1.3_cuda11，点击下载
上面的NVIDIA驱动、CUDA、Python等工具的安装就不一一赘述了。

创建虚拟环境并激活虚拟环境chatglm-ptuningv2-venv-py310-cu117：
cd /home/guodong.li/virtual-venv virtualenv -p /usr/bin/python3.10 chatglm-ptuningv2-venv-py310-cu117 source /home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/bin/activate
1
2
3
离线安装PyTorch，点击下载对应cuda版本的torch和torchvision即可。
pip install torch-1.13.1+cu117-cp310-cp310-linux_x86_64.whl pip install torchvision-0.14.1+cu117-cp310-cp310-linux_x86_64.whl
1
2
安装其他依赖库。
pip install -r requirements.txt
1
requirements.txt文件内容如下：
protobuf transformers==4.28.0 cpm_kernels gradio mdtex2html sentencepiece rouge_chinese nltk jieba datasets deepspeed accelerate
1
2
3
4
5
6
7
8
9
10
11
12
注意：
官方文档的transformers版本为4.27.1，chatglm加载模型时会调用transformers/dynamic_module_utils.py文件下的get_class_in_module方法，而该方法在并发情况下会存在找不到文件的问题。将transformers版本升级到4.28.0可以规避此问题。

数据准备

下面以 ADGEN (广告生成) 数据集为例来介绍微调的具体使用。

ADGEN 数据集为根据输入（content）生成一段广告词（summary），具体格式如下所示：
{ "content": "类型#上衣*版型#宽松*版型#显瘦*图案#线条*衣样式#衬衫*衣袖型#泡泡袖*衣款式#抽绳", "summary": "这件衬衫的款式非常的宽松，利落的线条可以很好的隐藏身材上的小缺点，穿在身上有着很好的显瘦效果。领口装饰了一个可爱的抽绳，漂亮的绳结展现出了十足的个性，配合时尚的泡泡袖型，尽显女性甜美可爱的气息。" }
1
2
3
4
请从官网下载 ADGEN 数据集，同通过此链接下载，并将其解压到 AdvertiseGen 目录。
tar -zxvf AdvertiseGen.tar.gz
1
查看数据集大小：
> wc -l AdvertiseGen/* > 1070 AdvertiseGen/dev.json > 114599 AdvertiseGen/train.json > 115669 total
1
2
3
4
使用DeepSpeed DP+Zero对ChatGLM-6B进行全参数微调

首先，我们使用DeepSpeed对ChatGLM-6B进行全参数微调。

首先，下载源代码，为确保代码的一致性切换到对应的commitid：
git clone https://github.com/THUDM/ChatGLM-6B.git cd ChatGLM-6B git checkout 8633db1 cd ptuning
1
2
3
4
修改ds_train_finetune.sh脚本使用DeepSpeed进行全参数微调。
LR=1e-4
1
MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --num_gpus=8 --master_port $MASTER_PORT main.py \ --deepspeed deepspeed.json \ --do_train \ --train_file /data/nfs/llm/data/AdvertiseGen/train.json \ --test_file /data/nfs/llm/data/AdvertiseGen/dev.json \ --prompt_column content \ --response_column summary \ --overwrite_cache \ --model_name_or_path /data/nfs/llm/model/chatglm-6b \ --output_dir /home/guodong.li/output/adgen-chatglm-6b-ft-$ LR
–overwrite_output_dir
–max_source_length 64
–max_target_length 64
–per_device_train_batch_size 24
–per_device_eval_batch_size 1
–gradient_accumulation_steps 2
–predict_with_generate
–num_train_epochs 2
–logging_steps 10
–save_steps 300
–learning_rate $LR
–fp16

运行过程：
```
> sh ds_train_finetune.sh

[2023-04-14 18:01:33,206] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.

[2023-04-14 18:01:33,417] [INFO] [runner.py:540:main] cmd = /home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=44148 --enable_each_rank_log=None main.py --deepspeed deepspeed.json --do_train --train_file /data/nfs/llm/data/AdvertiseGen/train.json --test_file /data/nfs/llm/data/AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /data/nfs/llm/model/chatglm-6b --output_dir /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4 --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 24 --per_device_eval_batch_size 1 --gradient_accumulation_steps 2 --predict_with_generate --num_train_epochs 2 --logging_steps 10 --save_steps 300 --learning_rate 1e-4 --fp16

[2023-04-14 18:01:35,945] [INFO] [launch.py:222:main] 0 NCCL_SOCKET_IFNAME=bond0

[2023-04-14 18:01:35,945] [INFO] [launch.py:222:main] 0 NCCL_IB_DISABLE=1

[2023-04-14 18:01:35,945] [INFO] [launch.py:229:main] WORLD INFO DICT: {‘localhost’: [0, 1, 2, 3, 4, 5, 6, 7]}

[2023-04-14 18:01:35,945] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=8, node_rank=0

[2023-04-14 18:01:35,945] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(, {‘localhost’: [0, 1, 2, 3, 4, 5, 6, 7]})

[2023-04-14 18:01:35,945] [INFO] [launch.py:247:main] dist_world_size=8

[2023-04-14 18:01:35,945] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

[2023-04-14 18:01:40,133] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl

04/14/2023 18:01:41 - WARNING - main - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: True

…

04/14/2023 18:01:41 - WARNING - main - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: True

04/14/2023 18:01:41 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(

_n_gpu=1,

adafactor=False,

adam_beta1=0.9,

adam_beta2=0.999,

adam_epsilon=1e-08,

auto_find_batch_size=False,

bf16=False,

bf16_full_eval=False,

data_seed=None,

dataloader_drop_last=False,

dataloader_num_workers=0,

dataloader_pin_memory=True,

ddp_bucket_cap_mb=None,

ddp_find_unused_parameters=None,

ddp_timeout=1800,

debug=[],

deepspeed=deepspeed.json,

disable_tqdm=False,

do_eval=False,

do_predict=False,

do_train=True,

eval_accumulation_steps=None,

eval_delay=0,

eval_steps=None,

evaluation_strategy=no,

fp16=True,

fp16_backend=auto,

fp16_full_eval=False,

fp16_opt_level=O1,

fsdp=[],

fsdp_config={‘fsdp_min_num_params’: 0, ‘xla’: False, ‘xla_fsdp_grad_ckpt’: False},

fsdp_min_num_params=0,

fsdp_transformer_layer_cls_to_wrap=None,

full_determinism=False,

generation_config=None,

generation_max_length=None,

generation_num_beams=None,

gradient_accumulation_steps=2,

gradient_checkpointing=False,

greater_is_better=None,

group_by_length=False,

half_precision_backend=auto,

hub_model_id=None,

hub_private_repo=False,

hub_strategy=every_save,

hub_token=,

ignore_data_skip=False,

include_inputs_for_metrics=False,

jit_mode_eval=False,

label_names=None,

label_smoothing_factor=0.0,

learning_rate=0.0001,

length_column_name=length,

load_best_model_at_end=False,

local_rank=0,

log_level=passive,

log_level_replica=warning,

log_on_each_node=True,

logging_dir=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/runs/Apr14_18-01-40_ai-app-2-46,

logging_first_step=False,

logging_nan_inf_filter=True,

logging_steps=10,

logging_strategy=steps,

lr_scheduler_type=linear,

max_grad_norm=1.0,

max_steps=-1,

metric_for_best_model=None,

mp_parameters=,

no_cuda=False,

num_train_epochs=2.0,

optim=adamw_hf,

optim_args=None,

output_dir=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4,

overwrite_output_dir=True,

past_index=-1,

per_device_eval_batch_size=1,

per_device_train_batch_size=24,

predict_with_generate=True,

prediction_loss_only=False,

push_to_hub=False,

push_to_hub_model_id=None,

push_to_hub_organization=None,

push_to_hub_token=,

ray_scope=last,

remove_unused_columns=True,

report_to=[],

resume_from_checkpoint=None,

run_name=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4,

save_on_each_node=False,

save_safetensors=False,

save_steps=300,

save_strategy=steps,

save_total_limit=None,

seed=42,

sharded_ddp=[],

skip_memory_metrics=True,

sortish_sampler=False,

tf32=None,

torch_compile=False,

torch_compile_backend=None,

torch_compile_mode=None,

torchdynamo=None,

tpu_metrics_debug=False,

tpu_num_cores=None,

use_ipex=False,

use_legacy_prediction_loop=False,

use_mps_device=False,

warmup_ratio=0.0,

warmup_steps=0,

weight_decay=0.0,

xpu_backend=None,

)

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 184.03it/s]

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,664 >> Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

0%|                                                                                                                                                                                   | 0/2 [00:00> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 240.57it/s]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 197.48it/s]

[INFO|configuration_utils.py:666] 2023-04-14 18:03:01,678 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,678 >> Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,679 >> Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

[INFO|configuration_utils.py:666] 2023-04-14 18:03:01,685 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

[INFO|configuration_utils.py:720] 2023-04-14 18:03:01,687 >> Model config ChatGLMConfig {

“_name_or_path”: “/data/nfs/llm/model/chatglm-6b”,

“architectures”: [

“ChatGLMModel”

],

“auto_map”: {

“AutoConfig”: “configuration_chatglm.ChatGLMConfig”,

“AutoModel”: “modeling_chatglm.ChatGLMForConditionalGeneration”,

“AutoModelForSeq2SeqLM”: “modeling_chatglm.ChatGLMForConditionalGeneration”

},

“bos_token_id”: 130004,

“eos_token_id”: 130005,

“gmask_token_id”: 130001,

“hidden_size”: 4096,

“inner_hidden_size”: 16384,

“layernorm_epsilon”: 1e-05,

“mask_token_id”: 130000,

“max_sequence_length”: 2048,

“model_type”: “chatglm”,

“num_attention_heads”: 32,

“num_layers”: 28,

“pad_token_id”: 3,

“position_encoding_2d”: true,

“pre_seq_len”: null,

“prefix_projection”: false,

“quantization_bit”: 0,

“torch_dtype”: “float16”,

“transformers_version”: “4.28.0”,

“use_cache”: true,

“vocab_size”: 130528

}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
```
0%| | 0/2 [00:00> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[WARNING|tokenization_auto.py:675] 2023-04-14 18:03:01,689 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|tokenization_utils_base.py:1807] 2023-04-14 18:03:01,694 >> loading file ice_text.model
[INFO|tokenization_utils_base.py:1807] 2023-04-14 18:03:01,694 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1807] 2023-04-14 18:03:01,694 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1807] 2023-04-14 18:03:01,694 >> loading file tokenizer_config.json
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 285.37it/s]
[INFO|modeling_utils.py:2531] 2023-04-14 18:03:01,992 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.json
[INFO|configuration_utils.py:575] 2023-04-14 18:03:01,993 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

Loading checkpoint shards: 0%| | 0/8 [00:00> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[WARNING|auto_factory.py:456] 2023-04-14 18:03:02,109 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:13<00:00, 1.70s/it]
[INFO|modeling_utils.py:3190] 2023-04-14 18:03:15,622 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[INFO|modeling_utils.py:3198] 2023-04-14 18:03:15,622 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /data/nfs/llm/model/chatglm-6b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
Loading checkpoint shards: 25%|████████████████████████████████████ | 2/8 [00:13<00:40, 6.73s/it][INFO|modeling_utils.py:2839] 2023-04-14 18:03:15,703 >> Generation config file not found, using a generation config created from the model config.
…
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:34<00:00, 4.32s/it]
input_ids [5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
inputs 类型#裤版型#宽松风格#性感图案#线条裤型#阔腿裤宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还
…
label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]
labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还
[2023-04-14 18:06:30,469] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-04-14 18:06:30,470] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no ‘params’ in the client Optimizer
[2023-04-14 18:06:30,470] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2023-04-14 18:06:30,483] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2023-04-14 18:06:30,484] [INFO] [utils.py:51:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=
[2023-04-14 18:06:30,484] [WARNING] [engine.py:1118:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
[2023-04-14 18:06:30,484] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2023-04-14 18:06:30,484] [INFO] [stage_1_and_2.py:133:init] Reduce bucket size 500000000
[2023-04-14 18:06:30,484] [INFO] [stage_1_and_2.py:134:init] Allgather bucket size 500000000
[2023-04-14 18:06:30,484] [INFO] [stage_1_and_2.py:135:init] CPU Offload: False
[2023-04-14 18:06:30,484] [INFO] [stage_1_and_2.py:136:init] Round robin gradient partitioning: False
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja…
Building extension module utils…
Allowing ninja to set a default number of workers… (overridable by setting the environment variable MAX_JOBS=N)
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
ninja: no work to do.
Loading extension module utils…
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Time to load utils op: 0.10171675682067871 seconds
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja…
Building extension module utils…
Allowing ninja to set a default number of workers… (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils…
Time to load utils op: 0.18768668174743652 seconds
…
Loading extension module utils…
Time to load utils op: 0.3021426200866699 seconds
Rank: 2 partition count [8, 8] and sizes[(771473408, False), (187392, False)]
…
Rank: 4 partition count [8, 8] and sizes[(771473408, False), (187392, False)]
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
No modifications detected for re-loaded extension module utils, skipping build step…
Loading extension module utils…
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Time to load utils op: 0.0005774497985839844 seconds
…
No modifications detected for re-loaded extension module utils, skipping build step…
Loading extension module utils…
Time to load utils op: 0.0011382102966308594 seconds
[2023-04-14 18:06:48,321] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states
[2023-04-14 18:06:48,321] [INFO] [utils.py:786:see_memory_usage] MA 14.37 GB Max_MA 14.37 GB CA 14.39 GB Max_CA 14 GB
[2023-04-14 18:06:48,322] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 50.56 GB, percent = 5.0%
04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False…
…
04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False…
[2023-04-14 18:06:48,431] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states
[2023-04-14 18:06:48,434] [INFO] [utils.py:786:see_memory_usage] MA 20.12 GB Max_MA 25.87 GB CA 25.9 GB Max_CA 26 GB
[2023-04-14 18:06:48,435] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 50.84 GB, percent = 5.0%
[2023-04-14 18:06:48,435] [INFO] [stage_1_and_2.py:489:init] optimizer state initialized
[2023-04-14 18:06:48,512] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer
[2023-04-14 18:06:48,513] [INFO] [utils.py:786:see_memory_usage] MA 20.12 GB Max_MA 20.12 GB CA 25.9 GB Max_CA 26 GB
[2023-04-14 18:06:48,513] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 51.29 GB, percent = 5.1%
[2023-04-14 18:06:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW
[2023-04-14 18:06:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2023-04-14 18:06:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler =
[2023-04-14 18:06:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0001, 0.0001], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 18:06:48,515] [INFO] [config.py:953:print] DeepSpeedEngine configuration:
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] activation_checkpointing_config {
“partition_activations”: false,
“contiguous_memory_optimization”: false,
“cpu_checkpointing”: false,
“number_checkpoints”: null,
“synchronize_checkpoint_boundary”: false,
“profile”: false
}
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] aio_config … {‘block_size’: 1048576, ‘queue_depth’: 8, ‘thread_count’: 1, ‘single_submit’: False, ‘overlap_events’: True}
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] amp_enabled … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] amp_params … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] autotuning_config … {
“enabled”: false,
“start_step”: null,
“end_step”: null,
“metric_path”: null,
“arg_mappings”: null,
“metric”: “throughput”,
“model_info”: null,
“results_dir”: “autotuning_results”,
“exps_dir”: “autotuning_exps”,
“overwrite”: true,
“fast”: true,
“start_profile_step”: 3,
“end_profile_step”: 5,
“tuner_type”: “gridsearch”,
“tuner_early_stopping”: 5,
“tuner_num_trials”: 50,
“model_info_path”: null,
“mp_size”: 1,
“max_train_batch_size”: null,
“min_train_batch_size”: 1,
“max_train_micro_batch_size_per_gpu”: 1.024000e+03,
“min_train_micro_batch_size_per_gpu”: 1,
“num_tuning_micro_batch_sizes”: 3
}
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] bfloat16_enabled … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] checkpoint_tag_validation_fail False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] comms_config …
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] communication_data_type … None
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] compression_config … {‘weight_quantization’: {‘shared_parameters’: {‘enabled’: False, ‘quantizer_kernel’: False, ‘schedule_offset’: 0, ‘quantize_groups’: 1, ‘quantize_verbose’: False, ‘quantization_type’: ‘symmetric’, ‘quantize_weight_in_forward’: False, ‘rounding’: ‘nearest’, ‘fp16_mixed_quantize’: False, ‘quantize_change_ratio’: 0.001}, ‘different_groups’: {}}, ‘activation_quantization’: {‘shared_parameters’: {‘enabled’: False, ‘quantization_type’: ‘symmetric’, ‘range_calibration’: ‘dynamic’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘sparse_pruning’: {‘shared_parameters’: {‘enabled’: False, ‘method’: ‘l1’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘row_pruning’: {‘shared_parameters’: {‘enabled’: False, ‘method’: ‘l1’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘head_pruning’: {‘shared_parameters’: {‘enabled’: False, ‘method’: ‘topk’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘channel_pruning’: {‘shared_parameters’: {‘enabled’: False, ‘method’: ‘l1’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘layer_reduction’: {‘enabled’: False}}
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] curriculum_enabled_legacy … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] curriculum_params_legacy … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] data_efficiency_config … {‘enabled’: False, ‘seed’: 1234, ‘data_sampling’: {‘enabled’: False, ‘num_epochs’: 1000, ‘num_workers’: 0, ‘curriculum_learning’: {‘enabled’: False}}, ‘data_routing’: {‘enabled’: False, ‘random_ltd’: {‘enabled’: False, ‘layer_token_lr_schedule’: {‘enabled’: False}}}}
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] data_efficiency_enabled … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] dataloader_drop_last … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] disable_allgather … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] dump_state … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] dynamic_loss_scale_args … {‘init_scale’: 65536, ‘scale_window’: 1000, ‘delayed_shift’: 2, ‘min_scale’: 1}
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] eigenvalue_enabled … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] eigenvalue_gas_boundary_resolution 1
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] eigenvalue_layer_name … bert.encoder.layer
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_layer_num … 0
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_max_iter … 100
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_stability … 1e-06
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_tol … 0.01
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_verbose … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] elasticity_enabled … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] flops_profiler_config … {
“enabled”: false,
“profile_step”: 1,
“module_depth”: -1,
“top_modules”: 1,
“detailed”: true,
“output_file”: null
}
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] fp16_auto_cast … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] fp16_enabled … True
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] fp16_master_weights_and_gradients False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] global_rank … 0
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] grad_accum_dtype … None
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] gradient_accumulation_steps … 1
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] gradient_clipping … 0.0
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] gradient_predivide_factor … 1.0
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] hybrid_engine … enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] initial_dynamic_scale … 65536
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] load_universal_checkpoint … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] loss_scale … 0
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] memory_breakdown … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] monitor_config … tensorboard=TensorBoardConfig(enabled=False, output_path=‘’, job_name=‘DeepSpeedJobName’) wandb=WandbConfig(enabled=False, group=None, team=None, project=‘deepspeed’) csv_monitor=CSVConfig(enabled=False, output_path=‘’, job_name=‘DeepSpeedJobName’) enabled=False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] nebula_config … {
“enabled”: false,
“persistent_storage_path”: null,
“persistent_time_interval”: 100,
“num_of_version_in_retention”: 2,
“enable_nebula_load”: true,
“load_path”: null
}
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] optimizer_legacy_fusion … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] optimizer_name … None
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] optimizer_params … None
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] pipeline … {‘stages’: ‘auto’, ‘partition’: ‘best’, ‘seed_layers’: False, ‘activation_checkpoint_interval’: 0}
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] pld_enabled … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] pld_params … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] prescale_gradients … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] scheduler_name … None
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] scheduler_params … None
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] sparse_attention … None
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] sparse_gradients_enabled … False
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] steps_per_print … 10
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] train_batch_size … 192
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] train_micro_batch_size_per_gpu 24
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] use_node_local_storage … False
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] wall_clock_breakdown … False
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] world_size … 8
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_allow_untested_optimizer True
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_config … stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=True
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_enabled … True
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_force_ds_cpu_optimizer … True
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_optimization_stage … 2
[2023-04-14 18:06:48,518] [INFO] [config.py:943:print_user_config] json = {
“train_micro_batch_size_per_gpu”: 24,
“zero_allow_untested_optimizer”: true,
“fp16”: {
“enabled”: true,
“loss_scale”: 0,
“initial_scale_power”: 16,
“loss_scale_window”: 1000,
“hysteresis”: 2,
“min_loss_scale”: 1
},
“zero_optimization”: {
“stage”: 2,
“allgather_partitions”: true,
“allgather_bucket_size”: 5.000000e+08,
“overlap_comm”: false,
“reduce_scatter”: true,
“reduce_bucket_size”: 5.000000e+08,
“contiguous_gradients”: true
}
}
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
No modifications detected for re-loaded extension module utils, skipping build step…
Loading extension module utils…
Time to load utils op: 0.00031948089599609375 seconds
0%| | 0/596 [00:00use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False…
[2023-04-14 18:06:53,718] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1
[2023-04-14 18:06:55,883] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768
0%|▎ | 1/596 [00:07<1:13:02, 7.37s/it][2023-04-14 18:06:57,948] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
[2023-04-14 18:07:00,007] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
0%|▌ | 2/596 [00:11<54:01, 5.46s/it][2023-04-14 18:07:06,332] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096
1%|▊ | 3/596 [00:17<57:51, 5.85s/it][2023-04-14 18:07:08,383] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
1%|█▏ | 4/596 [00:24<59:20, 6.01s/it][2023-04-14 18:07:18,876] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024
[2023-04-14 18:07:18,876] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=7, lr=[9.949664429530202e-05, 9.949664429530202e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 18:07:18,877] [INFO] [timer.py:199:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=66.98818896434254, CurrSamplesPerSec=93.79590019766518, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
1%|█▍ | 5/596 [00:30<1:00:11, 6.11s/it]
…
[2023-04-14 18:47:55,207] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=12, lr=[3.02013422818792e-06, 3.02013422818792e-06], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 18:47:57,392] [INFO] [timer.py:199:stop] epoch=0/micro_step=590/global_step=590, RunningAvgSamplesPerSec=45.931193758598916, CurrSamplesPerSec=45.63412532914195, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
50%|███████████████████████████████████████████████████████████████████████████████████▊ | 299/596 [41:42<41:37, 8.41s/it][2023-04-14 18:48:37,273] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=12, lr=[1.3422818791946309e-06, 1.3422818791946309e-06], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 18:48:39,453] [INFO] [timer.py:199:stop] epoch=0/micro_step=600/global_step=600, RunningAvgSamplesPerSec=45.92850276413307, CurrSamplesPerSec=45.66031263997641, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
{‘loss’: 13.3487, ‘learning_rate’: 1.3422818791946309e-06, ‘epoch’: 1.01}
50%|████████████████████████████████████████████████████████████████████████████████████ | 300/596 [41:50<41:30, 8.41s/it]Saving the whole model
[INFO|configuration_utils.py:457] 2023-04-14 18:48:39,458 >> Configuration saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/config.json
[INFO|configuration_utils.py:362] 2023-04-14 18:48:39,459 >> Configuration saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/generation_config.json
[INFO|modeling_utils.py:1855] 2023-04-14 18:49:03,951 >> The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/pytorch_model.bin.index.json.
[INFO|tokenization_utils_base.py:2171] 2023-04-14 18:49:03,953 >> tokenizer config file saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/tokenizer_config.json
[INFO|tokenization_utils_base.py:2178] 2023-04-14 18:49:03,953 >> Special tokens file saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/special_tokens_map.json
[2023-04-14 18:49:03,983] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step600 is about to be saved!
[2023-04-14 18:49:03,988] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt
[2023-04-14 18:49:03,988] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt…
[2023-04-14 18:49:15,934] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt.
[2023-04-14 18:49:15,937] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt…
[2023-04-14 18:49:28,049] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2023-04-14 18:49:28,049] [INFO] [engine.py:3125:_save_zero_checkpoint] zero checkpoint saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2023-04-14 18:49:28,049] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step600 is ready now!
51%|████████████████████████████████████████████████████████████████████████████████████▏ | 304/596 [43:14<1:05:51, 13.53s/it][2023-04-14 18:50:09,137] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=12, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 18:50:11,316] [INFO] [timer.py:199:stop] epoch=0/micro_step=610/global_step=610, RunningAvgSamplesPerSec=45.926876625767875, CurrSamplesPerSec=45.66709917655267, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
52%|██████████████████████████████████████████████████████████████████████████████████████▌ | 309/596 [43:56<44:16, 9.26s/it][2023-04-14 18:50:51,114] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=12, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 18:50:53,302] [INFO] [timer.py:199:stop] epoch=0/micro_step=620/global_step=620, RunningAvgSamplesPerSec=45.92462533252217, CurrSamplesPerSec=45.55552426651123, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
{‘loss’: 13.3202, ‘learning_rate’: 0.0, ‘epoch’: 1.04}
…
99%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 589/596 [1:23:07<00:58, 8.41s/it][2023-04-14 19:30:02,654] [INFO] [logging.py:96:log_dist] [Rank 0] step=1180, skipped=12, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 19:30:04,820] [INFO] [timer.py:199:stop] epoch=0/micro_step=1180/global_step=1180, RunningAvgSamplesPerSec=45.85904109663022, CurrSamplesPerSec=45.73521852038509, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
{‘loss’: 13.3537, ‘learning_rate’: 0.0, ‘epoch’: 1.98}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍| 594/596 [1:23:49<00:16, 8.41s/it][2023-04-14 19:30:44,847] [INFO] [logging.py:96:log_dist] [Rank 0] step=1190, skipped=12, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 19:30:47,022] [INFO] [timer.py:199:stop] epoch=0/micro_step=1190/global_step=1190, RunningAvgSamplesPerSec=45.856487437478386, CurrSamplesPerSec=45.579988341622055, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
{‘train_runtime’: 5046.8863, ‘train_samples_per_second’: 45.414, ‘train_steps_per_second’: 0.118, ‘train_loss’: 13.905431555421561, ‘epoch’: 2.0}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 596/596 [1:24:06<00:00, 8.47s/it]
***** train metrics *****
epoch = 2.0
train_loss = 13.9054
train_runtime = 1:24:06.88
train_samples = 114599
train_samples_per_second = 45.414
train_steps_per_second = 0.118
[2023-04-14 19:30:58,560] [INFO] [launch.py:460:main] Process 35198 exits successfully.
[2023-04-14 19:30:58,561] [INFO] [launch.py:460:main] Process 35192 exits successfully.
[2023-04-14 19:30:58,561] [INFO] [launch.py:460:main] Process 35193 exits successfully.
[2023-04-14 19:30:58,561] [INFO] [launch.py:460:main] Process 35195 exits successfully.
[2023-04-14 19:30:58,561] [INFO] [launch.py:460:main] Process 35191 exits successfully.
[2023-04-14 19:30:59,562] [INFO] [launch.py:460:main] Process 35194 exits successfully.
[2023-04-14 19:30:59,563] [INFO] [launch.py:460:main] Process 35197 exits successfully.
[2023-04-14 19:31:00,564] [INFO] [launch.py:460:main] Process 35196 exits successfully.

GPU显存占用：
```
Fri Apr 14 18:27:45 2023

±----------------------------------------------------------------------------+

| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |

|-------------------------------±---------------------±---------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                               |                      |               MIG M. |

|=++==============|

|   0  NVIDIA A800 80G…  Off  | 00000000:34:00.0 Off |                    0 |

| N/A   59C    P0    92W / 300W |  36539MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   1  NVIDIA A800 80G…  Off  | 00000000:35:00.0 Off |                    0 |

| N/A   61C    P0    96W / 300W |  38395MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   2  NVIDIA A800 80G…  Off  | 00000000:36:00.0 Off |                    0 |

| N/A   63C    P0    93W / 300W |  38395MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   3  NVIDIA A800 80G…  Off  | 00000000:37:00.0 Off |                    0 |

| N/A   65C    P0   102W / 300W |  38347MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   4  NVIDIA A800 80G…  Off  | 00000000:9B:00.0 Off |                    0 |

| N/A   64C    P0   108W / 300W |  38347MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   5  NVIDIA A800 80G…  Off  | 00000000:9C:00.0 Off |                    0 |

| N/A   64C    P0   105W / 300W |  38395MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   6  NVIDIA A800 80G…  Off  | 00000000:9D:00.0 Off |                    0 |

| N/A   58C    P0    97W / 300W |  36433MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   7  NVIDIA A800 80G…  Off  | 00000000:9E:00.0 Off |                    0 |

| N/A   59C    P0    92W / 300W |  38347MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
```
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 35191 C …nv-py310-cu117/bin/python 36537MiB |
| 1 N/A N/A 35192 C …nv-py310-cu117/bin/python 38393MiB |
| 2 N/A N/A 35193 C …nv-py310-cu117/bin/python 38393MiB |
| 3 N/A N/A 35194 C …nv-py310-cu117/bin/python 38345MiB |
| 4 N/A N/A 35195 C …nv-py310-cu117/bin/python 38345MiB |
| 5 N/A N/A 35196 C …nv-py310-cu117/bin/python 38393MiB |
| 6 N/A N/A 35197 C …nv-py310-cu117/bin/python 36431MiB |
| 7 N/A N/A 35198 C …nv-py310-cu117/bin/python 38345MiB |
±----------------------------------------------------------------------------+

输出文件：
```
 tree /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4

/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4

├── all_results.json

├── checkpoint-300

│   ├── config.json

│   ├── configuration_chatglm.py

│   ├── generation_config.json

│   ├── global_step600

│   │   ├── mp_rank_00_model_states.pt

│   │   ├── zero_pp_rank_0_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_1_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_2_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_3_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_4_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_5_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_6_mp_rank_00_optim_states.pt

│   │   └── zero_pp_rank_7_mp_rank_00_optim_states.pt

│   ├── ice_text.model

│   ├── latest

│   ├── modeling_chatglm.py

│   ├── pytorch_model-00001-of-00002.bin

│   ├── pytorch_model-00002-of-00002.bin

│   ├── pytorch_model.bin.index.json

│   ├── quantization.py

│   ├── rng_state_0.pth

│   ├── rng_state_1.pth

│   ├── rng_state_2.pth

│   ├── rng_state_3.pth

│   ├── rng_state_4.pth

│   ├── rng_state_5.pth

│   ├── rng_state_6.pth

│   ├── rng_state_7.pth

│   ├── special_tokens_map.json

│   ├── tokenization_chatglm.py

│   ├── tokenizer_config.json

│   ├── trainer_state.json

│   ├── training_args.bin

│   └── zero_to_fp32.py

├── trainer_state.json

└── train_results.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
```
2 directories, 36 files

训练结束后没有保存模型权重，只保存了训练过程中的checkpoint，可在代码中添加trainer.save_model()进行保存。

使用DeepSpeed进行full finetuning，对于显存要求较高，且训练较慢。因此下面尝试使用官网提供的P-Tuning v2进行高效参数微调。

使用P-Tuning v2对ChatGLM-6B进行参数高效微调

对于 ChatGLM-6B 模型基于 P-Tuning v2 进行微调。可将需要微调的参数量减少到原来的 0.1%，再通过模型量化、Gradient Checkpoint 等方法，最低只需要 7GB 显存即可运行。

首先，修改train.sh脚本，主要是修改train_file、validation_file、model_name_or_path、output_dir参数：
```
PRE_SEQ_LEN=128

LR=2e-2
1
2
```
CUDA_VISIBLE_DEVICES=0 python3 main.py
–do_train
–train_file /data/nfs/llm/data/AdvertiseGen/train.json
–validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
–prompt_column content
–response_column summary
–overwrite_cache
–model_name_or_path /data/nfs/llm/model/chatglm-6b
–output_dir /home/guodong.li/output/adgen-chatglm-6b-pt- $PRE_SEQ_LEN-$ LR
–overwrite_output_dir
–max_source_length 64
–max_target_length 64
–per_device_train_batch_size 1
–per_device_eval_batch_size 1
–gradient_accumulation_steps 16
–predict_with_generate
–max_steps 3000
–logging_steps 10
–save_steps 1000
–learning_rate $LR
–pre_seq_len $PRE_SEQ_LEN
–quantization_bit 4

运行过程：
```
  0%|                  | 0/3000 [00:00
…

{‘loss’: 4.2962, ‘learning_rate’: 0.0196, ‘epoch’: 0.01}

{‘loss’: 4.3112, ‘learning_rate’: 0.019533333333333333, ‘epoch’: 0.01}

2%|███▊             | 70/3000 [03:20<2:17:06,  2.81s/it]
1
2
3
4
5
```
GPU显存占用：
```
|-------------------------------±---------------------±---------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                               |                      |               MIG M. |

|=++==============|

|   0  NVIDIA A800 80G…  Off  | 00000000:34:00.0 Off |                    0 |

| N/A   71C    P0   300W / 300W |   6291MiB / 81920MiB |     74%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+
1
2
3
4
5
6
7
8
9
```
对显存的占用确实低，即使用了P-Tuning v2进行参数高效微调，但训练的速度还是很慢。

修改train.sh增大batch_size继续干。
```
PRE_SEQ_LEN=128

LR=2e-2
1
2
```
CUDA_VISIBLE_DEVICES=0 python3 main.py
–do_train
–train_file /data/nfs/llm/data/AdvertiseGen/train.json
–validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
–prompt_column content
–response_column summary
–overwrite_cache
–model_name_or_path /data/nfs/llm/model/chatglm-6b
–output_dir /home/guodong.li/output/adgen-chatglm-6b-pt- $PRE_SEQ_LEN-$ LR
–overwrite_output_dir
–max_source_length 64
–max_target_length 64
–per_device_train_batch_size 128
–per_device_eval_batch_size 8
–gradient_accumulation_steps 16
–predict_with_generate
–num_train_epochs 1
–logging_steps 10
–save_steps 100
–learning_rate $LR
–pre_seq_len $PRE_SEQ_LEN
–quantization_bit 4

运行过程：
```
sh train.sh

04/14/2023 19:46:38 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: Fals

04/14/2023 19:46:38 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(

_n_gpu=1,

adafactor=False,

adam_beta1=0.9,

adam_beta2=0.999,

adam_epsilon=1e-08,

auto_find_batch_size=False,

bf16=False,

bf16_full_eval=False,

data_seed=None,

dataloader_drop_last=False,

dataloader_num_workers=0,

dataloader_pin_memory=True,

ddp_bucket_cap_mb=None,

ddp_find_unused_parameters=None,

ddp_timeout=1800,

debug=[],

deepspeed=None,

disable_tqdm=False,

do_eval=False,

do_predict=False,

do_train=True,

eval_accumulation_steps=None,

eval_delay=0,

eval_steps=None,

evaluation_strategy=no,

fp16=False,

fp16_backend=auto,

fp16_full_eval=False,

fp16_opt_level=O1,

fsdp=[],

fsdp_config={‘fsdp_min_num_params’: 0, ‘xla’: False, ‘xla_fsdp_grad_ckpt’: False},

fsdp_min_num_params=0,

fsdp_transformer_layer_cls_to_wrap=None,

full_determinism=False,

generation_config=None,

generation_max_length=None,

generation_num_beams=None,

gradient_accumulation_steps=16,

gradient_checkpointing=False,

greater_is_better=None,

group_by_length=False,

half_precision_backend=auto,

hub_model_id=None,

hub_private_repo=False,

hub_strategy=every_save,

hub_token=,

ignore_data_skip=False,

include_inputs_for_metrics=False,

jit_mode_eval=False,

label_names=None,

label_smoothing_factor=0.0,

learning_rate=0.02,

length_column_name=length,

load_best_model_at_end=False,

local_rank=-1,

log_level=passive,

log_level_replica=warning,

log_on_each_node=True,

logging_dir=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/runs/Apr14_19-46-38_ai-app-2-46,

logging_first_step=False,

logging_nan_inf_filter=True,

logging_steps=10,

logging_strategy=steps,

lr_scheduler_type=linear,

max_grad_norm=1.0,

max_steps=-1,

metric_for_best_model=None,

mp_parameters=,

no_cuda=False,

num_train_epochs=1.0,

optim=adamw_hf,

optim_args=None,

output_dir=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2,

overwrite_output_dir=True,

past_index=-1,

per_device_eval_batch_size=8,

per_device_train_batch_size=128,

predict_with_generate=True,

prediction_loss_only=False,

push_to_hub=False,

push_to_hub_model_id=None,

push_to_hub_organization=None,

push_to_hub_token=,

ray_scope=last,

remove_unused_columns=True,

report_to=[],

resume_from_checkpoint=None,

run_name=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2,

save_on_each_node=False,

save_safetensors=False,

save_steps=100,

save_strategy=steps,

save_total_limit=None,

seed=42,

sharded_ddp=[],

skip_memory_metrics=True,

sortish_sampler=False,

tf32=None,

torch_compile=False,

torch_compile_backend=None,

torch_compile_mode=None,

torchdynamo=None,

tpu_metrics_debug=False,

tpu_num_cores=None,

use_ipex=False,

use_legacy_prediction_loop=False,

use_mps_device=False,

warmup_ratio=0.0,

warmup_steps=0,

weight_decay=0.0,

xpu_backend=None,

)

04/14/2023 19:47:58 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-1cf934bed8e233e6e)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

[INFO|configuration_utils.py:666] 2023-04-14 19:47:58,671 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[WARNING|configuration_auto.py:925] 2023-04-14 19:47:58,671 >> Explicitly passing a revision is encouraged when loading a configuratio a newer revision.

[INFO|configuration_utils.py:666] 2023-04-14 19:47:58,679 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[INFO|configuration_utils.py:720] 2023-04-14 19:47:58,681 >> Model config ChatGLMConfig {

“_name_or_path”: “/data/nfs/llm/model/chatglm-6b”,

“architectures”: [

“ChatGLMModel”

],

“auto_map”: {

“AutoConfig”: “configuration_chatglm.ChatGLMConfig”,

“AutoModel”: “modeling_chatglm.ChatGLMForConditionalGeneration”,

“AutoModelForSeq2SeqLM”: “modeling_chatglm.ChatGLMForConditionalGeneration”

},

“bos_token_id”: 130004,

“eos_token_id”: 130005,

“gmask_token_id”: 130001,

“hidden_size”: 4096,

“inner_hidden_size”: 16384,

“layernorm_epsilon”: 1e-05,

“mask_token_id”: 130000,

“max_sequence_length”: 2048,

“model_type”: “chatglm”,

“num_attention_heads”: 32,

“num_layers”: 28,

“pad_token_id”: 3,

“position_encoding_2d”: true,

“pre_seq_len”: null,

“prefix_projection”: false,

“quantization_bit”: 0,

“torch_dtype”: “float16”,

“transformers_version”: “4.28.0”,

“use_cache”: true,

“vocab_size”: 130528

}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
```
[WARNING|tokenization_auto.py:675] 2023-04-14 19:47:58,683 >> Explicitly passing a revision is encouraged when loading a model with curevision.
[INFO|tokenization_utils_base.py:1807] 2023-04-14 19:47:58,692 >> loading file ice_text.model
[INFO|tokenization_utils_base.py:1807] 2023-04-14 19:47:58,692 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1807] 2023-04-14 19:47:58,692 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1807] 2023-04-14 19:47:58,692 >> loading file tokenizer_config.json
[WARNING|auto_factory.py:456] 2023-04-14 19:47:59,089 >> Explicitly passing a revision is encouraged when loading a model with custom ion.
[INFO|modeling_utils.py:2531] 2023-04-14 19:47:59,115 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.jso
[INFO|configuration_utils.py:575] 2023-04-14 19:47:59,117 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████
[INFO|modeling_utils.py:3190] 2023-04-14 19:48:08,508 >> All model checkpoint weights were used when initializing ChatGLMForConditionalG

[WARNING|modeling_utils.py:3192] 2023-04-14 19:48:08,508 >> Some weights of ChatGLMForConditionalGeneration were not initialized from thtialized: [‘transformer.prefix_encoder.embedding.weight’]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|modeling_utils.py:2839] 2023-04-14 19:48:08,548 >> Generation config file not found, using a generation config created from the mo
Quantized to 4 bit
input_ids [5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 15388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65564219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 6 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
inputs 类型#裤版型#宽松风格#性感图案#线条裤型#阔腿裤宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长适贴身体验感棒棒哒。系带部分增加设计看点,还
label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 741-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100
labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自
/home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/lib/python3.10/site-packages/transformers/optimization.py:391: FutureWain a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warn
warnings.warn(
0%| 04/14/2023 19:51:19 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkp
{‘loss’: 6.0246, ‘learning_rate’: 0.016428571428571428, ‘epoch’: 0.18}
{‘loss’: 7.8721, ‘learning_rate’: 0.012857142857142859, ‘epoch’: 0.36}
{‘loss’: 8.2653, ‘learning_rate’: 0.009285714285714286, ‘epoch’: 0.54}
{‘loss’: 8.6636, ‘learning_rate’: 0.005714285714285714, ‘epoch’: 0.71}
{‘loss’: 8.5985, ‘learning_rate’: 0.002142857142857143, ‘epoch’: 0.89}
{‘train_runtime’: 4868.4062, ‘train_samples_per_second’: 23.539, ‘train_steps_per_second’: 0.012, ‘train_loss’: 7.956800188337054, 'epoc
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
***** train metrics *****
epoch = 1.0
train_loss = 7.9568
train_runtime = 1:21:08.40
train_samples = 114599
train_samples_per_second = 23.539
train_steps_per_second = 0.012

显存占用：
```
Sun Apr 16 19:53:00 2023

±----------------------------------------------------------------------------+

| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |

|-------------------------------±---------------------±---------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                               |                      |               MIG M. |

|=++==============|

|   0  NVIDIA A800 80G…  Off  | 00000000:34:00.0 Off |                    0 |

| N/A   71C    P0   281W / 300W |  63275MiB / 81920MiB |     92%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+
1
2
3
4
5
6
7
8
9
10
11
12
```
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 20126 C python3 63273MiB |
±----------------------------------------------------------------------------+

输出文件：
```
> ls -al  /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2

total 12

drwxrwxr-x 2 guodong.li guodong.li   98 Apr 14 21:12 .

drwxrwxr-x 8 guodong.li guodong.li  177 Apr 14 17:12 …

-rw-rw-r-- 1 guodong.li guodong.li  195 Apr 14 21:12 all_results.json

-rw-rw-r-- 1 guodong.li guodong.li 1185 Apr 14 21:12 trainer_state.json

-rw-rw-r-- 1 guodong.li guodong.li  195 Apr 14 21:12 train_results.json

1
2
3
4
5
6
7
```
可以看到，通过调整batch_size，显存使用及利用率都提升上去了。

如果需要使用DeepSpeed进行数据并行，可参考如下命令：
```
PRE_SEQ_LEN=128

LR=2e-2
1
2
```
deepspeed --include localhost:1,2,3 --master_port 29001 main.py
–deepspeed deepspeed.json
–do_train
–train_file /data/nfs/llm/data/AdvertiseGen/train.json
–validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
–prompt_column content
–response_column summary
–overwrite_cache
–model_name_or_path /data/nfs/llm/model/chatglm-6b
–output_dir /home/guodong.li/output/adgen-chatglm-6b-pt
–overwrite_output_dir
–max_source_length 64
–max_target_length 64
–per_device_train_batch_size 128
–per_device_eval_batch_size 8
–gradient_accumulation_steps 16
–predict_with_generate
–num_train_epochs 10
–logging_steps 10
–save_steps 100
–learning_rate $LR
–pre_seq_len $PRE_SEQ_LEN

模型评估

修改evaluate.sh文件，修改model_name_or_path（模型路径），ptuning_checkpoint（P-Tuning v2微调之后的权重路径）等参数：
```
PRE_SEQ_LEN=128

CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2

STEP=3000
1
2
3
```
PRE_SEQ_LEN=128
CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2
STEP=3000

CUDA_VISIBLE_DEVICES=1 python3 main.py
–do_predict
–validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
–test_file /data/nfs/llm/data/AdvertiseGen/dev.json
–overwrite_cache
–prompt_column content
–response_column summary
–model_name_or_path /data/nfs/llm/model/chatglm-6b
–ptuning_checkpoint /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500
–output_dir /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500
–overwrite_output_dir
–max_source_length 64
–max_target_length 64
–per_device_eval_batch_size 1
–predict_with_generate
–pre_seq_len $PRE_SEQ_LEN
–quantization_bit 4

运行过程：
```
sh evaluate.sh

04/16/2023 20:18:01 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False

04/16/2023 20:18:01 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(

_n_gpu=1,

adafactor=False,

adam_beta1=0.9,

adam_beta2=0.999,

adam_epsilon=1e-08,

auto_find_batch_size=False,

…

fp16=False,

fp16_backend=auto,

fp16_full_eval=False,

fp16_opt_level=O1,

fsdp=[],

fsdp_config={‘fsdp_min_num_params’: 0, ‘xla’: False, ‘xla_fsdp_grad_ckpt’: False},

fsdp_min_num_params=0,

fsdp_transformer_layer_cls_to_wrap=None,

full_determinism=False,

generation_config=None,

…

warmup_ratio=0.0,

warmup_steps=0,

weight_decay=0.0,

xpu_backend=None,

)

Downloading and preparing dataset json/default to /home/guodong.li/.cache/huggingface/datasets/json/default-df42438b5ccb0b44/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e…

Downloading data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3419.73it/s]

Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 196.48it/s]

Dataset json downloaded and prepared to /home/guodong.li/.cache/huggingface/datasets/json/default-df42438b5ccb0b44/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e. Subsequent calls will reuse this data.

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 326.85it/s]

[INFO|configuration_utils.py:666] 2023-04-16 20:19:21,784 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[WARNING|configuration_auto.py:925] 2023-04-16 20:19:21,785 >> Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

[INFO|configuration_utils.py:666] 2023-04-16 20:19:21,792 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[INFO|configuration_utils.py:720] 2023-04-16 20:19:21,795 >> Model config ChatGLMConfig {

“_name_or_path”: “/data/nfs/llm/model/chatglm-6b”,

“architectures”: [

“ChatGLMModel”

],

“auto_map”: {

“AutoConfig”: “configuration_chatglm.ChatGLMConfig”,

“AutoModel”: “modeling_chatglm.ChatGLMForConditionalGeneration”,

“AutoModelForSeq2SeqLM”: “modeling_chatglm.ChatGLMForConditionalGeneration”

},

“bos_token_id”: 130004,

“eos_token_id”: 130005,

“gmask_token_id”: 130001,

“hidden_size”: 4096,

“inner_hidden_size”: 16384,

“layernorm_epsilon”: 1e-05,

“mask_token_id”: 130000,

“max_sequence_length”: 2048,

“model_type”: “chatglm”,

“num_attention_heads”: 32,

“num_layers”: 28,

“pad_token_id”: 3,

“position_encoding_2d”: true,

“pre_seq_len”: null,

“prefix_projection”: false,

“quantization_bit”: 0,

“torch_dtype”: “float16”,

“transformers_version”: “4.28.0”,

“use_cache”: true,

“vocab_size”: 130528

}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
```
[WARNING|tokenization_auto.py:675] 2023-04-16 20:19:21,797 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|tokenization_utils_base.py:1807] 2023-04-16 20:19:21,805 >> loading file ice_text.model
[INFO|tokenization_utils_base.py:1807] 2023-04-16 20:19:21,805 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1807] 2023-04-16 20:19:21,805 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1807] 2023-04-16 20:19:21,805 >> loading file tokenizer_config.json
[WARNING|auto_factory.py:456] 2023-04-16 20:19:22,186 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|modeling_utils.py:2531] 2023-04-16 20:19:22,222 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.json
[INFO|configuration_utils.py:575] 2023-04-16 20:19:22,224 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:08<00:00, 1.04s/it]
[INFO|modeling_utils.py:3190] 2023-04-16 20:19:30,912 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[WARNING|modeling_utils.py:3192] 2023-04-16 20:19:30,912 >> Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at /data/nfs/llm/model/chatglm-6b and are newly initialized: [‘transformer.prefix_encoder.embedding.weight’]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|modeling_utils.py:2839] 2023-04-16 20:19:30,967 >> Generation config file not found, using a generation config created from the model config.
Quantized to 4 bit
input_ids [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 65421, 61, 75898, 32, 68554, 61, 77257, 64555, 32, 65107, 61, 66268, 32, 65347, 61, 71689, 32, 69768, 61, 85428, 32, 65173, 73942, 61, 70984, 32, 65173, 70936, 61, 64703, 65509, 130001, 130004]
inputs 类型#上衣材质#牛仔布颜色#白色风格#简约图案#刺绣衣样式#外套衣款式#破洞
label_ids [5, 71689, 66561, 67061, 77257, 70984, 6, 72194, 65173, 64290, 64622, 81549, 63823, 65173, 64290, 83343, 63832, 63912, 65209, 64703, 65509, 64051, 6, 69418, 78598, 87019, 6, 64257, 71319, 66069, 74197, 63823, 65173, 72265, 64880, 64131, 63832, 73416, 85428, 66261, 6, 65594, 87834, 6, 73412, 105145, 65388, 63823, 130001, 130004]
labels 简约而不简单的牛仔外套,白色的衣身十分百搭。衣身多处有做旧破洞设计,打破单调乏味,增加一丝造型看点。衣身后背处有趣味刺绣装饰,丰富层次感,彰显别样时尚。
04/16/2023 20:21:30 - INFO - main - *** Predict ***
[INFO|configuration_utils.py:575] 2023-04-16 20:21:30,090 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

0%| | 0/1070 [00:00> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

0%|▎ | 2/1070 [00:02<25:39, 1.44s/it][INFO|configuration_utils.py:575] 2023-04-16 20:21:37,311 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

0%|▍ | 3/1070
…
1%|█▎ | 8/1070 [00:20<50:13, 2.84s/it][INFO|configuration_utils.py:575] 2023-04-16 20:21:55,233 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

1%|█▍ | 9/1070 [00:23<50:24, 2.85s/it][INFO|configuration_utils.py:575] 2023-04-16 20:21:58,112 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

1%|█▌ | 10/1070 [00:26<50:30, 2.86s/it][INFO|configuration_utils.py:575] 2023-04-16 20:22:00,990 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

1%|█▋ | 11/1070 [00:29<50:37, 2.87s/it][INFO|configuration_utils.py:575] 2023-04-16 20:22:03,880 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

1%|█▊ | 12/1070 [00:32<50:38, 2.87s/it][INFO|configuration_utils.py:575] 2023-04-16 20:22:06,761 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}
…
[INFO|configuration_utils.py:575] 2023-04-16 21:13:16,240 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1069/1070 [51:44<00:02, 2.92s/it][INFO|configuration_utils.py:575] 2023-04-16 21:13:19,107 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1070/1070 [51:47<00:00, 2.90s/it]Building prefix dict from the default dictionary …
04/16/2023 21:13:22 - DEBUG - jieba - Building prefix dict from the default dictionary …
Dumping model to file cache /tmp/jieba.cache
04/16/2023 21:13:22 - DEBUG - jieba - Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.634 seconds.
04/16/2023 21:13:22 - DEBUG - jieba - Loading model cost 0.634 seconds.
Prefix dict has been built successfully.
04/16/2023 21:13:22 - DEBUG - jieba - Prefix dict has been built successfully.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1070/1070 [51:53<00:00, 2.91s/it]
***** predict metrics *****
predict_bleu-4 = 0.7846
predict_rouge-1 = 8.8941
predict_rouge-2 = 1.3703
predict_rouge-l = 16.4982
predict_runtime = 0:51:57.77
predict_samples = 1070
predict_samples_per_second = 0.343
predict_steps_per_second = 0.343

模型推理

新增inference.py文件：
```
import os

import torch

from transformers import AutoConfig, AutoModel, AutoTokenizer
1
2
3
```
MODEL_PATH = “/data/nfs/llm/model/chatglm-6b”
CHECKPOINT_PATH = “/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500”

载入Tokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)

config = AutoConfig.from_pretrained(MODEL_PATH, trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained(MODEL_PATH, config=config, trust_remote_code=True).cuda()

prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, “pytorch_model.bin”))
new_prefix_state_dict = {}

for k, v in prefix_state_dict.items():
if k.startswith(“transformer.prefix_encoder.”):
new_prefix_state_dict[k[len(“transformer.prefix_encoder.”):]] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

print(f"Quantized to 4 bit")
model = model.quantize(4)
model = model.half().cuda()
model.transformer.prefix_encoder.float()
model = model.eval()

print(“用户：你好\n”)
response, history = model.chat(tokenizer, “你好”, history=[])
print(“ChatGLM-6B：\n”,response)
print(“\n------------------------------------------------\n用户：”)

line = input()
while line:
response, history = model.chat(tokenizer, line, history=history)
print(“ChatGLM-6B：\n”, response)
print(“\n------------------------------------------------\n用户：”)
line = input()

运行命令：
```
CUDA_VISIBLE_DEVICES=0 python3 inference.py

1
```
结语

上面使用了DeepSpeed DP+ZeRO对ChatGLM-6B进行全参数微调，同时，当我们遇到GPU资源不足的情况下，可以利用P-Tuning v2进行了高效参数微调。

参考文档：
- P-Tuning v2
相关阅读:
ThreeJS 第二篇：顶点概念、几何体结构
 STM32中断和外部中断
 目标检测YOLO实战应用案例100讲-雾天场景下低能见度图像目标检测（下）
【Linux】学习记录_17_网络编程
 Cocos2dx-lua ScrollView[一]基础篇
 【抽代复习笔记】15-群（九）：凯莱定理
 DOM系列之创建元素
 【Redis进阶】Redis单线程模型和多线程模型
 从0开始编写SD卡底层驱动代码(适用于任何单片机的通用代码)
双向链表的知识点+例题
原文地址：https://blog.csdn.net/luoganttcc/article/details/133691494

ChatGLM-6B简介

P-Tuning v2简介

环境搭建

数据准备

使用DeepSpeed DP+Zero对ChatGLM-6B进行全参数微调

使用P-Tuning v2对ChatGLM-6B进行参数高效微调

模型评估

模型推理

载入Tokenizer

结语