• 使用DeepSpeed/P-Tuning v2对ChatGLM-6B进行微调


    link

    之前尝试了基于ChatGLM-6B使用LoRA进行参数高效微调,本文给大家分享使用DeepSpeed和P-Tuning v2对ChatGLM-6B进行微调,相关代码放置在GitHub上面:llm-action

    ChatGLM-6B简介

    ChatGLM-6B相关的简介请查看之前的文章,这里不再赘述。

    P-Tuning v2简介

    P-Tuning是一种较新的模型微调方法,它采用了参数剪枝的技术,可以将微调的参数量减少到原来的0.1%。具体来说,P-Tuning v2是基于P-Tuning v1的升级版,主要的改进在于采用了更加高效的剪枝方法,可以进一步减少模型微调的参数量。

    P-Tuning v2的原理是通过对已训练好的大型语言模型进行参数剪枝,得到一个更加小巧、效率更高的轻量级模型。具体地,P-Tuning v2首先使用一种自适应的剪枝策略,对大型语言模型中的参数进行裁剪,去除其中不必要的冗余参数。然后,对于被剪枝的参数,P-Tuning v2使用了一种特殊的压缩方法,能够更加有效地压缩参数大小,并显著减少模型微调的总参数量。

    总的来说,P-Tuning v2的核心思想是让模型变得更加轻便、更加高效,同时尽可能地保持模型的性能不受影响。这不仅可以加快模型的训练和推理速度,还可以减少模型在使用过程中的内存和计算资源消耗,让模型更适用于各种实际应用场景中。

    环境搭建

    基础环境配置如下:

    • 操作系统: Ubuntu 18.04
    • CPUs: 单个节点具有 1TB 内存的 Intel CPU,物理CPU个数为64,每颗CPU核数为16
    • GPUs: 8 卡 A800 80GB GPUs
    • Python: 3.10 (需要先升级OpenSSL到1.1.1t版本(点击下载OpenSSL),然后再编译安装Python),点击下载Python
    • NVIDIA驱动程序版本: 515.65.01,根据不同型号选择不同的驱动程序,点击下载
    • CUDA工具包: 11.7,点击下载
    • NCCL: nccl_2.14.3-1+cuda11.7,点击下载
    • cuDNN: 8.8.1.3_cuda11,点击下载

    上面的NVIDIA驱动、CUDA、Python等工具的安装就不一一赘述了。

    创建虚拟环境并激活虚拟环境chatglm-ptuningv2-venv-py310-cu117:

    cd /home/guodong.li/virtual-venv
    virtualenv -p /usr/bin/python3.10 chatglm-ptuningv2-venv-py310-cu117
    source /home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/bin/activate
    
    • 1
    • 2
    • 3

    离线安装PyTorch,点击下载对应cuda版本的torch和torchvision即可。

    pip install torch-1.13.1+cu117-cp310-cp310-linux_x86_64.whl
    pip install torchvision-0.14.1+cu117-cp310-cp310-linux_x86_64.whl
    
    • 1
    • 2

    安装其他依赖库。

    pip install -r requirements.txt
    
    • 1

    requirements.txt文件内容如下:

    protobuf
    transformers==4.28.0
    cpm_kernels
    gradio
    mdtex2html
    sentencepiece
    rouge_chinese
    nltk
    jieba
    datasets
    deepspeed
    accelerate
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    注意
    官方文档的transformers版本为4.27.1,chatglm加载模型时会调用transformers/dynamic_module_utils.py文件下的get_class_in_module方法,而该方法在并发情况下会存在找不到文件的问题。将transformers版本升级到4.28.0可以规避此问题。

    数据准备

    下面以 ADGEN (广告生成) 数据集为例来介绍微调的具体使用。

    ADGEN 数据集为根据输入(content)生成一段广告词(summary),具体格式如下所示:

    {
        "content": "类型#上衣*版型#宽松*版型#显瘦*图案#线条*衣样式#衬衫*衣袖型#泡泡袖*衣款式#抽绳",
        "summary": "这件衬衫的款式非常的宽松,利落的线条可以很好的隐藏身材上的小缺点,穿在身上有着很好的显瘦效果。领口装饰了一个可爱的抽绳,漂亮的绳结展现出了十足的个性,配合时尚的泡泡袖型,尽显女性甜美可爱的气息。"
    }
    
    • 1
    • 2
    • 3
    • 4

    请从官网下载 ADGEN 数据集,同通过此链接下载,并将其解压到 AdvertiseGen 目录。

    tar -zxvf AdvertiseGen.tar.gz
    
    • 1

    查看数据集大小:

    > wc -l AdvertiseGen/*
    > 1070 AdvertiseGen/dev.json
    > 114599 AdvertiseGen/train.json
    > 115669 total
    
    • 1
    • 2
    • 3
    • 4

    使用DeepSpeed DP+Zero对ChatGLM-6B进行全参数微调

    首先,我们使用DeepSpeed对ChatGLM-6B进行全参数微调。

    首先,下载源代码,为确保代码的一致性切换到对应的commitid

    git clone https://github.com/THUDM/ChatGLM-6B.git
    cd ChatGLM-6B
    git checkout 8633db1
    cd ptuning
    
    • 1
    • 2
    • 3
    • 4

    修改ds_train_finetune.sh脚本使用DeepSpeed进行全参数微调。

    LR=1e-4
    
    • 1

    MASTER_PORT=$(shuf -n 1 -i 10000-65535)

    deepspeed --num_gpus=8 --master_port M A S T E R P O R T m a i n . p y   − − d e e p s p e e d d e e p s p e e d . j s o n   − − d o t r a i n   − − t r a i n f i l e / d a t a / n f s / l l m / d a t a / A d v e r t i s e G e n / t r a i n . j s o n   − − t e s t f i l e / d a t a / n f s / l l m / d a t a / A d v e r t i s e G e n / d e v . j s o n   − − p r o m p t c o l u m n c o n t e n t   − − r e s p o n s e c o l u m n s u m m a r y   − − o v e r w r i t e c a c h e   − − m o d e l n a m e o r p a t h / d a t a / n f s / l l m / m o d e l / c h a t g l m − 6 b   − − o u t p u t d i r / h o m e / g u o d o n g . l i / o u t p u t / a d g e n − c h a t g l m − 6 b − f t − MASTER_PORT main.py \ --deepspeed deepspeed.json \ --do_train \ --train_file /data/nfs/llm/data/AdvertiseGen/train.json \ --test_file /data/nfs/llm/data/AdvertiseGen/dev.json \ --prompt_column content \ --response_column summary \ --overwrite_cache \ --model_name_or_path /data/nfs/llm/model/chatglm-6b \ --output_dir /home/guodong.li/output/adgen-chatglm-6b-ft- MASTERPORTmain.py deepspeeddeepspeed.json dotrain trainfile/data/nfs/llm/data/AdvertiseGen/train.json testfile/data/nfs/llm/data/AdvertiseGen/dev.json promptcolumncontent responsecolumnsummary overwritecache modelnameorpath/data/nfs/llm/model/chatglm6b outputdir/home/guodong.li/output/adgenchatglm6bftLR
    –overwrite_output_dir
    –max_source_length 64
    –max_target_length 64
    –per_device_train_batch_size 24
    –per_device_eval_batch_size 1
    –gradient_accumulation_steps 2
    –predict_with_generate
    –num_train_epochs 2
    –logging_steps 10
    –save_steps 300
    –learning_rate $LR
    –fp16

    运行过程:

    > sh ds_train_finetune.sh
    [2023-04-14 18:01:33,206] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
    [2023-04-14 18:01:33,417] [INFO] [runner.py:540:main] cmd = /home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=44148 --enable_each_rank_log=None main.py --deepspeed deepspeed.json --do_train --train_file /data/nfs/llm/data/AdvertiseGen/train.json --test_file /data/nfs/llm/data/AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /data/nfs/llm/model/chatglm-6b --output_dir /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4 --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 24 --per_device_eval_batch_size 1 --gradient_accumulation_steps 2 --predict_with_generate --num_train_epochs 2 --logging_steps 10 --save_steps 300 --learning_rate 1e-4 --fp16
    [2023-04-14 18:01:35,945] [INFO] [launch.py:222:main] 0 NCCL_SOCKET_IFNAME=bond0
    [2023-04-14 18:01:35,945] [INFO] [launch.py:222:main] 0 NCCL_IB_DISABLE=1
    [2023-04-14 18:01:35,945] [INFO] [launch.py:229:main] WORLD INFO DICT: {‘localhost’: [0, 1, 2, 3, 4, 5, 6, 7]}
    [2023-04-14 18:01:35,945] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=8, node_rank=0
    [2023-04-14 18:01:35,945] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(, {‘localhost’: [0, 1, 2, 3, 4, 5, 6, 7]})
    [2023-04-14 18:01:35,945] [INFO] [launch.py:247:main] dist_world_size=8
    [2023-04-14 18:01:35,945] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
    [2023-04-14 18:01:40,133] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
    04/14/2023 18:01:41 - WARNING - main - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: True

    04/14/2023 18:01:41 - WARNING - main - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: True
    04/14/2023 18:01:41 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(
    _n_gpu=1,
    adafactor=False,
    adam_beta1=0.9,
    adam_beta2=0.999,
    adam_epsilon=1e-08,
    auto_find_batch_size=False,
    bf16=False,
    bf16_full_eval=False,
    data_seed=None,
    dataloader_drop_last=False,
    dataloader_num_workers=0,
    dataloader_pin_memory=True,
    ddp_bucket_cap_mb=None,
    ddp_find_unused_parameters=None,
    ddp_timeout=1800,
    debug=[],
    deepspeed=deepspeed.json,
    disable_tqdm=False,
    do_eval=False,
    do_predict=False,
    do_train=True,
    eval_accumulation_steps=None,
    eval_delay=0,
    eval_steps=None,
    evaluation_strategy=no,
    fp16=True,
    fp16_backend=auto,
    fp16_full_eval=False,
    fp16_opt_level=O1,
    fsdp=[],
    fsdp_config={‘fsdp_min_num_params’: 0, ‘xla’: False, ‘xla_fsdp_grad_ckpt’: False},
    fsdp_min_num_params=0,
    fsdp_transformer_layer_cls_to_wrap=None,
    full_determinism=False,
    generation_config=None,
    generation_max_length=None,
    generation_num_beams=None,
    gradient_accumulation_steps=2,
    gradient_checkpointing=False,
    greater_is_better=None,
    group_by_length=False,
    half_precision_backend=auto,
    hub_model_id=None,
    hub_private_repo=False,
    hub_strategy=every_save,
    hub_token=,
    ignore_data_skip=False,
    include_inputs_for_metrics=False,
    jit_mode_eval=False,
    label_names=None,
    label_smoothing_factor=0.0,
    learning_rate=0.0001,
    length_column_name=length,
    load_best_model_at_end=False,
    local_rank=0,
    log_level=passive,
    log_level_replica=warning,
    log_on_each_node=True,
    logging_dir=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/runs/Apr14_18-01-40_ai-app-2-46,
    logging_first_step=False,
    logging_nan_inf_filter=True,
    logging_steps=10,
    logging_strategy=steps,
    lr_scheduler_type=linear,
    max_grad_norm=1.0,
    max_steps=-1,
    metric_for_best_model=None,
    mp_parameters=,
    no_cuda=False,
    num_train_epochs=2.0,
    optim=adamw_hf,
    optim_args=None,
    output_dir=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4,
    overwrite_output_dir=True,
    past_index=-1,
    per_device_eval_batch_size=1,
    per_device_train_batch_size=24,
    predict_with_generate=True,
    prediction_loss_only=False,
    push_to_hub=False,
    push_to_hub_model_id=None,
    push_to_hub_organization=None,
    push_to_hub_token=,
    ray_scope=last,
    remove_unused_columns=True,
    report_to=[],
    resume_from_checkpoint=None,
    run_name=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4,
    save_on_each_node=False,
    save_safetensors=False,
    save_steps=300,
    save_strategy=steps,
    save_total_limit=None,
    seed=42,
    sharded_ddp=[],
    skip_memory_metrics=True,
    sortish_sampler=False,
    tf32=None,
    torch_compile=False,
    torch_compile_backend=None,
    torch_compile_mode=None,
    torchdynamo=None,
    tpu_metrics_debug=False,
    tpu_num_cores=None,
    use_ipex=False,
    use_legacy_prediction_loop=False,
    use_mps_device=False,
    warmup_ratio=0.0,
    warmup_steps=0,
    weight_decay=0.0,
    xpu_backend=None,
    )
    04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 184.03it/s]
    04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
    [WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,664 >> Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
    04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
    0%| | 0/2 [00:00> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 240.57it/s]
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 197.48it/s]
    [INFO|configuration_utils.py:666] 2023-04-14 18:03:01,678 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json
    [WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,678 >> Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
    [WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,679 >> Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
    [INFO|configuration_utils.py:666] 2023-04-14 18:03:01,685 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json
    04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
    [INFO|configuration_utils.py:720] 2023-04-14 18:03:01,687 >> Model config ChatGLMConfig {
    “_name_or_path”: “/data/nfs/llm/model/chatglm-6b”,
    “architectures”: [
    “ChatGLMModel”
    ],
    “auto_map”: {
    “AutoConfig”: “configuration_chatglm.ChatGLMConfig”,
    “AutoModel”: “modeling_chatglm.ChatGLMForConditionalGeneration”,
    “AutoModelForSeq2SeqLM”: “modeling_chatglm.ChatGLMForConditionalGeneration”
    },
    “bos_token_id”: 130004,
    “eos_token_id”: 130005,
    “gmask_token_id”: 130001,
    “hidden_size”: 4096,
    “inner_hidden_size”: 16384,
    “layernorm_epsilon”: 1e-05,
    “mask_token_id”: 130000,
    “max_sequence_length”: 2048,
    “model_type”: “chatglm”,
    “num_attention_heads”: 32,
    “num_layers”: 28,
    “pad_token_id”: 3,
    “position_encoding_2d”: true,
    “pre_seq_len”: null,
    “prefix_projection”: false,
    “quantization_bit”: 0,
    “torch_dtype”: “float16”,
    “transformers_version”: “4.28.0”,
    “use_cache”: true,
    “vocab_size”: 130528
    }
            • 1
            • 2
            • 3
            • 4
            • 5
            • 6
            • 7
            • 8
            • 9
            • 10
            • 11
            • 12
            • 13
            • 14
            • 15
            • 16
            • 17
            • 18
            • 19
            • 20
            • 21
            • 22
            • 23
            • 24
            • 25
            • 26
            • 27
            • 28
            • 29
            • 30
            • 31
            • 32
            • 33
            • 34
            • 35
            • 36
            • 37
            • 38
            • 39
            • 40
            • 41
            • 42
            • 43
            • 44
            • 45
            • 46
            • 47
            • 48
            • 49
            • 50
            • 51
            • 52
            • 53
            • 54
            • 55
            • 56
            • 57
            • 58
            • 59
            • 60
            • 61
            • 62
            • 63
            • 64
            • 65
            • 66
            • 67
            • 68
            • 69
            • 70
            • 71
            • 72
            • 73
            • 74
            • 75
            • 76
            • 77
            • 78
            • 79
            • 80
            • 81
            • 82
            • 83
            • 84
            • 85
            • 86
            • 87
            • 88
            • 89
            • 90
            • 91
            • 92
            • 93
            • 94
            • 95
            • 96
            • 97
            • 98
            • 99
            • 100
            • 101
            • 102
            • 103
            • 104
            • 105
            • 106
            • 107
            • 108
            • 109
            • 110
            • 111
            • 112
            • 113
            • 114
            • 115
            • 116
            • 117
            • 118
            • 119
            • 120
            • 121
            • 122
            • 123
            • 124
            • 125
            • 126
            • 127
            • 128
            • 129
            • 130
            • 131
            • 132
            • 133
            • 134
            • 135
            • 136
            • 137
            • 138
            • 139
            • 140
            • 141
            • 142
            • 143
            • 144
            • 145
            • 146
            • 147
            • 148
            • 149
            • 150
            • 151
            • 152
            • 153
            • 154
            • 155
            • 156
            • 157
            • 158
            • 159
            • 160
            • 161
            • 162
            • 163
            • 164
            • 165
            • 166
            • 167
            • 168
            • 169
            • 170
            • 171

            0%| | 0/2 [00:00> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
            [WARNING|tokenization_auto.py:675] 2023-04-14 18:03:01,689 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
            [INFO|tokenization_utils_base.py:1807] 2023-04-14 18:03:01,694 >> loading file ice_text.model
            [INFO|tokenization_utils_base.py:1807] 2023-04-14 18:03:01,694 >> loading file added_tokens.json
            [INFO|tokenization_utils_base.py:1807] 2023-04-14 18:03:01,694 >> loading file special_tokens_map.json
            [INFO|tokenization_utils_base.py:1807] 2023-04-14 18:03:01,694 >> loading file tokenizer_config.json
            100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 285.37it/s]
            [INFO|modeling_utils.py:2531] 2023-04-14 18:03:01,992 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.json
            [INFO|configuration_utils.py:575] 2023-04-14 18:03:01,993 >> Generate config GenerationConfig {
            “_from_model_config”: true,
            “bos_token_id”: 130004,
            “eos_token_id”: 130005,
            “pad_token_id”: 3,
            “transformers_version”: “4.28.0”
            }

            Loading checkpoint shards: 0%| | 0/8 [00:00> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
            [WARNING|auto_factory.py:456] 2023-04-14 18:03:02,109 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
            Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:13<00:00, 1.70s/it]
            [INFO|modeling_utils.py:3190] 2023-04-14 18:03:15,622 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

            [INFO|modeling_utils.py:3198] 2023-04-14 18:03:15,622 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /data/nfs/llm/model/chatglm-6b.
            If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
            Loading checkpoint shards: 25%|████████████████████████████████████ | 2/8 [00:13<00:40, 6.73s/it][INFO|modeling_utils.py:2839] 2023-04-14 18:03:15,703 >> Generation config file not found, using a generation config created from the model config.

            Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:34<00:00, 4.32s/it]
            input_ids [5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
            inputs 类型#裤版型#宽松风格#性感图案#线条裤型#阔腿裤 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还

            label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]
            labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还
            [2023-04-14 18:06:30,469] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
            [2023-04-14 18:06:30,470] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no ‘params’ in the client Optimizer
            [2023-04-14 18:06:30,470] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
            [2023-04-14 18:06:30,483] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
            [2023-04-14 18:06:30,484] [INFO] [utils.py:51:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=
            [2023-04-14 18:06:30,484] [WARNING] [engine.py:1118:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
            [2023-04-14 18:06:30,484] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
            [2023-04-14 18:06:30,484] [INFO] [stage_1_and_2.py:133:init] Reduce bucket size 500000000
            [2023-04-14 18:06:30,484] [INFO] [stage_1_and_2.py:134:init] Allgather bucket size 500000000
            [2023-04-14 18:06:30,484] [INFO] [stage_1_and_2.py:135:init] CPU Offload: False
            [2023-04-14 18:06:30,484] [INFO] [stage_1_and_2.py:136:init] Round robin gradient partitioning: False
            Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
            Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
            Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
            Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
            Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja…
            Building extension module utils…
            Allowing ninja to set a default number of workers… (overridable by setting the environment variable MAX_JOBS=N)
            Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
            Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
            ninja: no work to do.
            Loading extension module utils…
            Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
            Time to load utils op: 0.10171675682067871 seconds
            Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
            Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja…
            Building extension module utils…
            Allowing ninja to set a default number of workers… (overridable by setting the environment variable MAX_JOBS=N)
            ninja: no work to do.
            Loading extension module utils…
            Time to load utils op: 0.18768668174743652 seconds

            Loading extension module utils…
            Time to load utils op: 0.3021426200866699 seconds
            Rank: 2 partition count [8, 8] and sizes[(771473408, False), (187392, False)]

            Rank: 4 partition count [8, 8] and sizes[(771473408, False), (187392, False)]
            Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
            No modifications detected for re-loaded extension module utils, skipping build step…
            Loading extension module utils…
            Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
            Time to load utils op: 0.0005774497985839844 seconds

            No modifications detected for re-loaded extension module utils, skipping build step…
            Loading extension module utils…
            Time to load utils op: 0.0011382102966308594 seconds
            [2023-04-14 18:06:48,321] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states
            [2023-04-14 18:06:48,321] [INFO] [utils.py:786:see_memory_usage] MA 14.37 GB Max_MA 14.37 GB CA 14.39 GB Max_CA 14 GB
            [2023-04-14 18:06:48,322] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 50.56 GB, percent = 5.0%
            04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False

            04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False
            [2023-04-14 18:06:48,431] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states
            [2023-04-14 18:06:48,434] [INFO] [utils.py:786:see_memory_usage] MA 20.12 GB Max_MA 25.87 GB CA 25.9 GB Max_CA 26 GB
            [2023-04-14 18:06:48,435] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 50.84 GB, percent = 5.0%
            [2023-04-14 18:06:48,435] [INFO] [stage_1_and_2.py:489:init] optimizer state initialized
            [2023-04-14 18:06:48,512] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer
            [2023-04-14 18:06:48,513] [INFO] [utils.py:786:see_memory_usage] MA 20.12 GB Max_MA 20.12 GB CA 25.9 GB Max_CA 26 GB
            [2023-04-14 18:06:48,513] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 51.29 GB, percent = 5.1%
            [2023-04-14 18:06:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW
            [2023-04-14 18:06:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
            [2023-04-14 18:06:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler =
            [2023-04-14 18:06:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0001, 0.0001], mom=[(0.9, 0.999), (0.9, 0.999)]
            [2023-04-14 18:06:48,515] [INFO] [config.py:953:print] DeepSpeedEngine configuration:
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] activation_checkpointing_config {
            “partition_activations”: false,
            “contiguous_memory_optimization”: false,
            “cpu_checkpointing”: false,
            “number_checkpoints”: null,
            “synchronize_checkpoint_boundary”: false,
            “profile”: false
            }
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] aio_config … {‘block_size’: 1048576, ‘queue_depth’: 8, ‘thread_count’: 1, ‘single_submit’: False, ‘overlap_events’: True}
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] amp_enabled … False
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] amp_params … False
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] autotuning_config … {
            “enabled”: false,
            “start_step”: null,
            “end_step”: null,
            “metric_path”: null,
            “arg_mappings”: null,
            “metric”: “throughput”,
            “model_info”: null,
            “results_dir”: “autotuning_results”,
            “exps_dir”: “autotuning_exps”,
            “overwrite”: true,
            “fast”: true,
            “start_profile_step”: 3,
            “end_profile_step”: 5,
            “tuner_type”: “gridsearch”,
            “tuner_early_stopping”: 5,
            “tuner_num_trials”: 50,
            “model_info_path”: null,
            “mp_size”: 1,
            “max_train_batch_size”: null,
            “min_train_batch_size”: 1,
            “max_train_micro_batch_size_per_gpu”: 1.024000e+03,
            “min_train_micro_batch_size_per_gpu”: 1,
            “num_tuning_micro_batch_sizes”: 3
            }
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] bfloat16_enabled … False
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] checkpoint_tag_validation_fail False
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] comms_config …
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] communication_data_type … None
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] compression_config … {‘weight_quantization’: {‘shared_parameters’: {‘enabled’: False, ‘quantizer_kernel’: False, ‘schedule_offset’: 0, ‘quantize_groups’: 1, ‘quantize_verbose’: False, ‘quantization_type’: ‘symmetric’, ‘quantize_weight_in_forward’: False, ‘rounding’: ‘nearest’, ‘fp16_mixed_quantize’: False, ‘quantize_change_ratio’: 0.001}, ‘different_groups’: {}}, ‘activation_quantization’: {‘shared_parameters’: {‘enabled’: False, ‘quantization_type’: ‘symmetric’, ‘range_calibration’: ‘dynamic’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘sparse_pruning’: {‘shared_parameters’: {‘enabled’: False, ‘method’: ‘l1’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘row_pruning’: {‘shared_parameters’: {‘enabled’: False, ‘method’: ‘l1’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘head_pruning’: {‘shared_parameters’: {‘enabled’: False, ‘method’: ‘topk’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘channel_pruning’: {‘shared_parameters’: {‘enabled’: False, ‘method’: ‘l1’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘layer_reduction’: {‘enabled’: False}}
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] curriculum_enabled_legacy … False
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] curriculum_params_legacy … False
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] data_efficiency_config … {‘enabled’: False, ‘seed’: 1234, ‘data_sampling’: {‘enabled’: False, ‘num_epochs’: 1000, ‘num_workers’: 0, ‘curriculum_learning’: {‘enabled’: False}}, ‘data_routing’: {‘enabled’: False, ‘random_ltd’: {‘enabled’: False, ‘layer_token_lr_schedule’: {‘enabled’: False}}}}
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] data_efficiency_enabled … False
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] dataloader_drop_last … False
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] disable_allgather … False
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] dump_state … False
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] dynamic_loss_scale_args … {‘init_scale’: 65536, ‘scale_window’: 1000, ‘delayed_shift’: 2, ‘min_scale’: 1}
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] eigenvalue_enabled … False
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] eigenvalue_gas_boundary_resolution 1
            [2023-04-14 18:06:48,516] [INFO] [config.py:957:print] eigenvalue_layer_name … bert.encoder.layer
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_layer_num … 0
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_max_iter … 100
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_stability … 1e-06
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_tol … 0.01
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_verbose … False
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] elasticity_enabled … False
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] flops_profiler_config … {
            “enabled”: false,
            “profile_step”: 1,
            “module_depth”: -1,
            “top_modules”: 1,
            “detailed”: true,
            “output_file”: null
            }
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] fp16_auto_cast … False
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] fp16_enabled … True
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] fp16_master_weights_and_gradients False
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] global_rank … 0
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] grad_accum_dtype … None
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] gradient_accumulation_steps … 1
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] gradient_clipping … 0.0
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] gradient_predivide_factor … 1.0
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] hybrid_engine … enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] initial_dynamic_scale … 65536
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] load_universal_checkpoint … False
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] loss_scale … 0
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] memory_breakdown … False
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] monitor_config … tensorboard=TensorBoardConfig(enabled=False, output_path=‘’, job_name=‘DeepSpeedJobName’) wandb=WandbConfig(enabled=False, group=None, team=None, project=‘deepspeed’) csv_monitor=CSVConfig(enabled=False, output_path=‘’, job_name=‘DeepSpeedJobName’) enabled=False
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] nebula_config … {
            “enabled”: false,
            “persistent_storage_path”: null,
            “persistent_time_interval”: 100,
            “num_of_version_in_retention”: 2,
            “enable_nebula_load”: true,
            “load_path”: null
            }
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] optimizer_legacy_fusion … False
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] optimizer_name … None
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] optimizer_params … None
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] pipeline … {‘stages’: ‘auto’, ‘partition’: ‘best’, ‘seed_layers’: False, ‘activation_checkpoint_interval’: 0}
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] pld_enabled … False
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] pld_params … False
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] prescale_gradients … False
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] scheduler_name … None
            [2023-04-14 18:06:48,517] [INFO] [config.py:957:print] scheduler_params … None
            [2023-04-14 18:06:48,518] [INFO] [config.py:957:print] sparse_attention … None
            [2023-04-14 18:06:48,518] [INFO] [config.py:957:print] sparse_gradients_enabled … False
            [2023-04-14 18:06:48,518] [INFO] [config.py:957:print] steps_per_print … 10
            [2023-04-14 18:06:48,518] [INFO] [config.py:957:print] train_batch_size … 192
            [2023-04-14 18:06:48,518] [INFO] [config.py:957:print] train_micro_batch_size_per_gpu 24
            [2023-04-14 18:06:48,518] [INFO] [config.py:957:print] use_node_local_storage … False
            [2023-04-14 18:06:48,518] [INFO] [config.py:957:print] wall_clock_breakdown … False
            [2023-04-14 18:06:48,518] [INFO] [config.py:957:print] world_size … 8
            [2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_allow_untested_optimizer True
            [2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_config … stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=True
            [2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_enabled … True
            [2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_force_ds_cpu_optimizer … True
            [2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_optimization_stage … 2
            [2023-04-14 18:06:48,518] [INFO] [config.py:943:print_user_config] json = {
            “train_micro_batch_size_per_gpu”: 24,
            “zero_allow_untested_optimizer”: true,
            “fp16”: {
            “enabled”: true,
            “loss_scale”: 0,
            “initial_scale_power”: 16,
            “loss_scale_window”: 1000,
            “hysteresis”: 2,
            “min_loss_scale”: 1
            },
            “zero_optimization”: {
            “stage”: 2,
            “allgather_partitions”: true,
            “allgather_bucket_size”: 5.000000e+08,
            “overlap_comm”: false,
            “reduce_scatter”: true,
            “reduce_bucket_size”: 5.000000e+08,
            “contiguous_gradients”: true
            }
            }
            Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
            No modifications detected for re-loaded extension module utils, skipping build step…
            Loading extension module utils…
            Time to load utils op: 0.00031948089599609375 seconds
            0%| | 0/596 [00:00use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False
            [2023-04-14 18:06:53,718] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1
            [2023-04-14 18:06:55,883] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768
            0%|▎ | 1/596 [00:07<1:13:02, 7.37s/it][2023-04-14 18:06:57,948] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
            [2023-04-14 18:07:00,007] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
            0%|▌ | 2/596 [00:11<54:01, 5.46s/it][2023-04-14 18:07:06,332] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096
            1%|▊ | 3/596 [00:17<57:51, 5.85s/it][2023-04-14 18:07:08,383] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
            1%|█▏ | 4/596 [00:24<59:20, 6.01s/it][2023-04-14 18:07:18,876] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024
            [2023-04-14 18:07:18,876] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=7, lr=[9.949664429530202e-05, 9.949664429530202e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
            [2023-04-14 18:07:18,877] [INFO] [timer.py:199:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=66.98818896434254, CurrSamplesPerSec=93.79590019766518, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
            1%|█▍ | 5/596 [00:30<1:00:11, 6.11s/it]

            [2023-04-14 18:47:55,207] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=12, lr=[3.02013422818792e-06, 3.02013422818792e-06], mom=[(0.9, 0.999), (0.9, 0.999)]
            [2023-04-14 18:47:57,392] [INFO] [timer.py:199:stop] epoch=0/micro_step=590/global_step=590, RunningAvgSamplesPerSec=45.931193758598916, CurrSamplesPerSec=45.63412532914195, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
            50%|███████████████████████████████████████████████████████████████████████████████████▊ | 299/596 [41:42<41:37, 8.41s/it][2023-04-14 18:48:37,273] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=12, lr=[1.3422818791946309e-06, 1.3422818791946309e-06], mom=[(0.9, 0.999), (0.9, 0.999)]
            [2023-04-14 18:48:39,453] [INFO] [timer.py:199:stop] epoch=0/micro_step=600/global_step=600, RunningAvgSamplesPerSec=45.92850276413307, CurrSamplesPerSec=45.66031263997641, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
            {‘loss’: 13.3487, ‘learning_rate’: 1.3422818791946309e-06, ‘epoch’: 1.01}
            50%|████████████████████████████████████████████████████████████████████████████████████ | 300/596 [41:50<41:30, 8.41s/it]Saving the whole model
            [INFO|configuration_utils.py:457] 2023-04-14 18:48:39,458 >> Configuration saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/config.json
            [INFO|configuration_utils.py:362] 2023-04-14 18:48:39,459 >> Configuration saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/generation_config.json
            [INFO|modeling_utils.py:1855] 2023-04-14 18:49:03,951 >> The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/pytorch_model.bin.index.json.
            [INFO|tokenization_utils_base.py:2171] 2023-04-14 18:49:03,953 >> tokenizer config file saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/tokenizer_config.json
            [INFO|tokenization_utils_base.py:2178] 2023-04-14 18:49:03,953 >> Special tokens file saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/special_tokens_map.json
            [2023-04-14 18:49:03,983] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step600 is about to be saved!
            [2023-04-14 18:49:03,988] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt
            [2023-04-14 18:49:03,988] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt…
            [2023-04-14 18:49:15,934] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt.
            [2023-04-14 18:49:15,937] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt…
            [2023-04-14 18:49:28,049] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt.
            [2023-04-14 18:49:28,049] [INFO] [engine.py:3125:_save_zero_checkpoint] zero checkpoint saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt
            [2023-04-14 18:49:28,049] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step600 is ready now!
            51%|████████████████████████████████████████████████████████████████████████████████████▏ | 304/596 [43:14<1:05:51, 13.53s/it][2023-04-14 18:50:09,137] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=12, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
            [2023-04-14 18:50:11,316] [INFO] [timer.py:199:stop] epoch=0/micro_step=610/global_step=610, RunningAvgSamplesPerSec=45.926876625767875, CurrSamplesPerSec=45.66709917655267, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
            52%|██████████████████████████████████████████████████████████████████████████████████████▌ | 309/596 [43:56<44:16, 9.26s/it][2023-04-14 18:50:51,114] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=12, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
            [2023-04-14 18:50:53,302] [INFO] [timer.py:199:stop] epoch=0/micro_step=620/global_step=620, RunningAvgSamplesPerSec=45.92462533252217, CurrSamplesPerSec=45.55552426651123, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
            {‘loss’: 13.3202, ‘learning_rate’: 0.0, ‘epoch’: 1.04}

            99%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 589/596 [1:23:07<00:58, 8.41s/it][2023-04-14 19:30:02,654] [INFO] [logging.py:96:log_dist] [Rank 0] step=1180, skipped=12, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
            [2023-04-14 19:30:04,820] [INFO] [timer.py:199:stop] epoch=0/micro_step=1180/global_step=1180, RunningAvgSamplesPerSec=45.85904109663022, CurrSamplesPerSec=45.73521852038509, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
            {‘loss’: 13.3537, ‘learning_rate’: 0.0, ‘epoch’: 1.98}
            100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍| 594/596 [1:23:49<00:16, 8.41s/it][2023-04-14 19:30:44,847] [INFO] [logging.py:96:log_dist] [Rank 0] step=1190, skipped=12, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
            [2023-04-14 19:30:47,022] [INFO] [timer.py:199:stop] epoch=0/micro_step=1190/global_step=1190, RunningAvgSamplesPerSec=45.856487437478386, CurrSamplesPerSec=45.579988341622055, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
            {‘train_runtime’: 5046.8863, ‘train_samples_per_second’: 45.414, ‘train_steps_per_second’: 0.118, ‘train_loss’: 13.905431555421561, ‘epoch’: 2.0}
            100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 596/596 [1:24:06<00:00, 8.47s/it]
            ***** train metrics *****
            epoch = 2.0
            train_loss = 13.9054
            train_runtime = 1:24:06.88
            train_samples = 114599
            train_samples_per_second = 45.414
            train_steps_per_second = 0.118
            [2023-04-14 19:30:58,560] [INFO] [launch.py:460:main] Process 35198 exits successfully.
            [2023-04-14 19:30:58,561] [INFO] [launch.py:460:main] Process 35192 exits successfully.
            [2023-04-14 19:30:58,561] [INFO] [launch.py:460:main] Process 35193 exits successfully.
            [2023-04-14 19:30:58,561] [INFO] [launch.py:460:main] Process 35195 exits successfully.
            [2023-04-14 19:30:58,561] [INFO] [launch.py:460:main] Process 35191 exits successfully.
            [2023-04-14 19:30:59,562] [INFO] [launch.py:460:main] Process 35194 exits successfully.
            [2023-04-14 19:30:59,563] [INFO] [launch.py:460:main] Process 35197 exits successfully.
            [2023-04-14 19:31:00,564] [INFO] [launch.py:460:main] Process 35196 exits successfully.

            GPU显存占用:

            Fri Apr 14 18:27:45 2023
            ±----------------------------------------------------------------------------+
            | NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 |
            |-------------------------------±---------------------±---------------------+
            | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
            | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
            | | | MIG M. |
            |=++==============|
            | 0 NVIDIA A800 80G… Off | 00000000:34:00.0 Off | 0 |
            | N/A 59C P0 92W / 300W | 36539MiB / 81920MiB | 100% Default |
            | | | Disabled |
            ±------------------------------±---------------------±---------------------+
            | 1 NVIDIA A800 80G… Off | 00000000:35:00.0 Off | 0 |
            | N/A 61C P0 96W / 300W | 38395MiB / 81920MiB | 100% Default |
            | | | Disabled |
            ±------------------------------±---------------------±---------------------+
            | 2 NVIDIA A800 80G… Off | 00000000:36:00.0 Off | 0 |
            | N/A 63C P0 93W / 300W | 38395MiB / 81920MiB | 100% Default |
            | | | Disabled |
            ±------------------------------±---------------------±---------------------+
            | 3 NVIDIA A800 80G… Off | 00000000:37:00.0 Off | 0 |
            | N/A 65C P0 102W / 300W | 38347MiB / 81920MiB | 100% Default |
            | | | Disabled |
            ±------------------------------±---------------------±---------------------+
            | 4 NVIDIA A800 80G… Off | 00000000:9B:00.0 Off | 0 |
            | N/A 64C P0 108W / 300W | 38347MiB / 81920MiB | 100% Default |
            | | | Disabled |
            ±------------------------------±---------------------±---------------------+
            | 5 NVIDIA A800 80G… Off | 00000000:9C:00.0 Off | 0 |
            | N/A 64C P0 105W / 300W | 38395MiB / 81920MiB | 100% Default |
            | | | Disabled |
            ±------------------------------±---------------------±---------------------+
            | 6 NVIDIA A800 80G… Off | 00000000:9D:00.0 Off | 0 |
            | N/A 58C P0 97W / 300W | 36433MiB / 81920MiB | 100% Default |
            | | | Disabled |
            ±------------------------------±---------------------±---------------------+
            | 7 NVIDIA A800 80G… Off | 00000000:9E:00.0 Off | 0 |
            | N/A 59C P0 92W / 300W | 38347MiB / 81920MiB | 100% Default |
            | | | Disabled |
            ±------------------------------±---------------------±---------------------+
            • 1
            • 2
            • 3
            • 4
            • 5
            • 6
            • 7
            • 8
            • 9
            • 10
            • 11
            • 12
            • 13
            • 14
            • 15
            • 16
            • 17
            • 18
            • 19
            • 20
            • 21
            • 22
            • 23
            • 24
            • 25
            • 26
            • 27
            • 28
            • 29
            • 30
            • 31
            • 32
            • 33
            • 34
            • 35
            • 36
            • 37
            • 38
            • 39
            • 40

            ±----------------------------------------------------------------------------+
            | Processes: |
            | GPU GI CI PID Type Process name GPU Memory |
            | ID ID Usage |
            |=============================================================================|
            | 0 N/A N/A 35191 C …nv-py310-cu117/bin/python 36537MiB |
            | 1 N/A N/A 35192 C …nv-py310-cu117/bin/python 38393MiB |
            | 2 N/A N/A 35193 C …nv-py310-cu117/bin/python 38393MiB |
            | 3 N/A N/A 35194 C …nv-py310-cu117/bin/python 38345MiB |
            | 4 N/A N/A 35195 C …nv-py310-cu117/bin/python 38345MiB |
            | 5 N/A N/A 35196 C …nv-py310-cu117/bin/python 38393MiB |
            | 6 N/A N/A 35197 C …nv-py310-cu117/bin/python 36431MiB |
            | 7 N/A N/A 35198 C …nv-py310-cu117/bin/python 38345MiB |
            ±----------------------------------------------------------------------------+

            输出文件:

             tree /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4
            /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4
            ├── all_results.json
            ├── checkpoint-300
            │ ├── config.json
            │ ├── configuration_chatglm.py
            │ ├── generation_config.json
            │ ├── global_step600
            │ │ ├── mp_rank_00_model_states.pt
            │ │ ├── zero_pp_rank_0_mp_rank_00_optim_states.pt
            │ │ ├── zero_pp_rank_1_mp_rank_00_optim_states.pt
            │ │ ├── zero_pp_rank_2_mp_rank_00_optim_states.pt
            │ │ ├── zero_pp_rank_3_mp_rank_00_optim_states.pt
            │ │ ├── zero_pp_rank_4_mp_rank_00_optim_states.pt
            │ │ ├── zero_pp_rank_5_mp_rank_00_optim_states.pt
            │ │ ├── zero_pp_rank_6_mp_rank_00_optim_states.pt
            │ │ └── zero_pp_rank_7_mp_rank_00_optim_states.pt
            │ ├── ice_text.model
            │ ├── latest
            │ ├── modeling_chatglm.py
            │ ├── pytorch_model-00001-of-00002.bin
            │ ├── pytorch_model-00002-of-00002.bin
            │ ├── pytorch_model.bin.index.json
            │ ├── quantization.py
            │ ├── rng_state_0.pth
            │ ├── rng_state_1.pth
            │ ├── rng_state_2.pth
            │ ├── rng_state_3.pth
            │ ├── rng_state_4.pth
            │ ├── rng_state_5.pth
            │ ├── rng_state_6.pth
            │ ├── rng_state_7.pth
            │ ├── special_tokens_map.json
            │ ├── tokenization_chatglm.py
            │ ├── tokenizer_config.json
            │ ├── trainer_state.json
            │ ├── training_args.bin
            │ └── zero_to_fp32.py
            ├── trainer_state.json
            └── train_results.json
            • 1
            • 2
            • 3
            • 4
            • 5
            • 6
            • 7
            • 8
            • 9
            • 10
            • 11
            • 12
            • 13
            • 14
            • 15
            • 16
            • 17
            • 18
            • 19
            • 20
            • 21
            • 22
            • 23
            • 24
            • 25
            • 26
            • 27
            • 28
            • 29
            • 30
            • 31
            • 32
            • 33
            • 34
            • 35
            • 36
            • 37
            • 38
            • 39
            • 40

            2 directories, 36 files

            训练结束后没有保存模型权重,只保存了训练过程中的checkpoint,可在代码中添加trainer.save_model()进行保存。

            使用DeepSpeed进行full finetuning,对于显存要求较高,且训练较慢。因此下面尝试使用官网提供的P-Tuning v2进行高效参数微调。

            使用P-Tuning v2对ChatGLM-6B进行参数高效微调

            对于 ChatGLM-6B 模型基于 P-Tuning v2 进行微调。可将需要微调的参数量减少到原来的 0.1%,再通过模型量化、Gradient Checkpoint 等方法,最低只需要 7GB 显存即可运行。

            首先,修改train.sh脚本,主要是修改train_filevalidation_filemodel_name_or_pathoutput_dir参数:

            PRE_SEQ_LEN=128
            LR=2e-2
            • 1
            • 2

            CUDA_VISIBLE_DEVICES=0 python3 main.py
            –do_train
            –train_file /data/nfs/llm/data/AdvertiseGen/train.json
            –validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
            –prompt_column content
            –response_column summary
            –overwrite_cache
            –model_name_or_path /data/nfs/llm/model/chatglm-6b
            –output_dir /home/guodong.li/output/adgen-chatglm-6b-pt- P R E S E Q L E N − PRE_SEQ_LEN- PRESEQLENLR
            –overwrite_output_dir
            –max_source_length 64
            –max_target_length 64
            –per_device_train_batch_size 1
            –per_device_eval_batch_size 1
            –gradient_accumulation_steps 16
            –predict_with_generate
            –max_steps 3000
            –logging_steps 10
            –save_steps 1000
            –learning_rate $LR
            –pre_seq_len $PRE_SEQ_LEN
            –quantization_bit 4

            运行过程:

              0%|                  | 0/3000 [00:00
            {‘loss’: 4.2962, ‘learning_rate’: 0.0196, ‘epoch’: 0.01}
            {‘loss’: 4.3112, ‘learning_rate’: 0.019533333333333333, ‘epoch’: 0.01}
            2%|███▊ | 70/3000 [03:20<2:17:06, 2.81s/it]
            • 1
            • 2
            • 3
            • 4
            • 5

            GPU显存占用:

            |-------------------------------±---------------------±---------------------+
            | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
            | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
            | | | MIG M. |
            |=++==============|
            | 0 NVIDIA A800 80G… Off | 00000000:34:00.0 Off | 0 |
            | N/A 71C P0 300W / 300W | 6291MiB / 81920MiB | 74% Default |
            | | | Disabled |
            ±------------------------------±---------------------±---------------------+
            • 1
            • 2
            • 3
            • 4
            • 5
            • 6
            • 7
            • 8
            • 9

            对显存的占用确实低,即使用了P-Tuning v2进行参数高效微调,但训练的速度还是很慢。

            修改train.sh增大batch_size继续干。

            PRE_SEQ_LEN=128
            LR=2e-2
            • 1
            • 2

            CUDA_VISIBLE_DEVICES=0 python3 main.py
            –do_train
            –train_file /data/nfs/llm/data/AdvertiseGen/train.json
            –validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
            –prompt_column content
            –response_column summary
            –overwrite_cache
            –model_name_or_path /data/nfs/llm/model/chatglm-6b
            –output_dir /home/guodong.li/output/adgen-chatglm-6b-pt- P R E S E Q L E N − PRE_SEQ_LEN- PRESEQLENLR
            –overwrite_output_dir
            –max_source_length 64
            –max_target_length 64
            –per_device_train_batch_size 128
            –per_device_eval_batch_size 8
            –gradient_accumulation_steps 16
            –predict_with_generate
            –num_train_epochs 1
            –logging_steps 10
            –save_steps 100
            –learning_rate $LR
            –pre_seq_len $PRE_SEQ_LEN
            –quantization_bit 4

            运行过程:

            sh train.sh
            04/14/2023 19:46:38 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: Fals
            04/14/2023 19:46:38 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(
            _n_gpu=1,
            adafactor=False,
            adam_beta1=0.9,
            adam_beta2=0.999,
            adam_epsilon=1e-08,
            auto_find_batch_size=False,
            bf16=False,
            bf16_full_eval=False,
            data_seed=None,
            dataloader_drop_last=False,
            dataloader_num_workers=0,
            dataloader_pin_memory=True,
            ddp_bucket_cap_mb=None,
            ddp_find_unused_parameters=None,
            ddp_timeout=1800,
            debug=[],
            deepspeed=None,
            disable_tqdm=False,
            do_eval=False,
            do_predict=False,
            do_train=True,
            eval_accumulation_steps=None,
            eval_delay=0,
            eval_steps=None,
            evaluation_strategy=no,
            fp16=False,
            fp16_backend=auto,
            fp16_full_eval=False,
            fp16_opt_level=O1,
            fsdp=[],
            fsdp_config={‘fsdp_min_num_params’: 0, ‘xla’: False, ‘xla_fsdp_grad_ckpt’: False},
            fsdp_min_num_params=0,
            fsdp_transformer_layer_cls_to_wrap=None,
            full_determinism=False,
            generation_config=None,
            generation_max_length=None,
            generation_num_beams=None,
            gradient_accumulation_steps=16,
            gradient_checkpointing=False,
            greater_is_better=None,
            group_by_length=False,
            half_precision_backend=auto,
            hub_model_id=None,
            hub_private_repo=False,
            hub_strategy=every_save,
            hub_token=,
            ignore_data_skip=False,
            include_inputs_for_metrics=False,
            jit_mode_eval=False,
            label_names=None,
            label_smoothing_factor=0.0,
            learning_rate=0.02,
            length_column_name=length,
            load_best_model_at_end=False,
            local_rank=-1,
            log_level=passive,
            log_level_replica=warning,
            log_on_each_node=True,
            logging_dir=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/runs/Apr14_19-46-38_ai-app-2-46,
            logging_first_step=False,
            logging_nan_inf_filter=True,
            logging_steps=10,
            logging_strategy=steps,
            lr_scheduler_type=linear,
            max_grad_norm=1.0,
            max_steps=-1,
            metric_for_best_model=None,
            mp_parameters=,
            no_cuda=False,
            num_train_epochs=1.0,
            optim=adamw_hf,
            optim_args=None,
            output_dir=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2,
            overwrite_output_dir=True,
            past_index=-1,
            per_device_eval_batch_size=8,
            per_device_train_batch_size=128,
            predict_with_generate=True,
            prediction_loss_only=False,
            push_to_hub=False,
            push_to_hub_model_id=None,
            push_to_hub_organization=None,
            push_to_hub_token=,
            ray_scope=last,
            remove_unused_columns=True,
            report_to=[],
            resume_from_checkpoint=None,
            run_name=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2,
            save_on_each_node=False,
            save_safetensors=False,
            save_steps=100,
            save_strategy=steps,
            save_total_limit=None,
            seed=42,
            sharded_ddp=[],
            skip_memory_metrics=True,
            sortish_sampler=False,
            tf32=None,
            torch_compile=False,
            torch_compile_backend=None,
            torch_compile_mode=None,
            torchdynamo=None,
            tpu_metrics_debug=False,
            tpu_num_cores=None,
            use_ipex=False,
            use_legacy_prediction_loop=False,
            use_mps_device=False,
            warmup_ratio=0.0,
            warmup_steps=0,
            weight_decay=0.0,
            xpu_backend=None,
            )
            04/14/2023 19:47:58 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-1cf934bed8e233e6e)
            100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
            [INFO|configuration_utils.py:666] 2023-04-14 19:47:58,671 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json
            [WARNING|configuration_auto.py:925] 2023-04-14 19:47:58,671 >> Explicitly passing a revision is encouraged when loading a configuratio a newer revision.
            [INFO|configuration_utils.py:666] 2023-04-14 19:47:58,679 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json
            [INFO|configuration_utils.py:720] 2023-04-14 19:47:58,681 >> Model config ChatGLMConfig {
            “_name_or_path”: “/data/nfs/llm/model/chatglm-6b”,
            “architectures”: [
            “ChatGLMModel”
            ],
            “auto_map”: {
            “AutoConfig”: “configuration_chatglm.ChatGLMConfig”,
            “AutoModel”: “modeling_chatglm.ChatGLMForConditionalGeneration”,
            “AutoModelForSeq2SeqLM”: “modeling_chatglm.ChatGLMForConditionalGeneration”
            },
            “bos_token_id”: 130004,
            “eos_token_id”: 130005,
            “gmask_token_id”: 130001,
            “hidden_size”: 4096,
            “inner_hidden_size”: 16384,
            “layernorm_epsilon”: 1e-05,
            “mask_token_id”: 130000,
            “max_sequence_length”: 2048,
            “model_type”: “chatglm”,
            “num_attention_heads”: 32,
            “num_layers”: 28,
            “pad_token_id”: 3,
            “position_encoding_2d”: true,
            “pre_seq_len”: null,
            “prefix_projection”: false,
            “quantization_bit”: 0,
            “torch_dtype”: “float16”,
            “transformers_version”: “4.28.0”,
            “use_cache”: true,
            “vocab_size”: 130528
            }
              • 1
              • 2
              • 3
              • 4
              • 5
              • 6
              • 7
              • 8
              • 9
              • 10
              • 11
              • 12
              • 13
              • 14
              • 15
              • 16
              • 17
              • 18
              • 19
              • 20
              • 21
              • 22
              • 23
              • 24
              • 25
              • 26
              • 27
              • 28
              • 29
              • 30
              • 31
              • 32
              • 33
              • 34
              • 35
              • 36
              • 37
              • 38
              • 39
              • 40
              • 41
              • 42
              • 43
              • 44
              • 45
              • 46
              • 47
              • 48
              • 49
              • 50
              • 51
              • 52
              • 53
              • 54
              • 55
              • 56
              • 57
              • 58
              • 59
              • 60
              • 61
              • 62
              • 63
              • 64
              • 65
              • 66
              • 67
              • 68
              • 69
              • 70
              • 71
              • 72
              • 73
              • 74
              • 75
              • 76
              • 77
              • 78
              • 79
              • 80
              • 81
              • 82
              • 83
              • 84
              • 85
              • 86
              • 87
              • 88
              • 89
              • 90
              • 91
              • 92
              • 93
              • 94
              • 95
              • 96
              • 97
              • 98
              • 99
              • 100
              • 101
              • 102
              • 103
              • 104
              • 105
              • 106
              • 107
              • 108
              • 109
              • 110
              • 111
              • 112
              • 113
              • 114
              • 115
              • 116
              • 117
              • 118
              • 119
              • 120
              • 121
              • 122
              • 123
              • 124
              • 125
              • 126
              • 127
              • 128
              • 129
              • 130
              • 131
              • 132
              • 133
              • 134
              • 135
              • 136
              • 137
              • 138
              • 139
              • 140
              • 141
              • 142
              • 143
              • 144
              • 145
              • 146
              • 147
              • 148
              • 149
              • 150
              • 151

              [WARNING|tokenization_auto.py:675] 2023-04-14 19:47:58,683 >> Explicitly passing a revision is encouraged when loading a model with curevision.
              [INFO|tokenization_utils_base.py:1807] 2023-04-14 19:47:58,692 >> loading file ice_text.model
              [INFO|tokenization_utils_base.py:1807] 2023-04-14 19:47:58,692 >> loading file added_tokens.json
              [INFO|tokenization_utils_base.py:1807] 2023-04-14 19:47:58,692 >> loading file special_tokens_map.json
              [INFO|tokenization_utils_base.py:1807] 2023-04-14 19:47:58,692 >> loading file tokenizer_config.json
              [WARNING|auto_factory.py:456] 2023-04-14 19:47:59,089 >> Explicitly passing a revision is encouraged when loading a model with custom ion.
              [INFO|modeling_utils.py:2531] 2023-04-14 19:47:59,115 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.jso
              [INFO|configuration_utils.py:575] 2023-04-14 19:47:59,117 >> Generate config GenerationConfig {
              “_from_model_config”: true,
              “bos_token_id”: 130004,
              “eos_token_id”: 130005,
              “pad_token_id”: 3,
              “transformers_version”: “4.28.0”
              }

              Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████
              [INFO|modeling_utils.py:3190] 2023-04-14 19:48:08,508 >> All model checkpoint weights were used when initializing ChatGLMForConditionalG

              [WARNING|modeling_utils.py:3192] 2023-04-14 19:48:08,508 >> Some weights of ChatGLMForConditionalGeneration were not initialized from thtialized: [‘transformer.prefix_encoder.embedding.weight’]
              You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
              [INFO|modeling_utils.py:2839] 2023-04-14 19:48:08,548 >> Generation config file not found, using a generation config created from the mo
              Quantized to 4 bit
              input_ids [5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 15388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65564219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 6 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
              inputs 类型#裤版型#宽松风格#性感图案#线条裤型#阔腿裤 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长适贴身体验感棒棒哒。系带部分增加设计看点,还
              label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 741-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100
              labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自
              /home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/lib/python3.10/site-packages/transformers/optimization.py:391: FutureWain a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warn
              warnings.warn(
              0%| 04/14/2023 19:51:19 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkp
              {‘loss’: 6.0246, ‘learning_rate’: 0.016428571428571428, ‘epoch’: 0.18}
              {‘loss’: 7.8721, ‘learning_rate’: 0.012857142857142859, ‘epoch’: 0.36}
              {‘loss’: 8.2653, ‘learning_rate’: 0.009285714285714286, ‘epoch’: 0.54}
              {‘loss’: 8.6636, ‘learning_rate’: 0.005714285714285714, ‘epoch’: 0.71}
              {‘loss’: 8.5985, ‘learning_rate’: 0.002142857142857143, ‘epoch’: 0.89}
              {‘train_runtime’: 4868.4062, ‘train_samples_per_second’: 23.539, ‘train_steps_per_second’: 0.012, ‘train_loss’: 7.956800188337054, 'epoc
              100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
              ***** train metrics *****
              epoch = 1.0
              train_loss = 7.9568
              train_runtime = 1:21:08.40
              train_samples = 114599
              train_samples_per_second = 23.539
              train_steps_per_second = 0.012

              显存占用:

              Sun Apr 16 19:53:00 2023
              ±----------------------------------------------------------------------------+
              | NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 |
              |-------------------------------±---------------------±---------------------+
              | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
              | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
              | | | MIG M. |
              |=++==============|
              | 0 NVIDIA A800 80G… Off | 00000000:34:00.0 Off | 0 |
              | N/A 71C P0 281W / 300W | 63275MiB / 81920MiB | 92% Default |
              | | | Disabled |
              ±------------------------------±---------------------±---------------------+
              • 1
              • 2
              • 3
              • 4
              • 5
              • 6
              • 7
              • 8
              • 9
              • 10
              • 11
              • 12

              ±----------------------------------------------------------------------------+
              | Processes: |
              | GPU GI CI PID Type Process name GPU Memory |
              | ID ID Usage |
              |=============================================================================|
              | 0 N/A N/A 20126 C python3 63273MiB |
              ±----------------------------------------------------------------------------+

              输出文件:

              > ls -al  /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2
              total 12
              drwxrwxr-x 2 guodong.li guodong.li 98 Apr 14 21:12 .
              drwxrwxr-x 8 guodong.li guodong.li 177 Apr 14 17:12 …
              -rw-rw-r-- 1 guodong.li guodong.li 195 Apr 14 21:12 all_results.json
              -rw-rw-r-- 1 guodong.li guodong.li 1185 Apr 14 21:12 trainer_state.json
              -rw-rw-r-- 1 guodong.li guodong.li 195 Apr 14 21:12 train_results.json
              • 1
              • 2
              • 3
              • 4
              • 5
              • 6
              • 7

              可以看到,通过调整batch_size,显存使用及利用率都提升上去了。

              如果需要使用DeepSpeed进行数据并行,可参考如下命令:

              PRE_SEQ_LEN=128
              LR=2e-2
              • 1
              • 2

              deepspeed --include localhost:1,2,3 --master_port 29001 main.py
              –deepspeed deepspeed.json
              –do_train
              –train_file /data/nfs/llm/data/AdvertiseGen/train.json
              –validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
              –prompt_column content
              –response_column summary
              –overwrite_cache
              –model_name_or_path /data/nfs/llm/model/chatglm-6b
              –output_dir /home/guodong.li/output/adgen-chatglm-6b-pt
              –overwrite_output_dir
              –max_source_length 64
              –max_target_length 64
              –per_device_train_batch_size 128
              –per_device_eval_batch_size 8
              –gradient_accumulation_steps 16
              –predict_with_generate
              –num_train_epochs 10
              –logging_steps 10
              –save_steps 100
              –learning_rate $LR
              –pre_seq_len $PRE_SEQ_LEN

              模型评估

              修改evaluate.sh文件,修改model_name_or_path(模型路径),ptuning_checkpoint(P-Tuning v2微调之后的权重路径)等参数:

              PRE_SEQ_LEN=128
              CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2
              STEP=3000
              • 1
              • 2
              • 3

              PRE_SEQ_LEN=128
              CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2
              STEP=3000

              CUDA_VISIBLE_DEVICES=1 python3 main.py
              –do_predict
              –validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
              –test_file /data/nfs/llm/data/AdvertiseGen/dev.json
              –overwrite_cache
              –prompt_column content
              –response_column summary
              –model_name_or_path /data/nfs/llm/model/chatglm-6b
              –ptuning_checkpoint /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500
              –output_dir /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500
              –overwrite_output_dir
              –max_source_length 64
              –max_target_length 64
              –per_device_eval_batch_size 1
              –predict_with_generate
              –pre_seq_len $PRE_SEQ_LEN
              –quantization_bit 4

              运行过程:

              sh evaluate.sh
              04/16/2023 20:18:01 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
              04/16/2023 20:18:01 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(
              _n_gpu=1,
              adafactor=False,
              adam_beta1=0.9,
              adam_beta2=0.999,
              adam_epsilon=1e-08,
              auto_find_batch_size=False,

              fp16=False,
              fp16_backend=auto,
              fp16_full_eval=False,
              fp16_opt_level=O1,
              fsdp=[],
              fsdp_config={‘fsdp_min_num_params’: 0, ‘xla’: False, ‘xla_fsdp_grad_ckpt’: False},
              fsdp_min_num_params=0,
              fsdp_transformer_layer_cls_to_wrap=None,
              full_determinism=False,
              generation_config=None,

              warmup_ratio=0.0,
              warmup_steps=0,
              weight_decay=0.0,
              xpu_backend=None,
              )
              Downloading and preparing dataset json/default to /home/guodong.li/.cache/huggingface/datasets/json/default-df42438b5ccb0b44/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e…
              Downloading data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3419.73it/s]
              Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 196.48it/s]
              Dataset json downloaded and prepared to /home/guodong.li/.cache/huggingface/datasets/json/default-df42438b5ccb0b44/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e. Subsequent calls will reuse this data.
              100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 326.85it/s]
              [INFO|configuration_utils.py:666] 2023-04-16 20:19:21,784 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json
              [WARNING|configuration_auto.py:925] 2023-04-16 20:19:21,785 >> Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
              [INFO|configuration_utils.py:666] 2023-04-16 20:19:21,792 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json
              [INFO|configuration_utils.py:720] 2023-04-16 20:19:21,795 >> Model config ChatGLMConfig {
              “_name_or_path”: “/data/nfs/llm/model/chatglm-6b”,
              “architectures”: [
              “ChatGLMModel”
              ],
              “auto_map”: {
              “AutoConfig”: “configuration_chatglm.ChatGLMConfig”,
              “AutoModel”: “modeling_chatglm.ChatGLMForConditionalGeneration”,
              “AutoModelForSeq2SeqLM”: “modeling_chatglm.ChatGLMForConditionalGeneration”
              },
              “bos_token_id”: 130004,
              “eos_token_id”: 130005,
              “gmask_token_id”: 130001,
              “hidden_size”: 4096,
              “inner_hidden_size”: 16384,
              “layernorm_epsilon”: 1e-05,
              “mask_token_id”: 130000,
              “max_sequence_length”: 2048,
              “model_type”: “chatglm”,
              “num_attention_heads”: 32,
              “num_layers”: 28,
              “pad_token_id”: 3,
              “position_encoding_2d”: true,
              “pre_seq_len”: null,
              “prefix_projection”: false,
              “quantization_bit”: 0,
              “torch_dtype”: “float16”,
              “transformers_version”: “4.28.0”,
              “use_cache”: true,
              “vocab_size”: 130528
              }
                • 1
                • 2
                • 3
                • 4
                • 5
                • 6
                • 7
                • 8
                • 9
                • 10
                • 11
                • 12
                • 13
                • 14
                • 15
                • 16
                • 17
                • 18
                • 19
                • 20
                • 21
                • 22
                • 23
                • 24
                • 25
                • 26
                • 27
                • 28
                • 29
                • 30
                • 31
                • 32
                • 33
                • 34
                • 35
                • 36
                • 37
                • 38
                • 39
                • 40
                • 41
                • 42
                • 43
                • 44
                • 45
                • 46
                • 47
                • 48
                • 49
                • 50
                • 51
                • 52
                • 53
                • 54
                • 55
                • 56
                • 57
                • 58
                • 59
                • 60
                • 61
                • 62
                • 63
                • 64
                • 65

                [WARNING|tokenization_auto.py:675] 2023-04-16 20:19:21,797 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
                [INFO|tokenization_utils_base.py:1807] 2023-04-16 20:19:21,805 >> loading file ice_text.model
                [INFO|tokenization_utils_base.py:1807] 2023-04-16 20:19:21,805 >> loading file added_tokens.json
                [INFO|tokenization_utils_base.py:1807] 2023-04-16 20:19:21,805 >> loading file special_tokens_map.json
                [INFO|tokenization_utils_base.py:1807] 2023-04-16 20:19:21,805 >> loading file tokenizer_config.json
                [WARNING|auto_factory.py:456] 2023-04-16 20:19:22,186 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
                [INFO|modeling_utils.py:2531] 2023-04-16 20:19:22,222 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.json
                [INFO|configuration_utils.py:575] 2023-04-16 20:19:22,224 >> Generate config GenerationConfig {
                “_from_model_config”: true,
                “bos_token_id”: 130004,
                “eos_token_id”: 130005,
                “pad_token_id”: 3,
                “transformers_version”: “4.28.0”
                }

                Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:08<00:00, 1.04s/it]
                [INFO|modeling_utils.py:3190] 2023-04-16 20:19:30,912 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

                [WARNING|modeling_utils.py:3192] 2023-04-16 20:19:30,912 >> Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at /data/nfs/llm/model/chatglm-6b and are newly initialized: [‘transformer.prefix_encoder.embedding.weight’]
                You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
                [INFO|modeling_utils.py:2839] 2023-04-16 20:19:30,967 >> Generation config file not found, using a generation config created from the model config.
                Quantized to 4 bit
                input_ids [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 65421, 61, 75898, 32, 68554, 61, 77257, 64555, 32, 65107, 61, 66268, 32, 65347, 61, 71689, 32, 69768, 61, 85428, 32, 65173, 73942, 61, 70984, 32, 65173, 70936, 61, 64703, 65509, 130001, 130004]
                inputs 类型#上衣材质#牛仔布颜色#白色风格#简约图案#刺绣衣样式#外套衣款式#破洞
                label_ids [5, 71689, 66561, 67061, 77257, 70984, 6, 72194, 65173, 64290, 64622, 81549, 63823, 65173, 64290, 83343, 63832, 63912, 65209, 64703, 65509, 64051, 6, 69418, 78598, 87019, 6, 64257, 71319, 66069, 74197, 63823, 65173, 72265, 64880, 64131, 63832, 73416, 85428, 66261, 6, 65594, 87834, 6, 73412, 105145, 65388, 63823, 130001, 130004]
                labels 简约而不简单的牛仔外套,白色的衣身十分百搭。衣身多处有做旧破洞设计,打破单调乏味,增加一丝造型看点。衣身后背处有趣味刺绣装饰,丰富层次感,彰显别样时尚。
                04/16/2023 20:21:30 - INFO - main - *** Predict ***
                [INFO|configuration_utils.py:575] 2023-04-16 20:21:30,090 >> Generate config GenerationConfig {
                “_from_model_config”: true,
                “bos_token_id”: 130004,
                “eos_token_id”: 130005,
                “pad_token_id”: 3,
                “transformers_version”: “4.28.0”
                }

                0%| | 0/1070 [00:00> Generate config GenerationConfig {
                “_from_model_config”: true,
                “bos_token_id”: 130004,
                “eos_token_id”: 130005,
                “pad_token_id”: 3,
                “transformers_version”: “4.28.0”
                }

                0%|▎ | 2/1070 [00:02<25:39, 1.44s/it][INFO|configuration_utils.py:575] 2023-04-16 20:21:37,311 >> Generate config GenerationConfig {
                “_from_model_config”: true,
                “bos_token_id”: 130004,
                “eos_token_id”: 130005,
                “pad_token_id”: 3,
                “transformers_version”: “4.28.0”
                }

                0%|▍ | 3/1070

                1%|█▎ | 8/1070 [00:20<50:13, 2.84s/it][INFO|configuration_utils.py:575] 2023-04-16 20:21:55,233 >> Generate config GenerationConfig {
                “_from_model_config”: true,
                “bos_token_id”: 130004,
                “eos_token_id”: 130005,
                “pad_token_id”: 3,
                “transformers_version”: “4.28.0”
                }

                1%|█▍ | 9/1070 [00:23<50:24, 2.85s/it][INFO|configuration_utils.py:575] 2023-04-16 20:21:58,112 >> Generate config GenerationConfig {
                “_from_model_config”: true,
                “bos_token_id”: 130004,
                “eos_token_id”: 130005,
                “pad_token_id”: 3,
                “transformers_version”: “4.28.0”
                }

                1%|█▌ | 10/1070 [00:26<50:30, 2.86s/it][INFO|configuration_utils.py:575] 2023-04-16 20:22:00,990 >> Generate config GenerationConfig {
                “_from_model_config”: true,
                “bos_token_id”: 130004,
                “eos_token_id”: 130005,
                “pad_token_id”: 3,
                “transformers_version”: “4.28.0”
                }

                1%|█▋ | 11/1070 [00:29<50:37, 2.87s/it][INFO|configuration_utils.py:575] 2023-04-16 20:22:03,880 >> Generate config GenerationConfig {
                “_from_model_config”: true,
                “bos_token_id”: 130004,
                “eos_token_id”: 130005,
                “pad_token_id”: 3,
                “transformers_version”: “4.28.0”
                }

                1%|█▊ | 12/1070 [00:32<50:38, 2.87s/it][INFO|configuration_utils.py:575] 2023-04-16 20:22:06,761 >> Generate config GenerationConfig {
                “_from_model_config”: true,
                “bos_token_id”: 130004,
                “eos_token_id”: 130005,
                “pad_token_id”: 3,
                “transformers_version”: “4.28.0”
                }

                [INFO|configuration_utils.py:575] 2023-04-16 21:13:16,240 >> Generate config GenerationConfig {
                “_from_model_config”: true,
                “bos_token_id”: 130004,
                “eos_token_id”: 130005,
                “pad_token_id”: 3,
                “transformers_version”: “4.28.0”
                }

                100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1069/1070 [51:44<00:02, 2.92s/it][INFO|configuration_utils.py:575] 2023-04-16 21:13:19,107 >> Generate config GenerationConfig {
                “_from_model_config”: true,
                “bos_token_id”: 130004,
                “eos_token_id”: 130005,
                “pad_token_id”: 3,
                “transformers_version”: “4.28.0”
                }

                100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1070/1070 [51:47<00:00, 2.90s/it]Building prefix dict from the default dictionary …
                04/16/2023 21:13:22 - DEBUG - jieba - Building prefix dict from the default dictionary …
                Dumping model to file cache /tmp/jieba.cache
                04/16/2023 21:13:22 - DEBUG - jieba - Dumping model to file cache /tmp/jieba.cache
                Loading model cost 0.634 seconds.
                04/16/2023 21:13:22 - DEBUG - jieba - Loading model cost 0.634 seconds.
                Prefix dict has been built successfully.
                04/16/2023 21:13:22 - DEBUG - jieba - Prefix dict has been built successfully.
                100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1070/1070 [51:53<00:00, 2.91s/it]
                ***** predict metrics *****
                predict_bleu-4 = 0.7846
                predict_rouge-1 = 8.8941
                predict_rouge-2 = 1.3703
                predict_rouge-l = 16.4982
                predict_runtime = 0:51:57.77
                predict_samples = 1070
                predict_samples_per_second = 0.343
                predict_steps_per_second = 0.343

                模型推理

                新增inference.py文件:

                import os
                import torch
                from transformers import AutoConfig, AutoModel, AutoTokenizer
                • 1
                • 2
                • 3

                MODEL_PATH = “/data/nfs/llm/model/chatglm-6b”
                CHECKPOINT_PATH = “/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500”

                载入Tokenizer

                tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)

                config = AutoConfig.from_pretrained(MODEL_PATH, trust_remote_code=True, pre_seq_len=128)
                model = AutoModel.from_pretrained(MODEL_PATH, config=config, trust_remote_code=True).cuda()

                prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, “pytorch_model.bin”))
                new_prefix_state_dict = {}

                for k, v in prefix_state_dict.items():
                if k.startswith(“transformer.prefix_encoder.”):
                new_prefix_state_dict[k[len(“transformer.prefix_encoder.”):]] = v
                model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

                print(f"Quantized to 4 bit")
                model = model.quantize(4)
                model = model.half().cuda()
                model.transformer.prefix_encoder.float()
                model = model.eval()

                print(“用户:你好\n”)
                response, history = model.chat(tokenizer, “你好”, history=[])
                print(“ChatGLM-6B:\n”,response)
                print(“\n------------------------------------------------\n用户:”)

                line = input()
                while line:
                response, history = model.chat(tokenizer, line, history=history)
                print(“ChatGLM-6B:\n”, response)
                print(“\n------------------------------------------------\n用户:”)
                line = input()

                运行命令:

                CUDA_VISIBLE_DEVICES=0 python3 inference.py
                • 1

                结语

                上面使用了DeepSpeed DP+ZeRO对ChatGLM-6B进行全参数微调,同时,当我们遇到GPU资源不足的情况下,可以利用P-Tuning v2进行了高效参数微调。

                参考文档

              • 相关阅读:
                ThreeJS 第二篇:顶点概念、几何体结构
                STM32中断和外部中断
                目标检测YOLO实战应用案例100讲-雾天场景下低能见度图像 目标检测(下)
                【Linux】学习记录_17_网络编程
                Cocos2dx-lua ScrollView[一]基础篇
                【抽代复习笔记】15-群(九):凯莱定理
                DOM系列之创建元素
                【Redis进阶】Redis单线程模型和多线程模型
                从0开始编写SD卡底层驱动代码(适用于任何单片机的通用代码)
                双向链表的知识点+例题
              • 原文地址:https://blog.csdn.net/luoganttcc/article/details/133691494