• ChatGLM2-6B微调实践


    环境准备

    申请阿里云GPU服务器

    • CentOS 7.6 64
    • Anaconda3-2023.07-1-Linux-x86_64
    • Python 3.11.5
    • GPU NVIDIA A10(显存24 G/1 core)
    • CPU 8 vCore/30G

    在这里插入图片描述

    安装部署

    1、安装 Anaconda
    wget https://repo.anaconda.com/archive/Anaconda3-2023.07-1-Linux-x86_64.sh
    sh Anaconda3-2023.07-1-Linux-x86_64.sh
    
    • 1
    • 2

    根据提示一路安装即可。

    2、安装CUDA
    wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run
    sh cuda_11.2.0_460.27.04_linux.run
    
    • 1
    • 2

    根据提示安装即可

    3、安装PyTorch
    conda install pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia
    
    • 1

    如提示找不到conda命令,需配置Anaconda环境变量

    4、安装 ChatGLM2-6B
    mkdir ChatGLM
    cd ChatGLM
    git clone https://github.com/THUDM/ChatGLM2-6B.git
    cd ChatGLM2-6B
    pip install -r requirements.txt
    
    • 1
    • 2
    • 3
    • 4
    • 5

    加载模型,需要从网上下载模型的7个分片文件,总共大约10几个G大小,可提前下载。

    模型下载地址:https://huggingface.co/THUDM/chatglm2-6b/tree/main

    微调实践

    1、准备数据集

    准备我们自己的数据集,分别生成训练文件和测试文件这两个文件,放在目录 ChatGLM2-6B/ptuning/myDataset/ 下面。

    训练集文件: train.json
    测试集文件: dev.json
    在这里插入图片描述

    2、安装python依赖

    后面微调训练,需要依赖一些 Python 模块,提前安装一下:

    conda install rouge_chinese nltk jieba datasets
    
    • 1
    3、微调并训练新模型

    修改 train.sh 脚本文件,根据自己实际情况配置即可,修改后的配置为:

    PRE_SEQ_LEN=128
    LR=2e-2
    NUM_GPUS=1
     
    torchrun --standalone --nnodes=1 --nproc-per-node=$NUM_GPUS main.py \
        --do_train \
        --train_file myDataset/train.json \
        --validation_file myDataset/dev.json \
        --preprocessing_num_workers 6 \
        --prompt_column content \
        --response_column summary \
        --overwrite_cache \
        --model_name_or_path /root/ChatGLM/ChatGLM2-6B-main/zhbr/chatglm2-6b \
        --output_dir output/zhbr-chatglm2-6b-checkpoint \
        --overwrite_output_dir \
        --max_source_length 64 \
        --max_target_length 128 \
        --per_device_train_batch_size 6 \
        --per_device_eval_batch_size 6 \
        --gradient_accumulation_steps 16 \
        --predict_with_generate \
        --max_steps 20 \
        --logging_steps 5 \
        --save_steps 5 \
        --learning_rate $LR \
        --pre_seq_len $PRE_SEQ_LEN \
        --quantization_bit 4
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27

    修改完即可进行微调:

    cd /root/ChatGLM/ChatGLM2-6B/ptuning/
    sh train.sh
    
    • 1
    • 2

    运行结果如下:

    (base) [root@iZbp178u8rw9n9ko94ubbyZ ptuning]# sh train.sh 
    [2023-10-08 13:09:12,312] torch.distributed.run: [WARNING] master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
    10/08/2023 13:09:15 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
    10/08/2023 13:09:15 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments(
    _n_gpu=1,
    adafactor=False,
    adam_beta1=0.9,
    adam_beta2=0.999,
    adam_epsilon=1e-08,
    auto_find_batch_size=False,
    bf16=False,
    bf16_full_eval=False,
    data_seed=None,
    dataloader_drop_last=False,
    dataloader_num_workers=0,
    dataloader_pin_memory=True,
    ddp_backend=None,
    ddp_broadcast_buffers=None,
    ddp_bucket_cap_mb=None,
    ddp_find_unused_parameters=None,
    ddp_timeout=1800,
    debug=[],
    deepspeed=None,
    disable_tqdm=False,
    dispatch_batches=None,
    do_eval=False,
    do_predict=False,
    do_train=True,
    eval_accumulation_steps=None,
    eval_delay=0,
    eval_steps=None,
    evaluation_strategy=IntervalStrategy.NO,
    fp16=False,
    fp16_backend=auto,
    fp16_full_eval=False,
    fp16_opt_level=O1,
    fsdp=[],
    fsdp_config={
       'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
    fsdp_min_num_params=0,
    fsdp_transformer_layer_cls_to_wrap=None,
    full_determinism=False,
    generation_config=None,
    generation_max_length=None,
    generation_num_beams=None,
    gradient_accumulation_steps=16,
    gradient_checkpointing=False,
    greater_is_better=None,
    group_by_length=False,
    half_precision_backend=auto,
    hub_always_push=False,
    hub_model_id=None,
    hub_private_repo=False,
    hub_strategy=HubStrategy.EVERY_SAVE,
    hub_token=<HUB_TOKEN>,
    ignore_data_skip=False,
    include_inputs_for_metrics=False,
    jit_mode_eval=False,
    label_names=None,
    label_smoothing_factor=0.0,
    learning_rate=0.02,
    length_column_name=length,
    load_best_model_at_end=False,
    local_rank=0,
    log_level=passive,
    log_level_replica=warning,
    log_on_each_node=True,
    logging_dir=output/zhbr-chatglm2-6b-checkpoint/runs/Oct08_13-09-15_iZbp178u8rw9n9ko94ubbyZ,
    logging_first_step=False,
    logging_nan_inf_filter=True,
    logging_steps=5,
    logging_strategy=IntervalStrategy.STEPS,
    lr_scheduler_type=SchedulerType.LINEAR,
    max_grad_norm=1.0,
    max_steps=20,
    metric_for_best_model=None,
    mp_parameters=,
    no_cuda=False,
    num_train_epochs=3.0,
    optim=OptimizerNames.ADAMW_TORCH,
    optim_args=None,
    output_dir=output/zhbr-chatglm2-6b-checkpoint,
    overwrite_output_dir=True,
    past_index=-1,
    per_device_eval_batch_size=6,
    per_device_train_batch_size=6,
    predict_with_generate=True,
    prediction_loss_only=False,
    push_to_hub=False,
    push_to_hub_model_id=None,
    push_to_hub_organization=None,
    push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
    ray_scope=last,
    remove_unused_columns=True,
    report_to=[],
    resume_from_checkpoint=None,
    run_name=output/zhbr-chatglm2-6b-checkpoint,
    save_on_each_node=False,
    save_safetensors=False,
    save_steps=5,
    save_strategy=IntervalStrategy.STEPS,
    save_total_limit=None,
    seed=42,
    sharded_ddp=[],
    skip_memory_metrics=True,
    sortish_sampler=False,
    tf32=None,
    torch_compile=False,
    torch_compile_backend=None,
    torch_compile_mode=None,
    torchdynamo=None,
    tpu_metrics_debug=False,
    tpu_num_cores=None,
    use_cpu=False,
    use_ipex=False,
    use_legacy_prediction_loop=False,
    use_mps_device=False,
    warmup_ratio=0.0,
    warmup_steps=0,
    weight_decay=0.0,
    )
    10/08/2023 13:09:16 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-8e52c57dfec9ef61/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)
    100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1379.71it/s]
    [INFO|configuration_utils.py:713] 2023-10-08 13:09:16,749 >> loading configuration file /root/ChatGLM/ChatGLM2-6B-main/zhbr/chatglm2-6b/config.json
    [INFO|configuration_utils.py:713] 2023-10-08 13:09:16,751 >> loading configuration file /root/ChatGLM/ChatGLM2-6B-main/zhbr/chatglm2-6b/config.json
    [INFO|configuration_utils.py:775] 2023-10-08 13:09:16,751 >> Model config ChatGLMConfig {
       
      "_name_or_path": "/root/ChatGLM/ChatGLM2-6B-main/zhbr/chatglm2-6b",
      "add_bias_linear": false,
      "add_qkv_bias": true,
      "apply_query_key_layer_scaling": true,
      "apply_residual_connection_post_layernorm": false,
      "architectures": [
        "ChatGLMModel"
      ],
      "attention_dropout": 0.0,
      "attention_softmax_in_fp32": true,
      "auto_map": {
       
        "AutoConfig": "configuration_chatglm.ChatGLMConfig",
        "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
        "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
        "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
        "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
      },
      "bias_dropout_fusion": true,
      "classifier_dropout": null,
      "eos_token_id": 2,
      "ffn_hidden_size": 13696,
      "fp32_residual_connection": false,
      "hidden_dropout": 0.0,
      "hidden_size": 4096,
      "kv_channels": 128,
      "layernorm_epsilon": 1e-05,
      "model_type": "chatglm",
      "multi_query_attention": true,
      "multi_query_group_num": 2,
      "num_attention_heads": 32,
      "num_layers": 28,
      "original_rope": true,
      "pad_token_id": 0,
      "padded_vocab_size": 65024,
      "post_layer_norm": true,
      "pre_seq_len": null,
      "prefix_projection": false,
      "quantization_bit": 0,
      
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145
    • 146
    • 147
    • 148
    • 149
    • 150
    • 151
    • 152
    • 153
    • 154
    • 155
    • 156
    • 157
    • 158
    • 159
    • 160
    • 161
    • 162
    • 163
    • 164
    • 165
    • 166
  • 相关阅读:
    计算机视觉面试题整理
    代码随想录算法训练营第6天 | 242. 有效的字母异位词 | 349. 两个数组的交集 | 202. 快乐数 | 1. 两数之和
    数字化转型系列主题:战略咨询常用术语解释和样例说明
    OVS DPDK VXLAN隧道处理
    【MyBatis笔记11】Mybatis中的一级缓存和二级缓存
    pyinstaller 打包exe 防反编译(加密)
    软件测试基础知识
    CSS 样式优先级
    虹科教您 | 如何选择超声波储罐液位传感器(一)
    【Linux|树莓派】分文件编程以及静态库动态库
  • 原文地址:https://blog.csdn.net/weixin_44455388/article/details/133670380