• 大模型部署手记(3)通义千问+Windows GPU


    1.简介

    组织机构:阿里

    代码仓:GitHub - QwenLM/Qwen: The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

    模型:Qwen/Qwen-7B-Chat-Int4

    下载:http://huggingface.co/Qwen/Qwen-7B-Chat-Int4

    modelscope下载:https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int4/summary

    硬件环境:暗影精灵7Plus

    Windows版本:Windows 11家庭中文版 Insider Preview 22H2

    内存 32G

    GPU显卡:Nvidia GTX 3080 Laptop (16G)

    安装阿里的 通义千问大模型有两种方式,modelscope方式和transformers(huggingface)方式。

    参考资料:

    1.玩一玩140亿参数的阿里千问!Qwen+Win11+3060 https://zhuanlan.zhihu.com/p/659000534

    2.玩一玩通义千问Qwen开源版,Win11 RTX3060本地安装记录! https://zhuanlan.zhihu.com/p/648368704

    2.代码和模型下载

    下载代码仓:

    d:

    git clone https://github.com/QwenLM/Qwen.git

    模型下载参见 第四部分执行 python Qwen-7B-Chat-Int4.py的过程。

    3.安装依赖

    打开Anaconda Powershell Prompt,创建conda环境:

    conda create -n model310 python=3.10

    conda activate model310

    安装modelscope基础库

    pip install modelscope

    在安装modelscope的时候,系统会自动安装pytorch 2.0.1(后面会发现装的torch这个完全不对)

    打开 魔搭社区 http://modelscope.cn

    注册一下:

    打开 Qwen-7B inr4量化的主页:https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int4/summary

    安装量化依赖:

    pip install auto-gptq optimum

    安装量化包:

    pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui

    安装其他依赖:

    pip install transformers_stream_generator

    pip install tiktoken

    pip install deepspeed

    目前deepspeed在windows上的安装还存在问题。我们先忽略掉吧!

    安装flash-attention库

    git clone -b v1.0.8 https://github.com/Dao-AILab/flash-attention

    cd flash-attention

    pip install .

    # 下方安装可选,安装可能比较缓慢。

    # Below are optional. Installing them might be slow.

    # pip install csrc/layer_norm

    # pip install csrc/rotary

    看日志应该是torch可能不是CUDA的版本。

    验证下:

    果然如此。

    还是使用conda安装pytorch 2.0的CUDA版本吧!

    conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

    为了保险,还是要验证一下:

    python

    import torch

    #pytorch的版本

    torch.__version__

    #是否支持CUDA

    torch.cuda.is_available()

    #CUDA的版本

    print(torch.version.cuda)

    #cuDNN的版本

    print(torch.backends.cudnn.version())

    #GPU内存

    torch.cuda.get_device_capability(device=0)

    再来:

    pip install .

    4.部署验证

    编辑d:\Qwen\Qwen-7B-Chat-Int4.py 文件,内容如下:

    1. from modelscope import AutoTokenizer, AutoModelForCausalLM, snapshot_download
    2. model_dir = snapshot_download("qwen/Qwen-7B-Chat-Int4", revision = 'v1.1.3' )
    3. # Note: The default behavior now has injection attack prevention off.
    4. tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
    5. model = AutoModelForCausalLM.from_pretrained(
    6. model_dir,
    7. device_map="auto",
    8. trust_remote_code=True
    9. ).eval()
    10. response, history = model.chat(tokenizer, "你好", history=None)
    11. print(response)
    12. # 你好!很高兴为你提供帮助。

    执行这个文件:

    cd d:\Qwen

    python Qwen-7B-Chat-Int4.py

    pip install chardet

    再来:

    python Qwen-7B-Chat-Int4.py

    耐心等待模型下载完毕。。。

    看来模型是下载到了这个目录:C:\Users\用户名\.cache\modelscope\hub\qwen\Qwen-7B-Chat-Int4

    这个下载的时候不显示速度,下载完毕之后才显示速度。。。

    仔细看看还少装了什么包:

    pip install cchardet

    再来:

    python Qwen-7B-Chat-Int4.py

    看来已经能成功运行了。

    将 前面下载目录 C:\Users\用户名\.cache\modelscope\hub\qwen\Qwen-7B-Chat-Int4 下的所有文件复制到 当前目录的 Qwen\Qwen-7B-Chat-Int4 目录:

    修改cli_demo.py

    修改如下代码:

    DEFAULT_CKPT_PATH = './Qwen/Qwen-7B-Chat-Int4'

    运行 python cli_demo.py

    系统很快会弹出:

    做一些交互:

    不过每次都要清屏,有点不舒服。

    把代码中的clear_screen都去掉:(除了收到明确的clear命令)

    CTRL-C退出去重新运行:python cli_demo.py

    貌似有点问题,代码好像每次都在做刷屏,然后输入一行新的话处理。

    经过多次尝试,代码这样修改就可以了:

    1. # Copyright (c) Alibaba Cloud.
    2. #
    3. # This source code is licensed under the license found in the
    4. # LICENSE file in the root directory of this source tree.
    5. """A simple command-line interactive chat demo."""
    6. import argparse
    7. import os
    8. import platform
    9. import shutil
    10. from copy import deepcopy
    11. from transformers import AutoModelForCausalLM, AutoTokenizer
    12. from transformers.generation import GenerationConfig
    13. from transformers.trainer_utils import set_seed
    14. DEFAULT_CKPT_PATH = './Qwen/Qwen-7B-Chat-Int4'
    15. _WELCOME_MSG = '''\
    16. Welcome to use Qwen-Chat model, type text to start chat, type :h to show command help.
    17. (欢迎使用 Qwen-Chat 模型,输入内容即可进行对话,:h 显示命令帮助。)
    18. Note: This demo is governed by the original license of Qwen.
    19. We strongly advise users not to knowingly generate or allow others to knowingly generate harmful content, including hate speech, violence, pornography, deception, etc.
    20. (注:本演示受Qwen的许可协议限制。我们强烈建议,用户不应传播及不应允许他人传播以下内容,包括但不限于仇恨言论、暴力、色情、欺诈相关的有害信息。)
    21. '''
    22. _HELP_MSG = '''\
    23. Commands:
    24. :help / :h Show this help message 显示帮助信息
    25. :exit / :quit / :q Exit the demo 退出Demo
    26. :clear / :cl Clear screen 清屏
    27. :clear-his / :clh Clear history 清除对话历史
    28. :history / :his Show history 显示对话历史
    29. :seed Show current random seed 显示当前随机种子
    30. :seed Set random seed to 设置随机种子
    31. :conf Show current generation config 显示生成配置
    32. :conf = Change generation config 修改生成配置
    33. :reset-conf Reset generation config 重置生成配置
    34. '''
    35. def _load_model_tokenizer(args):
    36. tokenizer = AutoTokenizer.from_pretrained(
    37. args.checkpoint_path, trust_remote_code=True, resume_download=True,
    38. )
    39. if args.cpu_only:
    40. device_map = "cpu"
    41. else:
    42. device_map = "auto"
    43. model = AutoModelForCausalLM.from_pretrained(
    44. args.checkpoint_path,
    45. device_map=device_map,
    46. trust_remote_code=True,
    47. resume_download=True,
    48. ).eval()
    49. config = GenerationConfig.from_pretrained(
    50. args.checkpoint_path, trust_remote_code=True, resume_download=True,
    51. )
    52. return model, tokenizer, config
    53. def _clear_screen():
    54. if platform.system() == "Windows":
    55. os.system("cls")
    56. else:
    57. os.system("clear")
    58. def _print_history(history):
    59. terminal_width = shutil.get_terminal_size()[0]
    60. print(f'History ({len(history)})'.center(terminal_width, '='))
    61. for index, (query, response) in enumerate(history):
    62. print(f'User[{index}]: {query}')
    63. print(f'QWen[{index}]: {response}')
    64. print('=' * terminal_width)
    65. def _get_input() -> str:
    66. while True:
    67. try:
    68. message = input('User> ').strip()
    69. except UnicodeDecodeError:
    70. print('[ERROR] Encoding error in input')
    71. continue
    72. except KeyboardInterrupt:
    73. exit(1)
    74. if message:
    75. return message
    76. print('[ERROR] Query is empty')
    77. def main():
    78. parser = argparse.ArgumentParser(
    79. description='QWen-Chat command-line interactive chat demo.')
    80. parser.add_argument("-c", "--checkpoint-path", type=str, default=DEFAULT_CKPT_PATH,
    81. help="Checkpoint name or path, default to %(default)r")
    82. parser.add_argument("-s", "--seed", type=int, default=1234, help="Random seed")
    83. parser.add_argument("--cpu-only", action="store_true", help="Run demo with CPU only")
    84. args = parser.parse_args()
    85. history, response = [], ''
    86. model, tokenizer, config = _load_model_tokenizer(args)
    87. orig_gen_config = deepcopy(model.generation_config)
    88. #_clear_screen()
    89. print(_WELCOME_MSG)
    90. seed = args.seed
    91. while True:
    92. query = _get_input()
    93. # Process commands.
    94. if query.startswith(':'):
    95. command_words = query[1:].strip().split()
    96. if not command_words:
    97. command = ''
    98. else:
    99. command = command_words[0]
    100. if command in ['exit', 'quit', 'q']:
    101. break
    102. elif command in ['clear', 'cl']:
    103. _clear_screen()
    104. print(_WELCOME_MSG)
    105. continue
    106. elif command in ['clear-history', 'clh']:
    107. print(f'[INFO] All {len(history)} history cleared')
    108. history.clear()
    109. continue
    110. elif command in ['help', 'h']:
    111. print(_HELP_MSG)
    112. continue
    113. elif command in ['history', 'his']:
    114. _print_history(history)
    115. continue
    116. elif command in ['seed']:
    117. if len(command_words) == 1:
    118. print(f'[INFO] Current random seed: {seed}')
    119. continue
    120. else:
    121. new_seed_s = command_words[1]
    122. try:
    123. new_seed = int(new_seed_s)
    124. except ValueError:
    125. print(f'[WARNING] Fail to change random seed: {new_seed_s!r} is not a valid number')
    126. else:
    127. print(f'[INFO] Random seed changed to {new_seed}')
    128. seed = new_seed
    129. continue
    130. elif command in ['conf']:
    131. if len(command_words) == 1:
    132. print(model.generation_config)
    133. else:
    134. for key_value_pairs_str in command_words[1:]:
    135. eq_idx = key_value_pairs_str.find('=')
    136. if eq_idx == -1:
    137. print('[WARNING] format: =')
    138. continue
    139. conf_key, conf_value_str = key_value_pairs_str[:eq_idx], key_value_pairs_str[eq_idx + 1:]
    140. try:
    141. conf_value = eval(conf_value_str)
    142. except Exception as e:
    143. print(e)
    144. continue
    145. else:
    146. print(f'[INFO] Change config: model.generation_config.{conf_key} = {conf_value}')
    147. setattr(model.generation_config, conf_key, conf_value)
    148. continue
    149. elif command in ['reset-conf']:
    150. print('[INFO] Reset generation config')
    151. model.generation_config = deepcopy(orig_gen_config)
    152. print(model.generation_config)
    153. continue
    154. else:
    155. # As normal query.
    156. pass
    157. # Run chat.
    158. set_seed(seed)
    159. try:
    160. for response in model.chat_stream(tokenizer, query, history=history, generation_config=config):
    161. pass
    162. # _clear_screen()
    163. # print(f"\nUser: {query}")
    164. print(f"\nQwen-Chat: {response}")
    165. except KeyboardInterrupt:
    166. print('[WARNING] Generation interrupted')
    167. continue
    168. history.append((query, response))
    169. if __name__ == "__main__":
    170. main()

    请注意print的位置。

    python cli_demo.py

    (全文完,谢谢阅读)

  • 相关阅读:
    超全的Python完全版电子书——从基础到爬虫、分析等高级应用,限时下载
    PG::Ha-natraj
    力扣经典150题第三十九题:赎金信
    解决跨越的几种方式
    四川达州-全国先进计算创新大赛总结
    比例夹管阀及其高精度压力和流量控制解决方案
    神经网络与深度学习笔记(1)——实践基础
    数字孪生与GIS的完美融合
    Pytorch 转ONNX详解
    获取DataFrame中各列最大值所在行的行号(索引号)idxmax()
  • 原文地址:https://blog.csdn.net/snmper/article/details/133578360