为了方便后续客户在BM1684平台使用YOLOV7,这里基于官方YOLOV7原生模型进行适配。
- 官方仓库:https://github.com/WongKinYiu/yolov7
- 模型地址:https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7x.pt
BM1684环境则基于官方BMNNSDK2开发手册,环境搭建可以参考我另外一篇总结-BMNNSDK2实战记录,这里直接基于该环境进行,环境大致配置如下,采用官方docker:
- docker环境:bmnnsdk2-bm1684-ubuntu-docker-py37
- BMNNSDK2包:bmnnsdk2_bm1684_v2.7.0_20220531patched.zip
创建如下目录结构:
- #目录结构
- YOLOv7_object/
- `-- model
- |-- download_yolov7_model.sh #下载原始模型
- |-- gen_bmodel.sh #生成fp32 bmodel
- |-- gen_umodel_int8bmodel.sh #生成int8 bmodel
- |-- out #bmode等输出目录
- | `-- YOLOv7
- `-- yolov7.pt #原生模型
3.1.1.1 脚本实现
采用bmnetp工具,脚本实现如下:
- #!/bin/bash
-
- model_dir=$(dirname $(readlink -f "$0"))
- echo "model path: ${model_dir}"
- top_dir=$model_dir/../../..
- sdk_dir=$top_dir
-
- export LD_LIBRARY_PATH=${sdk_dir}/lib/bmcompiler:${sdk_dir}/lib/bmlang:${sdk_dir}/lib/thirdparty/x86:${sdk_dir}/lib/bmnn/cmodel
- export PATH=$PATH:${sdk_dir}/bmnet/bmnetp
-
- #generate output directory
- mkdir -p out/YOLOv7
-
- # python
- echo "start model transform......"
- python3 -m bmnetp \
- --net_name=yolov7 \
- --target=BM1684 \
- --opt=1 \
- --cmp=true \
- --shapes="[1,3,640,640]" \
- --model="${model_dir}/yolov7.pt" \
- --outdir=output/YOLOv7 \
- --dyn=false
- if [ $? -eq 0 ]; then
- echo "Congratulation! Everything is OK!"
- else
- echo "Something is wrong, pleae have a check!"
- exit -1
- fi
运行脚本报错如下:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_bmodel.sh
- model path: /workspace/examples/YOLOv7_object/model
- start model transform......
- Namespace(cmp=True, desc=None, descs=None, dyn=False, enable_profile=False, input_structure=None, log_dir='', log_prefix=True, mode='compile', model='/workspace/examples/YOLOv7_object/model/yolov7.pt', net_name='yolov7', op_list=False, opt=1, outdir='output/YOLOv7', seed=42, shapes=[[1, 3, 640, 640]], target='BM1684', v=3)
- python3 -m bmnetp --model=/workspace/examples/YOLOv7_object/model/yolov7.pt --net_name=yolov7 --target=BM1684 --outdir=output/YOLOv7 --shapes="[1,3,640,640]" --opt=1 --cmp=true --dyn=false --enable_profile=false --mode=compile --seed=42
- /root/.local/lib/python3.7/site-packages/bmnetp/bin/bmnetp.bin --model=/workspace/examples/YOLOv7_object/model/yolov7.pt --net_name=yolov7 --target=BM1684 --outdir=output/YOLOv7 --shapes="[1,3,640,640]" --opt=1 --cmp=true --dyn=false --enable_profile=false --mode=0 --seed=42
- terminate called after throwing an instance of 'c10::Error'
- what(): [enforce fail at inline_container.cc:222] . file not found: archive/constants.pkl
- frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x47 (0x7f990e0c80e7 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libc10.so)
- frame #1: caffe2::serialize::PyTorchStreamReader::getRecordID(std::string const&) + 0xed (0x7f98fe93accd in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
- frame #2: caffe2::serialize::PyTorchStreamReader::getRecord(std::string const&) + 0x21 (0x7f98fe93ad41 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
- frame #3: torch::jit::readArchiveAndTensors(std::string const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&) + 0x62 (0x7f98ffe322f2 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
- frame #4: <unknown function> + 0x348b8b4 (0x7f98ffe328b4 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
- frame #5: <unknown function> + 0x348d193 (0x7f98ffe34193 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
- frame #6: torch::jit::load(std::shared_ptr<caffe2::serialize::ReadAdapterInterface>, c10::optional<c10::Device>, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > >&) + 0x193 (0x7f98ffe352f3 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
- frame #7: torch::jit::load(std::string const&, c10::optional<c10::Device>, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > >&) + 0xad (0x7f98ffe376bd in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
- frame #8: torch::jit::load(std::string const&, c10::optional<c10::Device>) + 0x54 (0x7f98ffe37794 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libtorch_cpu.so)
- frame #9: <unknown function> + 0x177ed2 (0x7f9911793ed2 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libbmnetp.so)
- frame #10: bm::check(std::string const&) + 0x2f (0x7f9911797df7 in /root/.local/lib/python3.7/site-packages/bmnetp/lib/libbmnetp.so)
- frame #11: main + 0xdc (0x445195 in /root/.local/lib/python3.7/site-packages/bmnetp/bin/bmnetp.bin)
- frame #12: __libc_start_main + 0xf0 (0x7f98fb57a840 in /lib/x86_64-linux-gnu/libc.so.6)
- frame #13: _start + 0x2a (0x4418aa in /root/.local/lib/python3.7/site-packages/bmnetp/bin/bmnetp.bin)
-
- Aborted (core dumped)
- compile failed, exit code=134
- <class 'SystemExit'> 134 <traceback object at 0x7fa01fcd1448>
- Something is wrong, pleae have a check!
首先怀疑是torch版本问题,通过查阅YOLOV7官方资料,YOLOV7要求torch>=1.7.0,!=1.12.0,torchvision>=0.8.1,!=0.13.0,而当前docker内相关包版本不匹配,如下:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# pip list | grep torch
- torch 1.5.0+cpu
- torchvision 0.6.0+cpu
后面通过咨询同事,发现模型迁移的原生模型必须要通过torchscript处理后才能进行迁移。鉴于此,需要重新配置docker环境。
3.1.1.2 docker环境配置
下面通过conda,进行docker环境管理。首先,需要安装conda,在上述docker中,命令如下:
- [2022-07-14 20:02:54] root@bitmain-SYS-4028GR-TR2:/workspace# mkdir miniconda
- [2022-07-14 20:03:29] root@bitmain-SYS-4028GR-TR2:/workspace# cd miniconda/
- [2022-07-14 20:03:31] root@bitmain-SYS-4028GR-TR2:/workspace/miniconda# wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Minicond aot@bitmain-SYS-4028GR-TR2:/workspace/miniconda# w
- ......
- [2022-07-14 20:04:07] 2022-07-14 20:04:15 (3.83 MB/s) - 'Miniconda3-latest-Linux-x86_64.sh' saved [76607678/76607678]
- [2022-07-14 20:04:37] root@bitmain-SYS-4028GR-TR2:/workspace/miniconda# bash Miniconda3-latest-Linux-x86_64.sh
- ......
- [2022-07-14 20:05:34] conda config --set auto_activate_base false
- [2022-07-14 20:05:34]
- [2022-07-14 20:05:34] Thank you for installing Miniconda3!
退出docker后,再次进入,conda就会默认启用了,注意看如下日志最后一行的(base)即表示已启用conda,日志如下:
- [2022-07-14 20:05:53] root@bitmain-SYS-4028GR-TR2:/workspace/miniconda# exit
- [2022-07-14 20:06:09] exit
- [2022-07-14 20:06:09] (base) ningbo.wang@bitmain-SYS-4028GR-TR2:~$docker exec -it ubuntu16.0-py37-wnb bash
- [2022-07-14 20:06:12] (base) root@bitmain-SYS-4028GR-TR2:/workspace# conda
为了使用方便,这里配置conda开机不默认启动,顺便把镜像源都配置为国内,下载包的速度会比较快,这里配置为清华镜像源,配置命令如下:
- [2022-07-14 20:06:18] (base) root@bitmain-SYS-4028GR-TR2:/workspace# con condaexitconda config --set auto_activate_base fals
- [2022-07-14 20:08:11] (base) root@bitmain-SYS-4028GR-TR2:/workspace# conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anacon
- da/pkgs/free/
- [2022-07-14 20:08:34] (base) root@bitmain-SYS-4028GR-TR2:/workspace# onda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anacond
- a/pkgs/main/bitmain-SYS-4028GR-TR2:/workspace# conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda
- [2022-07-14 20:08:45] (base) root@bitmain-SYS-4028GR-TR2:/workspace# conda config --set show_channel_urls yes
- [2022-07-14 20:09:02] (base) root@bitmain-SYS-4028GR-TR2:/workspace# pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simpl
- e
上述配置完成后,可以退出docker后,重新进入,然后创建yolov7环境,进行环境配置,如下:
- [2022-07-14 20:14:09] root@bitmain-SYS-4028GR-TR2:/workspace# conda create -n yolov7 python=3.7
- ......
- [2022-07-14 20:16:16] # To activate this environment, use
- [2022-07-14 20:16:16] #
- [2022-07-14 20:16:16] # $ conda activate yolov7
- [2022-07-14 20:16:16] #
- [2022-07-14 20:16:16] # To deactivate an active environment, use
- [2022-07-14 20:16:16] #
- [2022-07-14 20:16:16] # $ conda deactivate
- [2022-07-14 20:16:16] root@bitmain-SYS-4028GR-TR2:/workspace# conda activate yolov7
- [2022-07-14 20:17:01] (yolov7) root@bitmain-SYS-4028GR-TR2:/workspace#
3.1.1.3 YOLOv7模型准备
拉取官方代码仓库,并下载原生模型,之后将原生模型转换为torchscript模型,目录结构如下:
- [2022-07-14 21:00:58] (yolov7) root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# python m models/export.py --weights yolov7.pt
- ......
- [2022-07-14 21:01:24] Export complete (10.26s). Visualize with https://github.com/lutzroeder/netron.
-
- root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# tree -L 1
- .
- |-- LICENSE.md
- |-- README.md
- |-- cfg
- |-- data
- |-- detect.py
- |-- figure
- |-- hubconf.py
- |-- inference
- |-- models
- |-- requirements.txt
- |-- scripts
- |-- test.py
- |-- tools
- |-- train.py
- |-- train_aux.py
- |-- utils
- |-- yolov7.onnx
- |-- yolov7.pt
- `-- yolov7.torchscript.pt
3.1.1.4 模型转换
下面基于yolov7.torchscript.pt进行模型迁移,这里需要将停止docker,采用官方原生的docker环境,即【3.1.1.1】中的环境,命令如下:
- root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# cp yolov7.torchscript.pt ../../examples/YOLOv7_object/model/
- core gen_bmodel.sh out/
- download_yolov7_model.sh gen_umodel_int8bmodel.sh yolov7.pt
- root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# cp yolov7.torchscript.pt ../../examples/YOLOv7_object/model/
- root@bitmain-SYS-4028GR-TR2:/workspace/code/yolov7# cd ../../examples/YOLOv7_object/model
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_bmodel.sh
- model path: /workspace/examples/YOLOv7_object/model
- start model transform......
- ......
- BMLIB Send Quit Message
- Compiling succeeded.
- Congratulation! Everything is OK!
-
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# tree output/YOLOv7/
- output/YOLOv7/
- |-- compilation.bmodel
- |-- input_ref_data.dat
- |-- io_info.dat
- `-- output_ref_data.dat
-
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# bm_model.bin --info ./output/YOLOv7/compilation.bmodel
- bmodel version: B.2.2
- chip: BM1684
- create time: Fri Jul 15 10:40:10 2022
-
- ==========================================
- net 0: [yolov7] static
- ------------
- stage 0:
- input: x.1, [1, 3, 640, 640], float32, scale: 1
- output: 756, [1, 3, 80, 80, 85], float32, scale: 1
- output: 757, [1, 3, 40, 40, 85], float32, scale: 1
- output: 758, [1, 3, 20, 20, 85], float32, scale: 1
-
3.1.1.5 精度回归
下面借助官方工具,进行转换模型精度回归,精度符合预期,如下所示:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# bmrt_test --context_dir=./output/YOLOv7/
- [BMRT][deal_with_options:1412] INFO:Loop num: 1
- bmcpu init: skip cpu_user_defined
- ......
- [BMRT][bmrt_test:1038] INFO:==>comparing #0 output ...
- [BMRT][bmrt_test:1043] INFO:+++ The network[yolov7] stage[0] cmp success +++
- [BMRT][bmrt_test:1063] INFO:load input time(s): 0.004891
- [BMRT][bmrt_test:1064] INFO:calculate time(s): 0.084028
- [BMRT][bmrt_test:1065] INFO:get output time(s): 0.007568
- [BMRT][bmrt_test:1066] INFO:compare time(s): 0.027697
-
至此,fp32bmodel生成完毕。
int8量化模型相较于fp32复杂一些,大致需要一下步骤。
转存失败重新上传取消
3.1.2.1 量化数据集准备
这里基于coco128数据进行处理,主要参考YOLOv7前处理,需要保持一致,主要是等比例加框处理、归一化,将数据集处理成lmdb格式的文件,命令执行如下:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/data# python3 convert_imageset.py --imageset_rootfolder ./coco128/images/train2017/ --imageset_lmdbfolder ./ --image_size 640 --bgr2rgb True --gray False
- remove original lmdb file /workspace/examples/YOLOv7_object/data/data.mdb
- remove original lmdb file /workspace/examples/YOLOv7_object/data/data.mdb Ok!
-
- reading image /workspace/examples/YOLOv7_object/data/coco128/images/train2017/000000000472.jpg
- original shape: (226, 640, 3)
- save test.jpg done
- ......
-
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/data# tree -L 1
- .
- |-- coco128
- |-- convert_imageset.py
- |-- data.mdb
- `-- download_coco128.sh
这里为了方便查看前处理图片是否正确,将加框处理后图片存出后,对比查看,如下可以看出前处理正确:
转存失败重新上传取消
3.1.2.2 fp32umodel生成
采用ufw.tools.pt_to_umode工具,进行fp32umodel生成,命令如下:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_fp32umodel.sh
- /workspace/examples/YOLOv7_object/model
- /usr/local/lib/python3.7/runpy.py:125: RuntimeWarning: 'ufw.tools.pt_to_umodel' found in sys.modules after import of package 'ufw.tools', but prior to execution of 'ufw.tools.pt_to_umodel'; this may result in unpredictable behaviour
- warn(RuntimeWarning(msg))
- python3 -m bmnetp --model=./yolov7.torchscript.pt --net_name=yolov7.torchscript --target=BM1684 --outdir=compilation_fp32umodel --shapes="[1,3,640,640]" --opt=2 --cmp=true --dyn=false --enable_profile=false --mode=GenUmodel
- /root/.local/lib/python3.7/site-packages/bmnetp/bin/bmnetp.bin --model=./yolov7.torchscript.pt --net_name=yolov7.torchscript --target=BM1684 --outdir=compilation_fp32umodel --shapes="[1,3,640,640]" --opt=2 --cmp=true --dyn=false --enable_profile=false --mode=1
- All ops supported.
- ......
- Compiling succeeded.
- ####################################
- Converting Process Done Sucessfully
- ####################################
- fp32umodel done
-
3.1.2.3 int8umodel生成
下面,基于上述章节生成的数据集、fp32umode等成果物,进行int8umodel转换,主要包含两部分:
对输入浮点网络进行图优化,这一步在【3.1.2.2】中已包含,也可以在此处做
对浮点网络进行量化,得到int8的网络及权重文件
这里我们只进行int8的量化,不进行图优化,迭代200次,命令执行如下:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_int8umodel.sh
- /workspace/examples/YOLOv7_object/model
- I0718 10:21:30.706017 6933 common.cpp:62] ufw version with commit id:bc3faf38c90b7216f95796e9edaa8cecd9227d8d
- I0718 10:21:30.706442 6933 calibration_use_pb.cpp:171] calibration-tools version with commit id:bc3faf38c90b7216f95796e9edaa8cecd9227d8d
- ......
- /usr/bin/dot
- I0718 10:46:52.879577 6933 cali_core.cpp:1474] used time=0 hour:25 min:22 sec
- I0718 10:46:52.879654 6933 cali_core.cpp:1476] int8 calibration done.
- Congratulation! Everything is OK!
-
- #目录结构
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# tree compilation_fp32umodel/
- compilation_fp32umodel/
- |-- io_info.dat
- |-- yolov7.fp32umodel -> yolov7.torchscript_bmnetp.fp32umodel
- |-- yolov7.int8umodel
- |-- yolov7.prototxt -> yolov7.torchscript_bmnetp_test_fp32.prototxt
- |-- yolov7.torchscript_bmnetp.fp32umodel
- |-- yolov7.torchscript_bmnetp_test_fp32.prototxt
- |-- yolov7_deploy_fp32_unique_top.prototxt
- `-- yolov7_deploy_int8_unique_top.prototxt
注意,官方工具存在一些问题,手册讲解与工具实际表现不一致,如下:
-winograd配置为false或者true均会无报错,返回0状态退出,经过尝试,实际是只要-winograd则是使能(true),否则不配置该参数即为false,而官方手册及工具本身的help都是错误的,需要更新,另外,通过试探-save_test_proto、-graph_transform均是如此
转存失败重新上传取消
3.1.2.4 int8bmodel生成
下面生成板上部署使用的bmodel,代码大致如下:
- #!/bin/bash
- #1batch bmodel
- mkdir int8model
- bmnetu \
- -model compilation_fp32umodel/yolov7_deploy_int8_unique_top.prototxt \
- -weight compilation_fp32umodel/yolov7.int8umodel \
- -outdir=./int8model \
- -cmp true
-
- if [ $? -eq 0 ]; then
- cp ./int8model/compilation.bmodel ./int8model/yolov7_int8_1b.bmodel
- echo "Congratulation! Everything is OK!"
- else
- echo "Something is wrong, pleae have a check!"
- exit -1
- fi
运行脚本,命令执行及最终成果物路径如下:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# ./gen_int8bmodel.sh
- /workspace/examples/YOLOv7_object/model
- mkdir: cannot create directory 'int8model': File exists
- ......
- ============================================================
- *** Store bmodel of BMCompiler...
- ============================================================
- BMLIB Send Quit Message
- Congratulation! Everything is OK!
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# tree -L 1 int8model/
- int8model/
- |-- compilation.bmodel
- |-- input_ref_data.dat
- |-- io_info.dat
- |-- output_ref_data.dat
- `-- yolov7_int8_1b.bmodel #1batch的最终成果物
3.1.2.5 精度回归
下面借助官方工具,进行转换模型精度回归,精度符合预期,如下所示:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/YOLOv7_object/model# bmrt_test --context_dir=./int8model/
- [BMRT][deal_with_options:1412] INFO:Loop num: 1
- bmcpu init: skip cpu_user_defined
- open usercpu.so, init user_cpu_init
- [BMRT][load_bmodel:1018] INFO:Loading bmodel from [./int8model//compilation.bmodel]. Thanks for your patience...
- [BMRT][load_bmodel:982] INFO:pre net num: 0, load net num: 1
- [BMRT][show_net_info:1336] INFO: ########################
- [BMRT][show_net_info:1337] INFO: NetName: yolov7, Index=0
- [BMRT][show_net_info:1339] INFO: ---- stage 0 ----
- [BMRT][show_net_info:1347] INFO: Input 0) 'x.1' shape=[ 1 3 640 480 ] dtype=INT8 scale=127.031
- [BMRT][show_net_info:1356] INFO: Output 0) '756' shape=[ 1 3 80 60 85 ] dtype=INT8 scale=0.198189
- [BMRT][show_net_info:1356] INFO: Output 1) '757' shape=[ 1 3 40 30 85 ] dtype=INT8 scale=0.202178
- [BMRT][show_net_info:1356] INFO: Output 2) '758' shape=[ 1 3 20 15 85 ] dtype=INT8 scale=0.169756
- [BMRT][show_net_info:1359] INFO: ########################
- [BMRT][bmrt_test:770] INFO:==> running network #0, name: yolov7, loop: 0
- [BMRT][bmrt_test:834] INFO:reading input #0, bytesize=921600
- [BMRT][bmrt_test:987] INFO:reading output #0, bytesize=1224000
- [BMRT][bmrt_test:987] INFO:reading output #1, bytesize=306000
- [BMRT][bmrt_test:987] INFO:reading output #2, bytesize=76500
- [BMRT][bmrt_test:1019] INFO:net[yolov7] stage[0], launch total time is 32659 us (npu 32530 us, cpu 129 us)
- [BMRT][bmrt_test:1022] INFO:+++ The network[yolov7] stage[0] output_data +++
- [BMRT][bmrt_test:1038] INFO:==>comparing #0 output ...
- [BMRT][bmrt_test:1043] INFO:+++ The network[yolov7] stage[0] cmp success +++
- [BMRT][bmrt_test:1063] INFO:load input time(s): 0.000951
- [BMRT][bmrt_test:1064] INFO:calculate time(s): 0.032664
- [BMRT][bmrt_test:1065] INFO:get output time(s): 0.001572
- [BMRT][bmrt_test:1066] INFO:compare time(s): 0.005961