工作需要,用到了算能的1684芯片,在此记录一下基于官方手册的实操过程
参考官方手册:BMNNSDK2 入门手册
链接:https://sophgo-doc.gitbook.io/bmnnsdk2-bm1684
SDK复现需要借助一定的开发环境,这里基于公司公有服务器,通过ssh方式使用,步骤如下:
申请特定服务器账号
借助ssh工具,登录到服务器,由于电脑没法安装软件,这里采用了win10自带的cmd终端(需要接入内网),也可以采用常用的MobaXterm、SecureCRT等软件
- # docker镜像
- wget https://sophon-file.sophon.cn/sophon-prod-s3/drive/22/03/19/13/bmnnsdk2-bm1684-ubuntu-docker-py37.zip
- #SDK
- wget https://sophon-file.sophon.cn/sophon-prod-s3/drive/22/05/31/11/bmnnsdk2_bm1684_v2.7.0_20220531patched.zip
下载好的成果物如下:
通过上述操作,我们已经下载了必备的成果物,这里先解压SDK整包,解压后,可以通过校验MD5码,防止文件被篡改,带来一些不必要的麻烦,命令如下:
- (base) xxx@bitmain-SYS-4028GR-TR2:~$unzip bmnnsdk2_bm1684_v2.7.0_20220531patched.zip
- Archive: bmnnsdk2_bm1684_v2.7.0_20220531patched.zip
- creating: bmnnsdk2_bm1684_v2.7.0_20220531patched/
- inflating: bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2.MD5
- inflating: __MACOSX/bmnnsdk2_bm1684_v2.7.0_20220531patched/._bmnnsdk2.MD5
- inflating: bmnnsdk2_bm1684_v2.7.0_20220531patched/release_version.txt
- inflating: __MACOSX/bmnnsdk2_bm1684_v2.7.0_20220531patched/._release_version.txt
- inflating: bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0.tar.gz
- inflating: __MACOSX/bmnnsdk2_bm1684_v2.7.0_20220531patched/._bmnnsdk2-bm1684_v2.7.0.tar.gz
- (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched$cat bmnnsdk2.MD5
- 6ae7d9b5a8564eb66f4f820319c2d39f ./bmnnsdk2-bm1684_v2.7.0.tar.gz
- bf2c860701575909e43b964011694c8f ./release_version.txt
- (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched$md5sum ./*
- 6ae7d9b5a8564eb66f4f820319c2d39f ./bmnnsdk2-bm1684_v2.7.0.tar.gz
- 7719bf8cd5d5de8388ebcddda6f2c4be ./bmnnsdk2.MD5
- bf2c860701575909e43b964011694c8f ./release_version.txt
继续解压缩SDK真正成果物,如下:
- (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched$tar -zxvf bmnnsdk2-bm1684_v2.7.0.tar.gz
- bmnnsdk2-bm1684_v2.7.0/
- bmnnsdk2-bm1684_v2.7.0/release_version.txt
- ......
至此,SDK包的环境已经处理完毕。
经过上述操作,我们已经进入到服务器环境,并且下载好了相关成果物。为了方便便捷复现SDK,这里直接基于官方docker镜像,不再采用自搭docker。
docker采用ubuntu-docker-py37,首先需要解压该docker压缩包,解压缩后,可以通过校验MD5码,防止文件被篡改,带来一些不必要的麻烦,命令如下:
- base) xxx@bitmain-SYS-4028GR-TR2:~$unzip bmnnsdk2-bm1684-ubuntu-docker-py37.zip
- Archive: bmnnsdk2-bm1684-ubuntu-docker-py37.zip
- creating: bmnnsdk2-bm1684-ubuntu-docker-py37/
- extracting: bmnnsdk2-bm1684-ubuntu-docker-py37/bmnnsdk2-bm1684-ubuntu.docker
-
- extracting: bmnnsdk2-bm1684-ubuntu-docker-py37/bmnnsdk2.MD5
- extracting: bmnnsdk2-bm1684-ubuntu-docker-py37/Dockerfile.bm1684
- extracting: bmnnsdk2-bm1684-ubuntu-docker-py37/release_version.txt
- (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2-bm1684-ubuntu-docker-py37$cat bmnnsdk2.MD5
- cf91eb0ff60f28e368bba1c357d2e7e5 ./Dockerfile.bm1684
- c181ce60245b4fe07596d8a360944903 ./release_version.txt
- 105a4d5d13a41d97353fd2dab88b4802 ./bmnnsdk2-bm1684-ubuntu.docker
- (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2-bm1684-ubuntu-docker-py37$md5sum ./*
- 105a4d5d13a41d97353fd2dab88b4802 ./bmnnsdk2-bm1684-ubuntu.docker
- 7b1fdecee114e6d2d82c21286e9b1a39 ./bmnnsdk2.MD5
- cf91eb0ff60f28e368bba1c357d2e7e5 ./Dockerfile.bm1684
- c181ce60245b4fe07596d8a360944903 ./release_version.txt
参考官方说明,SDK包中有docker运行的脚本docker_run_bmnnsdk.sh,不过考虑到当前公用服务器,该脚本大概率会被执行了很多遍,相关container已经被多次创建,这里为了方便识别,需要修改脚本中内容,重命名container名称,脚本修改点如下:
- if [ -c "/dev/bm-sophon0" ]; then
- for dev in $(ls /dev/bm-sophon*);
- do
- mount_options+="--device="$dev:$dev" "
- done
- CMD="docker run \
- --name ubuntu16.0-py37-wnb \
- --network=host \
- --workdir=/workspace \
- --privileged=true \
- ${mount_options} \
- --device=/dev/bmdev-ctl:/dev/bmdev-ctl \
- -v /dev/shm --tmpfs /dev/shm:exec \
- -v $WORKSPACE:/workspace \
- -v /dev:/dev \
- -v /etc/localtime:/etc/localtime \
- -e LOCAL_USER_ID=`id -u` \
- -it $REPO/$IMAGE:$TAG \
- bash
- "
- else
- CMD="docker run \
- --name ubuntu16.0-py37-wnb \
- --network=host \
- --workdir=/workspace \
- --privileged=true \
- -v $WORKSPACE:/workspace \
- -v /dev/shm --tmpfs /dev/shm:exec \
- -v /etc/localtime:/etc/localtime \
- -e LOCAL_USER_ID=`id -u` \
- -it $REPO/$IMAGE:$TAG \
- bash
- "
- fi
下面创建container,采用官方脚本,容器创建后,会默认进入,命令如下:
- (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0$./docker_run_bmnnsdk.sh
- /mnt/sdb2/xxx/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0
- /mnt/sdb2/xxx/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0
- bmnnsdk2-bm1684/dev:ubuntu16.04
- docker run --name ubuntu16.0-py37-wnb --network=host --workdir=/workspace --privileged=true --device=/dev/bm-sophon0:/dev/bm-sophon0 --device=/dev/bm-sophon1:/dev/bm-sophon1 --device=/dev/bm-sophon2:/dev/bm-sophon2 --device=/dev/bm-sophon3:/dev/bm-sophon3 --device=/dev/bm-sophon4:/dev/bm-sophon4 --device=/dev/bm-sophon5:/dev/bm-sophon5 --device=/dev/bm-sophon6:/dev/bm-sophon6 --device=/dev/bm-sophon7:/dev/bm-sophon7 --device=/dev/bm-sophon8:/dev/bm-sophon8 --device=/dev/bmdev-ctl:/dev/bmdev-ctl -v /dev/shm --tmpfs /dev/shm:exec -v /mnt/sdb2/xxx/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0:/workspace -v /dev:/dev -v /etc/localtime:/etc/localtime -e LOCAL_USER_ID=1032 -it bmnnsdk2-bm1684/dev:ubuntu16.04 bash
- root@bitmain-SYS-4028GR-TR2:/workspace#
注:
上述方式运行的container,在退出后,container会自动退出,为了方便反复使用,可以通过如下命令进入:
(base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0$docker start ubuntu16.0-py37-wnb ubuntu16.0-py37-wnb (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0$docker exec -it ubuntu16.0-py37-wnb bash root@bitmain-SYS-4028GR-TR2:/workspace#
至此,基本环境就搭建完毕了。
下面基于上述环境,进行SDK中example重现,目录结构如下:
- #examples目录结构
- .
- |-- Resnet_classify
- |-- RetinaFace
- |-- SSD_object
- |-- YOLOX_object
- |-- YOLOv3_object
- |-- YOLOv5_object
- |-- calibration
- |-- centernet
- |-- multimedia
- |-- nntc
- |-- okkernel
- `-- sail
-
在复现example之前,还需要在docker中安装SDK中必须库和设置环境变量,命令如下:
- root@bitmain-SYS-4028GR-TR2:/workspace/scripts# ./install_lib.sh nntc
- linux is Ubuntu16.04.5LTS\n\l
- bmnetc and bmlang USING_CXX11_ABI=1
- Install lib done !
- root@bitmain-SYS-4028GR-TR2:/workspace/scripts# source envsetup_pcie.sh
- /workspace/scripts /workspace/scripts
- ......
- Successfully installed Flask-2.1.2 brotli-1.0.9 click-8.1.3 dash-2.5.1 dash-bootstrap-components-1.2.0 dash-core-components-2.0.0 dash-cytoscape-0.3.0 dash-draggable-0.1.2 dash-html-components-2.0.0 dash-split-pane-1.0.0 dash-table-5.0.0 flask-compress-1.12 ipykernel-5.3.4 itsdangerous-2.1.2 jsonschema-3.2.0 ufw-1.0.0 ufwio-0.9.0
- root@bitmain-SYS-4028GR-TR2:/workspace/scripts# source envsetup_cmodel.sh
- /workspace/scripts /workspace/scripts
- ......
- Installing collected packages: ufw
- Successfully installed ufw-1.0.0
首先,下载原生caffe模型,并做软连接,采用model目录下脚本实现,如下:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/model# ./download_ssd_model.sh
- Downloading models_VGGNet_VOC0712_SSD_300x300.tar.gz...
- ......
- All done!
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/model# ./gen_bmodel.sh
- /workspace/examples/SSD_object/model
- ......
- Success: combined to [out/fp32_ssd300.bmodel].
- #生成的模型文件
- ./out/
- |-- fp32_ssd300.bmodel
- |-- ssd300
- `-- ssd300_4batch
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/model# ./gen_umodel_int8bmodel.sh
- /workspace/examples/SSD_object/model
- /workspace/examples/SSD_object/model /workspace/examples/SSD_object/model
- ......
- Success: combined to [out/int8_ssd300.bmodel].
- combine bmodel ok
- /workspace/examples/SSD_object/model
此时,可以看到该目录下有新目录out生成,该目录结构如下:
- .
- |-- fp32_ssd300.bmodel
- |-- int8_ssd300.bmodel
- |-- ssd300
- `-- ssd300_4batch
上章我们将原生caffe模型编译,生成了fp32、int8的bmodel,这里通过自带精度校验工具进行模型精度回归。
该回归需要借助模型迁移中生成的输入、输出数据,命令如下:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/model# bmrt_test --context_dir=./out/ssd300
- [BMRT][deal_with_options:1412] INFO:Loop num: 1
- ......
- [BMRT][bmrt_test:1043] INFO:+++ The network[VGG_VOC0712_SSD_300x300_deploy] stage[0] cmp success +++
- [BMRT][bmrt_test:1063] INFO:load input time(s): 0.031876
- [BMRT][bmrt_test:1064] INFO:calculate time(s): 0.037262
- [BMRT][bmrt_test:1065] INFO:get output time(s): 0.000046
- [BMRT][bmrt_test:1066] INFO:compare time(s): 0.006667
该部分做的主要工作是使用SDK提供的软件接口,实现模型前后处理逻辑。这里基于example中已经替换过的CPP进行编译、测试,算法源码如下,这里摘取一部分作为示例,主要是其中一些格式转换、缩放等接口替换为SDK中实现:
- // resize && split by bmcv
- for (size_t i = 0; i < input.size(); i++) {
- LOG_TS(ts_, "ssd pre-process-vpp")
- bmcv_image_vpp_convert (bm_handle_, 1, input[i], &resize_bmcv_[i], &crop_rect_);
- LOG_TS(ts_, "ssd pre-process-vpp")
- }
-
- // do linear transform
- LOG_TS(ts_, "ssd pre-process-linear_tranform")
- bmcv_image_convert_to (bm_handle_, input.size(), linear_trans_param_, resize_bmcv_, linear_trans_bmcv_);
- LOG_TS(ts_, "ssd pre-process-linear_tranform")
下面进行源码编译,【环境搭建】章节中,已经将编译需要的依赖及工具链配置好,这里直接编译即可,编译完之后,会在当前目录生成pcie、arm版本的可执行程序:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/cpp_cv_bmcv_bmrt# make -f Makefile.pcie
-
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/cpp_cv_bmcv_bmrt# make -f Makefile.arm
-
- #成果物
- |-- ssd300_cv_bmcv_bmrt.arm
- `-- ssd300_cv_bmcv_bmrt.pcie
由于docker环境下是通过PCIE方式插入BM1684(可以通过lspci命令确认),这里可以直接运行ssd300_cv_bmcv_bmrt.pcie,发现如下报错:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/cpp_cv_bmcv_bmrt# ./ssd300_cv_bmcv_bmrt.p
- cie image /workspace/res/image/vehicle_1.jpg ../model/out/fp32_ssd300.bmodel 1 0
- ./ssd300_cv_bmcv_bmrt.pcie: error while loading shared libraries: libavcodec.so.58: cannot open shared object file: No such file or directory
通过排查,发现是环境配置章节中,需要根据环境,配置PCIE或者SOC模式,按照PCIE模式重新配置后,再运行后,demo能够正常执行:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/cpp_cv_bmcv_bmrt# ./ssd300_cv_bmcv_bmrt.pcie image /workspace/res/image/vehicle_1.jpg ../model/out/fp32_ssd300.bmodel 1 0
- [/home/jenkins/workspace/all_in_one_sa5/daily_build/bmetc/sa5/middleware-soc/bm_opencv/modules/core/src/cv_bmcpu.cpp:49->InternalBMCpuRegister]total 9 devices need to enable on-chip CPU. It may need serveral minutes for loading, please be patient....
- ......
- [ ssd overall] loops: 1 avg: 679449 us
- [ read image] loops: 1 avg: 391943 us
- [ attach input] loops: 1 avg: 2291 us
- [ detection] loops: 1 avg: 86327 us
- [ ssd pre-process] loops: 1 avg: 48232 us
- [ ssd pre-process-vpp] loops: 1 avg: 1300 us
- [ssd pre-process-linear_tranform] loops: 1 avg: 46928 us
- [ ssd inference] loops: 1 avg: 37930 us
- [ ssd post-process] loops: 1 avg: 161 us
-
- [/home/jenkins/workspace/all_in_one_sa5/daily_build/bmetc/sa5/middleware-soc/bm_opencv/modules/core/src/cv_bmcpu.cpp:113->~InternalBMCpuRegister]deconstructor function is called
直接参考官方SDK中examples/nntc/bmnett示例,命令如下,直接执行模型转换脚本:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/nntc/bmnett# ./bmnett_build_bmodel.sh
- Namespace(check_ops=True, cmp=True, const_names=None, descs=None, dyn=False, enable_profile=False, input_folder='', input_names=('P
- ......
- BMLIB Send Quit Message
- Compiling succeeded.
- #成果物目录
- ./output/
- `-- vqvae
- |-- compilation.bmodel
- |-- input_ref_data.dat
- |-- io_info.dat
- `-- output_ref_data.dat
直接参考官方SDK中examples/nntc/bmnetm示例,命令如下,直接执行模型转换脚本:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/nntc/bmnetm# ./bmnetm_build_bmodel.sh
- args: Namespace(cmp=None, debug=0, dyn=False, enable_profile=False, input_data='', input_names='data', list_ops=False, log_dir='',
- ......
- I0712 11:56:00.312815 1480 bmcompiler_bmodel.cpp:154] [BMCompiler:I] save_tensor output name [softmax_output]
- BMLIB Send Quit Message
-
- #生成物目录
- ./output/
- `-- lenet
- |-- compilation.bmodel
- |-- input_ref_data.dat
- |-- io_info.dat
- `-- output_ref_data.dat
直接参考官方SDK中examples/nntc/bmnetp示例,命令如下,直接执行模型转换脚本:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/nntc/bmnetp# ./bmnetp_build_bmodel.sh
- Namespace(cmp=True, desc=None, descs=None, dyn=False, enable_profile=False, input_structure=None, log_dir
- ......
- BMLIB Send Quit Message
- Compiling succeeded.
-
- #生成物目录
- ./output/
- `-- anchors
- |-- compilation.bmodel
- |-- input_ref_data.dat
- |-- io_info.dat
- `-- output_ref_data.dat
直接参考官方SDK中examples/nntc/bmnetd示例,命令如下,直接执行模型转换脚本:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/nntc/bmnetd# ./bmnetd_build_bmodel.sh
- ......
- *** Store bmodel of BMCompiler...
- ============================================================
- BMLIB Send Quit Message
- #生成物目录
- ./output/
- `-- anchors
- |-- compilation.bmodel
- |-- input_ref_data.dat
- |-- io_info.dat
- `-- output_ref_data.dat
其他深度学习框架的模型均能够转换到onnx格式,官方example未给具体示例展示
为了减少运算量、提高模型性能等,一般都需要将模型转换为INT8,步骤如下图所示:
参考官方SDK中examples/calibration/create_lmdb_demo,先下载数据集,这里采用的是coco128数据集,命令如下(如果无法运行,可以通过chmod增加运行权限,官方未加该权限):
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/calibration/create_lmdb_demo# chmod +x download_coco128.sh
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/calibration/create_lmdb_demo# ./download_coco128.sh
- ......
- inflating: coco128/README.txt
之后制作lmdb数据库文件,后面校准需要使用到该格式数据集,注意根据实际图片路径配置,官方给的路径参数有误,命令如下:
- root@bitmain-SYS-4028GR-TR2:/workspace/examples/calibration/create_lmdb_demo# python3 convert_imageset.py --imageset_rootfolder=./coco128/images/train2017 --imageset_lmdbfolder=./coco128 --resize_height=256 --resize_width=256 --shuffle=True --bgr2rgb=False --gray=False
-
- reading image /workspace/examples/calibration/create_lmdb_demo/coco128/images/train2017/000000000634.jpg
- ......
- reading image /workspace/examples/calibration/create_lmdb_demo/coco128/images/train2017/000000000359.jpg
- original shape: (332, 500, 3)
- cv_imge after resize (256, 256, 3)
-
- #目录结构
- coco128/
- |-- LICENSE
- |-- README.txt
- |-- data.mdb //即制作的数据库文件
- |-- images
- `-- labels