• BM1684实战记录


    一、环境搭建

    工作需要,用到了算能的1684芯片,在此记录一下基于官方手册的实操过程

    参考官方手册:BMNNSDK2 入门手册

    链接:https://sophgo-doc.gitbook.io/bmnnsdk2-bm1684

    1.1 服务器环境

            SDK复现需要借助一定的开发环境,这里基于公司公有服务器,通过ssh方式使用,步骤如下: 

    1. 申请特定服务器账号

    2. 借助ssh工具,登录到服务器,由于电脑没法安装软件,这里采用了win10自带的cmd终端(需要接入内网),也可以采用常用的MobaXterm、SecureCRT等软件

    3. 下载必要的成果物:SDK最新包+docker镜像,可以采用wget命令:
      1. # docker镜像
      2. wget https://sophon-file.sophon.cn/sophon-prod-s3/drive/22/03/19/13/bmnnsdk2-bm1684-ubuntu-docker-py37.zip
      3. #SDK
      4. wget https://sophon-file.sophon.cn/sophon-prod-s3/drive/22/05/31/11/bmnnsdk2_bm1684_v2.7.0_20220531patched.zip

    下载好的成果物如下:

    1.2 SDK环境

            通过上述操作,我们已经下载了必备的成果物,这里先解压SDK整包,解压后,可以通过校验MD5码,防止文件被篡改,带来一些不必要的麻烦,命令如下:

    1. (base) xxx@bitmain-SYS-4028GR-TR2:~$unzip bmnnsdk2_bm1684_v2.7.0_20220531patched.zip
    2. Archive: bmnnsdk2_bm1684_v2.7.0_20220531patched.zip
    3. creating: bmnnsdk2_bm1684_v2.7.0_20220531patched/
    4. inflating: bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2.MD5
    5. inflating: __MACOSX/bmnnsdk2_bm1684_v2.7.0_20220531patched/._bmnnsdk2.MD5
    6. inflating: bmnnsdk2_bm1684_v2.7.0_20220531patched/release_version.txt
    7. inflating: __MACOSX/bmnnsdk2_bm1684_v2.7.0_20220531patched/._release_version.txt
    8. inflating: bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0.tar.gz
    9. inflating: __MACOSX/bmnnsdk2_bm1684_v2.7.0_20220531patched/._bmnnsdk2-bm1684_v2.7.0.tar.gz
    10. (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched$cat bmnnsdk2.MD5
    11. 6ae7d9b5a8564eb66f4f820319c2d39f ./bmnnsdk2-bm1684_v2.7.0.tar.gz
    12. bf2c860701575909e43b964011694c8f ./release_version.txt
    13. (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched$md5sum ./*
    14. 6ae7d9b5a8564eb66f4f820319c2d39f ./bmnnsdk2-bm1684_v2.7.0.tar.gz
    15. 7719bf8cd5d5de8388ebcddda6f2c4be ./bmnnsdk2.MD5
    16. bf2c860701575909e43b964011694c8f ./release_version.txt

     继续解压缩SDK真正成果物,如下:

    1. (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched$tar -zxvf bmnnsdk2-bm1684_v2.7.0.tar.gz
    2. bmnnsdk2-bm1684_v2.7.0/
    3. bmnnsdk2-bm1684_v2.7.0/release_version.txt
    4. ......

    至此,SDK包的环境已经处理完毕。

    1.3 docker环境

            经过上述操作,我们已经进入到服务器环境,并且下载好了相关成果物。为了方便便捷复现SDK,这里直接基于官方docker镜像,不再采用自搭docker。

            docker采用ubuntu-docker-py37,首先需要解压该docker压缩包,解压缩后,可以通过校验MD5码,防止文件被篡改,带来一些不必要的麻烦,命令如下:

    1. base) xxx@bitmain-SYS-4028GR-TR2:~$unzip bmnnsdk2-bm1684-ubuntu-docker-py37.zip
    2. Archive: bmnnsdk2-bm1684-ubuntu-docker-py37.zip
    3. creating: bmnnsdk2-bm1684-ubuntu-docker-py37/
    4. extracting: bmnnsdk2-bm1684-ubuntu-docker-py37/bmnnsdk2-bm1684-ubuntu.docker
    5. extracting: bmnnsdk2-bm1684-ubuntu-docker-py37/bmnnsdk2.MD5
    6. extracting: bmnnsdk2-bm1684-ubuntu-docker-py37/Dockerfile.bm1684
    7. extracting: bmnnsdk2-bm1684-ubuntu-docker-py37/release_version.txt
    8. (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2-bm1684-ubuntu-docker-py37$cat bmnnsdk2.MD5
    9. cf91eb0ff60f28e368bba1c357d2e7e5 ./Dockerfile.bm1684
    10. c181ce60245b4fe07596d8a360944903 ./release_version.txt
    11. 105a4d5d13a41d97353fd2dab88b4802 ./bmnnsdk2-bm1684-ubuntu.docker
    12. (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2-bm1684-ubuntu-docker-py37$md5sum ./*
    13. 105a4d5d13a41d97353fd2dab88b4802 ./bmnnsdk2-bm1684-ubuntu.docker
    14. 7b1fdecee114e6d2d82c21286e9b1a39 ./bmnnsdk2.MD5
    15. cf91eb0ff60f28e368bba1c357d2e7e5 ./Dockerfile.bm1684
    16. c181ce60245b4fe07596d8a360944903 ./release_version.txt

            参考官方说明,SDK包中有docker运行的脚本docker_run_bmnnsdk.sh,不过考虑到当前公用服务器,该脚本大概率会被执行了很多遍,相关container已经被多次创建,这里为了方便识别,需要修改脚本中内容,重命名container名称,脚本修改点如下:

    1. if [ -c "/dev/bm-sophon0" ]; then
    2. for dev in $(ls /dev/bm-sophon*);
    3. do
    4. mount_options+="--device="$dev:$dev" "
    5. done
    6. CMD="docker run \
    7. --name ubuntu16.0-py37-wnb \
    8. --network=host \
    9. --workdir=/workspace \
    10. --privileged=true \
    11. ${mount_options} \
    12. --device=/dev/bmdev-ctl:/dev/bmdev-ctl \
    13. -v /dev/shm --tmpfs /dev/shm:exec \
    14. -v $WORKSPACE:/workspace \
    15. -v /dev:/dev \
    16. -v /etc/localtime:/etc/localtime \
    17. -e LOCAL_USER_ID=`id -u` \
    18. -it $REPO/$IMAGE:$TAG \
    19. bash
    20. "
    21. else
    22. CMD="docker run \
    23. --name ubuntu16.0-py37-wnb \
    24. --network=host \
    25. --workdir=/workspace \
    26. --privileged=true \
    27. -v $WORKSPACE:/workspace \
    28. -v /dev/shm --tmpfs /dev/shm:exec \
    29. -v /etc/localtime:/etc/localtime \
    30. -e LOCAL_USER_ID=`id -u` \
    31. -it $REPO/$IMAGE:$TAG \
    32. bash
    33. "
    34. fi

    下面创建container,采用官方脚本,容器创建后,会默认进入,命令如下:

    1. (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0$./docker_run_bmnnsdk.sh
    2. /mnt/sdb2/xxx/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0
    3. /mnt/sdb2/xxx/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0
    4. bmnnsdk2-bm1684/dev:ubuntu16.04
    5. docker run --name ubuntu16.0-py37-wnb --network=host --workdir=/workspace --privileged=true --device=/dev/bm-sophon0:/dev/bm-sophon0 --device=/dev/bm-sophon1:/dev/bm-sophon1 --device=/dev/bm-sophon2:/dev/bm-sophon2 --device=/dev/bm-sophon3:/dev/bm-sophon3 --device=/dev/bm-sophon4:/dev/bm-sophon4 --device=/dev/bm-sophon5:/dev/bm-sophon5 --device=/dev/bm-sophon6:/dev/bm-sophon6 --device=/dev/bm-sophon7:/dev/bm-sophon7 --device=/dev/bm-sophon8:/dev/bm-sophon8 --device=/dev/bmdev-ctl:/dev/bmdev-ctl -v /dev/shm --tmpfs /dev/shm:exec -v /mnt/sdb2/xxx/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0:/workspace -v /dev:/dev -v /etc/localtime:/etc/localtime -e LOCAL_USER_ID=1032 -it bmnnsdk2-bm1684/dev:ubuntu16.04 bash
    6. root@bitmain-SYS-4028GR-TR2:/workspace#

    注:

            上述方式运行的container,在退出后,container会自动退出,为了方便反复使用,可以通过如下命令进入:

    1. (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0$docker start ubuntu16.0-py37-wnb
    2. ubuntu16.0-py37-wnb
    3. (base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0$docker exec -it ubuntu16.0-py37-wnb bash
    4. root@bitmain-SYS-4028GR-TR2:/workspace#

    至此,基本环境就搭建完毕了。

    二、example重现

    下面基于上述环境,进行SDK中example重现,目录结构如下:

    1. #examples目录结构
    2. .
    3. |-- Resnet_classify
    4. |-- RetinaFace
    5. |-- SSD_object
    6. |-- YOLOX_object
    7. |-- YOLOv3_object
    8. |-- YOLOv5_object
    9. |-- calibration
    10. |-- centernet
    11. |-- multimedia
    12. |-- nntc
    13. |-- okkernel
    14. `-- sail

    在复现example之前,还需要在docker中安装SDK中必须库和设置环境变量,命令如下:

    1. root@bitmain-SYS-4028GR-TR2:/workspace/scripts# ./install_lib.sh nntc
    2. linux is Ubuntu16.04.5LTS\n\l
    3. bmnetc and bmlang USING_CXX11_ABI=1
    4. Install lib done !
    5. root@bitmain-SYS-4028GR-TR2:/workspace/scripts# source envsetup_pcie.sh
    6. /workspace/scripts /workspace/scripts
    7. ......
    8. Successfully installed Flask-2.1.2 brotli-1.0.9 click-8.1.3 dash-2.5.1 dash-bootstrap-components-1.2.0 dash-core-components-2.0.0 dash-cytoscape-0.3.0 dash-draggable-0.1.2 dash-html-components-2.0.0 dash-split-pane-1.0.0 dash-table-5.0.0 flask-compress-1.12 ipykernel-5.3.4 itsdangerous-2.1.2 jsonschema-3.2.0 ufw-1.0.0 ufwio-0.9.0
    9. root@bitmain-SYS-4028GR-TR2:/workspace/scripts# source envsetup_cmodel.sh
    10. /workspace/scripts /workspace/scripts
    11. ......
    12. Installing collected packages: ufw
    13. Successfully installed ufw-1.0.0

    2.1 SSD_object(caffe)

    2.1.1模型迁移

            首先,下载原生caffe模型,并做软连接,采用model目录下脚本实现,如下:

    1. root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/model# ./download_ssd_model.sh
    2. Downloading models_VGGNet_VOC0712_SSD_300x300.tar.gz...
    3. ......
    4. All done!

    • bmodel(fp32)模型生成:将原生模型转换成适合算能TPU的bmodel(fp32)模型,命令如下:
    1. root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/model# ./gen_bmodel.sh
    2. /workspace/examples/SSD_object/model
    3. ......
    4. Success: combined to [out/fp32_ssd300.bmodel].
    5. #生成的模型文件
    6. ./out/
    7. |-- fp32_ssd300.bmodel
    8. |-- ssd300
    9. `-- ssd300_4batch

    • bmodel(int8)模型生成:将原生模型转换成int8的bmodel模型,中间会将模型先转换为fp32格式的umodel格式(UFamework下的模型格式)模型,之后再借助该中间模型生成int8的umodel模型,最后再生成int8的bmodel模型,命令如下:
    1. root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/model# ./gen_umodel_int8bmodel.sh
    2. /workspace/examples/SSD_object/model
    3. /workspace/examples/SSD_object/model /workspace/examples/SSD_object/model
    4. ......
    5. Success: combined to [out/int8_ssd300.bmodel].
    6. combine bmodel ok
    7. /workspace/examples/SSD_object/model

    此时,可以看到该目录下有新目录out生成,该目录结构如下:

    1. .
    2. |-- fp32_ssd300.bmodel
    3. |-- int8_ssd300.bmodel
    4. |-- ssd300
    5. `-- ssd300_4batch

    2.1.2 精度回归

    上章我们将原生caffe模型编译,生成了fp32、int8的bmodel,这里通过自带精度校验工具进行模型精度回归。

            该回归需要借助模型迁移中生成的输入、输出数据,命令如下:        

    1. root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/model# bmrt_test --context_dir=./out/ssd300
    2. [BMRT][deal_with_options:1412] INFO:Loop num: 1
    3. ......
    4. [BMRT][bmrt_test:1043] INFO:+++ The network[VGG_VOC0712_SSD_300x300_deploy] stage[0] cmp success +++
    5. [BMRT][bmrt_test:1063] INFO:load input time(s): 0.031876
    6. [BMRT][bmrt_test:1064] INFO:calculate time(s): 0.037262
    7. [BMRT][bmrt_test:1065] INFO:get output time(s): 0.000046
    8. [BMRT][bmrt_test:1066] INFO:compare time(s): 0.006667

    2.1.3 算法迁移

            该部分做的主要工作是使用SDK提供的软件接口,实现模型前后处理逻辑。这里基于example中已经替换过的CPP进行编译、测试,算法源码如下,这里摘取一部分作为示例,主要是其中一些格式转换、缩放等接口替换为SDK中实现:

    1. // resize && split by bmcv
    2. for (size_t i = 0; i < input.size(); i++) {
    3. LOG_TS(ts_, "ssd pre-process-vpp")
    4. bmcv_image_vpp_convert (bm_handle_, 1, input[i], &resize_bmcv_[i], &crop_rect_);
    5. LOG_TS(ts_, "ssd pre-process-vpp")
    6. }
    7. // do linear transform
    8. LOG_TS(ts_, "ssd pre-process-linear_tranform")
    9. bmcv_image_convert_to (bm_handle_, input.size(), linear_trans_param_, resize_bmcv_, linear_trans_bmcv_);
    10. LOG_TS(ts_, "ssd pre-process-linear_tranform")

            下面进行源码编译,【环境搭建】章节中,已经将编译需要的依赖及工具链配置好,这里直接编译即可,编译完之后,会在当前目录生成pcie、arm版本的可执行程序:

    1. root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/cpp_cv_bmcv_bmrt# make -f Makefile.pcie
    2. root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/cpp_cv_bmcv_bmrt# make -f Makefile.arm
    3. #成果物
    4. |-- ssd300_cv_bmcv_bmrt.arm
    5. `-- ssd300_cv_bmcv_bmrt.pcie

            由于docker环境下是通过PCIE方式插入BM1684(可以通过lspci命令确认),这里可以直接运行ssd300_cv_bmcv_bmrt.pcie,发现如下报错:

    1. root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/cpp_cv_bmcv_bmrt# ./ssd300_cv_bmcv_bmrt.p
    2. cie image /workspace/res/image/vehicle_1.jpg ../model/out/fp32_ssd300.bmodel 1 0
    3. ./ssd300_cv_bmcv_bmrt.pcie: error while loading shared libraries: libavcodec.so.58: cannot open shared object file: No such file or directory

            通过排查,发现是环境配置章节中,需要根据环境,配置PCIE或者SOC模式,按照PCIE模式重新配置后,再运行后,demo能够正常执行:

    1. root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/cpp_cv_bmcv_bmrt# ./ssd300_cv_bmcv_bmrt.pcie image /workspace/res/image/vehicle_1.jpg ../model/out/fp32_ssd300.bmodel 1 0
    2. [/home/jenkins/workspace/all_in_one_sa5/daily_build/bmetc/sa5/middleware-soc/bm_opencv/modules/core/src/cv_bmcpu.cpp:49->InternalBMCpuRegister]total 9 devices need to enable on-chip CPU. It may need serveral minutes for loading, please be patient....
    3. ......
    4. [ ssd overall] loops: 1 avg: 679449 us
    5. [ read image] loops: 1 avg: 391943 us
    6. [ attach input] loops: 1 avg: 2291 us
    7. [ detection] loops: 1 avg: 86327 us
    8. [ ssd pre-process] loops: 1 avg: 48232 us
    9. [ ssd pre-process-vpp] loops: 1 avg: 1300 us
    10. [ssd pre-process-linear_tranform] loops: 1 avg: 46928 us
    11. [ ssd inference] loops: 1 avg: 37930 us
    12. [ ssd post-process] loops: 1 avg: 161 us
    13. [/home/jenkins/workspace/all_in_one_sa5/daily_build/bmetc/sa5/middleware-soc/bm_opencv/modules/core/src/cv_bmcpu.cpp:113->~InternalBMCpuRegister]deconstructor function is called

    2.2 VQ-VAE(tensorflow)

            直接参考官方SDK中examples/nntc/bmnett示例,命令如下,直接执行模型转换脚本:

    1. root@bitmain-SYS-4028GR-TR2:/workspace/examples/nntc/bmnett# ./bmnett_build_bmodel.sh
    2. Namespace(check_ops=True, cmp=True, const_names=None, descs=None, dyn=False, enable_profile=False, input_folder='', input_names=('P
    3. ......
    4. BMLIB Send Quit Message
    5. Compiling succeeded.
    6. #成果物目录
    7. ./output/
    8. `-- vqvae
    9. |-- compilation.bmodel
    10. |-- input_ref_data.dat
    11. |-- io_info.dat
    12. `-- output_ref_data.dat

    2.3 LeNet(MXNet)

    直接参考官方SDK中examples/nntc/bmnetm示例,命令如下,直接执行模型转换脚本:

    1. root@bitmain-SYS-4028GR-TR2:/workspace/examples/nntc/bmnetm# ./bmnetm_build_bmodel.sh
    2. args: Namespace(cmp=None, debug=0, dyn=False, enable_profile=False, input_data='', input_names='data', list_ops=False, log_dir='',
    3. ......
    4. I0712 11:56:00.312815 1480 bmcompiler_bmodel.cpp:154] [BMCompiler:I] save_tensor output name [softmax_output]
    5. BMLIB Send Quit Message
    6. #生成物目录
    7. ./output/
    8. `-- lenet
    9. |-- compilation.bmodel
    10. |-- input_ref_data.dat
    11. |-- io_info.dat
    12. `-- output_ref_data.dat

    2.4 Anchors(Pytorch)

            直接参考官方SDK中examples/nntc/bmnetp示例,命令如下,直接执行模型转换脚本:

    1. root@bitmain-SYS-4028GR-TR2:/workspace/examples/nntc/bmnetp# ./bmnetp_build_bmodel.sh
    2. Namespace(cmp=True, desc=None, descs=None, dyn=False, enable_profile=False, input_structure=None, log_dir
    3. ......
    4. BMLIB Send Quit Message
    5. Compiling succeeded.
    6. #生成物目录
    7. ./output/
    8. `-- anchors
    9. |-- compilation.bmodel
    10. |-- input_ref_data.dat
    11. |-- io_info.dat
    12. `-- output_ref_data.dat

    2.5 Yolov3-tiny(Darknet)

            直接参考官方SDK中examples/nntc/bmnetd示例,命令如下,直接执行模型转换脚本:

    1. root@bitmain-SYS-4028GR-TR2:/workspace/examples/nntc/bmnetd# ./bmnetd_build_bmodel.sh
    2. ......
    3. *** Store bmodel of BMCompiler...
    4. ============================================================
    5. BMLIB Send Quit Message
    6. #生成物目录
    7. ./output/
    8. `-- anchors
    9. |-- compilation.bmodel
    10. |-- input_ref_data.dat
    11. |-- io_info.dat
    12. `-- output_ref_data.dat

    2.6 Onnx&Paddle

    其他深度学习框架的模型均能够转换到onnx格式,官方example未给具体示例展示

    三、实战

    3.1 模型迁移

    为了减少运算量、提高模型性能等,一般都需要将模型转换为INT8,步骤如下图所示:

     

    3.1.1 量化数据集准备

            参考官方SDK中examples/calibration/create_lmdb_demo,先下载数据集,这里采用的是coco128数据集,命令如下(如果无法运行,可以通过chmod增加运行权限,官方未加该权限):

    1. root@bitmain-SYS-4028GR-TR2:/workspace/examples/calibration/create_lmdb_demo# chmod +x download_coco128.sh
    2. root@bitmain-SYS-4028GR-TR2:/workspace/examples/calibration/create_lmdb_demo# ./download_coco128.sh
    3. ......
    4. inflating: coco128/README.txt

            之后制作lmdb数据库文件,后面校准需要使用到该格式数据集,注意根据实际图片路径配置,官方给的路径参数有误,命令如下:

    1. root@bitmain-SYS-4028GR-TR2:/workspace/examples/calibration/create_lmdb_demo# python3 convert_imageset.py --imageset_rootfolder=./coco128/images/train2017 --imageset_lmdbfolder=./coco128 --resize_height=256 --resize_width=256 --shuffle=True --bgr2rgb=False --gray=False
    2. reading image /workspace/examples/calibration/create_lmdb_demo/coco128/images/train2017/000000000634.jpg
    3. ......
    4. reading image /workspace/examples/calibration/create_lmdb_demo/coco128/images/train2017/000000000359.jpg
    5. original shape: (332, 500, 3)
    6. cv_imge after resize (256, 256, 3)
    7. #目录结构
    8. coco128/
    9. |-- LICENSE
    10. |-- README.txt
    11. |-- data.mdb //即制作的数据库文件
    12. |-- images
    13. `-- labels

  • 相关阅读:
    CENTOS 7基于ISO文件进行安装新软件
    深度强化学习技术概述
    多重背包问题
    速锐得解码新款坦克300网关(Gateway)采集CAN总线数据实操过程
    亚商投资顾问 早餐FM/1129冰雪消费升温
    go Gorm连接数据库,并实现增删改查操作
    c++ lambda 表达式
    有几种人工神经网络算法,人工神经网络算法实例
    2.MySQL的调控按钮——启动选项和系统变量
    力扣207、课程表 【图】
  • 原文地址:https://blog.csdn.net/captain_wangnb/article/details/125900780