【先楫HPM6750系列】移植轻量级AI推理框架——TinyMaix

【先楫HPM6750系列】移植轻量级AI推理框架——TinyMaix
【先楫HPM6750系列】移植轻量级AI推理框架——TinyMaix
文章目录
- 【先楫HPM6750系列】移植轻量级AI推理框架——TinyMaix
- 一、TinyMaix是什么？
  二、TinyMaix移植
  2.1 开发环境搭建
  2.2 TinyMaix移植步骤
  2.2.1 目录规划
  2.2.2 修改源码
  
  2.2.3 编译运行
  
  三、基准测试
  3.1 场景1: TM_MDL_INT8 + TM_OPT0
  3.2 场景2: TM_MDL_INT8 + TM_OPT1
  3.3 场景3: TM_MDL_FP32 + TM_OPT0
  3.4 注意事项
  
  四、代码仓
  五、参考链接
本文介绍了如何将移植轻量级AI推理框架——TinyMaix移植到国产最强RISC-V单片机HPM6750上，并就TinyMaix的几个场景进行了基准测试。

一、TinyMaix是什么？

TinyMaix是国内sipeed团队开发一个轻量级AI推理框架，官方介绍如下：

TinyMaix 是面向单片机的超轻量级的神经网络推理库，即 TinyML 推理库，可以让你在任意单片机上运行轻量级深度学习模型。

甚至在 Arduino ATmega328 (32KB Flash, 2KB RAM) 上都能基于 TinyMaix 进行手写数字识别。

TinyMaix官网提供了详细介绍，可以在本文末尾的参考链接中找到链接。

二、TinyMaix移植

本节介绍如何将TinyMaix移植到HPM6750。

2.1 开发环境搭建

先楫官方支持SDK开发环境和RT-Thread开发环境，两种开发环境的搭建方法均可在官方提供的开发板用户手册（HPM6750EVKMINI USER GUIDE.pdf 或 HPM6750EVK USER GUIDE.pdf 文件）中找到，也可以参考我此前发布的帖子，具体见本文最后的参考链接。

考虑到TinyMaix对于现已支持的MCU，基准测试都是基于裸机进行的，因此这里使用的是HPM SDK开发环境。另外，基于裸机的移植在RTOS环境下一般也可以运行。因此，对于MCU芯片的计算类开源项目的移植（例如这里的TinyMaix），最好是基于裸机进行。

使用的HPM SDK版本为0.14.0，使用的SEGGER Embedded Studio版本信息为：
```
SEGGER Embedded Studio for RISC-V
Release 6.40 Build 2022102501.51567
Windows x64

© 2014-2022 SEGGER Microcontroller GmbH
© 1997-2022 Rowley Associates Ltd.

segger-cc: version 15.0.0
segger-ld: version 4.36.0
segger-rtl: version 4.20.0

GCC/BINUTILS: built using the GNU RISC-V Toolchain version GCC 12.20/Binutils 2.39 source distribution

Clang/LLVM: built using the version 15.0.0 source distribution
1
2
3
4
5
6
7
8
9
10
11
12
13
14
```
2.2 TinyMaix移植步骤

由于TinyMaix本身的源代码文件不多，整个移植过程相对还是比较简单的。

整体基本上分为三步：
1. 目录规划；
2. 修改源码；
3. 编译运行；
下面介绍具体操作步骤。

2.2.1 目录规划

考虑到TinyMaix和hpm_sdk都是使用CMake构建的，为了不对TinyMaix进行过多侵入性修改，这里采取的策略是——添加一个中间层。具体是将HPM6750平台的CMakeLists.txt文件放在TinyMaix源码目录的上一层，如下所示：
```
hpm_sdk/app/
├── CMakeLists.txt     # HPM6750平台的CMakeLists.txt
├── src
│   └── benchmark.c
└── TinyMaix/          # TinyMaix源码目录
1
2
3
4
5
```
2.2.2 修改源码

这里在src/benchmark.c文件内容如下：
```
#include 
#include "board.h"

#define MODEL_MNIST 1
#define MODEL_CIFAR10 2
#define MODEL_VWW 3
#define MODEL_MBNET 4

#define CONFIG_MODEL MODEL_CIFAR10 // 修改这一行切换 测试程序

#define main benchmark_main
#if (CONFIG_MODEL == MODEL_MNIST)
#include "mnist/main.c"
#elif (CONFIG_MODEL == MODEL_CIFAR10)
#include "cifar10/main.c"
#elif (CONFIG_MODEL == MODEL_VWW)
#include "vww/main.c"
#elif (CONFIG_MODEL == MODEL_MBNET)
#include "mbnet/label.c"
#include "mbnet/main.c"
#endif
#undef main

int main(void)
{
    board_init();

    printf("benchmark start...\n");
    benchmark_main(0, NULL);

    __asm__("wfi");
    return 0;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
```
为了不直接拷贝基准测试代码，简化代码结构，这里使用了不太常见的：
- 直接#include 某个.c文件；
- 在#include "xxx/main.c"前面，定义宏#define main benchmark_main，之后取消宏定义；
这样实现了将TinyMaix原有的测试代码作为benchmark.c一部分，而又不与这里的main函数相冲突的目的。

PS：这里为了简便，并没有把TinyMaix放到hpm_sdk的middleware目录，实际项目中使用的话最好将TinyMaix放到middleware目录。

另外，还需要修改`tm_port.h文件：
```
diff --git a/include/tm_port.h b/include/tm_port.h
index 357fc6b..5d1768c 100644
--- a/include/tm_port.h
+++ b/include/tm_port.h
@@ -31,7 +31,7 @@ limitations under the License.
 #define TM_OPT_LEVEL    TM_OPT0
 #define TM_MDL_TYPE     TM_MDL_INT8
 #define TM_FASTSCALE    (0)         //enable if your chip don't have FPU, may speed up 1/3, but decrease accuracy
-#define TM_LOCAL_MATH   (0)         //use local math func (like exp()) to avoid libm
+#define TM_LOCAL_MATH   (1)         //use local math func (like exp()) to avoid libm
 #define TM_ENABLE_STAT  (1)         //enable mdl stat functions
 #define TM_MAX_CSIZE    (1000)      //max channel num //used if INT8 mdl  //cost TM_MAX_CSIZE*4 Byte
 #define TM_MAX_KSIZE    (5*5)       //max kernel_size   //cost TM_MAX_KSIZE*4 Byte
@@ -49,9 +49,10 @@ limitations under the License.
 #define TM_DBGL()      TM_PRINTF("###L%d\n",__LINE__);

 /******************************* DBG TIME CONFIG  ************************************/
-#include 
-#include 
-#define  TM_GET_US()       ((uint32_t)((uint64_t)clock()*1000000/CLOCKS_PER_SEC))
+#include "board.h"
+#define  TM_GET_US()       (uint32_t)(HPM_MCHTMR->MTIME * 1000000uLL / clock_get_frequency(clock_mchtmr0))

 #define TM_DBGT_INIT()     uint32_t _start,_finish;float _time;_start=TM_GET_US();
 #define TM_DBGT_START()    _start=TM_GET_US();
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
```
2.2.3 编译运行

HPM6750项目的生成命令：
```
generate_project -b hpm6750evkmini -t flash_xip -f
1
```
HPM6750项目的编译、运行，具体可以开发环境搭建文章，链接见本文末尾。

手写数字识别（mnist模型），运行后，串口输出结果如下：

三、基准测试

下面是TinyMaix四种常用的基准测试模型的基准测试，四个模型分别为：
- mnist——手写数字识别模型，输入28x28x1
- cifar——10分类模型，输入32x32x3
- vww——人体检测二分类模型，输入96x96x3，输出有无人
- mbnet——1000分类模型，输入128x128x3
3.1 场景1: TM_MDL_INT8 + TM_OPT0

Optimization Level mnist cifar vww96 mbnet128 Note
None 1.111 90.570 281.900 515.106
Level 0 1.111 90.639 281.902 515.108 *
Level 1 0.526 33.362 119.447 216.624 *
Level 2 for speed 0.461 29.313 105.515 191.370 *

3.2 场景2: TM_MDL_INT8 + TM_OPT1

Optimization Level mnist cifar vww96 mbnet128 Note
None 1.590 127.960 398.240 667.937
Level 0 1.591 128.167 398.328 667.919 *
Level 1 0.524 37.174 128.468 195.390 *
Level 2 for speed 0.446 32.818 111.636 173.945 *

3.3 场景3: TM_MDL_FP32 + TM_OPT0

Optimization Level mnist cifar vww96 mbnet128 Note
None 1.408 251.955 644.835 1163.900
Level 0 1.408 252.067 644.661 1165.107 *
Level 1 0.518 195.807 416.859 765.025 *
Level 2 for speed 0.433 190.541 384.924 706.922 *

3.4 注意事项
- 在SEGGER Embedded Studio中, 可以通过如下菜单Project 'xxx' Options -> Code -> Code Generation -> Optimization Level修改优化等级；
- 在SEGGER Embedded Studio中, 默认的堆大小设置为16384 字节（16KB），不够运行vww96 和 mbnet128 模型，你可以通过菜单 Code -> Runtime Memory Area -> Heap Size修改具体配置大小，例如可以为524288（512KB）；
- 对于FP32模型，需要将RISC-V ISA设置从默认的rv32imac改为rv32gc（Code -> Code Generation -> RIS-V ISA），确保编译器可以生成浮点数操作指令。
四、代码仓

移植代码仓（包含全部修改）：
https://github.com/xusiwei/HPM6750_TinyMaix

benchmark代码仓（包含基准测试全部代码，包括CMakeLists.txt）：
https://github.com/xusiwei/HPM6750_TinyMaix_Benchmark

五、参考链接
相关阅读:
集合、collection、list的方法
 文案生成-帮助我们应对文案创作过程中的痛点
 算法通关村-----滑动窗口高频问题
 【每日一题Day346】LC1488避免洪水泛滥 | 贪心+哈希表
 java毕业生设计医疗健康管理平台会员管理子系统计算机源码+系统+mysql+调试部署+lw
jmeter跨平台运行csv等文件
 「C#」异步编程玩法笔记-Thread、ThreadPool、Task
深入理解合成复用原则（Composition /Aggregate Reuse Principle）
路由器本地docker 下载node容器部署 thressjs文档
 博客系统cdn失效问题修复
原文地址：https://blog.csdn.net/xusiwei1236/article/details/127718958

Optimization Level	mnist	cifar	vww96	mbnet128	Note
None	1.111	90.570	281.900	515.106
Level 0	1.111	90.639	281.902	515.108	*
Level 1	0.526	33.362	119.447	216.624	*
Level 2 for speed	0.461	29.313	105.515	191.370	*

Optimization Level	mnist	cifar	vww96	mbnet128	Note
None	1.590	127.960	398.240	667.937
Level 0	1.591	128.167	398.328	667.919	*
Level 1	0.524	37.174	128.468	195.390	*
Level 2 for speed	0.446	32.818	111.636	173.945	*

Optimization Level	mnist	cifar	vww96	mbnet128	Note
None	1.408	251.955	644.835	1163.900
Level 0	1.408	252.067	644.661	1165.107	*
Level 1	0.518	195.807	416.859	765.025	*
Level 2 for speed	0.433	190.541	384.924	706.922	*

【先楫HPM6750系列】移植轻量级AI推理框架——TinyMaix

文章目录

一、TinyMaix是什么？

二、TinyMaix移植

2.1 开发环境搭建

2.2 TinyMaix移植步骤

2.2.1 目录规划

2.2.2 修改源码

2.2.3 编译运行

三、基准测试

3.1 场景1: TM_MDL_INT8 + TM_OPT0

3.2 场景2: TM_MDL_INT8 + TM_OPT1

3.3 场景3: TM_MDL_FP32 + TM_OPT0

3.4 注意事项

四、代码仓

五、参考链接