yolov5 应用整理

适用于0基础人员与有一定yolov5了解的人员食用.

关于yolov5参考:

平台	库
x86/arm cpu	ncnn
x86	libtorch/pytorch
BM1684	算能标准库(需要进行模型转换)
昇腾	cann(ascend api)

https://gitee.com/Tencent/ncnn
https://github.com/alibaba/MNN

图像解码的延申

在所有嵌入式CPU平台进行图像的解码后都会有宽和高的对其,一般16|64位对其处理.
所有的硬件解码后的格式几乎都是NV12格式.
参考: ffmpeg的功能:

#include 
#define FFALIGN(x, a) (((x)+(a)-1)&~((a)-1))

FFALIGN(1080, 16)  #进行16位对其
1
2
3
4

笔者经历过的设备:

瑞芯微RK3399: mpp编解码时宽和高的对其均为stride=16, 使用时需要硬件的rga功能进行缩放,截图,转换格式处理, 即: MppFrame,
https://github.com/rockchip-linux/mpp
笔者在算能BM1684中,依据API只需要进行宽度64对齐.

前处理

batch:数量
channel:通道
height, width: 高宽
常见输入格式 BCHW, BHWC
而数据类型有: int8, float, float16,每个数据类型有不同的内存占用情况,libtorch时需要进行转换

#rgb颜色改变, FP32数据
cv::cvtColor(img_input, img_input, cv::COLOR_BGR2RGB);  // BGR -> RGB
img_input.convertTo(img_input, CV_32FC3, 1.0f / 255.0f);  // normalization 1/255

#BHWC转BCHW格式转换
tensor_img = tensor_img.permute({ 0, 3, 1, 2 }).contiguous();  // BHWC -> BCHW (Batch, Channel, Height, Width)
1
2
3
4
5
6

后处理数据

可以用 netron 去查看模型信息

如得到yolov5模型的输出为: float32[1,25200,85]
一个yolov5 onnx模型输出
如何得到目标的坐标呢? 可以理解为一个三维数组 out[1][25200][85]
25200: 有25200个目标,需要进行过滤处理.
85: [0] [1]为中心点xy坐标;
[2] [3] 为wh;
[4]为置信度,
后面的80个为80个分类的置信度,取最大值的位置为分类目标, 值*[4]=最终置信度.
然后进行nms处理, 过滤掉目标多余的坐标框.

BM1684

参考: 算能demon https://github.com/sophgo/sophon-demo
官方的Demon对图片转换有些不足,你需要进行对其处理的完善.

算能官方将ffmpeg与opencv两个库进行了改编,增加了自己的编解码,入门难度简单很多.
对于ffmpeg只需要进行AVFrame与bm_image的相互转换即可.
目前bm1684官方说明为64对其.

	AVFrame *dst = av_frame_alloc();
	dst->width = src->width;
	dst->height = src->height;
	dst->format = AV_PIX_FMT_YUV420P;
	av_frame_get_buffer(dst, 64);
1
2
3
4
5

昇腾

见昇腾社区:
https://www.hiascend.com/document/detail/zh/canncommercial/63RC2/inferapplicationdev/aclcppdevg/aclcppdevg_0000.html

目前已经研究出昇腾310 上的yolov5的转换, 昇腾在yolov5的前处理进行AIPP功能增加,使其能进行yuv数据的输入.

目前经过研究:在鲲鹏atlas800-3000中,纯使用cpu进行ffmpeg拉流解码可以达到80路视频

atc转换om模型说明

需要先将pt模型转换成onnx模型

# 将yolov5s.onnx模型转换为om模型
# 使用aipp时,输入格式为多种,这里是使用int8数据的NV12图片640大小输入
atc --model=yolov5s.onnx --framework=5 --output=yolov5s_aipp --input_format=NCHW --input_shape="input_image:1,3,640,640" --log=info --soc_version=Ascend310B1 --insert_op_conf=./aipp_YOLOv5.config 
1
2
3

aipp_YOLOv5.config参考昇腾CANN => 色域转换配置说明

aipp_op {
    aipp_mode:static
    input_format : YUV420SP_U8
    src_image_size_w : 640
    src_image_size_h : 640
    crop: false
    load_start_pos_h : 0
    load_start_pos_w : 0
    crop_size_w : 640
    crop_size_h: 640
    csc_switch : true
    rbuv_swap_switch : false
    # 色域转换
    matrix_r0c0: 256
    matrix_r0c1: 0
    matrix_r0c2: 359
    matrix_r1c0: 256
    matrix_r1c1: -88
    matrix_r1c2: -183
    matrix_r2c0: 256
    matrix_r2c1: 454
    matrix_r2c2: 0
    input_bias_0: 0
    input_bias_1: 128
    input_bias_2: 128
    # 均值归一化
    min_chn_0 : 0
    min_chn_1 : 0
    min_chn_2 : 0
    var_reci_chn_0: 0.003921568627451
    var_reci_chn_1: 0.003921568627451
    var_reci_chn_2: 0.003921568627451
    }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

通过其API来获取模型的信息:

#使用RGB输入格式时(可以用opencv读取图片/视频缩放后直接输入)
input[0] size:4915200 fmt:0(ACL_FORMAT_NCHW) dtype:0(ACL_FLOAT) dims(4):input_image= 1,3,640,640
#其中输入数据长度计算方式: 数据大小见 `sizeof(FLOAT)=4`
#1*640*640*3*4=4,915,200

#使用AIPP的NV12格式时(需要dvpp进行解码成NV12并进行缩放)
input num:1
input[0] size:614400 fmt:1(ACL_FORMAT_NHWC) dtype:4(ACL_UINT8) dims(4):input_image
1,640,640,3,
out num:1
output[0] fmt:2(ACL_FORMAT_ND) dtype:0(ACL_FLOAT) dims(3):Concat_309:0:output0
1,25200,85,
# 其中: 1*640*640*1.5=614400 
1
2
3
4
5
6
7
8
9
10
11
12
13

一个调用模型并检测视频的例子

import cv2
import torch
import numpy as np
import time
import datetime

from models.experimental import attempt_load
from utils.general import non_max_suppression, scale_coords, xyxy2xywh, check_img_size, xywh2xyxy
from utils.torch_utils import select_device
from utils.augmentations import letterbox

video_path = "Y:\\220411150420220411_150441.mp4"

if __name__ == '__main__':

    # 指定你的Yolov5权重文件的路径
    model_weights = "../yolov5s.pt"
    model_weights = "E:\\src\\yolov5\\lxmodels\\202204\\best.pt"

    device = select_device('cuda:0')  # 使用GPU，如果没有GPU，可以使用'cpu'

    # 加载Yolov5模型
    model = attempt_load(model_weights, map_location=device)

    # 设置模型为评估模式
    # model.eval()

    stride = int(model.stride.max())  # model stride
    names = model.module.names if hasattr(model, 'module') else model.names  # get class names
    imgsz = check_img_size([640, 640], s=stride)  # check image size
    model(torch.zeros(1, 3, *imgsz).to(device).type_as(next(model.parameters())))  # run once

    print(names)

    # 打开视频文件
    cap = cv2.VideoCapture(video_path)
    print(f"打开视频{video_path}")
    frame_index = 0

    cv2.namedWindow("Yolov5 Object Detection", cv2.WINDOW_NORMAL)

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        frame_index = frame_index + 1

        #frame = cv2.imread("test.jpg")

        bkimage = cv2.copyTo(frame, None)

        # 使用Yolov5进行目标检测
        img = letterbox(frame)[0]

        # Convert
        img = img.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
        img = np.ascontiguousarray(img)

        img = torch.from_numpy(img).to(device)
        img = img.float()  # uint8 to fp16/32
        img = img / 255.0  # 0 - 255 to 0.0 - 1.0
        if len(img.shape) == 3:
            img = img[None]  # expand for batch dim

        pred = model(img)[0]
        pred = non_max_suppression(pred, conf_thres=0.3, iou_thres=0.5)

        for i, det in enumerate(pred):  # per image
            gn = torch.tensor(frame.shape)[[1, 0, 1, 0]]  # normalization gain whwh
            # s = '%gx%g ' % img.shape[2:]  # print string

            if len(det):
                det[:, :4] = scale_coords(img.shape[2:], det[:, :4], frame.shape).round()

                # Print results
                for c in det[:, -1].unique():
                    n = (det[:, -1] == c).sum()  # detections per class
                    # s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string

                # Write results
                for *xyxy, conf, cls in reversed(det):
                    if conf < 0.88:
                        continue
                    # [x1, y1, x2, y2]
                    # xywh = xyxy2xywh(torch.tensor(xyxy).view(1, 4)).view(-1).tolist()

                    # label format
                    # line = (cls, *xywh, conf)
                    label = f'{names[int(cls)]} {conf:.2f}'
                    # 截图
                    # try:
                    # >>> 814-306 854-331 conf:0.8818694949150085
                    # 834.0 318.5 40.0 25.0
                    # print(f">>> {int(xyxy[0])}-{int(xyxy[1])} {int(xyxy[2])}-{int(xyxy[3])} conf:{conf}")
                    # print(f"{xywh[0]} {xywh[1]} {xywh[2]} {xywh[3]}")

                    # crop = frame[int(xyxy[1]):int(xyxy[3]), int(xyxy[0]):int(xyxy[2])]
                    # date_time = datetime.datetime.fromtimestamp(int(time.time() * 1000) / 1000.0).strftime(
                    #     "%Y%m%d%H%M%S-%f")
                    # cv2.imwrite(f"crop/{date_time}-{conf}.jpg", crop)
                    # # except Exception as e:
                    # #    print(f"异常: {int(xyxy[0])}-{int(xyxy[1])} {int(xyxy[2])}-{int(xyxy[3])} conf:{conf}")
                    #
                    # y2 = int(xyxy[3])
                    # if y2 > (frame.shape[0] - 100):
                    #     continue

                    # 绘图
                    frame = cv2.rectangle(frame, (int(xyxy[0]), int(xyxy[1])), (int(xyxy[2]), int(xyxy[3])),
                                          (0, 255, 0), 2)
                    frame = cv2.putText(frame, label, (int(xyxy[0]), int(xyxy[1]) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
                                        (0, 255, 0), 2)

            cv2.imshow('Yolov5 Object Detection', frame)
            key = cv2.waitKey(10) & 0xFF
            #key = 0
            print(f"frame_index={frame_index}")

            if key == ord('w'):
                # 获取当前时间戳（到毫秒级）
                timestamp = int(time.time() * 1000)

                # 将时间戳转换为易于阅读的日期和时间格式，包括毫秒
                date_time = datetime.datetime.fromtimestamp(timestamp / 1000.0).strftime("%Y-%m-%d_%H-%M-%S-%f")
                output_filename = f"snap/{date_time}.jpg"
                cv2.imwrite(output_filename, bkimage)
                # 保存图片
                print(f"保存图片{output_filename}")

            if key == ord('q'):
                break

    # cap.release()
    # cv2.destroyAllWindows()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135

相关阅读:
华为云认证的售前工程师是什么？
uni-app入门:页面布局之window和tabbar
SEO业务适合什么代理IP？2023海外代理IP推荐排名
 torch.cuda.is_available() 解决方案
 【UCIe】UCIe Data to Clock
通过Go语言创建CA与签发证书
 面试系列分布式事务：谈谈2PC的理解
 LINUX SHELL 脚本配置阿里yum仓库
 Vatee万腾平台：数字经济时代的智能金融解决方案
 解决 Spring Boot 与 springfox 的 NullPointerException 问题
原文地址：https://blog.csdn.net/cjqqschoolqq/article/details/132641970