• Yolov10笔记


    一、前言

            清华大学团队设计的Yolov10.

            在这项工作中,我们主要从后处理和模型结构两方面进一步优化YOLO系列模型的性能和延迟平衡。我们首先为YOLO引入了端到端训练的一致双重分配,这在大大降低推理延迟的情况下保证了性能。此外,我们针对YOLO的各组件使用效率和精度驱动的模型设计策略。这大大减少了计算冗余,并增强了模型能力。

            最终,我们获得了一个新的实时的端到端目标检测模型,即YOLOv10。广泛的实验表明,YOLOv10在各种模型规模上达到了最先进的性能和效率平衡。例如,YOLOv10-S在COCO上的类似AP下比RT-DETR-R18快1.8倍,同时有2.8倍更少的参数和FLOPs。与YOLOv9-C相比,YOLOv10-B在相同性能下延迟减少了46%,参数减少了25%。

    二、创新点

    1、一致双重匹配

            与一对多匹配不同,一对一匹配只为每个物体分配一个预测,避免了NMS后处理。然而,这会带来较弱的监督信息,导致次优的准确性和收敛速度[19]。另一方面,一对多策略则可以引入更多的正样本和监督信号,可有效弥补一对一匹配的不足[23]。因此,我们为YOLOs引入了双重标签匹配,如下图,以充分结合两种策略的优点。

    2、效率精度驱动的模型设计

    效率驱动的模型设计

    (1)、轻量级分类头

    (2)、空间-通道解耦下采样

    (3)、秩引导的块设计

    精度驱动的模型设计

    (1)、大核卷积

    (2)、部分自注意力(PSA)

    三、代码解读

    1、网络结构图

    以yolov10s模型为例。

    1. # Parameters
    2. nc: 80 # number of classes
    3. scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
    4. # [depth, width, max_channels]
    5. s: [0.33, 0.50, 1024]
    6. backbone:
    7. # [from, repeats, module, args]
    8. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
    9. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
    10. - [-1, 3, C2f, [128, True]]
    11. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
    12. - [-1, 6, C2f, [256, True]]
    13. - [-1, 1, SCDown, [512, 3, 2]] # 5-P4/16
    14. - [-1, 6, C2f, [512, True]]
    15. - [-1, 1, SCDown, [1024, 3, 2]] # 7-P5/32
    16. - [-1, 3, C2fCIB, [1024, True, True]]
    17. - [-1, 1, SPPF, [1024, 5]] # 9
    18. - [-1, 1, PSA, [1024]] # 10
    19. # YOLOv8.0n head
    20. head:
    21. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
    22. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
    23. - [-1, 3, C2f, [512]] # 13
    24. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
    25. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
    26. - [-1, 3, C2f, [256]] # 16 (P3/8-small)
    27. - [-1, 1, Conv, [256, 3, 2]]
    28. - [[-1, 13], 1, Concat, [1]] # cat head P4
    29. - [-1, 3, C2f, [512]] # 19 (P4/16-medium)
    30. - [-1, 1, SCDown, [512, 3, 2]]
    31. - [[-1, 10], 1, Concat, [1]] # cat head P5
    32. - [-1, 3, C2fCIB, [1024, True, True]] # 22 (P5/32-large)
    33. - [[16, 19, 22], 1, v10Detect, [nc]] # Detect(P3, P4, P5)

    由于完整的模型图太长,这里只截取头部处理的模型图。

    2、代码分析

    1. class v10Detect(Detect):
    2. max_det = -1
    3. def __init__(self, nc=80, ch=()):
    4. super().__init__(nc, ch)
    5. c3 = max(ch[0], min(self.nc, 100)) # channels
    6. self.cv3 = nn.ModuleList(nn.Sequential(nn.Sequential(Conv(x, x, 3, g=x), Conv(x, c3, 1)), \
    7. nn.Sequential(Conv(c3, c3, 3, g=c3), Conv(c3, c3, 1)), \
    8. nn.Conv2d(c3, self.nc, 1)) for i, x in enumerate(ch))
    9. self.one2one_cv2 = copy.deepcopy(self.cv2)
    10. self.one2one_cv3 = copy.deepcopy(self.cv3)
    11. def forward(self, x):
    12. one2one = self.forward_feat([xi.detach() for xi in x], self.one2one_cv2, self.one2one_cv3)
    13. if not self.export:
    14. one2many = super().forward(x)
    15. if not self.training:
    16. one2one = self.inference(one2one)
    17. if not self.export:
    18. return {"one2many": one2many, "one2one": one2one}
    19. else:
    20. assert(self.max_det != -1)
    21. boxes, scores, labels = ops.v10postprocess(one2one.permute(0, 2, 1), self.max_det, self.nc)
    22. return torch.cat([boxes, scores.unsqueeze(-1), labels.unsqueeze(-1)], dim=-1)
    23. else:
    24. return {"one2many": one2many, "one2one": one2one}
    1. def v10postprocess(preds, max_det, nc=80):
    2. # preds: shape[1,8400,84]
    3. assert(4 + nc == preds.shape[-1])
    4. boxes, scores = preds.split([4, nc], dim=-1) # [1,8400,4] [1,8400,80]
    5. max_scores = scores.amax(dim=-1) # [1,8400]
    6. max_scores, index = torch.topk(max_scores, max_det, axis=-1) # [1,300]
    7. index = index.unsqueeze(-1) # [1,300,1]
    8. boxes = torch.gather(boxes, dim=1, index=index.repeat(1, 1, boxes.shape[-1])) # [1,300,4]
    9. scores = torch.gather(scores, dim=1, index=index.repeat(1, 1, scores.shape[-1])) # [1,300,80]
    10. scores, index = torch.topk(scores.flatten(1), max_det, axis=-1) # 1*300
    11. labels = index % nc # 1*300
    12. index = index // nc # 1*300
    13. boxes = boxes.gather(dim=1, index=index.unsqueeze(-1).repeat(1, 1, boxes.shape[-1])) # 1*300*4
    14. return boxes, scores, labels
    1. def inference(self, x):
    2. # Inference path
    3. shape = x[0].shape # BCHW
    4. x_cat = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2)
    5. if self.dynamic or self.shape != shape:
    6. self.anchors, self.strides = (x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5))
    7. self.shape = shape
    8. if self.export and self.format in ("saved_model", "pb", "tflite", "edgetpu", "tfjs"): # avoid TF FlexSplitV ops
    9. box = x_cat[:, : self.reg_max * 4]
    10. cls = x_cat[:, self.reg_max * 4 :]
    11. else:
    12. box, cls = x_cat.split((self.reg_max * 4, self.nc), 1)
    13. if self.export and self.format in ("tflite", "edgetpu"):
    14. # Precompute normalization factor to increase numerical stability
    15. # See https://github.com/ultralytics/ultralytics/issues/7371
    16. grid_h = shape[2]
    17. grid_w = shape[3]
    18. grid_size = torch.tensor([grid_w, grid_h, grid_w, grid_h], device=box.device).reshape(1, 4, 1)
    19. norm = self.strides / (self.stride[0] * grid_size)
    20. dbox = self.decode_bboxes(self.dfl(box) * norm, self.anchors.unsqueeze(0) * norm[:, :2])
    21. else:
    22. dbox = self.decode_bboxes(self.dfl(box), self.anchors.unsqueeze(0)) * self.strides
    23. y = torch.cat((dbox, cls.sigmoid()), 1)
    24. return y if self.export else (y, x)

    记录各个模块返回参数的shape

    1. *** one2one = self.forward_feat([xi.detach() for xi in x], self.one2one_cv2, self.one2one_cv3) ===>
    2. 1*128*80*80 ==> 1*64*80*80 + 1*80*80*80 = 1*144*80*80
    3. 1*256*40*40 ==> 1*64*40*40 + 1*80*40*40 = 1*144*40*40
    4. 1*512*20*20 ==> 1*64*20*20 + 1*80*20*20 = 1*144*20*20
    5. *** one2one = self.inference(one2one) ===>
    6. 1*144*6400 + 1*144*1600 + 1*144*400 = 1*144*8400
    7. box: 1*64*8400 ==> dbox: 1*4*8400
    8. cls: 1*80*8400
    9. output: 1*4*8400 + 1*80*8400 = 1*84*8400
    10. *** boxes, scores, labels = ops.v10postprocess(one2one.permute(0, 2, 1), 300, 80)
    11. 1*84*8400 ==> boxes:1*300*4 scores:1*300 labels:1*300
    12. *** torch.cat([boxes, scores.unsqueeze(-1), labels.unsqueeze(-1)], dim=-1) ===>
    13. boxes:1*300*4 scores:1*300 labels:1*300 ==> 1*300*6

    四、参考

    1、博客:YOLOv10:实时的端到端目标检测 (qq.com)

    2、论文:[2405.14458] YOLOv10: Real-Time End-to-End Object Detection (arxiv.org)

    3、code:THU-MIG/yolov10: YOLOv10: Real-Time End-to-End Object Detection (github.com)

  • 相关阅读:
    QGIS添加在线底图
    详解CAN总线:CAN总线报文格式—数据帧
    EasyCVR集群部署如何解决项目中的海量视频接入与大并发需求?
    数据如何指导决策:优酷主客APP播转率的C端优化
    LAMP及论坛搭建
    鲲鹏arm64 centos7下官方二进制文件带证书https安装minio X509等问题解决实践
    大厂10年经验,我对Java高并发问题方案的总结,堪称教科书级
    【数据挖掘】数据挖掘、关联分析、分类预测、决策树、聚类、类神经网络与罗吉斯回归
    npm i 报错或者卡顿 range manifest for 解决
    高性能高可用的全能httpclient方法封装
  • 原文地址:https://blog.csdn.net/Goodness2020/article/details/139373260