💡💡💡本文独家改进:手把手教程,解决注意力机制引入到YOLOv8在自己数据集不涨点的问题点,本文提供五种改进方法来解决此问题;
ContextAggregation | 亲测在血细胞检测项目中涨点,提供五种改进方法,最大 map@0.5 从原始0.895提升至0.916
数据来源于医疗相关数据集,目的是解决血细胞检测问题。任务是通过显微图像读数来检测每张图像中的所有红细胞(RBC)、白细胞(WBC)以及血小板 (Platelets)共三类
意义:选择该数据集的原因是我们血液中RBC、WBC和血小板的密度提供了大量关于免疫系统和血红蛋白的信息,这些信息可以帮助我们初步地识别一个人是否健康,如果在其血液中发现了任何差异,我们就可以迅速采取行动来进行下一步的诊断。然而通过显微镜手动查看样品是一个繁琐的过程,这也是深度学习模式能够发挥重要作用的地方,YOLOv8可以从显微图像中分类和检测血细胞,并且达到很高的精确度。
数据集大小:364张
检测难点:1)类别不平衡;2)同个类别相互遮挡、不同类别相互遮挡;3)检测物长宽差异较大;等
论文:https://arxiv.org/abs/2106.01401
摘要
卷积神经网络(CNNs)在计算机视觉中无处不在,具有无数有效和高效的变化。最近,Container——最初是在自然语言处理中引入的——已经越来越多地应用于计算机视觉。早期的用户继续使用CNN的骨干,最新的网络是端到端无CNN的Transformer解决方案。最近一个令人惊讶的发现表明,一个简单的基于MLP的解决方案,没有任何传统的卷积或Transformer组件,可以产生有效的视觉表示。虽然CNN、Transformer和MLP-Mixers可以被视为完全不同的架构,但我们提供了一个统一的视图,表明它们实际上是在神经网络堆栈中聚合空间上下文的更通用方法的特殊情况。我们提出了Container(上下文聚合网络),一个用于多头上下文聚合的通用构建块,它可以利用Container的长期交互作用,同时仍然利用局部卷积操作的诱导偏差,导致更快的收敛速度,这经常在CNN中看到。我们的Container架构在ImageNet上使用22M参数实现了82.7%的Top-1精度,比DeiT-Small提高了2.8,并且可以在短短200个时代收敛到79.9%的Top-1精度。比起相比的基于Transformer的方法不能很好地扩展到下游任务依赖较大的输入图像的分辨率,我们高效的网络,名叫CONTAINER-LIGHT,可以使用在目标检测和分割网络如DETR实例,RetinaNet和Mask-RCNN获得令人印象深刻的检测图38.9,43.8,45.1和掩码mAP为41.3,与具有可比较的计算和参数大小的ResNet-50骨干相比,分别提供了6.6、7.3、6.9和6.6 pts的较大改进。与DINO框架下的DeiT相比,我们的方法在自监督学习方面也取得了很好的效果。
仅需22M参数量,所提CONTAINER在ImageNet数据集取得了82.7%的的top1精度,以2.8%优于DeiT-Small;此外仅需200epoch即可达到79.9%的top1精度。不用于难以扩展到下游任务的Transformer方案(因为需要更高分辨率),该方案CONTAINER-LIGHT可以嵌入到DETR、RetinaNet以及Mask-RCNN等架构中用于目标检测、实例分割任务并分别取得了6.6,7.6,6.9指标提升。
提供了一个统一视角表明:它们均是更广义方案下通过神经网络集成空间上下文信息的特例。我们提出了CONTAINER(CONText AggregatIon NEtwoRK),一种用于多头上下文集成(Context Aggregation)的广义构建模块 。
本文有以下几点贡献:
代码详见:Yolov8涨点神器:用于微小目标检测的上下文增强和特征细化网络ContextAggregation,助力小目标检测,暴力涨点-CSDN博客
结果分析
- # Ultralytics YOLO 🚀, GPL-3.0 license
- # YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
-
- # Parameters
- nc: 1 # number of classes
- scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
- # [depth, width, max_channels]
- n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
- s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
- m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
- l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
- x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
-
- # YOLOv8.0n backbone
- backbone:
- # [from, repeats, module, args]
- - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- - [-1, 3, C2f, [128, True]]
- - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- - [-1, 6, C2f, [256, True]]
- - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- - [-1, 6, C2f, [512, True]]
- - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- - [-1, 3, C2f, [1024, True]]
- - [-1, 1, SPPF, [1024, 5]] # 9
-
- # YOLOv8.0n head
- head:
- - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- - [[-1, 6], 1, Concat, [1]] # cat backbone P4
- - [-1, 3, C2f, [512]] # 12
- - [-1, 1, ContextAggregation, [512]] # 13
-
- - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- - [[-1, 4], 1, Concat, [1]] # cat backbone P3
- - [-1, 3, C2f, [256]] # 16 (P3/8-small)
- - [-1, 1, ContextAggregation, [256]] # 17 (P5/32-large)
-
- - [-1, 1, Conv, [256, 3, 2]]
- - [[-1, 13], 1, Concat, [1]] # cat head P4
- - [-1, 3, C2f, [512]] # 20 (P4/16-medium)
- - [-1, 1, ContextAggregation, [512]] # 21 (P5/32-large)
-
- - [-1, 1, Conv, [512, 3, 2]]
- - [[-1, 9], 1, Concat, [1]] # cat head P5
- - [-1, 3, C2f, [1024]] # 24 (P5/32-large)
- - [-1, 1, ContextAggregation, [1024]] # 25 (P5/32-large)
-
- - [[17, 21, 25], 1, Detect, [nc]] # Detect(P3, P4, P5)
map@0.5 从原始0.895提升至0.897
- YOLOv8_ContextAggregation1 summary (fused): 204 layers, 3009125 parameters, 0 gradients, 8.1 GFLOPs
- Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 6/6 [00:04<00:00, 1.36it/s]
- all 87 1138 0.816 0.893 0.897 0.602
- WBC 87 87 0.971 0.989 0.985 0.771
- RBC 87 968 0.699 0.836 0.841 0.584
- Platelets 87 83 0.777 0.855 0.865 0.452
- # Ultralytics YOLO 🚀, GPL-3.0 license
- # YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
-
- # Parameters
- nc: 1 # number of classes
- scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
- # [depth, width, max_channels]
- n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
- s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
- m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
- l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
- x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
-
- # YOLOv8.0n backbone
- backbone:
- # [from, repeats, module, args]
- - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- - [-1, 3, C2f, [128, True]]
- - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- - [-1, 6, C2f, [256, True]]
- - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- - [-1, 6, C2f, [512, True]]
- - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- - [-1, 3, C2f, [1024, True]]
- - [-1, 1, SPPF, [1024, 5]] # 9
-
- # YOLOv8.0n head
- head:
- - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- - [[-1, 6], 1, Concat, [1]] # cat backbone P4
- - [-1, 3, C2f, [512]] # 12
-
- - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- - [[-1, 4], 1, Concat, [1]] # cat backbone P3
- - [-1, 3, C2f, [256]] # 15 (P3/8-small)
- - [-1, 1, ContextAggregation, [256]] # 16 (P5/32-large)
-
- - [-1, 1, Conv, [256, 3, 2]]
- - [[-1, 12], 1, Concat, [1]] # cat head P4
- - [-1, 3, C2f, [512]] # 19 (P4/16-medium)
- - [-1, 1, ContextAggregation, [512]] # 20 (P5/32-large)
-
- - [-1, 1, Conv, [512, 3, 2]]
- - [[-1, 9], 1, Concat, [1]] # cat head P5
- - [-1, 3, C2f, [1024]] # 23 (P5/32-large)
- - [-1, 1, ContextAggregation, [1024]] # 24 (P5/32-large)
-
- - [[16, 20, 24], 1, Detect, [nc]] # Detect(P3, P4, P5)
map@0.5 从原始0.895提升至0.907
- YOLOv8_ContextAggregation2 summary (fused): 195 layers, 3008482 parameters, 0 gradients, 8.1 GFLOPs
- Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 2/2 [00:03<00:00, 1.59s/it]
- all 87 1138 0.824 0.892 0.907 0.613
- WBC 87 87 0.984 1 0.988 0.785
- RBC 87 968 0.727 0.836 0.851 0.596
- Platelets 87 83 0.76 0.84 0.881 0.457
- # Ultralytics YOLO 🚀, GPL-3.0 license
- # YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
-
- # Parameters
- nc: 1 # number of classes
- scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
- # [depth, width, max_channels]
- n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
- s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
- m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
- l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
- x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
-
- # YOLOv8.0n backbone
- backbone:
- # [from, repeats, module, args]
- - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- - [-1, 3, C2f, [128, True]]
- - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- - [-1, 6, C2f, [256, True]]
- - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- - [-1, 6, C2f, [512, True]]
- - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- - [-1, 3, C2f, [1024, True]]
- - [-1, 1, SPPF, [1024, 5]] # 9
- - [-1, 1, ContextAggregation, [1024]] # 10
-
- # YOLOv8.0n head
- head:
- - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- - [[-1, 6], 1, Concat, [1]] # cat backbone P4
- - [-1, 3, C2f, [512]] # 13
-
- - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- - [[-1, 4], 1, Concat, [1]] # cat backbone P3
- - [-1, 3, C2f, [256]] # 16 (P3/8-small)
-
- - [-1, 1, Conv, [256, 3, 2]]
- - [[-1, 13], 1, Concat, [1]] # cat head P4
- - [-1, 3, C2f, [512]] # 19 (P4/16-medium)
-
- - [-1, 1, Conv, [512, 3, 2]]
- - [[-1, 10], 1, Concat, [1]] # cat head P5
- - [-1, 3, C2f, [1024]] # 22 (P5/32-large)
-
- - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
map@0.5 从原始0.895提升至0.904
- YOLOv8_ContextAggregation3 summary (fused): 177 layers, 3007516 parameters, 0 gradients, 8.1 GFLOPs
- Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 2/2 [00:03<00:00, 1.91s/it]
- all 87 1138 0.835 0.874 0.904 0.61
- WBC 87 87 0.979 0.989 0.993 0.779
- RBC 87 968 0.722 0.841 0.86 0.597
- Platelets 87 83 0.804 0.792 0.86 0.453
- # Ultralytics YOLO 🚀, GPL-3.0 license
- # YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
-
- # Parameters
- nc: 1 # number of classes
- scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
- # [depth, width, max_channels]
- n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
- s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
- m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
- l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
- x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
-
- # YOLOv8.0n backbone
- backbone:
- # [from, repeats, module, args]
- - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- - [-1, 3, C2f, [128, True]]
- - [-1, 1, ContextAggregation, [128]] # 3
- - [-1, 1, Conv, [256, 3, 2]] # 4-P3/8
- - [-1, 6, C2f, [256, True]]
- - [-1, 1, ContextAggregation, [256]] # 6
- - [-1, 1, Conv, [512, 3, 2]] # 7-P4/16
- - [-1, 6, C2f, [512, True]]
- - [-1, 1, ContextAggregation, [512]] # 9
- - [-1, 1, Conv, [1024, 3, 2]] # 10-P5/32
- - [-1, 3, C2f, [1024, True]]
- - [-1, 1, ContextAggregation, [1024]] # 12
- - [-1, 1, SPPF, [1024, 5]] # 13
-
- # YOLOv8.0n head
- head:
- - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- - [[-1, 9], 1, Concat, [1]] # cat backbone P4
- - [-1, 3, C2f, [512]] # 16
-
- - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- - [[-1, 5], 1, Concat, [1]] # cat backbone P3
- - [-1, 3, C2f, [256]] # 19 (P3/8-small)
-
- - [-1, 1, Conv, [256, 3, 2]]
- - [[-1, 16], 1, Concat, [1]] # cat head P4
- - [-1, 3, C2f, [512]] # 22 (P4/16-medium)
-
- - [-1, 1, Conv, [512, 3, 2]]
- - [[-1, 13], 1, Concat, [1]] # cat head P5
- - [-1, 3, C2f, [1024]] # 25 (P5/32-large)
-
- - [[19, 22, 25], 1, Detect, [nc]] # Detect(P3, P4, P5)
map@0.5 从原始0.895提升至0.896
- YOLOv8_ContextAggregation4 summary (fused): 204 layers, 3008645 parameters, 0 gradients, 8.1 GFLOPs
- Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 6/6 [00:04<00:00, 1.37it/s]
- all 87 1138 0.829 0.884 0.896 0.609
- WBC 87 87 0.988 1 0.99 0.796
- RBC 87 968 0.741 0.796 0.843 0.581
- Platelets 87 83 0.759 0.855 0.854 0.451
- # Ultralytics YOLO 🚀, GPL-3.0 license
- # YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
-
- # Parameters
- nc: 1 # number of classes
- scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
- # [depth, width, max_channels]
- n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
- s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
- m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
- l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
- x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
-
- # YOLOv8.0n backbone
- backbone:
- # [from, repeats, module, args]
- - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- - [-1, 3, C2f, [128, True]]
- - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- - [-1, 6, C2f, [256, True]]
- - [-1, 1, ContextAggregation, [256]] # 5
- - [-1, 1, Conv, [512, 3, 2]] # 6-P4/16
- - [-1, 6, C2f, [512, True]]
- - [-1, 1, ContextAggregation, [512]] # 8
- - [-1, 1, Conv, [1024, 3, 2]] # 9-P5/32
- - [-1, 3, C2f, [1024, True]]
- - [-1, 1, ContextAggregation, [1024]] # 11
- - [-1, 1, SPPF, [1024, 5]] # 12
-
- # YOLOv8.0n head
- head:
- - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- - [[-1, 8], 1, Concat, [1]] # cat backbone P4
- - [-1, 3, C2f, [512]] # 15
-
- - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- - [[-1, 4], 1, Concat, [1]] # cat backbone P3
- - [-1, 3, C2f, [256]] # 18 (P3/8-small)
-
- - [-1, 1, Conv, [256, 3, 2]]
- - [[-1, 15], 1, Concat, [1]] # cat head P4
- - [-1, 3, C2f, [512]] # 21 (P4/16-medium)
-
- - [-1, 1, Conv, [512, 3, 2]]
- - [[-1, 12], 1, Concat, [1]] # cat head P5
- - [-1, 3, C2f, [1024]] # 24 (P5/32-large)
-
- - [[18, 21, 24], 1, Detect, [nc]] # Detect(P3, P4, P5)
map@0.5 从原始0.895提升至0.916
- YOLOv8_ContextAggregation5 summary (fused): 195 layers, 3008482 parameters, 0 gradients, 8.1 GFLOPs
- 8.0920064
- Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 2/2 [00:03<00:00, 1.74s/it]
- all 87 1138 0.837 0.912 0.916 0.622
- WBC 87 87 0.971 1 0.99 0.791
- RBC 87 968 0.737 0.851 0.862 0.607
- Platelets 87 83 0.803 0.887 0.897 0.469