• 【目标检测】LLA: Loss-aware label assignment for dense pedestrian detection【标签分配】


    总结

    本文提出了一种用于行人目标检测的标签分配策略,具体来说,主要有以下几步流程。

    1. 构建代价矩阵。通过网络的前向传播得到网络的输出, C c l s C^{cls} Ccls, C r e g C^{reg} Creg,构建代价矩阵 C = C c l s + λ ∗ C r e g C=C^{cls}+\lambda*C^{reg} C=Ccls+λCreg
    2. 选取代价矩阵中的前TOP K个候选框(即 loss比较小的),作为正样本,其他的为负样本。
    3. 为了加速收敛,强制正样本候选区域在gt框内。

    本文的作者和YOLOX是同一个作者,YOLOX的标签分配策略,可以看做在本文上面进行了稍微的更改。

    更多的细节

    1. TOP K,超参的敏感性
      作者通过做实验发现,TOP K在一定范围内是不敏感的在这里插入图片描述
    2. 代价矩阵中各部分消融实验研究
      在这里插入图片描述
    3. 可视化结果
      在这里插入图片描述

    代码

    参考连接

    def get_lla_assignments_and_losses(self, shifts, targets, box_cls, box_delta, box_iou):
    
    	gt_classes = []
    
    	box_cls = [permute_to_N_HWA_K(x, self.num_classes) for x in box_cls]
    	box_delta = [permute_to_N_HWA_K(x, 4) for x in box_delta]
    	box_iou = [permute_to_N_HWA_K(x, 1) for x in box_iou]
    
    	box_cls = torch.cat(box_cls, dim=1)
    	box_delta = torch.cat(box_delta, dim=1)
    	box_iou = torch.cat(box_iou, dim=1)
    
    	losses_cls = []
    	losses_box_reg = []
    	losses_iou = []
    
    	num_fg = 0
    
    	for shifts_per_image, targets_per_image, box_cls_per_image, \
    			box_delta_per_image, box_iou_per_image in zip(
    			shifts, targets, box_cls, box_delta, box_iou):
    
    		shifts_over_all = torch.cat(shifts_per_image, dim=0)
    
    		gt_boxes = targets_per_image.gt_boxes
    		gt_classes = targets_per_image.gt_classes
    
    		deltas = self.shift2box_transform.get_deltas(
    			shifts_over_all, gt_boxes.tensor.unsqueeze(1))
    		is_in_boxes = deltas.min(dim=-1).values > 0.01
    
    		shape = (len(targets_per_image), len(shifts_over_all), -1)
    		box_cls_per_image_unexpanded = box_cls_per_image
    		box_delta_per_image_unexpanded = box_delta_per_image
    
    		box_cls_per_image = box_cls_per_image.unsqueeze(0).expand(shape)
    		gt_cls_per_image = F.one_hot(
    			torch.max(gt_classes, torch.zeros_like(gt_classes)), self.num_classes
    		).float().unsqueeze(1).expand(shape)
    
    		with torch.no_grad():
    			loss_cls = sigmoid_focal_loss_jit(
    				box_cls_per_image,
    				gt_cls_per_image,
    				alpha=self.focal_loss_alpha,
    				gamma=self.focal_loss_gamma).sum(dim=-1)
    			loss_cls_bg = sigmoid_focal_loss_jit(
    				box_cls_per_image_unexpanded,
    				torch.zeros_like(box_cls_per_image_unexpanded),
    				alpha=self.focal_loss_alpha,
    				gamma=self.focal_loss_gamma).sum(dim=-1)
    			box_delta_per_image = box_delta_per_image.unsqueeze(0).expand(shape)
    			gt_delta_per_image = self.shift2box_transform.get_deltas(
    				shifts_over_all, gt_boxes.tensor.unsqueeze(1))
    			loss_delta = iou_loss(
    				box_delta_per_image,
    				gt_delta_per_image,
    				box_mode="ltrb",
    				loss_type='iou')
    
    			ious = get_ious(
    				box_delta_per_image,
    				gt_delta_per_image,
    				box_mode="ltrb",
    				loss_type='iou')
    
    			loss = loss_cls + self.reg_cost * loss_delta + 1e3 * (1 - is_in_boxes.float())
    			loss = torch.cat([loss, loss_cls_bg.unsqueeze(0)], dim=0)
    
    			num_gt = loss.shape[0] - 1
    			num_anchor = loss.shape[1]
    
    			# Topk
    			matching_matrix = torch.zeros_like(loss)
    			_, topk_idx = torch.topk(loss[:-1], k=self.topk, dim=1, largest=False)
    			matching_matrix[torch.arange(num_gt).unsqueeze(1).repeat(1,
    			   self.topk).view(-1), topk_idx.view(-1)] = 1.
    
    			# make sure one anchor with one gt
    			anchor_matched_gt = matching_matrix.sum(0)
    			if (anchor_matched_gt > 1).sum() > 0:
    				loss_min, loss_argmin = torch.min(loss[:-1, anchor_matched_gt > 1], dim=0)
    				matching_matrix[:, anchor_matched_gt > 1] *= 0.
    				matching_matrix[loss_argmin, anchor_matched_gt > 1] = 1.
    				anchor_matched_gt = matching_matrix.sum(0)
    			num_fg += matching_matrix.sum()
    			matching_matrix[-1] = 1. - anchor_matched_gt  # assignment for Background
    			assigned_gt_inds = torch.argmax(matching_matrix, dim=0)
    
    			gt_cls_per_image_bg = gt_cls_per_image.new_zeros(
    				(gt_cls_per_image.size(1), gt_cls_per_image.size(2))).unsqueeze(0)
    			gt_cls_per_image_with_bg = torch.cat(
    				[gt_cls_per_image, gt_cls_per_image_bg], dim=0)
    			cls_target_per_image = gt_cls_per_image_with_bg[
    				assigned_gt_inds, torch.arange(num_anchor)]
    
    			# Dealing with Crowdhuman ignore label
    			gt_classes_ = torch.cat([gt_classes, gt_classes.new_zeros(1)])
    			anchor_cls_labels = gt_classes_[assigned_gt_inds]
    			valid_flag = anchor_cls_labels >= 0
    
    			pos_mask = assigned_gt_inds != len(targets_per_image)  # get foreground mask
    			valid_fg = pos_mask & valid_flag
    			assigned_fg_inds = assigned_gt_inds[valid_fg]
    			range_fg = torch.arange(num_anchor)[valid_fg]
    			ious_fg = ious[assigned_fg_inds, range_fg]
    
    		anchor_loss_cls = sigmoid_focal_loss_jit(
    			box_cls_per_image_unexpanded[valid_flag],
    			cls_target_per_image[valid_flag],
    			alpha=self.focal_loss_alpha,
    			gamma=self.focal_loss_gamma).sum(dim=-1)
    
    		delta_target = gt_delta_per_image[assigned_fg_inds, range_fg]
    		anchor_loss_delta = 2. * iou_loss(
    			box_delta_per_image_unexpanded[valid_fg],
    			delta_target,
    			box_mode="ltrb",
    			loss_type=self.iou_loss_type)
    
    		anchor_loss_iou = 0.5 * F.binary_cross_entropy_with_logits(
    			box_iou_per_image.squeeze(1)[valid_fg],
    			ious_fg,
    			reduction='none')
    
    		losses_cls.append(anchor_loss_cls.sum())
    		losses_box_reg.append(anchor_loss_delta.sum())
    		losses_iou.append(anchor_loss_iou.sum())
    
    	if self.norm_sync:
    		dist.all_reduce(num_fg)
    		num_fg = num_fg.float() / dist.get_world_size()
    
    	return {
    		'loss_cls': torch.stack(losses_cls).sum() / num_fg,
    		'loss_box_reg': torch.stack(losses_box_reg).sum() / num_fg,
    		'loss_iou': torch.stack(losses_iou).sum() / num_fg
    	}
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
  • 相关阅读:
    k8s存储卷
    npm常用命令与操作篇
    Linux 下玩《原神》等游戏怎样查看实时帧率等信息
    基于鹦鹉优化算法(Parrot optimizer,PO)的无人机三维路径规划(提供MATLAB代码)
    2434: 【区赛】[慈溪2013]统计方格
    c#多线程同步执行
    Java中异常的捕获与处理
    猿创征文| 开发微信小程序入门到入土①(复习笔记)
    PICO高管专访:关于PICO 4硬件、内容、定价、海外布局的一切解答
    微服务 ZooKeeper ,Dubbo ,Kafka 介绍应用
  • 原文地址:https://blog.csdn.net/wxd1233/article/details/128064634