- import torch
- from scipy.optimize import linear_sum_assignment
- from torch import nn
-
- from util.box_ops import box_cxcywh_to_xyxy, generalized_box_iou
这段代码导入了一些PyTorch和SciPy中的模块和函数,以及自定义模块中的一些函数。
1. `import torch`: 导入PyTorch库,用于深度学习任务。
2. `from scipy.optimize import linear_sum_assignment`: 从SciPy库中导入`linear_sum_assignment`函数,它用于解决线性求和分配问题,通常用于匈牙利算法,用于在最优的方式下分配任务。
3. `from torch import nn`: 从PyTorch库中导入神经网络模块,`nn` 模块包含了构建神经网络层的类和函数。
4. `from util.box_ops import box_cxcywh_to_xyxy, generalized_box_iou`: 从自定义模块 `util.box_ops` 中导入 `box_cxcywh_to_xyxy` 和 `generalized_box_iou` 函数。这些函数可能是与处理边界框(bounding box)有关的工具函数,用于转换边界框坐标格式以及计算边界框之间的交并比(IoU)等操作。
HungarianMatcher 模块- class HungarianMatcher(nn.Module):
- """This class computes an assignment between the targets and the predictions of the network
- For efficiency reasons, the targets don't include the no_object. Because of this, in general,
- there are more predictions than targets. In this case, we do a 1-to-1 matching of the best predictions,
- while the others are un-matched (and thus treated as non-objects).
- """
-
- def __init__(self, cost_class: float = 1, cost_bbox: float = 1, cost_giou: float = 1):
- """Creates the matcher
- Params:
- cost_class: This is the relative weight of the classification error in the matching cost
- cost_bbox: This is the relative weight of the L1 error of the bounding box coordinates in the matching cost
- cost_giou: This is the relative weight of the giou loss of the bounding box in the matching cost
- """
- super().__init__()
- self.cost_class = cost_class
- self.cost_bbox = cost_bbox
- self.cost_giou = cost_giou
- assert cost_class != 0 or cost_bbox != 0 or cost_giou != 0, "all costs cant be 0"
-
- @torch.no_grad()
- def forward(self, outputs, targets):
- """ Performs the matching
- Params:
- outputs: This is a dict that contains at least these entries:
- "pred_logits": Tensor of dim [batch_size, num_queries, num_classes] with the classification logits
- "pred_boxes": Tensor of dim [batch_size, num_queries, 4] with the predicted box coordinates
- targets: This is a list of targets (len(targets) = batch_size), where each target is a dict containing:
- "labels": Tensor of dim [num_target_boxes] (where num_target_boxes is the number of ground-truth
- objects in the target) containing the class labels
- "boxes": Tensor of dim [num_target_boxes, 4] containing the target box coordinates
- Returns:
- A list of size batch_size, containing tuples of (index_i, index_j) where:
- - index_i is the indices of the selected predictions (in order)
- - index_j is the indices of the corresponding selected targets (in order)
- For each batch element, it holds:
- len(index_i) = len(index_j) = min(num_queries, num_target_boxes)
- """
- bs, num_queries = outputs["pred_logits"].shape[:2]
-
- # We flatten to compute the cost matrices in a batch
- out_prob = outputs["pred_logits"].flatten(0, 1).softmax(-1) # [batch_size * num_queries, num_classes]
- out_bbox = outputs["pred_boxes"].flatten(0, 1) # [batch_size * num_queries, 4]
-
- # Also concat the target labels and boxes
- tgt_ids = torch.cat([v["labels"] for v in targets])
- tgt_bbox = torch.cat([v["boxes"] for v in targets])
-
- # Compute the classification cost. Contrary to the loss, we don't use the NLL,
- # but approximate it in 1 - proba[target class].
- # The 1 is a constant that doesn't change the matching, it can be ommitted.
- cost_class = -out_prob[:, tgt_ids]
-
- # Compute the L1 cost between boxes
- cost_bbox = torch.cdist(out_bbox, tgt_bbox, p=1)
-
- # Compute the giou cost betwen boxes
- cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))
-
- # Final cost matrix
- C = self.cost_bbox * cost_bbox + self.cost_class * cost_class + self.cost_giou * cost_giou
- C = C.view(bs, num_queries, -1).cpu()
-
- sizes = [len(v["boxes"]) for v in targets]
- indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]
- return [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor(j, dtype=torch.int64)) for i, j in indices]
这段代码定义了一个名为 HungarianMatcher 的PyTorch模块,该模块用于计算网络输出和目标之间的匹配(assignment)。
这个模块主要用于目标检测任务中,其中网络输出(predictions)和目标(targets)是需要匹配的。
- class HungarianMatcher(nn.Module):
- """This class computes an assignment between the targets and the predictions of the network
- For efficiency reasons, the targets don't include the no_object. Because of this, in general,
- there are more predictions than targets. In this case, we do a 1-to-1 matching of the best predictions,
- while the others are un-matched (and thus treated as non-objects).
- """
-
- def __init__(self, cost_class: float = 1, cost_bbox: float = 1, cost_giou: float = 1):
- """Creates the matcher
- Params:
- cost_class: This is the relative weight of the classification error in the matching cost
- cost_bbox: This is the relative weight of the L1 error of the bounding box coordinates in the matching cost
- cost_giou: This is the relative weight of the giou loss of the bounding box in the matching cost
- """
- super().__init__()
- self.cost_class = cost_class
- self.cost_bbox = cost_bbox
- self.cost_giou = cost_giou
- assert cost_class != 0 or cost_bbox != 0 or cost_giou != 0, "all costs cant be 0"
这段代码定义了一个名为 HungarianMatcher 的PyTorch模块,用于执行目标检测中的匹配操作。以下是代码的详细解释:
class HungarianMatcher(nn.Module)::定义了一个继承自 nn.Module 的Python类,表示匈牙利匹配器。
文档字符串(Docstring):这是类的注释,提供了对类的简要描述和用途。
def __init__(self, cost_class: float = 1, cost_bbox: float = 1, cost_giou: float = 1)::初始化方法,用于创建匈牙利匹配器的实例。它接受三个可选参数,分别是:
cost_class:分类错误在匹配成本中的相对权重,默认为1。cost_bbox:边界框坐标错误在匹配成本中的相对权重,默认为1。cost_giou:GIOU损失在匹配成本中的相对权重,默认为1。super().__init__():调用父类的构造函数以正确初始化模块。
self.cost_class, self.cost_bbox, self.cost_giou:将传入的三个参数值存储在模块的实例变量中,以便在后续的计算中使用。
assert cost_class != 0 or cost_bbox != 0 or cost_giou != 0, "all costs cant be 0":断言语句,用于确保三个成本权重中至少有一个不为零。如果三个成本都为零,将引发AssertionError异常,以防止不合理的输入。
这个类的主要目的是在目标检测任务中,根据网络的预测结果和目标(ground-truth)之间执行最优匹配,以便计算损失和优化目标检测模型。成本权重用于调整分类错误、边界框坐标错误和GIOU损失之间的相对重要性,以满足特定任务的需求。匈牙利匹配算法用于执行最优匹配,使得每个预测与一个目标(或未匹配的情况)关联,以便计算损失。
- @torch.no_grad()
- def forward(self, outputs, targets):
- """ Performs the matching
- Params:
- outputs: This is a dict that contains at least these entries:
- "pred_logits": Tensor of dim [batch_size, num_queries, num_classes] with the classification logits
- "pred_boxes": Tensor of dim [batch_size, num_queries, 4] with the predicted box coordinates
- targets: This is a list of targets (len(targets) = batch_size), where each target is a dict containing:
- "labels": Tensor of dim [num_target_boxes] (where num_target_boxes is the number of ground-truth
- objects in the target) containing the class labels
- "boxes": Tensor of dim [num_target_boxes, 4] containing the target box coordinates
- Returns:
- A list of size batch_size, containing tuples of (index_i, index_j) where:
- - index_i is the indices of the selected predictions (in order)
- - index_j is the indices of the corresponding selected targets (in order)
- For each batch element, it holds:
- len(index_i) = len(index_j) = min(num_queries, num_target_boxes)
- """
- bs, num_queries = outputs["pred_logits"].shape[:2]
-
- # We flatten to compute the cost matrices in a batch
- out_prob = outputs["pred_logits"].flatten(0, 1).softmax(-1) # [batch_size * num_queries, num_classes]
- out_bbox = outputs["pred_boxes"].flatten(0, 1) # [batch_size * num_queries, 4]
-
- # Also concat the target labels and boxes
- tgt_ids = torch.cat([v["labels"] for v in targets])
- tgt_bbox = torch.cat([v["boxes"] for v in targets])
-
- # Compute the classification cost. Contrary to the loss, we don't use the NLL,
- # but approximate it in 1 - proba[target class].
- # The 1 is a constant that doesn't change the matching, it can be ommitted.
- cost_class = -out_prob[:, tgt_ids]
-
- # Compute the L1 cost between boxes
- cost_bbox = torch.cdist(out_bbox, tgt_bbox, p=1)
-
- # Compute the giou cost betwen boxes
- cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))
-
- # Final cost matrix
- C = self.cost_bbox * cost_bbox + self.cost_class * cost_class + self.cost_giou * cost_giou
- C = C.view(bs, num_queries, -1).cpu()
-
- sizes = [len(v["boxes"]) for v in targets]
- indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]
- return [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor(j, dtype=torch.int64)) for i, j in indices]
这个 `forward` 方法是 `HungarianMatcher` 类的主要方法,用于执行匈牙利匹配操作,将预测与目标进行匹配。下面是代码中每行的详细解释:
1. `@torch.no_grad()`:这是一个装饰器,用于将下面的方法调用设置为无需梯度。这是因为匈牙利匹配操作不需要进行梯度计算。
2. `def forward(self, outputs, targets):`:前向传播方法,用于执行匹配操作。接受两个参数:
- `outputs`:一个字典,包含以下至少两个条目:
- "pred_logits":形状为 [batch_size, num_queries, num_classes] 的张量,包含分类的 logits。
- "pred_boxes":形状为 [batch_size, num_queries, 4] 的张量,包含预测的边界框坐标。
- `targets`:一个目标列表,每个目标都是一个字典,包含以下两个条目:
- "labels":形状为 [num_target_boxes] 的张量,包含目标类别标签。
- "boxes":形状为 [num_target_boxes, 4] 的张量,包含目标边界框坐标。
3. `bs, num_queries = outputs["pred_logits"].shape[:2]`:获取批量大小(batch size)和查询数量(num_queries)。
4. `out_prob = outputs["pred_logits"].flatten(0, 1).softmax(-1)`:将分类 logits 平铺并进行 softmax 操作,以计算预测的类别概率。结果形状为 [batch_size * num_queries, num_classes]。
5. `out_bbox = outputs["pred_boxes"].flatten(0, 1)`:将预测的边界框坐标平铺,形状为 [batch_size * num_queries, 4]。
6. `tgt_ids = torch.cat([v["labels"] for v in targets])`:将目标中的类别标签连接成一个张量,形状为 [总目标边界框数]。
7. `tgt_bbox = torch.cat([v["boxes"] for v in targets])`:将目标中的边界框坐标连接成一个张量,形状为 [总目标边界框数, 4]。
8. `cost_class = -out_prob[:, tgt_ids]`:计算分类成本,即预测类别与目标类别之间的损失。这里使用了负对数似然的近似计算。
9. `cost_bbox = torch.cdist(out_bbox, tgt_bbox, p=1)`:计算边界框坐标成本,即预测边界框坐标与目标边界框坐标之间的 L1 距离。
10. `cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))`:计算 GIOU(Generalized IoU)成本,即预测边界框与目标边界框之间的 GIOU 损失。
11. `C = self.cost_bbox * cost_bbox + self.cost_class * cost_class + self.cost_giou * cost_giou`:将分类、边界框坐标和GIOU成本加权组合,得到最终的匹配成本矩阵C。
12. `C = C.view(bs, num_queries, -1).cpu()`:将成本矩阵C重新形状为 [batch_size, num_queries, 总目标边界框数],并将其移到CPU上。
13. `sizes = [len(v["boxes"]) for v in targets]`:获取每个目标中的边界框数量,存储在列表中。
14. `indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]`:对每个批次中的成本矩阵执行线性求和分配,以找到最佳匹配。
15. `return [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor(j, dtype=torch.int64)) for i, j in indices]`:返回一个列表,其中包含了每个批次中的匹配结果。每个匹配结果是一个元组,包含两个张量,分别表示选定的预测索引和相应的目标索引。匹配数量等于最小的查询数量和目标边界框数量。
build_matcher ()函数- def build_matcher(args):
- return HungarianMatcher(cost_class=args.set_cost_class, cost_bbox=args.set_cost_bbox, cost_giou=args.set_cost_giou)
build_matcher 函数用于构建一个 HungarianMatcher 类的实例,根据传入的参数配置匹配器的成本项。以下是这个函数的实现:
def build_matcher(args)::定义了一个名为 build_matcher 的函数,接受一个参数 args,用于配置匹配器的成本项。
return HungarianMatcher(cost_class=args.set_cost_class, cost_bbox=args.set_cost_bbox, cost_giou=args.set_cost_giou):创建并返回一个 HungarianMatcher 类的实例。在创建实例时,根据传入的 args 参数来设置成本项,这些成本项包括:
cost_class:分类错误的成本(类别损失的权重)。cost_bbox:边界框坐标错误的成本(边界框坐标损失的权重)。cost_giou:GIOU 损失的成本(GIOU 损失的权重)。这样,build_matcher 函数可以根据传入的参数创建并配置一个匹配器,并将其返回供后续使用。