• 【实例分割】论文详解YOLACT:Real-time Instance Segmentation


    🏆论文下载:paper

    🏆代码下载:code


    目录

    🏆论文下载:paper

    🏆代码下载:code

    1.🌷🌷创新点

    2.🌷🌷网络结构

    2.1🍀🍀Backbone

    2.2🍀🍀Protonet

    2.3🍀🍀Prediction Head

    2.4🍀🍀Masks Assembly

    3.🌷🌷结果

    3.1🍀🍀prototypes表现

    3.2🍀🍀coco结果

    4.🌷🌷代码

    整理不易,欢迎一键三连!!!

    送你们一条美丽的--分割线--


    YOLACT是比较经典的one-stage实例分割方法,属于anchor-base流派,即需要anchor计算。

    YOLACT:You Only Look ACoefficienTs(系数复数)

    摘要:我们提出了一个简单的用于实时实例分割的全卷积模型,在 MS COCO上实现了 29.8 mAP,在单个 Titan Xp 上达到 33.5 fps ,比以前任何竞争方法都要快得多。
    此外,我们仅在一个训练后就得到了这个结果图形处理器。 我们通过将实例分割分成两个并行子任务来实现这一点:(1)生成一组原型掩码和(2)预测每个实例掩码系数。 然后,我们通过将原型与掩码系数线性组合来生成实例掩码。 我们发现因为这个过程不依赖于重新池化,所以这种方法可以产生非常高质量的掩模,并且免费表现出时间稳定性。 此外,我们分析了模型的出色表现,虽然是全卷积的,但是它们学会以翻译变体方式自行本地化实例。 最后,我们还提出了快速 NMS,它是标准 NMS 的快 12 毫秒的替代品,但仅有边际性能损失。

    1.🌷🌷创新点

    • 针对实例分割任务提出了一种简单的全卷积模型;

    • 这个模型在COCO数据集用一块Titan Xp完成了29.8mAP和33.5fps的实时分割效果;

    • 将任务分为两个平行的子任务。产生prototype masks和预测mask coefficients;

      • prototype masks
        卷积层:在提取空间相关信息上效果显著。

      • mask coefficients
        全连接层:在获取语义向量上效果显著。

    • 利用矩阵计算对NMS进行加速

    2.🌷🌷网络结构

    左侧(prototypes):蓝(低值)黄(高值)

    中间(NMS/CROP/THRESHOLD):灰(没有训练的函数)

    网络结构:RetinaNet ResNet-101+FPN

    2.1🍀🍀Backbone

            采用卷积神经网络(例如ResNet101)做特征提取,然后将提取到的特征输入到特征金字塔网络中进行不同层级特征提取。ResNet的网络结构如图所示。从图中可以看出,ResNet的卷积模块一共有5个从conv1,conv2_x到conv5_x,分别对应图1 YOLACT模型中的C1-C5,然后将提取到的特征输入到特征金字塔网络中。YOLACT采用了多尺度的特征图, 从而可以检测到不同尺寸的物体,也就是在大的特征图上检测小的物体,在小的特征图上检测大的物体。

    * 特征金字塔模块的主要作用是获取深度更深的特征图,且含有多个不同尺度的特征图。

    2.2🍀🍀Protonet

            Protonet模块通过卷积和上采样获得Prototypes,Prototypes是多张mask的,mask中的亮(值大)的区域就是目标区域。最终通过线性组合生成的mask来获取每个实例的mask。

    2.3🍀🍀Prediction Head

            预测头在RetinaNet的基础上多回归了一个mask系数,输出预测框Bbox,类别信息conf以及掩码系数。利用此系数与Protonet中的mask线性组合。YOLACT利用特征金字塔中的特征图(5个尺度)每个特征图的每个点都生成3个目标框,为了避免太多冗余目标框,使用NMS非极大值抑制来进行目标框筛选。

    2.4🍀🍀Masks Assembly

            利用Prediction Head模块的mask系数与Protonet中的多张mask进行线性组合,每个目标得到一张mask。

    损失函数:Lmask = BCE(M, Mgt)

    3.🌷🌷结果

    3.1🍀🍀prototypes表现

    3.2🍀🍀coco结果

    4.🌷🌷代码

            YOLACT类的定义:

    1. class Yolact(nn.Module):
    2. """
    3. You can set the arguments by changing them in the backbone config object in config.py.
    4. Parameters (in cfg.backbone):
    5. - selected_layers: The indices of the conv layers to use for prediction.
    6. - pred_scales: A list with len(selected_layers) containing tuples of scales (see PredictionModule)
    7. - pred_aspect_ratios: A list of lists of aspect ratios with len(selected_layers) (see PredictionModule)
    8. """
    9. def __init__(self):
    10. super().__init__()
    11. self.backbone = construct_backbone(cfg.backbone)
    12. if cfg.freeze_bn:
    13. self.freeze_bn()
    14. # Compute mask_dim here and add it back to the config. Make sure Yolact's constructor is called early!
    15. if cfg.mask_type == mask_type.direct:
    16. cfg.mask_dim = cfg.mask_size**2
    17. elif cfg.mask_type == mask_type.lincomb:
    18. if cfg.mask_proto_use_grid:
    19. self.grid = torch.Tensor(np.load(cfg.mask_proto_grid_file))
    20. self.num_grids = self.grid.size(0)
    21. else:
    22. self.num_grids = 0
    23. self.proto_src = cfg.mask_proto_src
    24. if self.proto_src is None: in_channels = 3
    25. elif cfg.fpn is not None: in_channels = cfg.fpn.num_features
    26. else: in_channels = self.backbone.channels[self.proto_src]
    27. in_channels += self.num_grids
    28. # The include_last_relu=false here is because we might want to change it to another function
    29. self.proto_net, cfg.mask_dim = make_net(in_channels, cfg.mask_proto_net, include_last_relu=False)
    30. if cfg.mask_proto_bias:
    31. cfg.mask_dim += 1
    32. self.selected_layers = cfg.backbone.selected_layers
    33. src_channels = self.backbone.channels
    34. if cfg.use_maskiou:
    35. self.maskiou_net = FastMaskIoUNet()
    36. if cfg.fpn is not None:
    37. # Some hacky rewiring to accomodate the FPN
    38. self.fpn = FPN([src_channels[i] for i in self.selected_layers])
    39. self.selected_layers = list(range(len(self.selected_layers) + cfg.fpn.num_downsample))
    40. src_channels = [cfg.fpn.num_features] * len(self.selected_layers)
    41. self.prediction_layers = nn.ModuleList()
    42. cfg.num_heads = len(self.selected_layers)
    43. for idx, layer_idx in enumerate(self.selected_layers):
    44. # If we're sharing prediction module weights, have every module's parent be the first one
    45. parent = None
    46. if cfg.share_prediction_module and idx > 0:
    47. parent = self.prediction_layers[0]
    48. pred = PredictionModule(src_channels[layer_idx], src_channels[layer_idx],
    49. aspect_ratios = cfg.backbone.pred_aspect_ratios[idx],
    50. scales = cfg.backbone.pred_scales[idx],
    51. parent = parent,
    52. index = idx)
    53. self.prediction_layers.append(pred)
    54. # Extra parameters for the extra losses
    55. if cfg.use_class_existence_loss:
    56. # This comes from the smallest layer selected
    57. # Also note that cfg.num_classes includes background
    58. self.class_existence_fc = nn.Linear(src_channels[-1], cfg.num_classes - 1)
    59. if cfg.use_semantic_segmentation_loss:
    60. self.semantic_seg_conv = nn.Conv2d(src_channels[0], cfg.num_classes-1, kernel_size=1)
    61. # For use in evaluation
    62. self.detect = Detect(cfg.num_classes, bkg_label=0, top_k=cfg.nms_top_k,
    63. conf_thresh=cfg.nms_conf_thresh, nms_thresh=cfg.nms_thresh)
    64. def save_weights(self, path):
    65. """ Saves the model's weights using compression because the file sizes were getting too big. """
    66. torch.save(self.state_dict(), path)
    67. def load_weights(self, path):
    68. """ Loads weights from a compressed save file. """
    69. state_dict = torch.load(path)
    70. # For backward compatability, remove these (the new variable is called layers)
    71. for key in list(state_dict.keys()):
    72. if key.startswith('backbone.layer') and not key.startswith('backbone.layers'):
    73. del state_dict[key]
    74. # Also for backward compatibility with v1.0 weights, do this check
    75. if key.startswith('fpn.downsample_layers.'):
    76. if cfg.fpn is not None and int(key.split('.')[2]) >= cfg.fpn.num_downsample:
    77. del state_dict[key]
    78. self.load_state_dict(state_dict)
    79. def init_weights(self, backbone_path):
    80. """ Initialize weights for training. """
    81. # Initialize the backbone with the pretrained weights.
    82. self.backbone.init_backbone(backbone_path)
    83. conv_constants = getattr(nn.Conv2d(1, 1, 1), '__constants__')
    84. # Quick lambda to test if one list contains the other
    85. def all_in(x, y):
    86. for _x in x:
    87. if _x not in y:
    88. return False
    89. return True
    90. # Initialize the rest of the conv layers with xavier
    91. for name, module in self.named_modules():
    92. # See issue #127 for why we need such a complicated condition if the module is a WeakScriptModuleProxy
    93. # Broke in 1.3 (see issue #175), WeakScriptModuleProxy was turned into just ScriptModule.
    94. # Broke in 1.4 (see issue #292), where RecursiveScriptModule is the new star of the show.
    95. # Note that this might break with future pytorch updates, so let me know if it does
    96. is_script_conv = False
    97. if 'Script' in type(module).__name__:
    98. # 1.4 workaround: now there's an original_name member so just use that
    99. if hasattr(module, 'original_name'):
    100. is_script_conv = 'Conv' in module.original_name
    101. # 1.3 workaround: check if this has the same constants as a conv module
    102. else:
    103. is_script_conv = (
    104. all_in(module.__dict__['_constants_set'], conv_constants)
    105. and all_in(conv_constants, module.__dict__['_constants_set']))
    106. is_conv_layer = isinstance(module, nn.Conv2d) or is_script_conv
    107. if is_conv_layer and module not in self.backbone.backbone_modules:
    108. nn.init.xavier_uniform_(module.weight.data)
    109. if module.bias is not None:
    110. if cfg.use_focal_loss and 'conf_layer' in name:
    111. if not cfg.use_sigmoid_focal_loss:
    112. # Initialize the last layer as in the focal loss paper.
    113. # Because we use softmax and not sigmoid, I had to derive an alternate expression
    114. # on a notecard. Define pi to be the probability of outputting a foreground detection.
    115. # Then let z = sum(exp(x)) - exp(x_0). Finally let c be the number of foreground classes.
    116. # Chugging through the math, this gives us
    117. # x_0 = log(z * (1 - pi) / pi) where 0 is the background class
    118. # x_i = log(z / c) for all i > 0
    119. # For simplicity (and because we have a degree of freedom here), set z = 1. Then we have
    120. # x_0 = log((1 - pi) / pi) note: don't split up the log for numerical stability
    121. # x_i = -log(c) for all i > 0
    122. module.bias.data[0] = np.log((1 - cfg.focal_loss_init_pi) / cfg.focal_loss_init_pi)
    123. module.bias.data[1:] = -np.log(module.bias.size(0) - 1)
    124. else:
    125. module.bias.data[0] = -np.log(cfg.focal_loss_init_pi / (1 - cfg.focal_loss_init_pi))
    126. module.bias.data[1:] = -np.log((1 - cfg.focal_loss_init_pi) / cfg.focal_loss_init_pi)
    127. else:
    128. module.bias.data.zero_()
    129. def train(self, mode=True):
    130. super().train(mode)
    131. if cfg.freeze_bn:
    132. self.freeze_bn()
    133. def freeze_bn(self, enable=False):
    134. """ Adapted from https://discuss.pytorch.org/t/how-to-train-with-frozen-batchnorm/12106/8 """
    135. for module in self.modules():
    136. if isinstance(module, nn.BatchNorm2d):
    137. module.train() if enable else module.eval()
    138. module.weight.requires_grad = enable
    139. module.bias.requires_grad = enable
    140. def forward(self, x):
    141. """ The input should be of size [batch_size, 3, img_h, img_w] """
    142. _, _, img_h, img_w = x.size()
    143. cfg._tmp_img_h = img_h
    144. cfg._tmp_img_w = img_w
    145. with timer.env('backbone'):
    146. outs = self.backbone(x)
    147. if cfg.fpn is not None:
    148. with timer.env('fpn'):
    149. # Use backbone.selected_layers because we overwrote self.selected_layers
    150. outs = [outs[i] for i in cfg.backbone.selected_layers]
    151. outs = self.fpn(outs)
    152. proto_out = None
    153. if cfg.mask_type == mask_type.lincomb and cfg.eval_mask_branch:
    154. with timer.env('proto'):
    155. proto_x = x if self.proto_src is None else outs[self.proto_src]
    156. if self.num_grids > 0:
    157. grids = self.grid.repeat(proto_x.size(0), 1, 1, 1)
    158. proto_x = torch.cat([proto_x, grids], dim=1)
    159. proto_out = self.proto_net(proto_x)
    160. proto_out = cfg.mask_proto_prototype_activation(proto_out)
    161. if cfg.mask_proto_prototypes_as_features:
    162. # Clone here because we don't want to permute this, though idk if contiguous makes this unnecessary
    163. proto_downsampled = proto_out.clone()
    164. if cfg.mask_proto_prototypes_as_features_no_grad:
    165. proto_downsampled = proto_out.detach()
    166. # Move the features last so the multiplication is easy
    167. proto_out = proto_out.permute(0, 2, 3, 1).contiguous()
    168. if cfg.mask_proto_bias:
    169. bias_shape = [x for x in proto_out.size()]
    170. bias_shape[-1] = 1
    171. proto_out = torch.cat([proto_out, torch.ones(*bias_shape)], -1)
    172. with timer.env('pred_heads'):
    173. pred_outs = { 'loc': [], 'conf': [], 'mask': [], 'priors': [] }
    174. if cfg.use_mask_scoring:
    175. pred_outs['score'] = []
    176. if cfg.use_instance_coeff:
    177. pred_outs['inst'] = []
    178. for idx, pred_layer in zip(self.selected_layers, self.prediction_layers):
    179. pred_x = outs[idx]
    180. if cfg.mask_type == mask_type.lincomb and cfg.mask_proto_prototypes_as_features:
    181. # Scale the prototypes down to the current prediction layer's size and add it as inputs
    182. proto_downsampled = F.interpolate(proto_downsampled, size=outs[idx].size()[2:], mode='bilinear', align_corners=False)
    183. pred_x = torch.cat([pred_x, proto_downsampled], dim=1)
    184. # A hack for the way dataparallel works
    185. if cfg.share_prediction_module and pred_layer is not self.prediction_layers[0]:
    186. pred_layer.parent = [self.prediction_layers[0]]
    187. p = pred_layer(pred_x)
    188. for k, v in p.items():
    189. pred_outs[k].append(v)
    190. for k, v in pred_outs.items():
    191. pred_outs[k] = torch.cat(v, -2)
    192. if proto_out is not None:
    193. pred_outs['proto'] = proto_out
    194. if self.training:
    195. # For the extra loss functions
    196. if cfg.use_class_existence_loss:
    197. pred_outs['classes'] = self.class_existence_fc(outs[-1].mean(dim=(2, 3)))
    198. if cfg.use_semantic_segmentation_loss:
    199. pred_outs['segm'] = self.semantic_seg_conv(outs[0])
    200. return pred_outs
    201. else:
    202. if cfg.use_mask_scoring:
    203. pred_outs['score'] = torch.sigmoid(pred_outs['score'])
    204. if cfg.use_focal_loss:
    205. if cfg.use_sigmoid_focal_loss:
    206. # Note: even though conf[0] exists, this mode doesn't train it so don't use it
    207. pred_outs['conf'] = torch.sigmoid(pred_outs['conf'])
    208. if cfg.use_mask_scoring:
    209. pred_outs['conf'] *= pred_outs['score']
    210. elif cfg.use_objectness_score:
    211. # See focal_loss_sigmoid in multibox_loss.py for details
    212. objectness = torch.sigmoid(pred_outs['conf'][:, :, 0])
    213. pred_outs['conf'][:, :, 1:] = objectness[:, :, None] * F.softmax(pred_outs['conf'][:, :, 1:], -1)
    214. pred_outs['conf'][:, :, 0 ] = 1 - objectness
    215. else:
    216. pred_outs['conf'] = F.softmax(pred_outs['conf'], -1)
    217. else:
    218. if cfg.use_objectness_score:
    219. objectness = torch.sigmoid(pred_outs['conf'][:, :, 0])
    220. pred_outs['conf'][:, :, 1:] = (objectness > 0.10)[..., None] \
    221. * F.softmax(pred_outs['conf'][:, :, 1:], dim=-1)
    222. else:
    223. pred_outs['conf'] = F.softmax(pred_outs['conf'], -1)
    224. return self.detect(pred_outs, self)

    整理不易,欢迎一键三连!!!


    送你们一条美丽的--分割线--

    🌷🌷🍀🍀🌾🌾🍓🍓🍂🍂🙋🙋🐸🐸🙋🙋💖💖🍌🍌🔔🔔🍉🍉🍭🍭🍋🍋🍇🍇🏆🏆📸📸⛵⛵⭐⭐🍎🍎👍👍🌷🌷

     

  • 相关阅读:
    SQL必需掌握的100个重要知识点:使用子查询
    C++ 练气期之二维数组与矩阵运算
    C语言初学习——易错点合集(持续更新中)
    StarRocks 的学习笔记
    春招秋招,什么是群面和无领导小组讨论
    Jenkins快速了解
    算法打卡01——求两数之和
    YOLOv8改进 | DAttention (DAT)注意力机制实现极限涨点
    目标检测重要概念——IOU、感受野、空洞卷积、mAP
    房地产渠道风控怎么开展
  • 原文地址:https://blog.csdn.net/qq_38308388/article/details/132896153