• Mosaic数据增强


    paper:YOLOv4: Optimal Speed and Accuracy of Object Detection

    mosaic data augmentation最早是在YOLO v4的文章中提出的,但其实在ultralytics-YOLOv3中就已经实现了。具体就是将4张样本拼接成一张图,具有以下优点:(1)丰富一张图上的信息(2)增强后一张图上包含四张图的信息,减少了对大batch_size的依赖(3)通常小目标的检测效果要比大目标差,将四张图放到 一张图中,相当于变相扩充了数据集中小目标的样本数量。

    下面是YOLOv4 paper中给出的一些mosaic增强的示例图

    下面以mmdetection中的实现为例,介绍一下具体实现 

    在mmdet中要使用Mosaic,需要同时使用MultiImageMixDataset。原本results字典中保存的是一张图的相关信息包括img、gt_bboxes、gt_labels等,在MultiImageMixDataset类中调用Mosaic类中的get_indexes方法,随机再挑出其它三张图的索引。然后将这3张图的信息放到列表中作为key 'mix_results'的value加到原始的results中,这样results就包含了4张图的信息。

    mosaic的具体实现在函数_mosaic_transform中,具体步骤如下:

    1. 创建一个两倍img_scale大小的空图mosaic_img,值为pad_val,默认为114。img_scale是config文件中预先设定的模型输入大小。
    2. 确定四张图的的交点,将空图分为左上、右上、右下、左下四块。
    3. 将原始图片保持宽高比例resize到模型输入大小img_scale。
    4. 将图片贴到mosaic_img中,四张图相交于中心点,对于超过mosaic_img范围的部分截断。
    5. 调整每张小图的gt_bboxes坐标。

    完整代码如下

    1. class Mosaic:
    2. """Mosaic augmentation.
    3. Given 4 images, mosaic transform combines them into
    4. one output image. The output image is composed of the parts from each sub-
    5. image.
    6. .. code:: text
    7. mosaic transform
    8. center_x
    9. +------------------------------+
    10. | pad | pad |
    11. | +-----------+ |
    12. | | | |
    13. | | image1 |--------+ |
    14. | | | | |
    15. | | | image2 | |
    16. center_y |----+-------------+-----------|
    17. | | cropped | |
    18. |pad | image3 | image4 |
    19. | | | |
    20. +----|-------------+-----------+
    21. | |
    22. +-------------+
    23. The mosaic transform steps are as follows:
    24. 1. Choose the mosaic center as the intersections of 4 images
    25. 2. Get the left top image according to the index, and randomly
    26. sample another 3 images from the custom dataset.
    27. 3. Sub image will be cropped if image is larger than mosaic patch
    28. Args:
    29. img_scale (Sequence[int]): Image size after mosaic pipeline of single
    30. image. The shape order should be (height, width).
    31. Default to (640, 640).
    32. center_ratio_range (Sequence[float]): Center ratio range of mosaic
    33. output. Default to (0.5, 1.5).
    34. min_bbox_size (int | float): The minimum pixel for filtering
    35. invalid bboxes after the mosaic pipeline. Default to 0.
    36. bbox_clip_border (bool, optional): Whether to clip the objects outside
    37. the border of the image. In some dataset like MOT17, the gt bboxes
    38. are allowed to cross the border of images. Therefore, we don't
    39. need to clip the gt bboxes in these cases. Defaults to True.
    40. skip_filter (bool): Whether to skip filtering rules. If it
    41. is True, the filter rule will not be applied, and the
    42. `min_bbox_size` is invalid. Default to True.
    43. pad_val (int): Pad value. Default to 114.
    44. prob (float): Probability of applying this transformation.
    45. Default to 1.0.
    46. """
    47. def __init__(self,
    48. img_scale=(640, 640),
    49. center_ratio_range=(0.5, 1.5),
    50. min_bbox_size=0,
    51. bbox_clip_border=True,
    52. skip_filter=True,
    53. pad_val=114,
    54. prob=1.0):
    55. assert isinstance(img_scale, tuple)
    56. assert 0 <= prob <= 1.0, 'The probability should be in range [0,1]. '\
    57. f'got {prob}.'
    58. log_img_scale(img_scale, skip_square=True)
    59. self.img_scale = img_scale
    60. self.center_ratio_range = center_ratio_range
    61. self.min_bbox_size = min_bbox_size
    62. self.bbox_clip_border = bbox_clip_border
    63. self.skip_filter = skip_filter
    64. self.pad_val = pad_val
    65. self.prob = prob
    66. def __call__(self, results):
    67. """Call function to make a mosaic of image.
    68. Args:
    69. results (dict): Result dict.
    70. Returns:
    71. dict: Result dict with mosaic transformed.
    72. """
    73. if random.uniform(0, 1) > self.prob:
    74. return results
    75. results = self._mosaic_transform(results)
    76. return results
    77. def get_indexes(self, dataset):
    78. """Call function to collect indexes.
    79. Args:
    80. dataset (:obj:`MultiImageMixDataset`): The dataset.
    81. Returns:
    82. list: indexes.
    83. """
    84. indexes = [random.randint(0, len(dataset)) for _ in range(3)]
    85. return indexes
    86. def _mosaic_transform(self, results):
    87. """Mosaic transform function.
    88. Args:
    89. results (dict): Result dict.
    90. Returns:
    91. dict: Updated result dict.
    92. """
    93. assert 'mix_results' in results
    94. mosaic_labels = []
    95. mosaic_bboxes = []
    96. if len(results['img'].shape) == 3:
    97. mosaic_img = np.full(
    98. (int(self.img_scale[0] * 2), int(self.img_scale[1] * 2), 3),
    99. self.pad_val,
    100. dtype=results['img'].dtype)
    101. else:
    102. mosaic_img = np.full(
    103. (int(self.img_scale[0] * 2), int(self.img_scale[1] * 2)),
    104. self.pad_val,
    105. dtype=results['img'].dtype)
    106. # mosaic center x, y
    107. center_x = int(
    108. random.uniform(*self.center_ratio_range) * self.img_scale[1])
    109. center_y = int(
    110. random.uniform(*self.center_ratio_range) * self.img_scale[0])
    111. center_position = (center_x, center_y)
    112. loc_strs = ('top_left', 'top_right', 'bottom_left', 'bottom_right')
    113. for i, loc in enumerate(loc_strs):
    114. if loc == 'top_left':
    115. results_patch = copy.deepcopy(results)
    116. else:
    117. results_patch = copy.deepcopy(results['mix_results'][i - 1])
    118. img_i = results_patch['img']
    119. h_i, w_i = img_i.shape[:2]
    120. # keep_ratio resize
    121. scale_ratio_i = min(self.img_scale[0] / h_i,
    122. self.img_scale[1] / w_i)
    123. img_i = mmcv.imresize(
    124. img_i, (int(w_i * scale_ratio_i), int(h_i * scale_ratio_i)))
    125. # compute the combine parameters
    126. paste_coord, crop_coord = self._mosaic_combine(
    127. loc, center_position, img_i.shape[:2][::-1])
    128. x1_p, y1_p, x2_p, y2_p = paste_coord
    129. x1_c, y1_c, x2_c, y2_c = crop_coord
    130. # crop and paste image
    131. mosaic_img[y1_p:y2_p, x1_p:x2_p] = img_i[y1_c:y2_c, x1_c:x2_c]
    132. # adjust coordinate
    133. gt_bboxes_i = results_patch['gt_bboxes']
    134. gt_labels_i = results_patch['gt_labels']
    135. if gt_bboxes_i.shape[0] > 0:
    136. padw = x1_p - x1_c
    137. padh = y1_p - y1_c
    138. gt_bboxes_i[:, 0::2] = \
    139. scale_ratio_i * gt_bboxes_i[:, 0::2] + padw
    140. gt_bboxes_i[:, 1::2] = \
    141. scale_ratio_i * gt_bboxes_i[:, 1::2] + padh
    142. mosaic_bboxes.append(gt_bboxes_i)
    143. mosaic_labels.append(gt_labels_i)
    144. if len(mosaic_labels) > 0:
    145. mosaic_bboxes = np.concatenate(mosaic_bboxes, 0)
    146. mosaic_labels = np.concatenate(mosaic_labels, 0)
    147. if self.bbox_clip_border: # True
    148. mosaic_bboxes[:, 0::2] = np.clip(mosaic_bboxes[:, 0::2], 0,
    149. 2 * self.img_scale[1])
    150. mosaic_bboxes[:, 1::2] = np.clip(mosaic_bboxes[:, 1::2], 0,
    151. 2 * self.img_scale[0])
    152. if not self.skip_filter: # True
    153. mosaic_bboxes, mosaic_labels = \
    154. self._filter_box_candidates(mosaic_bboxes, mosaic_labels)
    155. # remove outside bboxes
    156. inside_inds = find_inside_bboxes(mosaic_bboxes, 2 * self.img_scale[0],
    157. 2 * self.img_scale[1])
    158. mosaic_bboxes = mosaic_bboxes[inside_inds]
    159. mosaic_labels = mosaic_labels[inside_inds]
    160. results['img'] = mosaic_img
    161. results['img_shape'] = mosaic_img.shape
    162. results['gt_bboxes'] = mosaic_bboxes
    163. results['gt_labels'] = mosaic_labels
    164. return results
    165. def _mosaic_combine(self, loc, center_position_xy, img_shape_wh):
    166. """Calculate global coordinate of mosaic image and local coordinate of
    167. cropped sub-image.
    168. Args:
    169. loc (str): Index for the sub-image, loc in ('top_left',
    170. 'top_right', 'bottom_left', 'bottom_right').
    171. center_position_xy (Sequence[float]): Mixing center for 4 images,
    172. (x, y).
    173. img_shape_wh (Sequence[int]): Width and height of sub-image
    174. Returns:
    175. tuple[tuple[float]]: Corresponding coordinate of pasting and
    176. cropping
    177. - paste_coord (tuple): paste corner coordinate in mosaic image.
    178. - crop_coord (tuple): crop corner coordinate in mosaic image.
    179. """
    180. assert loc in ('top_left', 'top_right', 'bottom_left', 'bottom_right')
    181. if loc == 'top_left':
    182. # index0 to top left part of image
    183. x1, y1, x2, y2 = max(center_position_xy[0] - img_shape_wh[0], 0), \
    184. max(center_position_xy[1] - img_shape_wh[1], 0), \
    185. center_position_xy[0], \
    186. center_position_xy[1]
    187. crop_coord = img_shape_wh[0] - (x2 - x1), img_shape_wh[1] - (
    188. y2 - y1), img_shape_wh[0], img_shape_wh[1]
    189. elif loc == 'top_right':
    190. # index1 to top right part of image
    191. x1, y1, x2, y2 = center_position_xy[0], \
    192. max(center_position_xy[1] - img_shape_wh[1], 0), \
    193. min(center_position_xy[0] + img_shape_wh[0],
    194. self.img_scale[1] * 2), \
    195. center_position_xy[1]
    196. crop_coord = 0, img_shape_wh[1] - (y2 - y1), min(
    197. img_shape_wh[0], x2 - x1), img_shape_wh[1]
    198. elif loc == 'bottom_left':
    199. # index2 to bottom left part of image
    200. x1, y1, x2, y2 = max(center_position_xy[0] - img_shape_wh[0], 0), \
    201. center_position_xy[1], \
    202. center_position_xy[0], \
    203. min(self.img_scale[0] * 2, center_position_xy[1] +
    204. img_shape_wh[1])
    205. crop_coord = img_shape_wh[0] - (x2 - x1), 0, img_shape_wh[0], min(
    206. y2 - y1, img_shape_wh[1])
    207. else:
    208. # index3 to bottom right part of image
    209. x1, y1, x2, y2 = center_position_xy[0], \
    210. center_position_xy[1], \
    211. min(center_position_xy[0] + img_shape_wh[0],
    212. self.img_scale[1] * 2), \
    213. min(self.img_scale[0] * 2, center_position_xy[1] +
    214. img_shape_wh[1])
    215. crop_coord = 0, 0, min(img_shape_wh[0],
    216. x2 - x1), min(y2 - y1, img_shape_wh[1])
    217. paste_coord = x1, y1, x2, y2
    218. return paste_coord, crop_coord
    219. def _filter_box_candidates(self, bboxes, labels):
    220. """Filter out bboxes too small after Mosaic."""
    221. bbox_w = bboxes[:, 2] - bboxes[:, 0]
    222. bbox_h = bboxes[:, 3] - bboxes[:, 1]
    223. valid_inds = (bbox_w > self.min_bbox_size) & \
    224. (bbox_h > self.min_bbox_size)
    225. valid_inds = np.nonzero(valid_inds)[0]
    226. return bboxes[valid_inds], labels[valid_inds]
    227. def __repr__(self):
    228. repr_str = self.__class__.__name__
    229. repr_str += f'img_scale={self.img_scale}, '
    230. repr_str += f'center_ratio_range={self.center_ratio_range}, '
    231. repr_str += f'pad_val={self.pad_val}, '
    232. repr_str += f'min_bbox_size={self.min_bbox_size}, '
    233. repr_str += f'skip_filter={self.skip_filter})'
    234. return repr_str

    因为这里的mosaic_img大小是img_scale的两倍,因为在mmdet中mosaic还需要同时结合RandomAffine使用,randomaffine中包含旋转、缩放、平移、剪切操作,其中包含参数border=(-img_scale[0] // 2, -img_scale[1] // 2)),这里就使仿射变换后的输出大小为img_scale。

  • 相关阅读:
    Express 6 指南 - 路由 6.3 路线路径 Route paths
    购物车——js小项目实例
    【C++】C++入门
    选择排序的简单理解
    刷代码随想录有感(118):动态规划——打家劫舍II
    [Python进阶] 操纵鼠标:Pynput
    web安全之MySQL手工注入的原理讲解和实验分析
    【2023复旦微电子提前批笔试题】~ 题目及参考答案
    Stm32_标准库_8_ADC_光敏传感器_测量具体光照强度
    Auto.js 清除指定应用缓存
  • 原文地址:https://blog.csdn.net/ooooocj/article/details/127949794