paper:mixup: Beyond Empirical Risk Minimization
解决这个问题的一个方法是邻域风险最小化(Vicinal Risk Minimization, VRM),即通过数据增强在原始样本的基础上构造更多的样本,但数据增强中需要人类知识来描述训练数据中每个样本的邻域,比如翻转、缩放等。因此VRM也有两点不足
针对上述问题,本文提出一种data-agnostic的数据增强方法mixup,
![]()
其中
此外,作者提到了一些通过实验得到的结论
这里通过roll方法将batch内的图片向后平移一个,然后与原batch进行mixup,相当于batch内的每张图片都和相邻的一张进行mixup,roll方法详见 torch.roll()
- class RandomMixup(torch.nn.Module):
- """Randomly apply Mixup to the provided batch and targets.
- The class implements the data augmentations as described in the paper
- `"mixup: Beyond Empirical Risk Minimization"
`_. - Args:
- num_classes (int): number of classes used for one-hot encoding.
- p (float): probability of the batch being transformed. Default value is 0.5.
- alpha (float): hyperparameter of the Beta distribution used for mixup.
- Default value is 1.0.
- inplace (bool): boolean to make this transform inplace. Default set to False.
- """
-
- def __init__(self, num_classes: int, p: float = 0.5, alpha: float = 1.0, inplace: bool = False) -> None:
- super().__init__()
-
- if num_classes < 1:
- raise ValueError(
- f"Please provide a valid positive value for the num_classes. Got num_classes={num_classes}"
- )
-
- if alpha <= 0:
- raise ValueError("Alpha param can't be zero.")
-
- self.num_classes = num_classes
- self.p = p
- self.alpha = alpha
- self.inplace = inplace
-
- def forward(self, batch: Tensor, target: Tensor) -> Tuple[Tensor, Tensor]:
- """
- Args:
- batch (Tensor): Float tensor of size (B, C, H, W)
- target (Tensor): Integer tensor of size (B, )
- Returns:
- Tensor: Randomly transformed batch.
- """
- if batch.ndim != 4:
- raise ValueError(f"Batch ndim should be 4. Got {batch.ndim}")
- if target.ndim != 1:
- raise ValueError(f"Target ndim should be 1. Got {target.ndim}")
- if not batch.is_floating_point():
- raise TypeError(f"Batch dtype should be a float tensor. Got {batch.dtype}.")
- if target.dtype != torch.int64:
- raise TypeError(f"Target dtype should be torch.int64. Got {target.dtype}")
-
- if not self.inplace:
- batch = batch.clone()
- target = target.clone()
-
- if target.ndim == 1:
- target = torch.nn.functional.one_hot(target, num_classes=self.num_classes).to(dtype=batch.dtype)
-
- if torch.rand(1).item() >= self.p:
- return batch, target
-
- # It's faster to roll the batch by one instead of shuffling it to create image pairs
- batch_rolled = batch.roll(1, 0)
- target_rolled = target.roll(1, 0)
-
- # Implemented as on mixup paper, page 3.
- lambda_param = float(torch._sample_dirichlet(torch.tensor([self.alpha, self.alpha]))[0])
- batch_rolled.mul_(1.0 - lambda_param)
- batch.mul_(lambda_param).add_(batch_rolled)
-
- target_rolled.mul_(1.0 - lambda_param)
- target.mul_(lambda_param).add_(target_rolled)
-
- return batch, target
-
- def __repr__(self) -> str:
- s = (
- f"{self.__class__.__name__}("
- f"num_classes={self.num_classes}"
- f", p={self.p}"
- f", alpha={self.alpha}"
- f", inplace={self.inplace}"
- f")"
- )
- return s
这里是通过randperm将batch内的图片打乱,然后与原batch进行mixup,并且得到\(\lambda\)的方法与torchvision也不一样。
- class BatchMixupLayer(BaseMixupLayer):
- r"""Mixup layer for a batch of data.
- Mixup is a method to reduces the memorization of corrupt labels and
- increases the robustness to adversarial examples. It's
- proposed in `mixup: Beyond Empirical Risk Minimization
-
` - This method simply linearly mix pairs of data and their labels.
- Args:
- alpha (float): Parameters for Beta distribution to generate the
- mixing ratio. It should be a positive number. More details
- are in the note.
- num_classes (int): The number of classes.
- prob (float): The probability to execute mixup. It should be in
- range [0, 1]. Default sto 1.0.
- Note:
- The :math:`\alpha` (``alpha``) determines a random distribution
- :math:`Beta(\alpha, \alpha)`. For each batch of data, we sample
- a mixing ratio (marked as :math:`\lambda`, ``lam``) from the random
- distribution.
- """
-
- def __init__(self, *args, **kwargs):
- super(BatchMixupLayer, self).__init__(*args, **kwargs)
-
- def mixup(self, img, gt_label):
- one_hot_gt_label = one_hot_encoding(gt_label, self.num_classes)
- lam = np.random.beta(self.alpha, self.alpha)
- batch_size = img.size(0)
- index = torch.randperm(batch_size)
-
- mixed_img = lam * img + (1 - lam) * img[index, :]
- mixed_gt_label = lam * one_hot_gt_label + (
- 1 - lam) * one_hot_gt_label[index, :]
-
- return mixed_img, mixed_gt_label
-
- def __call__(self, img, gt_label):
- return self.mixup(img, gt_label)
在文章Bag of Freebies for Training Object Detection Neural Networks 中,对两张图片mixup后只是合并了两张图中的所有gt box,并没有对类别标签进行mixup。但文章提到"weighted loss indicates the overall loss is the summation of multiple objects with ratio 0 to 1 according to image blending ratio they belong to in the original training images",即在计算loss时对每个物体的loss按mixup时的系数进行加权求和。
