Margin Based Loss

17-ICCV-Sampling Matters in Deep Embedding Learning

Preliminaries

contrastive loss

triplet loss

hard negative mining

semi-hard negative mining

Distance weighted sampling

Margin based loss

Relationship to isotonic regression

17-ICCV-Sampling Matters in Deep Embedding Learning

Preliminaries

contrastive loss

正样本尽可能近，负样本被固定距离α隔开

visually diverse classes are embedded in the same small space as visually similar ones. The embedding space does not allow for distortions.

triplet loss

loss+sampling strategy

embedding space to be arbitrarily distorted
does not impose a constant margin α

hard negative mining

——>模型坍塌

semi-hard negative mining

online selection：one triplet is sampled for every (a, p) pair

offline selection：a batch has 1=3 of images as anchors, positives, and negatives respectively

如果有正确的采样策略，简单的pairwse loss也是高效的。

Distance weighted sampling

n维单位球面成对距离分布：球面上取一固定点a，随机在球面上另选一点，这个点和a之间距离（余弦距离？球面上的距离？）为d的概率

在高维空间中q(d)接近正态分布

如果负样本均匀分散，随机抽样最可能得到的样本；阈值<，没有loss，训练不会有进展。

负样本梯度

决定梯度方向，如果很小（困难样本），有噪音z，梯度方向会被噪音主导。

在anchor为a时负样本n被选中的概率

根据距离均匀采样，权重（采样概率与出现的概率成反比，随机采样到的概率越大乘的权重越小才能保证均匀）【避免采样都聚集在】
用λ切断权重采样【过近或过远的样本随机采样到的概率较小，对应的权重会比较大。为了避免噪音样本，设定λ限制两端的样本权重不会过大】

距离加权采样提供较大的距离范围，在控制方差的同时，稳定地生成信息丰富的示例。


  def inverse_sphere_distances(self, batch, anchor_to_all_dists, labels, anchor_label):
            dists        = anchor_to_all_dists
            bs,dim       = len(dists),batch.shape[-1]
 
            #negated log-distribution of distances of unit sphere in dimension <dim>
            log_q_d_inv = ((2.0 - float(dim)) * torch.log(dists) - (float(dim-3) / 2) * torch.log(1.0 - 0.25 * (dists.pow(2))))
            log_q_d_inv[np.where(labels==anchor_label)[0]] = 0
 
            q_d_inv     = torch.exp(log_q_d_inv - torch.max(log_q_d_inv)) # - max(log) for stability
            q_d_inv[np.where(labels==anchor_label)[0]] = 0
 
            ### NOTE: Cutting of values with high distances made the results slightly worse. It can also lead to
            # errors where there are no available negatives (for high samples_per_class cases).
            # q_d_inv[np.where(dists.detach().cpu().numpy()>self.upper_cutoff)[0]]    = 0
 
            q_d_inv = q_d_inv/q_d_inv.sum()
            return q_d_inv.detach().cpu().numpy()

【实际实现：权重log分布，没有λ切断可能更好】

Margin based loss

triplet loss没有预设阈值α分离相似和不相似的图片，灵活的扭曲空间容忍异常值，适应不同水平的不同类的类内方差。
triplet loss只需要正样本距离anchor比负样本近，contrastive loss使所有正样本尽量近是没必要的（细粒度识别允许类内差异，图像检索只需要相对关系）。
hard negative mining易导致模型坍塌：困难正样本对大吸引梯度，困难负样本对小排斥梯度，所有点聚在同一点（4b）。
真实距离代替平方距离，对所有嵌入的梯度长度都为1（4c）。