在kaggle比赛中,不管是目标检测任务、语义分割任务中,经常能看到SWA(Stochastic Weight Averaging)和EMA(Exponential Moving Average)的身影,今天就来一起学习下。
SWA随机权重平均:在优化的末期取k个优化轨迹上的checkpoints,平均他们的权重,得到最终的网络权重,这样就会使得最终的权重位于flat曲面更中心的位置,缓解权重震荡问题,获得一个更加平滑的解,相比于传统训练有更泛化的解。
1.给定超参数:
2.然后,按照正常的SGD标准流程进行训练,每训练c步,就平均一次权重
3.最后,使用平均的权重 wSWA 权重进行推理。
- import torch
- import torch.nn as nn
-
-
- def apply_swa(model: nn.Module,
- checkpoint_list: list,
- weight_list: list,
- strict: bool = True):
- """
- :param model:
- :param checkpoint_list: 要进行swa的模型路径列表
- :param weight_list: 每个模型对应的权重
- :param strict: 输入模型权重与checkpoint是否需要完全匹配
- :return:
- """
-
- checkpoint_tensor_list = [torch.load(f, map_location='cpu') for f in checkpoint_list]
-
- for name, param in model.named_parameters():
- try:
- param.data = sum([ckpt['model'][name] * w for ckpt, w in zip(checkpoint_tensor_list, weight_list)])
- except KeyError:
- if strict:
- raise KeyError(f"Can't match '{name}' from checkpoint")
- else:
- print(f"Can't match '{name}' from checkpoint")
-
- return model
EMA指数移动平均:shadow权重是通过历史的模型权重指数加权平均数来累积的,每次shadow权重的更新都会受上一次shadow权重的影响,所以shadow权重的更新都会带有前几次模型权重的惯性,历史权重越久远,其重要性就越小,这样可以使得权重更新更加平滑。
从上述公式来看,shadow权重的更新大部分由累积的权重决定,小部分由当前权重决定。
- import torch
- import torch.nn as nn
- from torch.utils.data import DataLoader
-
-
- class EMA:
- def __init__(self, model: nn.Module,
- decay: float = 0.999):
- self.model = model
- self.decay = decay
- self.shadow = {}
- self.backup = {}
-
- def register(self):
- """创建shadow权重"""
- for name, param in self.model.named_parameters():
- if param.requires_grad:
- self.shadow[name] = param.data.clone()
-
- def update(self):
- """EMA平滑操作,更新shadow权重"""
- for name, param in self.model.named_parameters():
- if param.requires_grad:
- assert name in self.shadow
- new_average = (1.0 - self.decay) * param.data + self.decay * self.shadow[name]
- self.shadow[name] = new_average.clone()
-
- def apply_shadow(self):
- """使用shadow权重作为模型权重,并创建原模型权重备份"""
- for name, param in self.model.named_parameters():
- if param.requires_grad:
- assert name in self.shadow
- self.backup[name] = param.data
- param.data = self.shadow[name]
-
- def restore(self):
- """恢复模型权重"""
- for name, param in self.model.named_parameters():
- if param.requires_grad:
- assert name in self.backup
- param.data = self.backup[name]
- self.backup = {}