Stacked Hourglass Networks for Human Pose Estimation 源码分析

基于top-down方法的人体姿态估计模型源码解析

GitHub - princeton-vl/pytorch_stacked_hourglass: Pytorch implementation of "Stacked Hourglass Networks for Human Pose Estimation"Pytorch implementation of "Stacked Hourglass Networks for Human Pose Estimation" - GitHub - princeton-vl/pytorch_stacked_hourglass: Pytorch implementation of "Stacked Hourglass Networks for Human Pose Estimation"https://github.com/princeton-vl/pytorch_stacked_hourglass

人体姿态估计一般分为两个方向： Top-down 和 bottom-up

top-down 方法依赖于目标检测需要先检测出一个个人然后对单个人进行后续姿态估计

比如： Stacked Hourglass Networks 、Cascaded Pyramid Network、CPN 、MSPN、HRNet等等。

bottom-up 和 top-down 相反先确定人然后在进行分组比如 open-pose、HigherHRNet等等。

首先我们分析下网络结构：


class PoseNet(nn.Module):
    def __init__(self, nstack, inp_dim, oup_dim, bn=False, increase=0, **kwargs):
        super(PoseNet, self).__init__()
        
        self.nstack = nstack
        self.pre = nn.Sequential(
            Conv(3, 64, 7, 2, bn=True, relu=True),
            Residual(64, 128),
            Pool(2, 2),
            Residual(128, 128),
            Residual(128, inp_dim)
        )
        
        self.hgs = nn.ModuleList( [
        nn.Sequential(
            Hourglass(4, inp_dim, bn, increase),
        ) for i in range(nstack)] )
        
        self.features = nn.ModuleList( [
        nn.Sequential(
            Residual(inp_dim, inp_dim),
            Conv(inp_dim, inp_dim, 1, bn=True, relu=True)
        ) for i in range(nstack)] )
        
        self.outs = nn.ModuleList( [Conv(inp_dim, oup_dim, 1, relu=False, bn=False) for i in range(nstack)] )
        self.merge_features = nn.ModuleList( [Merge(inp_dim, inp_dim) for i in range(nstack-1)] )
        self.merge_preds = nn.ModuleList( [Merge(oup_dim, inp_dim) for i in range(nstack-1)] )
        self.nstack = nstack
        self.heatmapLoss = HeatmapLoss()
 
    def forward(self, imgs):
        ## our posenet
        # shape (B, H, W, C) -> (B, C, H, W)
        x = imgs.permute(0, 3, 1, 2) #x of size 1,3,inpdim,inpdim
        # 图片缩小四倍 经过一个 k=7, s=2得卷积核缩小2倍 接一个 残差块
        # 然后经过一个池化层在缩小2倍  通道输变成 256  然后在接 两个残差块
        # shape (B, 256, H // 4, W // 4)
        x = self.pre(x)
        combined_hm_preds = []
        # 堆叠得 stack hourglasses 层数 可以自己设置 这里 是 8 个
        for i in range(self.nstack):
            # 这里 就是一个 类似 unet结构得残差连接
            # 先下采样到 H // (4 * 8) * W // (4 * 8) 然后在上采用到 (H // 4, W // 4)
            # shape: (B, H // 4, W // 4, 256)
            hg = self.hgs[i](x)
            # shape: (B, 256, H // 4, W // 4)
            feature = self.features[i](hg)
            # shape: (B, num_joints, H // 4, W // 4) 连接数
            preds = self.outs[i](feature)
            combined_hm_preds.append(preds)
            # 对于前面得堆叠让其进行 预测 也就是 论文中所述： 中间监督 让网络越来越好
            if i < self.nstack - 1:
                # 让 heatmap 和 feature 融合 进入下一个 hourglasses
                x = x + self.merge_preds[i](preds) + self.merge_features[i](feature)
        # 将所有堆叠得 hourglasse 输出返回 
        return torch.stack(combined_hm_preds, 1)

现在网络结构和输出都有了我们来看下网络得损失函数损失函数很简单就是用MSE


 def calc_loss(self, combined_hm_preds, heatmaps):
        combined_loss = []
        # 对每个堆叠块 进行损失计算
        for i in range(self.nstack):
            # 计算每个堆叠块得损失
            combined_loss.append(self.heatmapLoss(combined_hm_preds[0][:,i], heatmaps))
        combined_loss = torch.stack(combined_loss, dim=1)
        return combined_loss
 
class HeatmapLoss(torch.nn.Module):
    """
    loss for detection heatmap
    """
    def __init__(self):
        super(HeatmapLoss, self).__init__()
 
    def forward(self, pred, gt):
        """
        pred： shape (B, num_joints, H, W)
        gt shape (B, num_joints, H, W)
        """
        # 就是 简单得平方差 计算
        l = ((pred - gt)**2)
        l = l.mean(dim=3).mean(dim=2).mean(dim=1)
        return l ## l of dim bsize

最后分析下 heatmap 对应得groundtruth怎么生成得。其实很简单就是用二维高斯函数来计算周围点到关键点得距离。当然heatmap如果很大计算所有点没必要。所有只需要就算距离关键点（x,y）一定范围内得距离即可。为什么不直接设置关键点为1 其他点都为0 这样导致负样本过多，正样本只有一个，模型无法学习。


class GenerateHeatmap():
    def __init__(self, output_res, num_parts):
        self.output_res = output_res
        self.num_parts = num_parts
        # 计算一个一定范围得二维高斯函数
        sigma = self.output_res/64 # 这里 sigma = 1
        self.sigma = sigma
        size = 6*sigma + 3 # size = 9  一般都是取一奇数 这样有中心点 也就是对应关键点得位置
        x = np.arange(0, size, 1, float)
        y = x[:, np.newaxis]
        x0, y0 = 3*sigma + 1, 3*sigma + 1
        # 得到一个  (size, size) 得 二维高斯函数  中心点值为1 其余点按照高斯分布降低
        self.g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * sigma ** 2))
 
    def __call__(self, keypoints):
        # （num_joints, H, W）
        hms = np.zeros(shape = (self.num_parts, self.output_res, self.output_res), dtype = np.float32)
        sigma = self.sigma
        for p in keypoints:
            for idx, pt in enumerate(p):
                if pt[0] > 0: 
                    x, y = int(pt[0]), int(pt[1])
                    if x<0 or y<0 or x>=self.output_res or y>=self.output_res:
                        continue
                    # 取一个 范围 来用上面计算得高斯函数来进行覆盖 
                    ul = int(x - 3*sigma - 1), int(y - 3*sigma - 1)
                    br = int(x + 3*sigma + 2), int(y + 3*sigma + 2)
 
                    c,d = max(0, -ul[0]), min(br[0], self.output_res) - ul[0]
                    a,b = max(0, -ul[1]), min(br[1], self.output_res) - ul[1]
 
                    cc,dd = max(0, ul[0]), min(br[0], self.output_res)
                    aa,bb = max(0, ul[1]), min(br[1], self.output_res)
                    # 用 self.g 来进行赋值
                    hms[idx, aa:bb,cc:dd] = np.maximum(hms[idx, aa:bb,cc:dd], self.g[a:b,c:d])
        return hms

这是一个经典得 top-down 形式人体姿态估计网络。

相关阅读:
[计算机入门] Windows附件程序介绍(办公类)
Linux常见指令(1)
python基础教程视频学习如何使用Python编程语言
 【论文阅读笔记】NTIRE 2022 Burst Super-Resolution Challenge
【畅购商城】购物车模块之查看购物车
 Android拖放startDragAndDrop拖拽onDrawShadow动态添加View，Kotlin（3）
探索安全之道 | 企业漏洞管理：从理念到行动
 Ajax技术【Ajax技术详解、 Ajax 的使用、Ajax请求、 JSON详解、JACKSON 的使用】(一)-全面详解（学习总结---从入门到深化）
java（JVM）
国科云：什么是DHCP？DHCP是怎么工作的？
原文地址：https://blog.csdn.net/xiaoxu1025/article/details/127835690

Stacked Hourglass Networks for Human Pose Estimation 源码分析

人体姿态估计 一般分为两个方向： Top-down 和 bottom-up

人体姿态估计一般分为两个方向： Top-down 和 bottom-up