HRNet 源码分析

论文 Deep High-Resolution Representation Learning for Human Pose Estimation

也是一个 top-down得对于人体姿态估计得检测方法。和Stack hourglass netword, CPN, MSPN等都大致一样。需要先学习一个人体检测器，将每个人都检测出来，然后在送进单个人体姿态估计模型。

从论文名字可以看出，HIgh-Resolution 高分辨率。

Stack hourglass netword, CPN, MSPN 模型结构都有一定得相似性，类型和Unet结构相似，加上一些残差。都经历一个下采样然后在进行上采样得过程。

然后HRNet 稍微有点不同，保持相同大小进行特征传递。每经过一个Transition是多出一个下采样得分支

如图：

模型得代码如下：


class PoseHighResolutionNet(nn.Module):
 
    def __init__(self, cfg, **kwargs):
        pass
 
    def forward(self, x):
        # 下面将x缩小了4倍  两个conv得s=2
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.layer1(x)
 
        x_list = []
        # 每经过一个transition 产生一个下采样分支
        for i in range(self.stage2_cfg['NUM_BRANCHES']):
            if self.transition1[i] is not None:
                x_list.append(self.transition1[i](x))
            else:
                x_list.append(x)
        y_list = self.stage2(x_list)
 
        x_list = []
        # 每经过一个transition 产生一个下采样分支
        for i in range(self.stage3_cfg['NUM_BRANCHES']):
            if self.transition2[i] is not None:
                x_list.append(self.transition2[i](y_list[-1]))
            else:
                x_list.append(y_list[i])
        y_list = self.stage3(x_list)
 
        x_list = []
        # 每经过一个transition 产生一个下采样分支
        for i in range(self.stage4_cfg['NUM_BRANCHES']):
            if self.transition3[i] is not None:
                x_list.append(self.transition3[i](y_list[-1]))
            else:
                x_list.append(y_list[i])
        y_list = self.stage4(x_list)
 
        # 最终模型只取 最后一个stage得 第一层得输出来
        x = self.final_layer(y_list[0])
 
        return x

对于heatmaplabel的生成和其他网络一样采用 2D高斯函数生成


    def generate_target(self, joints, joints_vis):
        '''
        :param joints:  [num_joints, 3]
        :param joints_vis: [num_joints, 3]
        :return: target, target_weight(1: visible, 0: invisible)
        '''
        target_weight = np.ones((self.num_joints, 1), dtype=np.float32)
        target_weight[:, 0] = joints_vis[:, 0]
 
        assert self.target_type == 'gaussian', \
            'Only support gaussian map now!'
 
        if self.target_type == 'gaussian':
            target = np.zeros((self.num_joints,
                               self.heatmap_size[1],
                               self.heatmap_size[0]),
                              dtype=np.float32)
 
            tmp_size = self.sigma * 3
 
            for joint_id in range(self.num_joints):
                feat_stride = self.image_size / self.heatmap_size
                mu_x = int(joints[joint_id][0] / feat_stride[0] + 0.5)
                mu_y = int(joints[joint_id][1] / feat_stride[1] + 0.5)
                # Check that any part of the gaussian is in-bounds
                ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)]
                br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)]
                if ul[0] >= self.heatmap_size[0] or ul[1] >= self.heatmap_size[1] \
                        or br[0] < 0 or br[1] < 0:
                    # If not, just return the image as is
                    target_weight[joint_id] = 0
                    continue
 
                # # Generate gaussian
                # 生成高斯函数进行赋值
                size = 2 * tmp_size + 1
                x = np.arange(0, size, 1, np.float32)
                y = x[:, np.newaxis]
                x0 = y0 = size // 2
                # The gaussian is not normalized, we want the center value to equal 1
                g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * self.sigma ** 2))
 
                # Usable gaussian range
                g_x = max(0, -ul[0]), min(br[0], self.heatmap_size[0]) - ul[0]
                g_y = max(0, -ul[1]), min(br[1], self.heatmap_size[1]) - ul[1]
                # Image range
                img_x = max(0, ul[0]), min(br[0], self.heatmap_size[0])
                img_y = max(0, ul[1]), min(br[1], self.heatmap_size[1])
 
                v = target_weight[joint_id]
                if v > 0.5:
                    target[joint_id][img_y[0]:img_y[1], img_x[0]:img_x[1]] = \
                        g[g_y[0]:g_y[1], g_x[0]:g_x[1]]
 
        if self.use_different_joints_weight:
            target_weight = np.multiply(target_weight, self.joints_weight)
 
        return target, target_weight

对于人体姿态估计网络的损失函数基本都是一样的 MSE


class JointsMSELoss(nn.Module):
    def __init__(self, use_target_weight):
        super(JointsMSELoss, self).__init__()
        # 采用 MSE
        self.criterion = nn.MSELoss(reduction='mean')
        self.use_target_weight = use_target_weight
 
    def forward(self, output, target, target_weight):
        batch_size = output.size(0)
        num_joints = output.size(1)
        heatmaps_pred = output.reshape((batch_size, num_joints, -1)).split(1, 1)
        heatmaps_gt = target.reshape((batch_size, num_joints, -1)).split(1, 1)
        loss = 0
 
        for idx in range(num_joints):
            heatmap_pred = heatmaps_pred[idx].squeeze()
            heatmap_gt = heatmaps_gt[idx].squeeze()
            if self.use_target_weight:
                # 损失计算
                loss += 0.5 * self.criterion(
                    heatmap_pred.mul(target_weight[:, idx]),
                    heatmap_gt.mul(target_weight[:, idx])
                )
            else:
                loss += 0.5 * self.criterion(heatmap_pred, heatmap_gt)
 
        return loss / num_joints

从代码上看其实和其他的top-down网络基本相似，貌似仅仅是在网络结构上进行了一定调整。

网络结构采用并行的传递方式。

基于inference代码基本上和其他top-down网络代码相似。

有兴趣可以看看其他几篇top-down网络的源码分析

Rethinking on Multi-Stage Networks for Human Pose Estimation 源码分析_那时那月那人的博客-CSDN博客Rethinking on Multi-Stage Networks for Human Pose Estimation 源码分析https://blog.csdn.net/xiaoxu1025/article/details/127840623 CPN-Cascaded Pyramid Network for Multi-Person Pose Estimation 源码分析_那时那月那人的博客-CSDN博客CPN 源码分析https://blog.csdn.net/xiaoxu1025/article/details/127838074 Stacked Hourglass Networks for Human Pose Estimation 源码分析_那时那月那人的博客-CSDN博客Stacked Hourglass Networks for Human Pose Estimation 源码分析从源码分析 Stacked Hourglass Networks 在人体检测方向得具体实现https://blog.csdn.net/xiaoxu1025/article/details/127835690

到此基于 top-down 方法的人体姿态检测网络模型告一段落。

如果对采用bottom-up方式的HigherHRNet感兴趣可以移步下面链接

HigherHRNet 源码分析_那时那月那人的博客-CSDN博客

相关阅读:
Unity UI Toolkit学习笔记-EditorWindow
17.前端笔记-CSS-定位
 Hive主要介绍
 python---三目运算符
 【Final Project】Kitti的双目视觉里程计（2）重读
 计算机应用专业，报软考应该选什么？
2022/7/27 考试总结
 FPGA - 7系列 FPGA SelectIO -05- 逻辑资源之OLOGIC
什么是超声波清洗机？工作原理是什么？2023年超声波清洗机推荐
 软考中级软件设计师--6.设计模式
原文地址：https://blog.csdn.net/xiaoxu1025/article/details/127843498