• CenterPoint 源码流程解读(二)


    接上一篇CenterPoint 源码流程解读(一)

    CenterPoint 源码流程解读(二)

    主要内容:
    二. Backbone – 特征提取
    2.1 voxelize:体素化
    2.2 点云voxel编码方式: PillarFeatureNet(PFN)
    2.3 点云中间编码方式: PointPillarsScatter
    2.4 backbone: SECOND
    三. Neck
    四. Head和loss
    4.1 CenterHead
    4.2 loss

    二. Backbone – 特征提取

    参考激光点云3D目标检测算法之PointPillars

    2.1 voxelize:体素化

    主要实现类Voxelization,将点云转为voxel表征方式。

    • voxels : 30000205, 30000个体素,每个体素20个点,每个点5维度信息
    • coors:体素坐标,30000*3
    • num_points_per_voxel: 每个体素中点的个数
        def forward(ctx,
                    points,
                    voxel_size,
                    coors_range,
                    max_points=35,
                    max_voxels=20000,
                    deterministic=True):
            """convert kitti points(N, >=3) to voxels. 
            """
            if max_points == -1 or max_voxels == -1:
                coors = points.new_zeros(size=(points.size(0), 3), dtype=torch.int)
                dynamic_voxelize(points, coors, voxel_size, coors_range, 3)
                return coors
            else:
                voxels = points.new_zeros(
                    size=(max_voxels, max_points, points.size(1))) #30000,20,5
                coors = points.new_zeros(size=(max_voxels, 3), dtype=torch.int) #30000,3
                num_points_per_voxel = points.new_zeros(
                    size=(max_voxels, ), dtype=torch.int)
                voxel_num = hard_voxelize(points, voxels, coors,
                                          num_points_per_voxel, voxel_size,
                                          coors_range, max_points, max_voxels, 3,
                                          deterministic) # cuda中体素化ops,29249
                # select the valid voxels,去掉空的voxel
                voxels_out = voxels[:voxel_num]
                coors_out = coors[:voxel_num]
                num_points_per_voxel_out = num_points_per_voxel[:voxel_num]  #每个体素中点数
                return voxels_out, coors_out, num_points_per_voxel_out
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28

    2.2 点云voxel编码方式: PillarFeatureNet(PFN)

    主要作用是将点云voxel表征方式进行编码,并建立稠密特征张量。

    将上一步中的体素化点云编码成为10维的向量D(x,y,z,r,delt_t,xc,yc,zc,xp,yp), 其中x,y,z,r,delt_t分别表示点云3个坐标、反射强度、多帧时点的时间戳差值;xc,yc,zc表示到该Pillar中所有点的算术平均值点(中心)的距离,xp,yp表示该点到该Pillar的x,y坐标中心的偏移值,得到一个(P,N,D)稠密张量。再通过多层 PFNLayer = linear线性层 + BatchNorm + ReLU + max pooling,转换得到(P,N,C),N代表每一个pillar中的点数,C代表channel数目,最终经过对每个pillar进行最大池化max_pooling得到(P,C)的张量。

        def forward(self, features, num_points, coors):
            """Forward function.
            """
            features_ls = [features]
            # Find distance of x, y, and z from cluster center,到每个pillar中心点的距离
            if self._with_cluster_center:
                points_mean = features[:, :, :3].sum(
                    dim=1, keepdim=True) / num_points.type_as(features).view(
                        -1, 1, 1)
                f_cluster = features[:, :, :3] - points_mean
                features_ls.append(f_cluster)
    
            # Find distance of x, y, and z from pillar center, 到pillar中心坐标距离
            dtype = features.dtype
            if self._with_voxel_center:
                if not self.legacy:
                    f_center = torch.zeros_like(features[:, :, :2])
                    f_center[:, :, 0] = features[:, :, 0] - (
                        coors[:, 3].to(dtype).unsqueeze(1) * self.vx +
                        self.x_offset)
                    f_center[:, :, 1] = features[:, :, 1] - (
                        coors[:, 2].to(dtype).unsqueeze(1) * self.vy +
                        self.y_offset)
                else:
                    f_center = features[:, :, :2]
                    f_center[:, :, 0] = f_center[:, :, 0] - (
                        coors[:, 3].type_as(features).unsqueeze(1) * self.vx +
                        self.x_offset)
                    f_center[:, :, 1] = f_center[:, :, 1] - (
                        coors[:, 2].type_as(features).unsqueeze(1) * self.vy +
                        self.y_offset)
                features_ls.append(f_center)
            
            #计算点到中心(0,0)距离
            if self._with_distance: 
                points_dist = torch.norm(features[:, :, :3], 2, 2, keepdim=True) 
                features_ls.append(points_dist)
    
            # Combine together feature decorations,合并
            features = torch.cat(features_ls, dim=-1)
            # The feature decorations were calculated without regard to whether
            # pillar was empty. Need to ensure that
            # empty pillars remain set to zeros.
            voxel_count = features.shape[1]
            mask = get_paddings_indicator(num_points, voxel_count, axis=0)
            mask = torch.unsqueeze(mask, -1).type_as(features)
            features *= mask
    
            for pfn in self.pfn_layers:
                features = pfn(features, num_points)
    
            return features.squeeze() #[P,C] 27059, 64
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52

    2.3 点云中间编码方式: PointPillarsScatter

    作用:将学习到的稠密特征[C,P] 还原成伪图像[C,W,H]

        def forward_batch(self, voxel_features, coors, batch_size):
            """Scatter features of single sample.
            """
            # batch_canvas will be the final output.
            batch_canvas = []
            for batch_itt in range(batch_size):
                # Create the canvas for this sample
                canvas = torch.zeros(
                    self.in_channels,
                    self.nx * self.ny,
                    dtype=voxel_features.dtype,
                    device=voxel_features.device)
    
                # Only include non-empty pillars
                batch_mask = coors[:, 0] == batch_itt
                this_coors = coors[batch_mask, :]
                indices = this_coors[:, 2] * self.nx + this_coors[:, 3]
                indices = indices.type(torch.long)
                voxels = voxel_features[batch_mask, :]
                voxels = voxels.t()
    
                # Now scatter the blob back to the canvas.
                canvas[:, indices] = voxels
    
                # Append to a list for later stacking.
                batch_canvas.append(canvas)
    
            # Stack to 3-dim tensor (batch-size, in_channels, nrows*ncols)
            batch_canvas = torch.stack(batch_canvas, 0)
    
            # Undo the column stacking to final 4-dim tensor
            batch_canvas = batch_canvas.view(batch_size, self.in_channels, self.ny,
                                             self.nx)
            return batch_canvas
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34

    2.4 backbone: SECOND

    使用多层的conv+BN+Relu三件套进行特征提取,总共有[4,6,6]层三件套组成, channel维度分别对应[64, 128, 256]。

            blocks = []
            for i, layer_num in enumerate(layer_nums):
                block = [
                    build_conv_layer(
                        conv_cfg,
                        in_filters[i],
                        out_channels[i],
                        3,
                        stride=layer_strides[i],
                        padding=1),
                    build_norm_layer(norm_cfg, out_channels[i])[1],
                    nn.ReLU(inplace=True),
                ]
                for j in range(layer_num):
                    block.append(
                        build_conv_layer(
                            conv_cfg,
                            out_channels[i],
                            out_channels[i],
                            3,
                            padding=1))
                    block.append(build_norm_layer(norm_cfg, out_channels[i])[1])
                    block.append(nn.ReLU(inplace=True))
    
                block = nn.Sequential(*block)
                blocks.append(block)
    
            self.blocks = nn.ModuleList(blocks)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28

    三. Neck

    SECONDFPN, 对Backbone得到特征进行加工和合理利用。主要还是由类似conv+BN+Relu三件套构成,进行上采样解码操作,将上一步channel[64, 128, 256]均变成128,然后合并,得到[B,C,W,H]的张量,此中C为128*3 = 384,结构如下:

      (pts_neck): SECONDFPN(
        (deblocks): ModuleList(
          (0): Sequential(
            (0): Conv2d(64, 128, kernel_size=(2, 2), stride=(2, 2), bias=False)
            (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
            (2): ReLU(inplace=True)
          )
          (1): Sequential(
            (0): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
            (2): ReLU(inplace=True)
          )
          (2): Sequential(
            (0): ConvTranspose2d(256, 128, kernel_size=(2, 2), stride=(2, 2), bias=False)
            (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
            (2): ReLU(inplace=True)
          )
        )
      )
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19

    四. Head和loss

    4.1 CenterHead

    CenterHead, 先经过一个共享卷积,将特征由[B,384,128,128]变为[B,64,128,128]。然后分别对每个任务tasks进行推理,最后得到预测结果字典。

        def forward(self, feats):
            """Forward pass.
            """
            return multi_apply(self.forward_single, feats)
            
        def forward_single(self, x):
            """Forward function for CenterPoint.
            """
            ret_dicts = []
    
            x = self.shared_conv(x) # 共享卷积,三件套
    
            for task in self.task_heads:
                ret_dicts.append(task(x))
            return ret_dicts
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15

    每个大类别,含有一个task,每一个task,对应1个SeparateHead,每个SeparateHead包含6个需要回归的head。故配置中有6个task,6个head,6*6=36个需要回归的head。其中一个SeparateHead结构如下,6个head分别为reg、height、dim、rot、vel、heatmap。最终经过CenterHead处理后,得到关于6个tasks的list。

    注意:因为不同类别,BEV视角下尺寸不同,如car和pedestrian,故将其分为不同的任务;而pedestrian与traffic_cone在BEV视角下,尺寸相近,故作为一个task进行回归

    (0): SeparateHead(
            (reg): Sequential(
              (0): ConvModule(
                (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (activate): ReLU(inplace=True)
              )
              (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            )
            (height): Sequential(
              (0): ConvModule(
                (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (activate): ReLU(inplace=True)
              )
              (1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            )
            (dim): Sequential(
              (0): ConvModule(
                (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (activate): ReLU(inplace=True)
              )
              (1): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            )
            (rot): Sequential(
              (0): ConvModule(
                (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (activate): ReLU(inplace=True)
              )
              (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            )
            (vel): Sequential(
              (0): ConvModule(
                (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (activate): ReLU(inplace=True)
              )
              (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            )
            (heatmap): Sequential(
              (0): ConvModule(
                (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
                (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (activate): ReLU(inplace=True)
              )
              (1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            )
          )
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50

    4.2 loss

    参考CenterHead的loss函数

    针对每一个task,利用gtbox真值和get_targets得到heatmaps, anno_boxes, inds, masks,这四个量含义如下表:

    参数heatmapanno_boxindmask
    说明中心点热图分数框的gt真值框的中心点在热力图中的位置有效box的掩码,1/0划分
    尺寸[class_num, 128, 128][500, 10][500][500]
    取值举例每个class有一张热图10维参数的含义,第1-2维表示中心点的偏移量offset_x、 offset_y,第3维表示中心点的高度z,第4-6维表示目标框的长宽高box_dim,第7-8维表示旋转角度sin(α) cos(α),第9-10维表示速度vx vyind[idx] = x*128 + ymask[idx] = 1

    主要包含两个loss,一个是针对heatmap的focal loss,另一个是针对bbox的L1 loss

        def loss(self, gt_bboxes_3d, gt_labels_3d, preds_dicts, **kwargs):
            """Loss function for CenterHead.
            """
    
            heatmaps, anno_boxes, inds, masks = self.get_targets(
                gt_bboxes_3d, gt_labels_3d)
            loss_dict = dict()
            for task_id, preds_dict in enumerate(preds_dicts):
                # loss1: heatmap focal loss 
                preds_dict[0]['heatmap'] = clip_sigmoid(preds_dict[0]['heatmap'])
                num_pos = heatmaps[task_id].eq(1).float().sum().item()
                loss_heatmap = self.loss_cls(
                    preds_dict[0]['heatmap'],
                    heatmaps[task_id],
                    avg_factor=max(num_pos, 1)) 
                target_box = anno_boxes[task_id]
                # reconstruct the anno_box from multiple reg heads
                preds_dict[0]['anno_box'] = torch.cat(
                    (preds_dict[0]['reg'], preds_dict[0]['height'],
                     preds_dict[0]['dim'], preds_dict[0]['rot'],
                     preds_dict[0]['vel']),
                    dim=1)
    
                # Regression loss for dimension, offset, height, rotation
                ind = inds[task_id]
                num = masks[task_id].float().sum()
                pred = preds_dict[0]['anno_box'].permute(0, 2, 3, 1).contiguous()
                pred = pred.view(pred.size(0), -1, pred.size(3))
                pred = self._gather_feat(pred, ind)
                mask = masks[task_id].unsqueeze(2).expand_as(target_box).float()
                isnotnan = (~torch.isnan(target_box)).float()
                mask *= isnotnan
    
                code_weights = self.train_cfg.get('code_weights', None)
                bbox_weights = mask * mask.new_tensor(code_weights)
                # loss2: bbox loss
                loss_bbox = self.loss_bbox(
                    pred, target_box, bbox_weights, avg_factor=(num + 1e-4))
                loss_dict[f'task{task_id}.loss_heatmap'] = loss_heatmap
                loss_dict[f'task{task_id}.loss_bbox'] = loss_bbox
            return loss_dict
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
  • 相关阅读:
    Python 机器学习入门之K-Means聚类算法
    今天起将正式开始更新JAVA和PYTHON的相关知识
    In-sensor zoom功能调试笔记
    UJNOJ_1307: 数独 [for ACMer]_模拟
    杭州脚本科技公司的面试题【杭州多测师】【杭州多测师_王sir】
    如何在Linux下编写代码和执行程序
    Maven配置环境变量
    代码随想录二刷day42
    开源大数据集群部署(十五)Zookeeper集群部署
    C复习-字符串+字符+字节
  • 原文地址:https://blog.csdn.net/weixin_36354875/article/details/127757761