【OpenPCDet】稀疏卷积SPConv-v1.2代码解读(4)

【建立rulebook(接上篇)】

getSubMIndicePairsKernel3核函数在src/spconv/indice.cu中定义，它是一个模板函数。只是我没有太理解卷积核个维度的大小K0,K1,K2怎么也定义在模板参数里,直接作为函数参数不一样么?好吧,不要在意这些细节，上硬菜!

摘自：src/spconv/indice.cu


  1 template <typename Index, typename IndexGrid, unsigned K0, unsigned K1, unsigned K2>                                                                                                                           
  2 __global__ void getSubMIndicePairsKernel3(tv::TensorView<const Index> indicesIn, 
  3                                           tv::TensorView gridsOut,
  4                                           tv::TensorView indicePairs,
  5                                           tv::TensorView indiceNum,
  6                                           const tv::SimpleVector3> outSpatialShape,
  7                                           Index spatialVolume) {
  8   auto numActIn = indicesIn.dim(0);  //e.g. torch.Size(N,4) -> N
  9  
 10   Index point[3];
 11   Index index = 0;
 12   Index offset;
 13   constexpr unsigned KV = K0 * K1 * K2;   //e.g. 3x3x3
 14   constexpr unsigned center = KV / 2;
 15   *(indiceNum.data() + center) = numActIn;
 16   for (int ix : tv::KernelLoopX<int>(numActIn)) {
 17     const Index *indice_data = indicesIn.data() + ix * (3 + 1);
 18 #pragma unroll
 19     for (int i = 0; i < K0; ++i) {
 20 #pragma unroll
 21       for (int j = 0; j < K1; ++j) {
 22 #pragma unroll
 23         for (int k = 0; k < K2; ++k) {
 24           offset = i * K1 * K2 + j * K2 + k;
 25           if (offset > center){
 26             continue;
 27           }
 28           if (center == offset){
 29               // center of subm indice pairs dont need atomicadd
 30               indicePairs(1, offset, ix) = ix;
 31               indicePairs(0, offset, ix) = ix;
 32           }else{
 33             point[2] = indice_data[3] - k + K2 / 2; 
 34             point[1] = indice_data[2] - j + K1 / 2; 
 35             point[0] = indice_data[1] - i + K0 / 2;  
 36             if (point[1] >= 0 && point[1] < outSpatialShape[1] && point[2] >= 0 &&
 37                 point[2] < outSpatialShape[2] && point[0] >= 0 &&
 38                 point[0] < outSpatialShape[0]) {
 39               index = tv::ArrayIndexRowMajor<3, 3>::runPtrs(
 40                           point, outSpatialShape.data(), 0) + spatialVolume * indice_data[0];
 41               if (gridsOut[index] != -1) {
 42                 // for subm: indicePairs[0, i] = indicePairs[1, kernelVolume - i - 1]
 43                 Index oldNum = atomicAdd(indiceNum.data() + offset, Index(1));
 44                 atomicAdd(indiceNum.data() + KV - offset - 1, Index(1));
 45                 indicePairs(1, offset, oldNum) = gridsOut[index];
 46                 indicePairs(0, offset, oldNum) = ix;
 47                 indicePairs(1, KV - offset - 1, oldNum) = ix;
 48                 indicePairs(0, KV - offset - 1, oldNum) = gridsOut[index];
 49               }
 50             }
 51           }
 52         }
 53       }
 54     }

第13，14行分别计算出kernel的大小及其中心位置。对于子流行稀疏卷积来说,kernel中心的元素一定会和输入中的每一个有效(active)元素进行一次运算，所以在第15行直接对indiceNum中中心位置的地址赋值为numActIn。

第16行for 循环里面的(int ix : tv::KernelLoopX(numActIn))其实就类似我们函数中常见的循环的写法：


{
    int idx    = threadIdx.x + blockIdx.x * blockDim.x;
    int stride = blockDim.x * gridDim.x;
            
    for(int i = idx; i < num; i += stride) {
        //...运算...
    }
}

只是这里换了一中更加高级的表达形式，在include/tensorview/kernel_utils.h文件中可窥见其原型。NumILP按默认值等于1的话，其stride也是gridDim.x*blockDim.x。


template <typename T, int NumILP = 1>
__forceinline__ __device__ detail::KernelLoop KernelLoopX(T count) {
  return detail::KernelLoop(blockIdx.x * blockDim.x + threadIdx.x,
                               gridDim.x * blockDim.x * NumILP, count);
}

第19,21,23行3层for循环对应卷积核3个维度D,H和W，大小分别为K0,K1和K2。#pragma unroll命令，显示地告诉编译器在进行编译时对循环进行展开。

第24行计算出当前卷积核内的偏移，以3x3x3(K0=3,K1=3,K2=3)3D卷积核为例，offset从0~26，但是代码25行规定当offset > center(13)时continue，所以offset实际只计算到13。

第28~50行代码就是建立rulebook的核心，为什么offset只需要计算到center位置这是由子流行稀疏卷积的一个对称特点决定的，归结起来就是下面4行代码:


 45                 indicePairs(1, offset, oldNum) = gridsOut[index];
 46                 indicePairs(0, offset, oldNum) = ix;
 47                 indicePairs(1, KV - offset - 1, oldNum) = ix;
 48                 indicePairs(0, KV - offset - 1, oldNum) = gridsOut[index];

卷积核offset处的元素和索引为ix的输入元素作用，产生的输出再gridsOut[index]位置。那么相应地(KV-offset-1)处的元素和索引为gridsOut[index]处的输入元素作用，产生的输出就在ix位置。

对于卷积核中心位置(center)的元素，它一定会和每一个输入元素作用，所以在28行做了特殊判断。当offset等于center时，输入索引等于输出索引等于ix。最终，产生的indicePairs存储类似下表的映射规则。

卷积核内的偏移(offset)	各偏移总的运算次数(count)	输入索引(v_in)	输出索引(v_out)
0	1	0	5
1	2	0	4
		1	7
2	2	0	3
		1	6
3	1	0	2
4	2	0	1
		1	5
5	1	1	2
6	1	1	1
...	...	...	...

【参考文献】

稀疏卷积 Sparse Convolution Net - 知乎

3D稀疏卷积粗略理解：Submanifold Sparse Convolution和Spatially Sparse Convolution以及SECOND网络理解 - 知乎

Spconv代码解读 - 知乎

相关阅读:
Windows/Ubuntu安装frida和objection
蛇形填空 I
速览 NFT 期权赛道代表项目与发展前景
 QDockWidget组件的隐藏与显示（按钮控制）
Windows搭建uiautomator2和weditor环境
 云原生|kubernetes |部署k8s图形化管理组件 kuboard v3
项目部署；流程
 使用busybox快速制作initramfs
【计算机网络】网络层（三）—— IPv4知识总结
 Windows Server服务器下的Linux子系统
原文地址：https://blog.csdn.net/ChuiGeDaQiQiu/article/details/127680713