MMDetection在ScanNet上训练

（2022.10.24）注意，scannet-frames-25k 这个子集的 instance maps 与原始 ScanNetV2 的 instance maps（即 scene*_*/instance/ 或 scene*_*/instance-filt/ 里面的那些 .png）格式是不同的，[5] 有讲这一点。所以如果想用原始 ScanNetV2 数据训练，在用本文改的转换代码转成 COCO object detection 格式 annotations 之前，要用原始 ScanNetV2 中的

原始 instance maps：scene*_*/instance/*.png（或 scene*_*/instance-filt/*.png），和
原始 label maps：scene*_*/label/*.png（或 scene*_*/label-filt/*.png）

合并成 scannet-frames-25k 格式的 instance maps 先。步骤是：

原始 label maps 中的 class IDs 是 NYU40 的 class ID，要用 scannetv2-labels.combined.tsv 将其转换成 ScanNet 的 class IDs；
合并 instance map 和 label map：instance_map = 1000 * raw_label_map + raw_instance_map。

这两步转换分别在 [2] 中 ScanNet/BenchmarkScripts/2d_helpers/ 下的 convert_scannet_label_image.py 和 convert_scannet_instance_image.py 中有实现。（不过，自己画一下原始 instance map 就会发现，同一 scan 下的 instance ID 是全局的，即同一 scan 下同一 instance 在不同帧里拥有相同 ID，可以跨帧追踪同一个 instance，evaluate 的时候也许会用得上。[2] 提供的 instance map 转换代码好像会破坏这一特性，需要的话可以自己 hack 一份，保留全局 instance ID）

需要用 ScanNet^[1,2] 训练一个 object detection 模型，使用 MMDetection^[3,4]。步骤：

下载 ScanNet-frames-25k（ScanNet 的子集）；
划分数据集；
将 annotations 转化成 COCO object detecion 的格式；
配 MMDetection 环境（从源码安装）；
改 MMDetection 文件，用来训练。

ScanNet

ScanNet 主要用于 3D 领域，但其数据形式是 RGB-D 序列，其中 RGB 的序列可以当视频，有 v1、v2 两个版本，完整的 v2 大约 1.8T。其它信息见 [1-2,5-7]，下载脚本 download-scannet.py 见 [8]。

[5] 有提到 scannet_frames_25k 这个子集，本文主要用它。对照 [8] 的代码可知，它是从完整的 v2 中抽出来的，大概每 100 帧抽一帧。下载的文件是：

scannet_frames_25k.zip，~5.6G，1513 份 scans（即 RGB-D 序列，这里简单当成 videos）；
scannet_frames_test.zip，~610M，100 份 scans，对应的测试集。

执行：

python download-scannet.py -o . --preprocessed_frames
python download-scannet.py -o . --test_frames_2d
1
2

下载（我脚本下不了，把下载链粘到迅雷下的）。解压看文件结构：

scannet_frames_25k/
|- scene0000_00/	# 一份 scan
|  |- color/		# RGB 序列，视频 (jpg)
|  |- depth/		# depth 序列 (png)
|  |- instance/		# instance mask (png)
|  |- label/
|  |- pose/
|  |- intrinsics_color.txt
|  `- intrinsics_depth.txt
|- scene0000_01/	#　另一份 scan
...

scannet_frames_test/
|- scene0707_00/
|  |- color/
|  |- depth/
|  |- pose/
|  |- intrinsics_color.txt
|  `- intrinsics_depth.txt
|- scene0708_00/
...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

可见 test set 的文件少了跟 label 有关的文件。

[2] 中提供了官方的划分，在 ScanNet/Tasks/Benchmark/ 下，分成 train/val/test 3 部分。对照其中 v2 的几个划分文件（txt）和上述两个 zip 可知：

scannet_frames_25k.zip = train + val
scannet_frames_test.zip = test

所以这个子集 scan 应该是跟完整版一样多，只是对每个 scan 下的序列抽样了。

Splitting

从 MMDetection 的 configuration 文件看，要将 train/val 分在不同的目录，并生成不同的 json 标注文件。因为 test 缺少 instance/，而后续转成 COCO 格式时又要用到这个目录的东西，所以本文弃用 test set 的数据，用 val。

输出路径：data/scannet-frames/

# split-scannet.py
import os
import os.path as osp

"""split ScanNet
Only `scannet_frames_25k/` is used while `scannet_frames_test/` is ignored
    because the scenes in it have no `**/instance/` sub-folder which
    is needed by `convert2panopic.py`.
So I simply reuse validation set as the test set as in the COCO
    configuration file in MMDetection.
These 2 subset are then converted separately to produce separate
    annotation json files as needed in the configuration.
"""

DATA_ROOT = "/data"
# DATA_P = [osp.join(DATA_ROOT, p) for p in ("scannet_frames_25k", "scannet_frames_test")]
DATA_P = osp.join(DATA_ROOT, "scannet_frames_25k")

# 检查 scans 数
dir_list = next(os.walk(DATA_P))[1]
print("#data:", len(dir_list))  # 1513 -> ALL
print("conclusion: contains all data, only frames are down-sampled")

SPLIT_P = osp.join(os.environ["HOME"], "codes", "ScanNet", "Tasks", "Benchmark")
# scannet_frames_25k is only available for ScanNetv2
VER = "v2"
DEST = "data/scannet-frames"  # 代码目录中的 data/，不是 /data/
if not osp.exists(DEST):
    os.makedirs(DEST)

# test set has NO `**/instance/*.png` needed by `convert2panopic.py`
for subset in ["train", "val"]:
    split_file = osp.join(SPLIT_P, "scannet"+VER+"_"+subset+".txt")

    # soft-link all scans of this subset to `sub_dest`
    sub_dest = osp.join(DEST, subset)
    if not osp.exists(sub_dest):
        os.makedirs(sub_dest)

    with open(split_file, "r") as f:
        for line in f:
            line = line.strip()
            if "" == line:
                continue
            os.system("ln -s {} {}".format(
                osp.join(DATA_P, line),
                osp.join(sub_dest, line)))
    print(subset, "DONE")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

Convert to COCO Format

MMDetection 推荐的组织数据方式之一是转化为 COCO 的格式^[9-11]。rvc_devkit^[12] 提供了一份转化的脚本：rconvert_scannet_coco.sh，其核心用的是 [2] 提供的转化代码：convert2panoptic.py。但这其实是为 panoptic segmentation 任务准备的（见 [9] 第 4 条），而此处于需要 object detection 的 annotation 格式（[9] 第 1 条）。

(2022.10.8) 另一份转化代码^[26]：scannet_train_val_to_efficientps.py，核心是 panoptic2detection_coco_format.py ^[27]。

于是基于 convert2panoptic.py 改一份适用于 object detection 的转化代码（convert-scannet-coco-objdet.py）：

# convert-scannet-coco-objdet.py

#!/usr/bin/python
#
# Convert to COCO-style panoptic segmentation format (http://cocodataset.org/#format-data).
#

"""iTom's modified version (2022.9.13)
This file is inherited from
    rvc_devkit/segmentation/conv_scannet/convert2panoptic.py
which is the same as
    ScanNet/BenchmarkScripts/convert2panoptic.py
But I modify it to fit objection detection format and be
    able to dinstiguish different subsets (i.e. train/val/test)
    in terms of the output json annotation files.

There are several modifications:

(a) Addition argment
    - an additional optional arguement of `convert2panoptic`:
        subset_tag, default = None
    - an additional optional command-line argument:
        --subset-tag -> args.subsetTag, default = None
If this argument is used, the name of output json annotation file
    will be modified accordingly.

(b) Move to COCO object detection annotation format instead of the
    original panoptic format. I borrowed the functions, i.e.
        - binary_mask_to_polygon
        - close_contour
    for polygon format segmentation info calculation. But the results
    are discarded due to
        - their wierdly large volumn
            (~38G for val set & ~114G for the training set !)
        - that they are not used in detection task
    If you want to reenable it, you may need to install
        - scikit-image
    (NOTE: I suspect this is buggy somewhere.)

(c) Change extension to ".jpg" in `images/file_name` field.
"""

# python imports
from __future__ import print_function, absolute_import, division, unicode_literals
from itertools import count
import os
import glob
import sys
import argparse
import json
import numpy as np

# iTom: for polygon calculation
from skimage import measure

# Image processing
from PIL import Image

EVAL_LABELS = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28, 33, 34, 36, 39]
EVAL_LABEL_NAMES = ["wall", "floor", "cabinet", "bed", "chair", "sofa", "table", "door", "window", "bookshelf", "picture", "counter", "desk", "curtain", "refrigerator", "shower curtain", "toilet", "sink", "bathtub", "otherfurniture"]
EVAL_LABEL_CATS = ["indoor", "indoor", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "appliance", "furniture", "furniture", "appliance", "furniture", "furniture"]
EVAL_LABEL_COLORS = [(174, 199, 232), (152, 223, 138), (31, 119, 180), (255, 187, 120), (188, 189, 34), (140, 86, 75), (255, 152, 150), (214, 39, 40), (197, 176, 213), (148, 103, 189), (196, 156, 148), (23, 190, 207), (247, 182, 210), (219, 219, 141), (255, 127, 14), (158, 218, 229), (44, 160, 44), (112, 128, 144), (227, 119, 194), (82, 84, 163)]

def splitall(path):
    allparts = []
    while 1:
        parts = os.path.split(path)
        if parts[0] == path:  # sentinel for absolute paths
            allparts.insert(0, parts[0])
            break
        elif parts[1] == path: # sentinel for relative paths
            allparts.insert(0, parts[1])
            break
        else:
            path = parts[0]
            allparts.insert(0, parts[1])
    return allparts


def close_contour(contour):
    """iTom: helper function for binary mask -> polygon conversion
    from: https://github.com/waspinator/pycococreator/blob/master/pycococreatortools/pycococreatortools.py#L20
    """
    if not np.array_equal(contour[0], contour[-1]):
        contour = np.vstack((contour, contour[0]))
    return contour


def binary_mask_to_polygon(binary_mask, tolerance=0):
    """iTom: Converts a binary mask to COCO polygon representation
    Args:
        binary_mask: a 2D binary numpy array where '1's represent the object
        tolerance: Maximum distance from original points of polygon to approximated
            polygonal chain. If tolerance is 0, the original coordinate array is returned.

    from: https://github.com/waspinator/pycococreator/blob/master/pycococreatortools/pycococreatortools.py#L35
    ref:
    - https://github.com/cocodataset/cocoapi/issues/131
    - https://stackoverflow.com/questions/68663512/image-segmentation-mask-to-polygon-for-coco-json
    - https://stackoverflow.com/questions/58884265/python-convert-binary-mask-to-polygon
    """
    polygons = []
    # pad mask to close contours of shapes which start and end at an edge
    padded_binary_mask = np.pad(binary_mask, pad_width=1, mode='constant', constant_values=0)
    contours = measure.find_contours(padded_binary_mask, 0.5)
    # contours = np.subtract(contours, 1)  # iTom: original but buggy
    for i in range(len(contours)):  # iTom: change to for-loop subtraction
        contours[i] = np.subtract(contours[0], 1)
    for contour in contours:
        contour = close_contour(contour)
        contour = measure.approximate_polygon(contour, tolerance)
        if len(contour) < 3:
            continue
        contour = np.flip(contour, axis=1)
        segmentation = contour.ravel().tolist()
        # after padding and subtracting 1 we may get -0.5 points in our segmentation
        segmentation = [0 if i < 0 else i for i in segmentation]
        polygons.append(segmentation)

    return polygons


# The main method
def convert2panoptic(scannetPath, outputFolder=None, subset_tag=None, beginAnnoId=0, beginImageId=0, thingsOnly=False):
    """iTom's modification
    subset_tag: str, an optional subset distinguishing string for
        train/val seperation. One can simply ignore it to get the
        original output json file name.
    """

    if outputFolder is None:
        outputFolder = scannetPath

    # find files
    search = os.path.join(scannetPath, "*", "instance", "*.png")
    files = glob.glob(search)
    files.sort()
    # quit if we did not find anything
    if not files:
        print(
            "Did not find any files for using matching pattern {}. Please consult the README.".format(search)
        )
        sys.exit(-1)
    # a bit verbose
    print("Converting {} annotation files.".format(len(files)))

    outputBaseFile = "scannet_objdet"
    if subset_tag is not None:
        outputBaseFile = outputBaseFile + "_" + subset_tag
        print("iTom: modifying json annotation file name to:", outputBaseFile)
    outFile = os.path.join(outputFolder, "{}.json".format(outputBaseFile))
    print("Json file with the annotations in panoptic format will be saved in {}".format(outFile))
    # panopticFolder = os.path.join(outputFolder, outputBaseFile)
    # if not os.path.isdir(panopticFolder):
    #     print("Creating folder {} for panoptic segmentation PNGs".format(panopticFolder))
    #     os.mkdir(panopticFolder)
    # print("Corresponding segmentations in .png format will be saved in {}".format(panopticFolder))

    categories = []
    cls_is_things = {}  # iTom
    for idx in range(len(EVAL_LABELS)):
        label = EVAL_LABELS[idx]
        name = EVAL_LABEL_NAMES[idx]
        cat = EVAL_LABEL_CATS[idx]
        color = EVAL_LABEL_COLORS[idx]
        isthing = label > 2
        cls_is_things[int(label)] = isthing  # iTom
        if thingsOnly and not isthing:  # iTom
            continue
        categories.append({'id': int(label),
                           'name': name,
                           'color': color,
                           'supercategory': cat,
                           'isthing': isthing})

    images = []
    annotations = []
    for progress, f in enumerate(files):

        originalFormat = np.array(Image.open(f))

        parts = splitall(f)
        fileName = parts[-1]
        sceneName = parts[-3]
        outputFileName = "{}__{}".format(sceneName, fileName)
        inputFileName = os.path.join(sceneName, "color", fileName)
        # imageId = os.path.splitext(outputFileName)[0]
        imageId = beginImageId
        beginImageId += 1
        # image entry, id for image is its filename without extension
        images.append({"id": imageId,
                       "width": int(originalFormat.shape[1]),
                       "height": int(originalFormat.shape[0]),
                       "file_name": inputFileName.replace(".png", ".jpg")})
                       # "file_name": inputFileName})

        # pan_format = np.zeros(
        #     (originalFormat.shape[0], originalFormat.shape[1], 3), dtype=np.uint8
        # )
        segmentIds = np.unique(originalFormat)
        segmInfo = []
        for i_seg, segmentId in enumerate(segmentIds):
            isCrowd = 0
            if segmentId < 1000:
                semanticId = segmentId
            else:
                semanticId = segmentId // 1000
            if semanticId not in EVAL_LABELS:
                continue
            if thingsOnly and not cls_is_things[semanticId]:  # iTom
                continue

            mask = originalFormat == segmentId
            color = [segmentId % 256, segmentId // 256, segmentId // 256 // 256]
            # pan_format[mask] = color

            area = np.sum(mask) # segment area computation

            # bbox computation for a segment
            hor = np.sum(mask, axis=0)
            hor_idx = np.nonzero(hor)[0]
            x = hor_idx[0]
            width = hor_idx[-1] - x + 1
            vert = np.sum(mask, axis=1)
            vert_idx = np.nonzero(vert)[0]
            y = vert_idx[0]
            height = vert_idx[-1] - y + 1
            bbox = [int(x), int(y), int(width), int(height)]

            segmInfo.append({"id": int(segmentId),
                            "category_id": int(semanticId),
                            "area": int(area),
                            "bbox": bbox,
                            "iscrowd": isCrowd})

            # COCO objectoin detectoin format:
            #   - https://cocodataset.org/#format-data
            # ref:
            #   - https://zhuanlan.zhihu.com/p/29393415
            #   - https://zhuanlan.zhihu.com/p/263454360

            # polygon = binary_mask_to_polygon(mask)  # wiredly large, discarded
            polygon = []

            # # annoId = imageId + "_" + str(i_seg)  # "scene0046_00__000200_2"
            # spaceId, scanId = sceneName.split("scene")[1].split("_")
            # imgFileNum = os.path.splitext(fileName)[0]
            # annoId = int(spaceId) * 1000000 + int(scanId) * 10000 + int(imgFileNum) + i_seg
            # # print("annoId:", annoId, "<-", spaceId, scanId, imgFileNum, i_seg)
            annoId = beginAnnoId
            beginAnnoId += 1

            annotations.append({'id': annoId,
                                'image_id': imageId,
                                'category_id': int(semanticId),
                                "segmentation": polygon,
                                'area': int(area),
                                'bbox': bbox,
                                "iscrowd": isCrowd})
            # break  # debug

        ## iTom: original panoptic annotation, removed
        # annotations.append({'image_id': imageId,
        #                     'file_name': outputFileName,
        #                     "segments_info": segmInfo})

        # Image.fromarray(pan_format).save(os.path.join(panopticFolder, outputFileName))

        print("\rProgress: {:>3.2f} %".format((progress + 1) * 100 / len(files)), end=' ')
        sys.stdout.flush()
        # break  # debug

    print("\nSaving the json file {}".format(outFile))
    d = {'images': images,
        'annotations': annotations,
        'categories': categories}
    with open(outFile, 'w') as f:
        # 不要用 `indent` 好像可以让 json 文件变小很多
        json.dump(d, f, sort_keys=True)#, indent=4)

    return beginAnnoId, beginImageId


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--dataset-folder",
                        dest="scannetPath",
                        help="path to the ScanNet data 'scannet_frames_25k' folder",
                        required=True,
                        type=str)
    parser.add_argument("--output-folder",
                        dest="outputFolder",
                        help="path to the output folder.",
                        default=None,
                        type=str)
    # iTom-added, optional
    parser.add_argument("--subset-tag",
                        dest="subsetTag",
                        help="(iTom, optional) distinguishing str for train/val separation",
                        default=None,
                        type=str)
    parser.add_argument("--begin-anno-id",
                        dest="beginAnnoId",
                        help="(iTom) annotation IDs will start from this number." \
                            "When convert for each subset sequentially, " \
                            "use this to ensure that there is no duplicated annotation ID",
                        default=0,
                        type=int)
    parser.add_argument("--begin-image-id",
                        dest="beginImageId",
                        help="(iTom) image IDs will start from this number." \
                            "When convert for each subset sequentially, " \
                            "use this to ensure that there is no duplicated annotation ID",
                        default=0,
                        type=int)
    parser.add_argument('--things-only',
                        dest="thingsOnly",
                        action="store_true",
                        help="keep thing classes & drop stuff classes")
    args = parser.parse_args()

    last_unused_anno_id, last_unused_image_id = convert2panoptic(
        args.scannetPath, args.outputFolder, args.subsetTag, args.beginAnnoId, args.beginImageId, args.thingsOnly)
    # record the last unused annotation & image ID to interact with `scripts/split-cvt2coco.sh`
    with open("last-unused-anno-id.txt", "w") as f:
        f.write(str(last_unused_anno_id))
    with open("last-unused-image-id.txt", "w") as f:
        f.write(str(last_unused_image_id))


# call the main
if __name__ == "__main__":
    main()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333

主体还是 convert2panoptic.py 的内容，改变有：

按 [9] 中 COCO object detection 需要的格式改写 annotations；
- 其中 segmentation 域按 [9,10] 介绍，用 polygon 格式（因为原 convert2panoptic.py 写死 isCrowd = 0），而由 binary mask 计算 polygon 的代码抄自 [13]，相关可见 [14-16]。
- 但是我弃用了，因为这样做出来的 json 文件大得出奇（val ~38G、train ~114G；COCO 数据量大过 ScanNet 很多，整个 annotation 的 zip 才 ~241M），我怀疑有 bug，而且 object detection 似乎用不着。
- （2023.1.14）对于加入 polygon 时 json 文件过大的问题，杨东泽说空格、换行会占好多空间，他提供的方案是：json.dump(d, f, separators=(',', ':'))，不过在 Mask R-CNN 中用仍然有问题，未解决。
- 由 [10]，annotations 的长度，即 annotation 的个数，等于整个数据（子）集中的 bounding box 数量，那 annotations/id 这个域应该给每个 bbox 赋一个不同的整数 ID 就行。注意：一定要是一个数，否则会报错，说不能转化为数字。
- 我此处用将 space ID、scan ID、scan 内 image 文件名的数字、一幅 image 内的 segmentation 顺序压缩成一个整数的方法保证 annotations/id 的惟一性。而我又针对这个数据集统计过：image 文件名的数字都是 100 整数倍，除以 100 之后不超过 100；每幅 image 内 segmentation 的个数也不超过 100。是以有代码中 annoId 的计算方式。
加了一个 subset tag 参数，使得 train/val 生成不同的 json 文件。
- 简单忽略这个参数就可以像原 convert2panoptic.py 一样，只生成一个 json。
images/file_name 这个域将 .png 后缀改为 .jpg。
- 原本的代码是用 **/instance/*.png 枚举帧的，默认 .png 后缀，然而实际上 color/ 下的 RGB 帧是 .jpg 后缀的。

调用：

#!/bin/bash

DEST=data/scannet-frames  # 前面 split 的输出

anno_id=0  # interact with `convert-scannet-coco-objdet.py`
image_id=0
for subset in train val; do
    python convert-scannet-coco-objdet.py \
        --dataset-folder $DEST/$subset \
        --output-folder $DEST \
        --things-only \
        --subset-tag $subset \
        --begin-anno-id $anno_id \
        --begin-image-id $image_id

    # update beginning (i.e. last unused) annotation & image ID
    anno_id=`cat last-unused-anno-id.txt`
    image_id=`cat last-unused-image-id.txt`
done
rm last-unused-anno-id.txt
rm last-unused-image-id.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Environment

因为后续需要用到 MMDetection 的文件，所以用源码安装的方式，参考 [4] 的安装教程：get_started.md，安装脚本：

#!/bin/bash
# env-mmdetection.sh

echo conda 安装虚拟环境
CONDA_P=~/miniconda3
ENV=openmmlab
if [ ! -d $CONDA_P/envs/$ENV ]; then
    conda create --name $ENV python=3.8 -y
fi
CONDA_BIN=$CONDA_P/envs/$ENV/bin

$CONDA_BIN/pip install torch==1.8.2 torchvision==0.9.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
# used in mmdetection/demo/video_gpuaccel_demo.py
$CONDA_BIN/pip install ffmpegcv scipy scikit-image
conda install -n $ENV ffmpeg -y

$CONDA_BIN/pip install -U openmim
$CONDA_BIN/mim install mmcv-full==1.6.0 mmengine
# avoid bug: KeyError: 'Cascade Mask R-CNN'
#   (i.e. open-mmlab/mim issues #125)
# https://github.com/open-mmlab/mim/issues/125
$CONDA_BIN/mim install mmdet==2.24.0

if [ ! -d mmdetection ]; then
    echo try to clone from the original github repo
    git clone https://github.com/open-mmlab/mmdetection.git
    # git submodule add https://github.com/open-mmlab/mmdetection.git
    if [ $? -ne 0 ]; then
        echo * FAILED to clone from github
        echo clone from a gitee transit repo instead
        git clone https://gitee.com/xoxleoxox/mmdetection
        # git submodule add https://gitee.com/xoxleoxox/mmdetection
    fi
fi
cd mmdetection
$CONDA_BIN/pip install -v -e .

echo 验证安装
$CONDA_BIN/mim download mmdet --config yolov3_mobilenetv2_320_300e_coco --dest .
$CONDA_BIN/python demo/image_demo.py demo/demo.jpg yolov3_mobilenetv2_320_300e_coco.py \
    yolov3_mobilenetv2_320_300e_coco_20210719_215349-d18dff72.pth --device cpu --out-file result.jpg
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

注：之前从源码安装，会在验证安装、跑 demo 的时候报过一个错，见 [17]，所以里面有句装 mmdet==2.24.0 的；但训练如果用 mmdet 2.24.0 又会报错，还是需要从源码安装新的版本，而这会覆盖掉 2.24.0 的旧版。

Training

codes of MMDetection

要用到 MMDetection 的代码，故用了 git submodule 在工程目录中添加 MMDetection 的仓库，参考 [18-20]。脚本：

#!/bin/bash
# add-submodules.sh

# echo rvc_devkit
# git submodule add https://github.com/ozendelait/rvc_devkit.git
# if [ $? -ne 0 ]; then
#     git submodule add https://gitee.com/tyloeng/rvc_devkit.git
# fi

echo mmdetection
git submodule add https://github.com/open-mmlab/mmdetection.git
if [ $? -ne 0 ]; then
    git submodule add https://gitee.com/xoxleoxox/mmdetection
fi

# echo ScanNet
# git submodule add https://github.com/ScanNet/ScanNet.git
# if [ $? -ne 0 ]; then
#     git submodule add https://gitee.com/gxdcode/ScanNet.git
# fi

git submodule update --init --recursive
git submodule update --remote

#CONDA_P=~/miniconda3
#ENV=openmmlab
#CONDA_BIN=$CONDA_P/envs/$ENV/bin

#cd rvc_devkit
#$CONDA_BIN/pip install -r requirements.txt
#cd objdet
#$CONDA_BIN/pip install -r requirements.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

configuration files

利用 MMDetection 在新数据集上训练已经写好的模型可以参考 [4] 的示例 2_new_data_model.md、1_exist_data_model.md。数据前面准备好了，这里主要是要准备配置文件。我放照 [4] 的结构，在自己工程目录也建了个 configs/ 目录，从 [4] 复制了两份配置文件并改名：

mstrain_3x_scannet.py（来自 mmdetection/configs/common/mstrain_3x_coco.py）
faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py（来自 mmdetection/configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_mstrain_3x_coco.py）

此时工程目录结构形如：

my-project/
|- convert-scannet-coco-objdet.py
|- split-scannet.py
|- data/
|  `- scannet-frames/
|     |- train/						# splitting 产生
|     |- val/						# splitting 产生
|     |- scannet_objdet_train/		# anno 格式转化产生
|     |- scannet_objdet_val/		# anno 格式转化产生
|     |- scannet_objdet_train.json	# anno 格式转化产生
|     `- scannet_objdet_val.json	# anno 格式转化产生
|- mmdetection/						# submodule
|- configs/							# 仿照 mmdetection/configs/ 结构
|  |- common/
|  |  `- mstrain_3x_scannet.py
|  `- faster_rcnn/
|     `- faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py
`- scripts/
   |- add-modules.sh
   |- env-mmdetection.sh
   |- find_gpu.sh
   `- train-faster-rcnn-scannet-frames.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

其中两个配置文件为：

mstrain_3x_scannet.py（data 部分改成自己的、改 _base_ 引用路径）
（2022.9.17）参考 [24-25]，将类集改为 ScanNet 的，要改 classes、data/train/dataset/classes、data/val/classes、data/test/classes。（但这个修改我未测试过，不知道有没有其它要相应改的。）

## iTom Notes
# Inherited from `mmdetection/configs/common/mstrain_3x_coco.py`,
# this file is designed for training Faster R-CNN on converted ScanNet-frames-25k.
import os.path as osp

_base_ = '../../mmdetection/configs/_base_/default_runtime.py'
# dataset settings
dataset_type = 'CocoDataset'
classes = (
    "wall", "floor", "cabinet", "bed", "chair",
    "sofa", "table", "door", "window", "bookshelf",
    "picture", "counter", "desk", "curtain", "refrigerator",
    "shower curtain", "toilet", "sink", "bathtub", "otherfurniture"
)
data_root = 'data/scannet-frames/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

# In mstrain 3x config, img_scale=[(1333, 640), (1333, 800)],
# multiscale_mode='range'
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='Resize',
        img_scale=[(1333, 640), (1333, 800)],
        multiscale_mode='range',
        keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

# Use RepeatDataset to speed up training
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='RepeatDataset',
        times=3,
        dataset=dict(
            type=dataset_type,
            ann_file=osp.join(data_root, 'scannet_panoptic_train.json'),
            img_prefix=osp.join(data_root, 'train/'),
            pipeline=train_pipeline,
            classes=classes)),
    val=dict(
        type=dataset_type,
        ann_file=osp.join(data_root, 'scannet_panoptic_val.json'),
        img_prefix=osp.join(data_root, 'val/'),
        pipeline=test_pipeline,
        classes=classes),
    test=dict(
        type=dataset_type,
        ann_file=osp.join(data_root, 'scannet_panoptic_val.json'),
        img_prefix=osp.join(data_root, 'val/'),
        pipeline=test_pipeline,
        classes=classes))
evaluation = dict(interval=1, metric='bbox')

# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)

# learning policy
# Experiments show that using step=[9, 11] has higher performance
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[9, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90

faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py（_base_ 引用上述自己改的配置文件、改引用路径）
（2022.9.17）参考 [24-25]，将类集改为 ScanNet 的，要改 model/roi_head/bbox_head/num_classes。（但这个修改我未测试过，不知道有没有其它要相应改的。）

## iTom Notes
# Inherited from `mmdetection/configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_mstrain_3x_coco.py`,
# this file is designed for training Faster R-CNN on converted ScanNet-frames-25k.

_base_ = [
    # '../common/mstrain_3x_coco.py',
    '../common/mstrain_3x_scannet.py',
    # '../_base_/models/faster_rcnn_r50_fpn.py'
    '../../mmdetection/configs/_base_/models/faster_rcnn_r50_fpn.py'
]
model = dict(
    backbone=dict(
        type='ResNeXt',
        depth=101,
        groups=64,
        base_width=4,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        style='pytorch',
        init_cfg=dict(
            type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d')),
    roi_head=dict(bbox_head=dict(num_classes=20)))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

training

用 MMDetection 提供的脚本进行分布式训练：

#!/bin/bash
# train-faster-rcnn-scannet-frames.sh
clear

echo run \`conda activate openmmlab\` first

config=configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py

. scripts/find_gpu.sh 4 14845

PATH=/usr/local/cuda/bin:$PATH \
PYTHONPATH=mmdetection/mmdet:$PYTHONPATH \
CUDA_VISIBLE_DEVICES=${gpu_id} \
MMDET_DATASETS=`pwd`/data/scannet-frames/ \
bash mmdetection/tools/dist_train.sh \
    $config ${n_gpu_found}
# python mmdetection/tools/train.py \
#     $config
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

其中：

find_gpu.sh 见 [21]；
将 cuda 的 bin/ 目录放在 $PATH 开头，保证使用 cuda 目录里的 nvcc 而不是 /usr/bin/nvcc，见 [22]。

执行 bash scripts/train-faster-rcnn-scannet-frames.sh 开始训练。

References

相关阅读:
C# wpf使用ffmpeg命令行实现录屏
 Java ==和equals的区别
 VirtualBox网络连接方式学习笔记
 unity中UI、shader显示在3D物体前
 SpringBoot很熟？那手撕一下自定义启动器吧
 一周吃透Java面试八股文（2023最新整理
 ES6新特性之Symbol的数据类型
 关于Go中两个模块互相调用的场景解决方案
 springboot第55集：思维导图Sharding-JDBC，事务，微服务分布式架构周刊
 【青书学堂】 2023年第二学期 JavaScript 基础编程(高起专) 作业
原文地址：https://blog.csdn.net/HackerTom/article/details/126834449