• MMDetection在ScanNet上训练


    (2022.10.24)注意,scannet-frames-25k 这个子集的 instance maps 与原始 ScanNetV2 的 instance maps(即 scene*_*/instance/ 或 scene*_*/instance-filt/ 里面的那些 .png)格式是同的,[5] 有讲这一点。所以如果想用原始 ScanNetV2 数据训练,在用本文改的转换代码转成 COCO object detection 格式 annotations 之前,要用原始 ScanNetV2 中的

    • 原始 instance maps:scene*_*/instance/*.png(或 scene*_*/instance-filt/*.png),和
    • 原始 label maps:scene*_*/label/*.png(或 scene*_*/label-filt/*.png

    合并成 scannet-frames-25k 格式的 instance maps 先。步骤是:

    1. 原始 label maps 中的 class IDs 是 NYU40 的 class ID,要用 scannetv2-labels.combined.tsv 将其转换成 ScanNet 的 class IDs;
    2. 合并 instance map 和 label map:instance_map = 1000 * raw_label_map + raw_instance_map

    这两步转换分别在 [2] 中 ScanNet/BenchmarkScripts/2d_helpers/ 下的 convert_scannet_label_image.pyconvert_scannet_instance_image.py 中有实现。(不过,自己画一下原始 instance map 就会发现,同一 scan 下的 instance ID 是全局的,即同一 scan 下同一 instance 在不同帧里拥有相同 ID,可以跨帧追踪同一个 instance,evaluate 的时候也许会用得上。[2] 提供的 instance map 转换代码好像会破坏这一特性,需要的话可以自己 hack 一份,保留全局 instance ID


    需要用 ScanNet[1,2] 训练一个 object detection 模型,使用 MMDetection[3,4]。步骤:

    1. 下载 ScanNet-frames-25k(ScanNet 的子集);
    2. 划分数据集;
    3. 将 annotations 转化成 COCO object detecion 的格式;
    4. 配 MMDetection 环境(从源码安装);
    5. 改 MMDetection 文件,用来训练。

    ScanNet

    ScanNet 主要用于 3D 领域,但其数据形式是 RGB-D 序列,其中 RGB 的序列可以当视频,有 v1、v2 两个版本,完整的 v2 大约 1.8T。其它信息见 [1-2,5-7],下载脚本 download-scannet.py 见 [8]。

    [5] 有提到 scannet_frames_25k 这个子集,本文主要用它。对照 [8] 的代码可知,它是从完整的 v2 中抽出来的,大概每 100 帧抽一帧。下载的文件是:

    • scannet_frames_25k.zip,~5.6G,1513 份 scans(即 RGB-D 序列,这里简单当成 videos);
    • scannet_frames_test.zip,~610M,100 份 scans,对应的测试集。

    执行:

    python download-scannet.py -o . --preprocessed_frames
    python download-scannet.py -o . --test_frames_2d
    
    • 1
    • 2

    下载(我脚本下不了,把下载链粘到迅雷下的)。解压看文件结构:

    scannet_frames_25k/
    |- scene0000_00/	# 一份 scan
    |  |- color/		# RGB 序列,视频 (jpg)
    |  |- depth/		# depth 序列 (png)
    |  |- instance/		# instance mask (png)
    |  |- label/
    |  |- pose/
    |  |- intrinsics_color.txt
    |  `- intrinsics_depth.txt
    |- scene0000_01/	# 另一份 scan
    ...
    
    scannet_frames_test/
    |- scene0707_00/
    |  |- color/
    |  |- depth/
    |  |- pose/
    |  |- intrinsics_color.txt
    |  `- intrinsics_depth.txt
    |- scene0708_00/
    ...
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21

    可见 test set 的文件少了跟 label 有关的文件。

    [2] 中提供了官方的划分,在 ScanNet/Tasks/Benchmark/ 下,分成 train/val/test 3 部分。对照其中 v2 的几个划分文件(txt)和上述两个 zip 可知:

    • scannet_frames_25k.zip = train + val
    • scannet_frames_test.zip = test

    所以这个子集 scan 应该是跟完整版一样多,只是对每个 scan 下的序列抽样了。

    Splitting

    从 MMDetection 的 configuration 文件看,要将 train/val 分在不同的目录,并生成不同的 json 标注文件。因为 test 缺少 instance/,而后续转成 COCO 格式时又要用到这个目录的东西,所以本文弃用 test set 的数据,用 val。

    • 输出路径:data/scannet-frames/
    # split-scannet.py
    import os
    import os.path as osp
    
    """split ScanNet
    Only `scannet_frames_25k/` is used while `scannet_frames_test/` is ignored
        because the scenes in it have no `**/instance/` sub-folder which
        is needed by `convert2panopic.py`.
    So I simply reuse validation set as the test set as in the COCO
        configuration file in MMDetection.
    These 2 subset are then converted separately to produce separate
        annotation json files as needed in the configuration.
    """
    
    DATA_ROOT = "/data"
    # DATA_P = [osp.join(DATA_ROOT, p) for p in ("scannet_frames_25k", "scannet_frames_test")]
    DATA_P = osp.join(DATA_ROOT, "scannet_frames_25k")
    
    # 检查 scans 数
    dir_list = next(os.walk(DATA_P))[1]
    print("#data:", len(dir_list))  # 1513 -> ALL
    print("conclusion: contains all data, only frames are down-sampled")
    
    SPLIT_P = osp.join(os.environ["HOME"], "codes", "ScanNet", "Tasks", "Benchmark")
    # scannet_frames_25k is only available for ScanNetv2
    VER = "v2"
    DEST = "data/scannet-frames"  # 代码目录中的 data/,不是 /data/
    if not osp.exists(DEST):
        os.makedirs(DEST)
    
    # test set has NO `**/instance/*.png` needed by `convert2panopic.py`
    for subset in ["train", "val"]:
        split_file = osp.join(SPLIT_P, "scannet"+VER+"_"+subset+".txt")
    
        # soft-link all scans of this subset to `sub_dest`
        sub_dest = osp.join(DEST, subset)
        if not osp.exists(sub_dest):
            os.makedirs(sub_dest)
    
        with open(split_file, "r") as f:
            for line in f:
                line = line.strip()
                if "" == line:
                    continue
                os.system("ln -s {} {}".format(
                    osp.join(DATA_P, line),
                    osp.join(sub_dest, line)))
        print(subset, "DONE")
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48

    Convert to COCO Format

    MMDetection 推荐的组织数据方式之一是转化为 COCO 的格式[9-11]。rvc_devkit[12] 提供了一份转化的脚本:rconvert_scannet_coco.sh,其核心用的是 [2] 提供的转化代码:convert2panoptic.py。但这其实是为 panoptic segmentation 任务准备的(见 [9] 第 4 条),而此处于需要 object detection 的 annotation 格式([9] 第 1 条)。

    于是基于 convert2panoptic.py 改一份适用于 object detection 的转化代码(convert-scannet-coco-objdet.py):

    # convert-scannet-coco-objdet.py
    
    #!/usr/bin/python
    #
    # Convert to COCO-style panoptic segmentation format (http://cocodataset.org/#format-data).
    #
    
    """iTom's modified version (2022.9.13)
    This file is inherited from
        rvc_devkit/segmentation/conv_scannet/convert2panoptic.py
    which is the same as
        ScanNet/BenchmarkScripts/convert2panoptic.py
    But I modify it to fit objection detection format and be
        able to dinstiguish different subsets (i.e. train/val/test)
        in terms of the output json annotation files.
    
    There are several modifications:
    
    (a) Addition argment
        - an additional optional arguement of `convert2panoptic`:
            subset_tag, default = None
        - an additional optional command-line argument:
            --subset-tag -> args.subsetTag, default = None
    If this argument is used, the name of output json annotation file
        will be modified accordingly.
    
    (b) Move to COCO object detection annotation format instead of the
        original panoptic format. I borrowed the functions, i.e.
            - binary_mask_to_polygon
            - close_contour
        for polygon format segmentation info calculation. But the results
        are discarded due to
            - their wierdly large volumn
                (~38G for val set & ~114G for the training set !)
            - that they are not used in detection task
        If you want to reenable it, you may need to install
            - scikit-image
        (NOTE: I suspect this is buggy somewhere.)
    
    (c) Change extension to ".jpg" in `images/file_name` field.
    """
    
    # python imports
    from __future__ import print_function, absolute_import, division, unicode_literals
    from itertools import count
    import os
    import glob
    import sys
    import argparse
    import json
    import numpy as np
    
    # iTom: for polygon calculation
    from skimage import measure
    
    # Image processing
    from PIL import Image
    
    EVAL_LABELS = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28, 33, 34, 36, 39]
    EVAL_LABEL_NAMES = ["wall", "floor", "cabinet", "bed", "chair", "sofa", "table", "door", "window", "bookshelf", "picture", "counter", "desk", "curtain", "refrigerator", "shower curtain", "toilet", "sink", "bathtub", "otherfurniture"]
    EVAL_LABEL_CATS = ["indoor", "indoor", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "appliance", "furniture", "furniture", "appliance", "furniture", "furniture"]
    EVAL_LABEL_COLORS = [(174, 199, 232), (152, 223, 138), (31, 119, 180), (255, 187, 120), (188, 189, 34), (140, 86, 75), (255, 152, 150), (214, 39, 40), (197, 176, 213), (148, 103, 189), (196, 156, 148), (23, 190, 207), (247, 182, 210), (219, 219, 141), (255, 127, 14), (158, 218, 229), (44, 160, 44), (112, 128, 144), (227, 119, 194), (82, 84, 163)]
    
    def splitall(path):
        allparts = []
        while 1:
            parts = os.path.split(path)
            if parts[0] == path:  # sentinel for absolute paths
                allparts.insert(0, parts[0])
                break
            elif parts[1] == path: # sentinel for relative paths
                allparts.insert(0, parts[1])
                break
            else:
                path = parts[0]
                allparts.insert(0, parts[1])
        return allparts
    
    
    def close_contour(contour):
        """iTom: helper function for binary mask -> polygon conversion
        from: https://github.com/waspinator/pycococreator/blob/master/pycococreatortools/pycococreatortools.py#L20
        """
        if not np.array_equal(contour[0], contour[-1]):
            contour = np.vstack((contour, contour[0]))
        return contour
    
    
    def binary_mask_to_polygon(binary_mask, tolerance=0):
        """iTom: Converts a binary mask to COCO polygon representation
        Args:
            binary_mask: a 2D binary numpy array where '1's represent the object
            tolerance: Maximum distance from original points of polygon to approximated
                polygonal chain. If tolerance is 0, the original coordinate array is returned.
    
        from: https://github.com/waspinator/pycococreator/blob/master/pycococreatortools/pycococreatortools.py#L35
        ref:
        - https://github.com/cocodataset/cocoapi/issues/131
        - https://stackoverflow.com/questions/68663512/image-segmentation-mask-to-polygon-for-coco-json
        - https://stackoverflow.com/questions/58884265/python-convert-binary-mask-to-polygon
        """
        polygons = []
        # pad mask to close contours of shapes which start and end at an edge
        padded_binary_mask = np.pad(binary_mask, pad_width=1, mode='constant', constant_values=0)
        contours = measure.find_contours(padded_binary_mask, 0.5)
        # contours = np.subtract(contours, 1)  # iTom: original but buggy
        for i in range(len(contours)):  # iTom: change to for-loop subtraction
            contours[i] = np.subtract(contours[0], 1)
        for contour in contours:
            contour = close_contour(contour)
            contour = measure.approximate_polygon(contour, tolerance)
            if len(contour) < 3:
                continue
            contour = np.flip(contour, axis=1)
            segmentation = contour.ravel().tolist()
            # after padding and subtracting 1 we may get -0.5 points in our segmentation
            segmentation = [0 if i < 0 else i for i in segmentation]
            polygons.append(segmentation)
    
        return polygons
    
    
    # The main method
    def convert2panoptic(scannetPath, outputFolder=None, subset_tag=None, beginAnnoId=0, beginImageId=0, thingsOnly=False):
        """iTom's modification
        subset_tag: str, an optional subset distinguishing string for
            train/val seperation. One can simply ignore it to get the
            original output json file name.
        """
    
        if outputFolder is None:
            outputFolder = scannetPath
    
        # find files
        search = os.path.join(scannetPath, "*", "instance", "*.png")
        files = glob.glob(search)
        files.sort()
        # quit if we did not find anything
        if not files:
            print(
                "Did not find any files for using matching pattern {}. Please consult the README.".format(search)
            )
            sys.exit(-1)
        # a bit verbose
        print("Converting {} annotation files.".format(len(files)))
    
        outputBaseFile = "scannet_objdet"
        if subset_tag is not None:
            outputBaseFile = outputBaseFile + "_" + subset_tag
            print("iTom: modifying json annotation file name to:", outputBaseFile)
        outFile = os.path.join(outputFolder, "{}.json".format(outputBaseFile))
        print("Json file with the annotations in panoptic format will be saved in {}".format(outFile))
        # panopticFolder = os.path.join(outputFolder, outputBaseFile)
        # if not os.path.isdir(panopticFolder):
        #     print("Creating folder {} for panoptic segmentation PNGs".format(panopticFolder))
        #     os.mkdir(panopticFolder)
        # print("Corresponding segmentations in .png format will be saved in {}".format(panopticFolder))
    
        categories = []
        cls_is_things = {}  # iTom
        for idx in range(len(EVAL_LABELS)):
            label = EVAL_LABELS[idx]
            name = EVAL_LABEL_NAMES[idx]
            cat = EVAL_LABEL_CATS[idx]
            color = EVAL_LABEL_COLORS[idx]
            isthing = label > 2
            cls_is_things[int(label)] = isthing  # iTom
            if thingsOnly and not isthing:  # iTom
                continue
            categories.append({'id': int(label),
                               'name': name,
                               'color': color,
                               'supercategory': cat,
                               'isthing': isthing})
    
        images = []
        annotations = []
        for progress, f in enumerate(files):
    
            originalFormat = np.array(Image.open(f))
    
            parts = splitall(f)
            fileName = parts[-1]
            sceneName = parts[-3]
            outputFileName = "{}__{}".format(sceneName, fileName)
            inputFileName = os.path.join(sceneName, "color", fileName)
            # imageId = os.path.splitext(outputFileName)[0]
            imageId = beginImageId
            beginImageId += 1
            # image entry, id for image is its filename without extension
            images.append({"id": imageId,
                           "width": int(originalFormat.shape[1]),
                           "height": int(originalFormat.shape[0]),
                           "file_name": inputFileName.replace(".png", ".jpg")})
                           # "file_name": inputFileName})
    
            # pan_format = np.zeros(
            #     (originalFormat.shape[0], originalFormat.shape[1], 3), dtype=np.uint8
            # )
            segmentIds = np.unique(originalFormat)
            segmInfo = []
            for i_seg, segmentId in enumerate(segmentIds):
                isCrowd = 0
                if segmentId < 1000:
                    semanticId = segmentId
                else:
                    semanticId = segmentId // 1000
                if semanticId not in EVAL_LABELS:
                    continue
                if thingsOnly and not cls_is_things[semanticId]:  # iTom
                    continue
    
                mask = originalFormat == segmentId
                color = [segmentId % 256, segmentId // 256, segmentId // 256 // 256]
                # pan_format[mask] = color
    
                area = np.sum(mask) # segment area computation
    
                # bbox computation for a segment
                hor = np.sum(mask, axis=0)
                hor_idx = np.nonzero(hor)[0]
                x = hor_idx[0]
                width = hor_idx[-1] - x + 1
                vert = np.sum(mask, axis=1)
                vert_idx = np.nonzero(vert)[0]
                y = vert_idx[0]
                height = vert_idx[-1] - y + 1
                bbox = [int(x), int(y), int(width), int(height)]
    
                segmInfo.append({"id": int(segmentId),
                                "category_id": int(semanticId),
                                "area": int(area),
                                "bbox": bbox,
                                "iscrowd": isCrowd})
    
                # COCO objectoin detectoin format:
                #   - https://cocodataset.org/#format-data
                # ref:
                #   - https://zhuanlan.zhihu.com/p/29393415
                #   - https://zhuanlan.zhihu.com/p/263454360
    
                # polygon = binary_mask_to_polygon(mask)  # wiredly large, discarded
                polygon = []
    
                # # annoId = imageId + "_" + str(i_seg)  # "scene0046_00__000200_2"
                # spaceId, scanId = sceneName.split("scene")[1].split("_")
                # imgFileNum = os.path.splitext(fileName)[0]
                # annoId = int(spaceId) * 1000000 + int(scanId) * 10000 + int(imgFileNum) + i_seg
                # # print("annoId:", annoId, "<-", spaceId, scanId, imgFileNum, i_seg)
                annoId = beginAnnoId
                beginAnnoId += 1
    
                annotations.append({'id': annoId,
                                    'image_id': imageId,
                                    'category_id': int(semanticId),
                                    "segmentation": polygon,
                                    'area': int(area),
                                    'bbox': bbox,
                                    "iscrowd": isCrowd})
                # break  # debug
    
            ## iTom: original panoptic annotation, removed
            # annotations.append({'image_id': imageId,
            #                     'file_name': outputFileName,
            #                     "segments_info": segmInfo})
    
            # Image.fromarray(pan_format).save(os.path.join(panopticFolder, outputFileName))
    
            print("\rProgress: {:>3.2f} %".format((progress + 1) * 100 / len(files)), end=' ')
            sys.stdout.flush()
            # break  # debug
    
        print("\nSaving the json file {}".format(outFile))
        d = {'images': images,
            'annotations': annotations,
            'categories': categories}
        with open(outFile, 'w') as f:
            # 不要用 `indent` 好像可以让 json 文件变小很多
            json.dump(d, f, sort_keys=True)#, indent=4)
    
        return beginAnnoId, beginImageId
    
    
    def main():
        parser = argparse.ArgumentParser()
        parser.add_argument("--dataset-folder",
                            dest="scannetPath",
                            help="path to the ScanNet data 'scannet_frames_25k' folder",
                            required=True,
                            type=str)
        parser.add_argument("--output-folder",
                            dest="outputFolder",
                            help="path to the output folder.",
                            default=None,
                            type=str)
        # iTom-added, optional
        parser.add_argument("--subset-tag",
                            dest="subsetTag",
                            help="(iTom, optional) distinguishing str for train/val separation",
                            default=None,
                            type=str)
        parser.add_argument("--begin-anno-id",
                            dest="beginAnnoId",
                            help="(iTom) annotation IDs will start from this number." \
                                "When convert for each subset sequentially, " \
                                "use this to ensure that there is no duplicated annotation ID",
                            default=0,
                            type=int)
        parser.add_argument("--begin-image-id",
                            dest="beginImageId",
                            help="(iTom) image IDs will start from this number." \
                                "When convert for each subset sequentially, " \
                                "use this to ensure that there is no duplicated annotation ID",
                            default=0,
                            type=int)
        parser.add_argument('--things-only',
                            dest="thingsOnly",
                            action="store_true",
                            help="keep thing classes & drop stuff classes")
        args = parser.parse_args()
    
        last_unused_anno_id, last_unused_image_id = convert2panoptic(
            args.scannetPath, args.outputFolder, args.subsetTag, args.beginAnnoId, args.beginImageId, args.thingsOnly)
        # record the last unused annotation & image ID to interact with `scripts/split-cvt2coco.sh`
        with open("last-unused-anno-id.txt", "w") as f:
            f.write(str(last_unused_anno_id))
        with open("last-unused-image-id.txt", "w") as f:
            f.write(str(last_unused_image_id))
    
    
    # call the main
    if __name__ == "__main__":
        main()
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145
    • 146
    • 147
    • 148
    • 149
    • 150
    • 151
    • 152
    • 153
    • 154
    • 155
    • 156
    • 157
    • 158
    • 159
    • 160
    • 161
    • 162
    • 163
    • 164
    • 165
    • 166
    • 167
    • 168
    • 169
    • 170
    • 171
    • 172
    • 173
    • 174
    • 175
    • 176
    • 177
    • 178
    • 179
    • 180
    • 181
    • 182
    • 183
    • 184
    • 185
    • 186
    • 187
    • 188
    • 189
    • 190
    • 191
    • 192
    • 193
    • 194
    • 195
    • 196
    • 197
    • 198
    • 199
    • 200
    • 201
    • 202
    • 203
    • 204
    • 205
    • 206
    • 207
    • 208
    • 209
    • 210
    • 211
    • 212
    • 213
    • 214
    • 215
    • 216
    • 217
    • 218
    • 219
    • 220
    • 221
    • 222
    • 223
    • 224
    • 225
    • 226
    • 227
    • 228
    • 229
    • 230
    • 231
    • 232
    • 233
    • 234
    • 235
    • 236
    • 237
    • 238
    • 239
    • 240
    • 241
    • 242
    • 243
    • 244
    • 245
    • 246
    • 247
    • 248
    • 249
    • 250
    • 251
    • 252
    • 253
    • 254
    • 255
    • 256
    • 257
    • 258
    • 259
    • 260
    • 261
    • 262
    • 263
    • 264
    • 265
    • 266
    • 267
    • 268
    • 269
    • 270
    • 271
    • 272
    • 273
    • 274
    • 275
    • 276
    • 277
    • 278
    • 279
    • 280
    • 281
    • 282
    • 283
    • 284
    • 285
    • 286
    • 287
    • 288
    • 289
    • 290
    • 291
    • 292
    • 293
    • 294
    • 295
    • 296
    • 297
    • 298
    • 299
    • 300
    • 301
    • 302
    • 303
    • 304
    • 305
    • 306
    • 307
    • 308
    • 309
    • 310
    • 311
    • 312
    • 313
    • 314
    • 315
    • 316
    • 317
    • 318
    • 319
    • 320
    • 321
    • 322
    • 323
    • 324
    • 325
    • 326
    • 327
    • 328
    • 329
    • 330
    • 331
    • 332
    • 333

    主体还是 convert2panoptic.py 的内容,改变有:

    • 按 [9] 中 COCO object detection 需要的格式改写 annotations
      • 其中 segmentation 域按 [9,10] 介绍,用 polygon 格式(因为原 convert2panoptic.py 写死 isCrowd = 0),而由 binary mask 计算 polygon 的代码抄自 [13],相关可见 [14-16]。
      • 但是我弃用了,因为这样做出来的 json 文件大得出奇(val ~38G、train ~114G;COCO 数据量大过 ScanNet 很多,整个 annotation 的 zip 才 ~241M),我怀疑有 bug,而且 object detection 似乎用不着。
      • (2023.1.14)对于加入 polygon 时 json 文件过大的问题,杨东泽 说空格、换行会占好多空间,他提供的方案是:json.dump(d, f, separators=(',', ':')),不过在 Mask R-CNN 中用仍然有问题,未解决。
      • 由 [10],annotations 的长度,即 annotation 的个数,等于整个数据(子)集中的 bounding box 数量,那 annotations/id 这个域应该给每个 bbox 赋一个不同的整数 ID 就行。注意:一定要是一个数,否则会报错,说不能转化为数字。
      • 我此处用将 space ID、scan ID、scan 内 image 文件名的数字、一幅 image 内的 segmentation 顺序压缩成一个整数的方法保证 annotations/id 的惟一性。而我又针对这个数据集统计过:image 文件名的数字都是 100 整数倍,除以 100 之后不超过 100;每幅 image 内 segmentation 的个数也不超过 100。是以有代码中 annoId 的计算方式。
    • 加了一个 subset tag 参数,使得 train/val 生成不同的 json 文件。
      • 简单忽略这个参数就可以像原 convert2panoptic.py 一样,只生成一个 json。
    • images/file_name 这个域将 .png 后缀改为 .jpg
      • 原本的代码是用 **/instance/*.png 枚举帧的,默认 .png 后缀,然而实际上 color/ 下的 RGB 帧是 .jpg 后缀的。

    调用:

    #!/bin/bash
    
    DEST=data/scannet-frames  # 前面 split 的输出
    
    anno_id=0  # interact with `convert-scannet-coco-objdet.py`
    image_id=0
    for subset in train val; do
        python convert-scannet-coco-objdet.py \
            --dataset-folder $DEST/$subset \
            --output-folder $DEST \
            --things-only \
            --subset-tag $subset \
            --begin-anno-id $anno_id \
            --begin-image-id $image_id
    
        # update beginning (i.e. last unused) annotation & image ID
        anno_id=`cat last-unused-anno-id.txt`
        image_id=`cat last-unused-image-id.txt`
    done
    rm last-unused-anno-id.txt
    rm last-unused-image-id.txt
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21

    Environment

    因为后续需要用到 MMDetection 的文件,所以用源码安装的方式,参考 [4] 的安装教程:get_started.md,安装脚本:

    #!/bin/bash
    # env-mmdetection.sh
    
    echo conda 安装虚拟环境
    CONDA_P=~/miniconda3
    ENV=openmmlab
    if [ ! -d $CONDA_P/envs/$ENV ]; then
        conda create --name $ENV python=3.8 -y
    fi
    CONDA_BIN=$CONDA_P/envs/$ENV/bin
    
    $CONDA_BIN/pip install torch==1.8.2 torchvision==0.9.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
    # used in mmdetection/demo/video_gpuaccel_demo.py
    $CONDA_BIN/pip install ffmpegcv scipy scikit-image
    conda install -n $ENV ffmpeg -y
    
    $CONDA_BIN/pip install -U openmim
    $CONDA_BIN/mim install mmcv-full==1.6.0 mmengine
    # avoid bug: KeyError: 'Cascade Mask R-CNN'
    #   (i.e. open-mmlab/mim issues #125)
    # https://github.com/open-mmlab/mim/issues/125
    $CONDA_BIN/mim install mmdet==2.24.0
    
    if [ ! -d mmdetection ]; then
        echo try to clone from the original github repo
        git clone https://github.com/open-mmlab/mmdetection.git
        # git submodule add https://github.com/open-mmlab/mmdetection.git
        if [ $? -ne 0 ]; then
            echo * FAILED to clone from github
            echo clone from a gitee transit repo instead
            git clone https://gitee.com/xoxleoxox/mmdetection
            # git submodule add https://gitee.com/xoxleoxox/mmdetection
        fi
    fi
    cd mmdetection
    $CONDA_BIN/pip install -v -e .
    
    echo 验证安装
    $CONDA_BIN/mim download mmdet --config yolov3_mobilenetv2_320_300e_coco --dest .
    $CONDA_BIN/python demo/image_demo.py demo/demo.jpg yolov3_mobilenetv2_320_300e_coco.py \
        yolov3_mobilenetv2_320_300e_coco_20210719_215349-d18dff72.pth --device cpu --out-file result.jpg
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41

    注:之前从源码安装,会在验证安装、跑 demo 的时候报过一个错,见 [17],所以里面有句装 mmdet==2.24.0 的;但训练如果用 mmdet 2.24.0 又会报错,还是需要从源码安装新的版本,而这会覆盖掉 2.24.0 的旧版。

    Training

    codes of MMDetection

    要用到 MMDetection 的代码,故用了 git submodule 在工程目录中添加 MMDetection 的仓库,参考 [18-20]。脚本:

    #!/bin/bash
    # add-submodules.sh
    
    # echo rvc_devkit
    # git submodule add https://github.com/ozendelait/rvc_devkit.git
    # if [ $? -ne 0 ]; then
    #     git submodule add https://gitee.com/tyloeng/rvc_devkit.git
    # fi
    
    echo mmdetection
    git submodule add https://github.com/open-mmlab/mmdetection.git
    if [ $? -ne 0 ]; then
        git submodule add https://gitee.com/xoxleoxox/mmdetection
    fi
    
    # echo ScanNet
    # git submodule add https://github.com/ScanNet/ScanNet.git
    # if [ $? -ne 0 ]; then
    #     git submodule add https://gitee.com/gxdcode/ScanNet.git
    # fi
    
    git submodule update --init --recursive
    git submodule update --remote
    
    #CONDA_P=~/miniconda3
    #ENV=openmmlab
    #CONDA_BIN=$CONDA_P/envs/$ENV/bin
    
    #cd rvc_devkit
    #$CONDA_BIN/pip install -r requirements.txt
    #cd objdet
    #$CONDA_BIN/pip install -r requirements.txt
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32

    configuration files

    利用 MMDetection 在新数据集上训练已经写好的模型可以参考 [4] 的示例 2_new_data_model.md1_exist_data_model.md。数据前面准备好了,这里主要是要准备配置文件。我放照 [4] 的结构,在自己工程目录也建了个 configs/ 目录,从 [4] 复制了两份配置文件并改名:

    • mstrain_3x_scannet.py(来自 mmdetection/configs/common/mstrain_3x_coco.py
    • faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py(来自 mmdetection/configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_mstrain_3x_coco.py

    此时工程目录结构形如:

    my-project/
    |- convert-scannet-coco-objdet.py
    |- split-scannet.py
    |- data/
    |  `- scannet-frames/
    |     |- train/						# splitting 产生
    |     |- val/						# splitting 产生
    |     |- scannet_objdet_train/		# anno 格式转化产生
    |     |- scannet_objdet_val/		# anno 格式转化产生
    |     |- scannet_objdet_train.json	# anno 格式转化产生
    |     `- scannet_objdet_val.json	# anno 格式转化产生
    |- mmdetection/						# submodule
    |- configs/							# 仿照 mmdetection/configs/ 结构
    |  |- common/
    |  |  `- mstrain_3x_scannet.py
    |  `- faster_rcnn/
    |     `- faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py
    `- scripts/
       |- add-modules.sh
       |- env-mmdetection.sh
       |- find_gpu.sh
       `- train-faster-rcnn-scannet-frames.sh
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22

    其中两个配置文件为:

    • mstrain_3x_scannet.pydata 部分改成自己的、改 _base_ 引用路径)
    • (2022.9.17)参考 [24-25],将类集改为 ScanNet 的,要改 classesdata/train/dataset/classesdata/val/classesdata/test/classes。(但这个修改我未测试过,不知道有没有其它要相应改的。
    ## iTom Notes
    # Inherited from `mmdetection/configs/common/mstrain_3x_coco.py`,
    # this file is designed for training Faster R-CNN on converted ScanNet-frames-25k.
    import os.path as osp
    
    _base_ = '../../mmdetection/configs/_base_/default_runtime.py'
    # dataset settings
    dataset_type = 'CocoDataset'
    classes = (
        "wall", "floor", "cabinet", "bed", "chair",
        "sofa", "table", "door", "window", "bookshelf",
        "picture", "counter", "desk", "curtain", "refrigerator",
        "shower curtain", "toilet", "sink", "bathtub", "otherfurniture"
    )
    data_root = 'data/scannet-frames/'
    img_norm_cfg = dict(
        mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
    
    # In mstrain 3x config, img_scale=[(1333, 640), (1333, 800)],
    # multiscale_mode='range'
    train_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(type='LoadAnnotations', with_bbox=True),
        dict(
            type='Resize',
            img_scale=[(1333, 640), (1333, 800)],
            multiscale_mode='range',
            keep_ratio=True),
        dict(type='RandomFlip', flip_ratio=0.5),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='Pad', size_divisor=32),
        dict(type='DefaultFormatBundle'),
        dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
    ]
    test_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(
            type='MultiScaleFlipAug',
            img_scale=(1333, 800),
            flip=False,
            transforms=[
                dict(type='Resize', keep_ratio=True),
                dict(type='RandomFlip'),
                dict(type='Normalize', **img_norm_cfg),
                dict(type='Pad', size_divisor=32),
                dict(type='ImageToTensor', keys=['img']),
                dict(type='Collect', keys=['img']),
            ])
    ]
    
    # Use RepeatDataset to speed up training
    data = dict(
        samples_per_gpu=2,
        workers_per_gpu=2,
        train=dict(
            type='RepeatDataset',
            times=3,
            dataset=dict(
                type=dataset_type,
                ann_file=osp.join(data_root, 'scannet_panoptic_train.json'),
                img_prefix=osp.join(data_root, 'train/'),
                pipeline=train_pipeline,
                classes=classes)),
        val=dict(
            type=dataset_type,
            ann_file=osp.join(data_root, 'scannet_panoptic_val.json'),
            img_prefix=osp.join(data_root, 'val/'),
            pipeline=test_pipeline,
            classes=classes),
        test=dict(
            type=dataset_type,
            ann_file=osp.join(data_root, 'scannet_panoptic_val.json'),
            img_prefix=osp.join(data_root, 'val/'),
            pipeline=test_pipeline,
            classes=classes))
    evaluation = dict(interval=1, metric='bbox')
    
    # optimizer
    optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
    optimizer_config = dict(grad_clip=None)
    
    # learning policy
    # Experiments show that using step=[9, 11] has higher performance
    lr_config = dict(
        policy='step',
        warmup='linear',
        warmup_iters=500,
        warmup_ratio=0.001,
        step=[9, 11])
    runner = dict(type='EpochBasedRunner', max_epochs=12)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py_base_ 引用上述自己改的配置文件、改引用路径)
    • (2022.9.17)参考 [24-25],将类集改为 ScanNet 的,要改 model/roi_head/bbox_head/num_classes。(但这个修改我未测试过,不知道有没有其它要相应改的。
    ## iTom Notes
    # Inherited from `mmdetection/configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_mstrain_3x_coco.py`,
    # this file is designed for training Faster R-CNN on converted ScanNet-frames-25k.
    
    _base_ = [
        # '../common/mstrain_3x_coco.py',
        '../common/mstrain_3x_scannet.py',
        # '../_base_/models/faster_rcnn_r50_fpn.py'
        '../../mmdetection/configs/_base_/models/faster_rcnn_r50_fpn.py'
    ]
    model = dict(
        backbone=dict(
            type='ResNeXt',
            depth=101,
            groups=64,
            base_width=4,
            num_stages=4,
            out_indices=(0, 1, 2, 3),
            frozen_stages=1,
            norm_cfg=dict(type='BN', requires_grad=True),
            style='pytorch',
            init_cfg=dict(
                type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d')),
        roi_head=dict(bbox_head=dict(num_classes=20)))
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24

    training

    用 MMDetection 提供的脚本进行分布式训练:

    #!/bin/bash
    # train-faster-rcnn-scannet-frames.sh
    clear
    
    echo run \`conda activate openmmlab\` first
    
    config=configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py
    
    . scripts/find_gpu.sh 4 14845
    
    PATH=/usr/local/cuda/bin:$PATH \
    PYTHONPATH=mmdetection/mmdet:$PYTHONPATH \
    CUDA_VISIBLE_DEVICES=${gpu_id} \
    MMDET_DATASETS=`pwd`/data/scannet-frames/ \
    bash mmdetection/tools/dist_train.sh \
        $config ${n_gpu_found}
    # python mmdetection/tools/train.py \
    #     $config
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18

    其中:

    • find_gpu.sh 见 [21];
    • 将 cuda 的 bin/ 目录放在 $PATH 开头,保证使用 cuda 目录里的 nvcc 而不是 /usr/bin/nvcc,见 [22]。

    执行 bash scripts/train-faster-rcnn-scannet-frames.sh 开始训练。

    References

    1. (CVPR 2017) ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
    2. ScanNet/ScanNet
    3. (arXiv 2019) MMDetection: Open MMLab Detection Toolbox and Benchmark
    4. open-mmlab/mmdetection
    5. ScanNet Benchmark
    6. 关于ScanNet数据集
    7. 深度学习(1)RGB-D数据集:ScanNet
    8. scannet数据集下载文件
    9. COCO | Data format
    10. COCO数据集的标注格式
    11. COCO数据集标注详解
    12. ozendelait/rvc_devkit
    13. waspinator/pycococreator
    14. convert mask binary image to polygon format #131
    15. Image segmentation mask to polygon for coco json
    16. Python - convert binary mask to polygon
    17. KeyError: ‘Cascade Mask R-CNN’ #125
    18. Git Tools - Submodules
    19. Git submodule 子模块的管理和使用
    20. Git Submodule使用完整教程
    21. shell监视gpu使用情况
    22. 装detectron2报错:nvcc fatal : No input files specified; use option --help for more information
    23. facebookresearch/detr/datasets/coco.py/convert_coco_poly_to_mask
    24. AssertionError: The num_classes (3) in Shared2FCBBoxHead of MMDataParallel does not matches the length of CLASSES 80) in CocoDataset #4828
    25. Train with customized datasets | Prepare a config
    26. ScanNet-EfficientPS/tools/scannet_train_val_to_efficientps.py
    27. panopticapi/converters/panoptic2detection_coco_format.py
  • 相关阅读:
    【边缘检测】基于蚁群算法实现图像边缘检测附matlab代码
    Python算法图解——递归(二):打印从10循环到1
    大学毕业五年的经历与思考
    关于Django使用Jquery异步刷新
    Debug和Release的区别
    DDPM的学习
    低压配电箱监测报警系统:智能化监控的重要
    cx3588 Rockchip_基于 DRM 框架的 HDMI 开发指南
    软件设计模式(五):代理模式
    uni-app 开发中,监听 input 键盘事件获取不到按下按键值怎么办?
  • 原文地址:https://blog.csdn.net/HackerTom/article/details/126834449