从零开始的图像语义分割：FCN快速复现教程（Pytorch+CityScapes数据集）

从零开始的图像语义分割：FCN复现教程（Pytorch+CityScapes数据集）

前言
一、图像分割开山之作FCN
二、代码及数据集获取
- 1.源项目代码
- 2.CityScapes数据集
三、代码复现
总结
参考网站

前言

摆了两周，突然觉得不能一直再颓废下去了，应该利用好时间，并且上个月就读了一些经典的图像分割论文比如FCN、UNet和Mask R-CNN，但仅仅只是读了论文并且大概了解了图像分割是在做什么任务的，于是今天就拉动手复现一下，因为只有代码运行起来了，才能进行接下来的代码阅读以及其他改进迁移等后续工作。
本文着重在于代码的复现，其他相关知识会涉及得较少，需要读者自行了解。
看完这篇文章，您将收获一个完整的图像分割项目（一个通用的图像分割数据集及一份可正常执行的代码）。

一、图像分割开山之作FCN

图来自FCN,Jonathan Long,Evan Shelhamer,Trevor Darrell CVPR2015

图像分割可以大致为实例分割、语义分割，其中语义分割(Semantic Segmentation)是对图像中每一个像素点进行分类，确定每个点的类别（如属于背景、人或车等），从而进行区域划分。目前，语义分割已经被广泛应用于自动驾驶、无人机落点判定等场景中。
FCN全程Fully Convolutional Networks，最早发表于CVPR2015，原论文链接如下：
FCN论文链接：https://arxiv.org/abs/1411.4038
正如其名称全卷积网络，实则是将早年的网络比如VGG的全连接层代替为卷积层，这样做的目的是让模型可以输入不同尺寸的图像，因为全连接层一旦被创建输入输出维度都是固定的，追根溯源就是输入图片的尺寸固定，并且语义分割是像素级别操作，替换为卷积层也更加合理（卷积操作就是像素级别，这些都是后话了）。
更具体的学习视频可以跳转到b站FCN网络结构详解(语义分割)

二、代码及数据集获取

1.源项目代码

在这里插入图片描述
进入FCN论文链接，点击Code&Data再进入Community Code跳转到paperwithcode网站。

很神奇地是会发现有两个FCN的检索链接，本文所需要的pytorch项目代码在红框这个链接中

Star最高的就是本文所需项目，这个大佬还有自己的个人网页，而且号称是FCN最简单的实现，我可以作证此言不虚，的确是众多代码中最简洁明朗的。

2.CityScapes数据集

CityScapes数据集官方下载链接：CityScapes Download
然而下载这个数据集需要注册账号，而且需要的是教育邮箱，可能是按照是否带edu.cn域名判断的吧，本人使用学校邮箱成功注册下载了数据集。读者若有不便可以上网其他途径获取或淘宝买个账号。
在这里插入图片描述
只需下载前3个数据集即可，gtFine_trainvaltest是精确标注（最主要最关键部分），gtCoarse是粗略标注，leftimg8bit_trainvaltest是原图。虽然模型训练的时候只需要用到gtFine但是因为接下来还需要预处理数据集，因此要将三个数据集下载好，才能执行官方给的预处理代码。
重构数据集
在这里插入图片描述
将三个zip解压然后新建一个文件夹命名为CityScapes，然后将三个解压文件里的内容按上图目录放置好，为数据集预处理做准备。

三、代码复现

1.数据预处理

这里需要先下载官方的脚本：cityscapesScripts
接下来对其中的一些地方进行修改，最重要的两个文件为项目下cityscapesscripts\helpers\labels.py和cityscapesscripts\preparation\createTrainIdLabelImgs.py。
在这里插入图片描述
蓝色框为原本的代码，直接注释掉添加红框处代码，即指定自己本地的数据集目录，比如我就将CityScapes放到了E盘的dataset目录下。

然后是在label.py文件里按照训练的需要更改trainid，255为不被模型所需要的id，因为FCN中为19类+背景板，所以为20类，刚好符合所以不需要更改label文件中任何内容。
在这里插入图片描述
最后运行createTrainIdLabelImgs.py，如果报错的话大概率是因为缺少上图蓝框所示的库，将其直接注释掉就可以了。

2.代码修改

之所以需要修改是因为原本的代码里面数据预处理那块太慢了，Cityscapes_utils.py要将trainId写入npy文件，运行速度极慢，这也是先前用官方预处理脚本cityscapesScripts来预处理的原因，预处理的目的其实也只是生成TrainIds的mask图片，和labelIds的png图片是同理的，只是每个像素所对应类别按照label.py里面的label表进行改变。
其实pytorch官方有给出加载CityScapes的数据集代码，但其直接拿来用并不能满足我们要求，所以需要修改一下，就原项目代码的Cityscapes_loader.py和torchvision.datasets.Cityscapes的代码结合，得到如下可执行代码。读者只需用其替换train.py文件即可。

# -*- coding: utf-8 -*-
# Author: Reganzhx

from __future__ import print_function

import random
from tqdm import tqdm # 由于训练缓慢，添加进度条方便观察
import imageio
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.autograd import Variable
from torch.utils.data import DataLoader

from fcn import VGGNet, FCN32s, FCN16s, FCN8s, FCNs
# from Cityscapes_loader import CityScapesDataset
from CamVid_loader import CamVidDataset
from torchvision.datasets import Cityscapes
from matplotlib import pyplot as plt
import numpy as np
import time
import sys
import os
from PIL import Image


class CityScapesDataset(Cityscapes):
    def __init__(self, root: str,
                 split: str = "train",
                 mode: str = "fine",
                 target_type="semantic",
                 transform=None,
                 target_transform=None,
                 transforms=None):
        super(CityScapesDataset, self).__init__(root,
                                                split,
                                                mode,
                                                target_type,
                                                transform,
                                                target_transform,
                                                transforms)
        self.means = np.array([103.939, 116.779, 123.68]) / 255.
        self.n_class = 20
        self.new_h = 512 # 数据集图片过大，需要剪裁
        self.new_w = 1024

    def __getitem__(self, index):
        img = imageio.imread(self.images[index], pilmode='RGB')
        targets = []
        for i, t in enumerate(self.target_type):
            if t == "polygon":
                target = self._load_json(self.targets[index][i])
            else:
                target = imageio.imread(self.targets[index][i])
            targets.append(target)

        target = tuple(targets) if len(targets) > 1 else targets[0] # 针对多目标 可不关注
        h, w, _ = img.shape
        top = random.randint(0, h - self.new_h)
        left = random.randint(0, w - self.new_w)
        img = img[top:top + self.new_h, left:left + self.new_w]
        label = target[top:top + self.new_h, left:left + self.new_w]

        # reduce mean
        img = img[:, :, ::-1]  # switch to BGR
        img = np.transpose(img, (2, 0, 1)) / 255.
        img[0] -= self.means[0]
        img[1] -= self.means[1]
        img[2] -= self.means[2]

        # convert to tensor
        img = torch.from_numpy(img.copy()).float()
        label = torch.from_numpy(label.copy()).long()

        # create one-hot encoding
        h, w = label.size()
        target = torch.zeros(self.n_class, h, w)
        for c in range(self.n_class):
            target[c][label == c] = 1

        sample = {'X': img, 'Y': target, 'l': label}

        return sample

    def __len__(self) -> int:
        return len(self.images)

    def _get_target_suffix(self, mode: str, target_type: str) -> str:
        if target_type == "instance":
            return f"{mode}_instanceIds.png"
        elif target_type == "semantic": # 让其指向预处理好的target图片
            return f"{mode}_labelTrainIds.png"
        elif target_type == "color":
            return f"{mode}_color.png"
        else:
            return f"{mode}_polygons.json"


n_class = 20
batch_size = 2 # 根据测试，1batch需要2G显存，请按实际设置
epochs = 500
lr = 1e-4
momentum = 0
w_decay = 1e-5
step_size = 50
gamma = 0.5
configs = "FCNs-BCEWithLogits_batch{}_epoch{}_RMSprop_scheduler-step{}-gamma{}_lr{}_momentum{}_w_decay{}".format(
    batch_size, epochs, step_size, gamma, lr, momentum, w_decay)
print("Configs:", configs)

# create dir for model
model_dir = "models"
if not os.path.exists(model_dir):
    os.makedirs(model_dir)
model_path = os.path.join(model_dir, configs)

use_gpu = torch.cuda.is_available()
num_gpu = list(range(torch.cuda.device_count()))

# 自行更改root
train_data = CityScapesDataset(root='E:/datasets/CityScapes', split='train', mode='fine',
                               target_type='semantic')

train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)

val_data = CityScapesDataset(root='E:/datasets/CityScapes', split='val', mode='fine',
                             target_type='semantic')

val_loader = DataLoader(val_data, batch_size=1)

vgg_model = VGGNet(requires_grad=True, remove_fc=True)
fcn_model = FCNs(pretrained_net=vgg_model, n_class=n_class)

if use_gpu:
    ts = time.time()
    vgg_model = vgg_model.cuda()
    fcn_model = fcn_model.cuda()
    fcn_model = nn.DataParallel(fcn_model, device_ids=num_gpu)
    print("Finish cuda loading, time elapsed {}".format(time.time() - ts))

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.RMSprop(fcn_model.parameters(), lr=lr, momentum=momentum, weight_decay=w_decay)
scheduler = lr_scheduler.StepLR(optimizer, step_size=step_size,
                                gamma=gamma)  # decay LR by a factor of 0.5 every 30 epochs

# create dir for score
score_dir = os.path.join("scores", configs)
if not os.path.exists(score_dir):
    os.makedirs(score_dir)
IU_scores = np.zeros((epochs, n_class))
pixel_scores = np.zeros(epochs)


def train():
    for epoch in range(epochs):
        scheduler.step()

        ts = time.time()
        for iter, batch in enumerate(tqdm(train_loader)):
            optimizer.zero_grad()

            if use_gpu:
                inputs = Variable(batch['X'].cuda())
                labels = Variable(batch['Y'].cuda())
            else:
                inputs, labels = Variable(batch['X']), Variable(batch['Y'])

            outputs = fcn_model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            if iter % 10 == 0:
                print("epoch{}, iter{}, loss: {}".format(epoch, iter, loss.item()))

        print("Finish epoch {}, time elapsed {}".format(epoch, time.time() - ts))
        torch.save(fcn_model, model_path)
        val(epoch)


def val(epoch):
    fcn_model.eval()
    total_ious = []
    pixel_accs = []
    for iter, batch in enumerate(val_loader):
        if use_gpu:
            inputs = Variable(batch['X'].cuda())
        else:
            inputs = Variable(batch['X'])

        output = fcn_model(inputs)
        output = output.data.cpu().numpy()

        N, _, h, w = output.shape
        pred = output.transpose(0, 2, 3, 1).reshape(-1, n_class).argmax(axis=1).reshape(N, h, w)

        target = batch['l'].cpu().numpy().reshape(N, h, w)
        for p, t in zip(pred, target):
            total_ious.append(iou(p, t))
            pixel_accs.append(pixel_acc(p, t))

    # Calculate average IoU
    total_ious = np.array(total_ious).T  # n_class * val_len
    ious = np.nanmean(total_ious, axis=1)
    pixel_accs = np.array(pixel_accs).mean()
    print("epoch{}, pix_acc: {}, meanIoU: {}, IoUs: {}".format(epoch, pixel_accs, np.nanmean(ious), ious))
    IU_scores[epoch] = ious
    np.save(os.path.join(score_dir, "meanIU"), IU_scores)
    pixel_scores[epoch] = pixel_accs
    np.save(os.path.join(score_dir, "meanPixel"), pixel_scores)


# borrow functions and modify it from https://github.com/Kaixhin/FCN-semantic-segmentation/blob/master/main.py
# Calculates class intersections over unions
def iou(pred, target):
    ious = []
    for cls in range(n_class):
        pred_inds = pred == cls
        target_inds = target == cls
        intersection = pred_inds[target_inds].sum()
        union = pred_inds.sum() + target_inds.sum() - intersection
        if union == 0:
            ious.append(float('nan'))  # if there is no ground truth, do not include in evaluation
        else:
            ious.append(float(intersection) / max(union, 1))
        # print("cls", cls, pred_inds.sum(), target_inds.sum(), intersection, float(intersection) / max(union, 1))
    return ious


def pixel_acc(pred, target):
    correct = (pred == target).sum()
    total = (target == target).sum()
    return correct / total


if __name__ == "__main__":
    val(0)  # show the accuracy before training
    train()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240

3.运行结果

分别在自己办公电脑1030显卡（显存4G）和3060显卡（显存12G）上测试，根据两台电脑运行上看每增加1batch就需要消耗2G显存，因为3060上最大只能将batch size设置为6。3060显卡上1个epoch需要8min，也就是说训练完500epoch需要三天时间，可见图像分割真的是极其消耗资源。而1030上1代竟然耗时2h20min，所以按照时间来看首选设备是3090，这样才可能在一天之内进行完一次完整500epoch训练。
在这里插入图片描述
第1轮迭代后pixel accuracy就有75%，目前到第25轮pixel accuracy达到85%，随着epoch数增加，pixel acc也越来越高，希望其最终能突破90%，原论文中可是达到96%pixel准确率。

下图为3060上训练150epoch的结果，每5epoch进行一次val评估。最后使用matplotlib绘制如下曲线，pixel_acc和meanIoU的获取请读者自行额外编写代码获得，此处仅提供绘图代码。
第135epoch取得最高pixel accuracy=0.8766716842651368，meanIoU=0.3268041800950261
在这里插入图片描述

from matplotlib import pyplot as plt

x=[i for i in range(0,151,5)] #横坐标
# 此处给出我的数据，浮点数都用round函数取到小数点后7位
pix_acc_list=[0.7520696,0.7918097,0.6557526,0.8310604,0.8453417,0.8509236,0.8534471,0.8378322,0.8489639,0.8563263,0.8538324,0.8572157,0.860767,0.8660216,0.8631711,0.8631837,0.8670352,0.8597714,0.8689239,0.8647407,0.8698506,0.8712046,0.8719427,0.8722804,0.8732114,0.871852,0.8714358,0.8766717,0.86854,0.8661136,0.8761132]
meanIoU_list=[0.1333057,0.185366,0.1383637,0.2432535,0.2634509,0.2799635,0.2831553,0.2642947,0.2924905,0.3027259,0.3123738,0.2976701,0.3113799,0.3239229,0.3163488,0.3170467,0.3246953,0.3236825,0.3242375,0.3262411,0.3355112,0.3285704,0.3388148,0.328427,0.3378653,0.3385619,0.3358321,0.3268042,0.3297385,0.3347885,0.3379351]
plt.figure()
plt.plot(x,pix_acc_list,color='blue',label='pixel acc')
plt.plot(x,meanIoU_list,color='red',label='meanIoU')

plt.xticks(fontsize=16)
plt.yticks(fontsize=16)

plt.xlabel('Epoch',fontsize=20)
plt.ylabel('Score',fontsize=20)
plt.legend(fontsize=16)
plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

总结

希望您读到这里能有所收获，本文所参考资料也在文末给出，大家可以查阅获取更多知识细节，后续还将不断完善本文内容，敬请期待……

参考网站

https://bbs.huaweicloud.com/blogs/306716
https://developer.aliyun.com/article/797607
https://www.cnblogs.com/dotman/p/cityscapes_dataset_tips.html
https://zhuanlan.zhihu.com/p/147195575
https://codeantenna.com/a/uD5sJceaS1
https://blog.csdn.net/zz2230633069/article/details/84591532
https://www.zhihu.com/question/276325769/answer/2418207657
https://blog.csdn.net/zz2230633069/article/details/84668984
https://blog.csdn.net/yumaomi/article/details/124847721

相关阅读:
分享让PPT变高级的两个小技巧
 102-基于stm32单片机自动灭火火灾报警装置Proteus仿真（仿真+源码+全套资料）
ARMday01(计算机理论、ARM理论)
论文阅读_扩散模型_DM
Kafka生产与消费示例
 vscode 代码片段
 QT 插件化图像算法软件架构
 Elasticsearch(二)- 索引-分片过滤器与延迟再分配
 SpringBoot项目创建
 CSDN有哪些值得学习的专栏？
原文地址：https://blog.csdn.net/weixin_43594279/article/details/127986173