Pytorch实现图像语义分割（初体验）

这些天在学习图像语义分割相关的知识，并简单写了篇概述。原本想先看几篇经典论文，如全卷积网络 FCN，奈何英语水平有限，翻译起来实在费劲。想来不如先直接体验一下语义分割的效果，果然实践起来还挺有趣的。遂将过程记录如下。

代码实现

from torchvision import models
from PIL import Image
import matplotlib.pyplot as plt
import torch
import torchvision.transforms as T
import numpy as np


# Define the helper function
def decode_segmap(image, nc=21):
    label_colors = np.array([(0, 0, 0),  # 0=background
                             # 1=aeroplane, 2=bicycle, 3=bird, 4=boat, 5=bottle
                             (128, 0, 0), (0, 128, 0), (128, 128, 0), (0, 0, 128), (128, 0, 128),
                             # 6=bus, 7=car, 8=cat, 9=chair, 10=cow
                             (0, 128, 128), (128, 128, 128), (64, 0, 0), (192, 0, 0), (64, 128, 0),
                             # 11=dining table, 12=dog, 13=horse, 14=motorbike, 15=person
                             (192, 128, 0), (64, 0, 128), (192, 0, 128), (64, 128, 128), (192, 128, 128),
                             # 16=potted plant, 17=sheep, 18=sofa, 19=train, 20=tv/monitor
                             (0, 64, 0), (128, 64, 0), (0, 192, 0), (128, 192, 0), (0, 64, 128)])

    r = np.zeros_like(image).astype(np.uint8)
    g = np.zeros_like(image).astype(np.uint8)
    b = np.zeros_like(image).astype(np.uint8)

    for l in range(0, nc):
        idx = image == l
        r[idx] = label_colors[l, 0]
        g[idx] = label_colors[l, 1]
        b[idx] = label_colors[l, 2]

    rgb = np.stack([r, g, b], axis=2)
    return rgb


def segment(net, path):
    img = Image.open(path)
    plt.imshow(img)
    plt.axis('off')
    plt.show()
    # Comment the Resize and CenterCrop for better inference results
    trf = T.Compose([T.Resize(256),
                     T.CenterCrop(224),
                     T.ToTensor(),
                     T.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])])
    inp = trf(img).unsqueeze(0)
    out = net(inp)['out']
    om = torch.argmax(out.squeeze(), dim=0).detach().cpu().numpy()
    rgb = decode_segmap(om)
    plt.imshow(rgb)
    plt.axis('off')
    plt.show()


fcn = models.segmentation.fcn_resnet101(pretrained=True).eval()
# dlb = models.segmentation.deeplabv3_resnet101(pretrained=True).eval()

girl = '../img/girl_dog.jpg'
segment(fcn, girl)
# segment(dlb, girl)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

参考链接：https://learnopencv.com/pytorch-for-beginners-semantic-segmentation-using-torchvision/
代码整体理解相对比较简单，详细内容在参考链接中讲解得很清除，我也不必再做赘述。

测试结果

下面展示部分代码运行结果。

可能图像分割的效果不是那么得好，但整体而言还是实现了语义分割，大家也可以自己找一些图片进行测试（注意找的图片要求是label_colors中的），如对代码有疑问可留言交流。

相关阅读:
关于不完全类型的认识
《SQLi-Labs》04. Less 23~28a
一款简单漂亮的WPF UI - AduSkin
NPDP产品经理证书是什么行业的证书？
javaWeb监听器Listener（三）定时清理session
【2024】深度学习配置环境常见报错，持续更新中....
告别模糊，教你游戏录屏怎么样清晰！
@RequestMapping运用举例(有源码) 前后端如何传递参数？后端如何接收前端传过来的参数，传递单个参数，多个参数，对象，数组/集合(有源码)
[LeetCode解题报告] 1610. 可见点的最大数目
SPA项目开发之动态树+数据表格+分页

原文地址：https://blog.csdn.net/weixin_53065229/article/details/132920121