• 语义分割 U-Net 应用入门



    1. 简介

    这里采用了一个在图像分割领域比较熟知的U-Net网络结构
    是一个基于FCN做改进后的一个深度学习网络
    包含下采样(编码器,特征提取)和上采样(解码器,分辨率还原)两个阶段,因模型结构比较像U型而命名为U-Net

    在这里插入图片描述


    2. 源码

    根据PaddlePaddle飞桨开源框架上的文档代码进行一些更改:

    1. 整合和梳理工程文件
    2. 调整数据集地址
    3. 调整网络结构

    可以通过以下渠道下载:


    3. 数据集


    3.1. 开源数据集

    本案例使用原文里的一个例子的 Oxford-IIIT Pet数据集
    里面包含了宠物照片和对应的标签数据
    宠物图片在 /images
    标签数据在 /annotations/trimaps
    具体详情参考 飞桨官方文档说明
    在这里插入图片描述


    3.2. 建立数据集文件夹

    在工程中新建文件夹 /resources/Oxford-IIIT Pet/images,将所有数据原始图片均放置于此
    在工程中新建文件夹 /resources/Oxford-IIIT Pet/masks,将所有数据标签图片均放置于此

    在这里插入图片描述


    3.3. 数据格式统一

    宠物图片数据集里为jpg格式,这边利用tool_jpg2png.py将其统一为png

    import os
    from PIL import Image
    
    # 原图和标签图片地址
    resources_path = "./resources/Oxford-IIIT Pet"
    origin_images_path = resources_path + "/images"
    img_name_list = os.listdir(origin_images_path)
    
    for img_name in img_name_list:
        if img_name[-3:] == "jpg":
            tp = Image.open(origin_images_path + '/' + img_name)
            tp.save(origin_images_path + '/' + img_name[:-3] + 'png')
            os.remove(origin_images_path + '/' + img_name)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13

    4. 网络结构

    网络结构在model.py中定义
    根据U-Net的图片中设置相似的结构,具体如下:

    -----------------------------------------------------------------------------
      Layer (type)        Input Shape          Output Shape         Param #
    =============================================================================
        Conv2D-1       [[1, 3, 160, 160]]   [1, 16, 160, 160]         448
      BatchNorm2D-1   [[1, 16, 160, 160]]   [1, 16, 160, 160]         64
         ReLU-1       [[1, 16, 160, 160]]   [1, 16, 160, 160]          0
        Conv2D-2      [[1, 16, 160, 160]]   [1, 16, 160, 160]        2,320
      BatchNorm2D-2   [[1, 16, 160, 160]]   [1, 16, 160, 160]         64
         ReLU-2       [[1, 16, 160, 160]]   [1, 16, 160, 160]          0
       MaxPool2D-1    [[1, 16, 160, 160]]    [1, 16, 80, 80]           0
        Conv2D-3       [[1, 16, 80, 80]]     [1, 32, 80, 80]         4,640
      BatchNorm2D-3    [[1, 32, 80, 80]]     [1, 32, 80, 80]          128
         ReLU-3        [[1, 32, 80, 80]]     [1, 32, 80, 80]           0
        Conv2D-4       [[1, 32, 80, 80]]     [1, 32, 80, 80]         9,248
      BatchNorm2D-4    [[1, 32, 80, 80]]     [1, 32, 80, 80]          128
         ReLU-4        [[1, 32, 80, 80]]     [1, 32, 80, 80]           0
       MaxPool2D-2     [[1, 32, 80, 80]]     [1, 32, 40, 40]           0
        Conv2D-5       [[1, 32, 40, 40]]     [1, 64, 40, 40]        18,496
      BatchNorm2D-5    [[1, 64, 40, 40]]     [1, 64, 40, 40]          256
         ReLU-5        [[1, 64, 40, 40]]     [1, 64, 40, 40]           0
        Conv2D-6       [[1, 64, 40, 40]]     [1, 64, 40, 40]        36,928
      BatchNorm2D-6    [[1, 64, 40, 40]]     [1, 64, 40, 40]          256
         ReLU-6        [[1, 64, 40, 40]]     [1, 64, 40, 40]           0
       MaxPool2D-3     [[1, 64, 40, 40]]     [1, 64, 20, 20]           0
        Conv2D-7       [[1, 64, 20, 20]]     [1, 128, 20, 20]       73,856
      BatchNorm2D-7    [[1, 128, 20, 20]]    [1, 128, 20, 20]         512
         ReLU-7        [[1, 128, 20, 20]]    [1, 128, 20, 20]          0
        Conv2D-8       [[1, 128, 20, 20]]    [1, 128, 20, 20]       147,584
      BatchNorm2D-8    [[1, 128, 20, 20]]    [1, 128, 20, 20]         512
         ReLU-8        [[1, 128, 20, 20]]    [1, 128, 20, 20]          0
       MaxPool2D-4     [[1, 128, 20, 20]]    [1, 128, 10, 10]          0
        Conv2D-9       [[1, 128, 10, 10]]    [1, 256, 10, 10]       295,168
      BatchNorm2D-9    [[1, 256, 10, 10]]    [1, 256, 10, 10]        1,024
         ReLU-9        [[1, 256, 10, 10]]    [1, 256, 10, 10]          0
        Conv2D-10      [[1, 256, 10, 10]]    [1, 256, 10, 10]       590,080
     BatchNorm2D-10    [[1, 256, 10, 10]]    [1, 256, 10, 10]        1,024
         ReLU-10       [[1, 256, 10, 10]]    [1, 256, 10, 10]          0
       Upsample-1      [[1, 256, 10, 10]]    [1, 256, 20, 20]          0
        Conv2D-11      [[1, 256, 20, 20]]    [1, 128, 20, 20]       32,896
    Conv2DTranspose-1  [[1, 128, 20, 20]]    [1, 128, 20, 20]       147,584
     BatchNorm2D-11    [[1, 128, 20, 20]]    [1, 128, 20, 20]         512
         ReLU-11       [[1, 128, 20, 20]]    [1, 128, 20, 20]          0
    Conv2DTranspose-2  [[1, 128, 20, 20]]    [1, 128, 20, 20]       147,584
     BatchNorm2D-12    [[1, 128, 20, 20]]    [1, 128, 20, 20]         512
         ReLU-12       [[1, 128, 20, 20]]    [1, 128, 20, 20]          0
       Upsample-2      [[1, 128, 20, 20]]    [1, 128, 40, 40]          0
        Conv2D-12      [[1, 128, 40, 40]]    [1, 64, 40, 40]         8,256
    Conv2DTranspose-3  [[1, 64, 40, 40]]     [1, 64, 40, 40]        36,928
     BatchNorm2D-13    [[1, 64, 40, 40]]     [1, 64, 40, 40]          256
         ReLU-13       [[1, 64, 40, 40]]     [1, 64, 40, 40]           0
    Conv2DTranspose-4  [[1, 64, 40, 40]]     [1, 64, 40, 40]        36,928
     BatchNorm2D-14    [[1, 64, 40, 40]]     [1, 64, 40, 40]          256
         ReLU-14       [[1, 64, 40, 40]]     [1, 64, 40, 40]           0
       Upsample-3      [[1, 64, 40, 40]]     [1, 64, 80, 80]           0
        Conv2D-13      [[1, 64, 80, 80]]     [1, 32, 80, 80]         2,080
    Conv2DTranspose-5  [[1, 32, 80, 80]]     [1, 32, 80, 80]         9,248
     BatchNorm2D-15    [[1, 32, 80, 80]]     [1, 32, 80, 80]          128
         ReLU-15       [[1, 32, 80, 80]]     [1, 32, 80, 80]           0
    Conv2DTranspose-6  [[1, 32, 80, 80]]     [1, 32, 80, 80]         9,248
     BatchNorm2D-16    [[1, 32, 80, 80]]     [1, 32, 80, 80]          128
         ReLU-16       [[1, 32, 80, 80]]     [1, 32, 80, 80]           0
       Upsample-4      [[1, 32, 80, 80]]    [1, 32, 160, 160]          0
        Conv2D-14     [[1, 32, 160, 160]]   [1, 16, 160, 160]         528
    Conv2DTranspose-7 [[1, 16, 160, 160]]   [1, 16, 160, 160]        2,320
     BatchNorm2D-17   [[1, 16, 160, 160]]   [1, 16, 160, 160]         64
         ReLU-17      [[1, 16, 160, 160]]   [1, 16, 160, 160]          0
    Conv2DTranspose-8 [[1, 16, 160, 160]]   [1, 16, 160, 160]        2,320
     BatchNorm2D-18   [[1, 16, 160, 160]]   [1, 16, 160, 160]         64
         ReLU-18      [[1, 16, 160, 160]]   [1, 16, 160, 160]          0
        Conv2D-15     [[1, 16, 160, 160]]    [1, 4, 160, 160]         68
    =============================================================================
    Total params: 1,620,644
    Trainable params: 1,614,756
    Non-trainable params: 5,888
    -----------------------------------------------------------------------------
    Input size (MB): 0.29
    Forward/backward pass size (MB): 91.31
    Params size (MB): 6.18
    Estimated Total Size (MB): 97.78
    -----------------------------------------------------------------------------
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80

    5. 训练

    根据自己电脑硬件合理调整参数进行训练,执行train.py文件
    保存模型于 output 文件夹

    $ python train.py
    
    # Epoch 1/15
    # step  30/416 [=>............................] - loss: 0.9846 - ETA: 5:49 - 907ms/step
    
    • 1
    • 2
    • 3
    • 4

    6. 预测

    执行predict.py文件,预测测试集中前两个数据

    在这里插入图片描述效果还可以吧


    谢谢

  • 相关阅读:
    mybatis-plus 操作json字段
    C语言实现三子棋小游戏(源码+教程)
    aarch64 麒麟v10系统使用docker部署nvidia_gpu_exporter监控GPU
    登峰造极,师出造化,Pytorch人工智能AI图像增强框架ControlNet绘画实践,基于Python3.10
    滑雪——DFS,BFS_DP记忆化搜索
    Docker的使用
    【ElasticSearch应用】
    [modern c++] 函数式编程与 std::ref
    Copilot with GPT-4与文心一言4.0:AI技术的未来
    HTL6033是一款专用于3串锂电池或聚合物电池的保护芯片
  • 原文地址:https://blog.csdn.net/qq_32618327/article/details/125791300