• YOLOv5 分类模型 数据集加载 1


    YOLOv5 分类模型 数据集加载 1

    flyfish

    数据集的加载 python实现,不使用torch库

    目标:得到样本前面是图像文件路径,后面是标签索引

    samples: [('/media/a/flyfish/test/n01440764/ILSVRC2012_val_00000293.JPEG', 0),
              ('/media/a/flyfish/test/n01440764/ILSVRC2012_val_00002138.JPEG', 0),
              ('/media/a/flyfish/test/n01440764/ILSVRC2012_val_00003014.JPEG', 0),
              ('/media/a/flyfish/test/n01440764/ILSVRC2012_val_00006697.JPEG', 0))]
    
    • 1
    • 2
    • 3
    • 4

    简化实现

    import os
    import os.path
    from typing import Any, Callable, cast, Dict, List, Optional, Tuple, Union
    
    
    class DatasetFolder:
    
        def __init__(
            self,
            root: str,
    
        ) -> None:
            self.root=root
            classes, class_to_idx = self.find_classes(self.root)
            samples = self.make_dataset(self.root, class_to_idx)
            
            self.classes = classes
            self.class_to_idx = class_to_idx
            self.samples = samples
            self.targets = [s[1] for s in samples]
    
    
        @staticmethod
        def make_dataset(
            directory: str,
            class_to_idx: Optional[Dict[str, int]] = None,
    
        ) -> List[Tuple[str, int]]:
     
            directory = os.path.expanduser(directory)
    
            if class_to_idx is None:
                _, class_to_idx = self.find_classes(directory)
            elif not class_to_idx:
                raise ValueError("'class_to_index' must have at least one entry to collect any samples.")
    
    
    
            instances = []
            available_classes = set()
            for target_class in sorted(class_to_idx.keys()):
                class_index = class_to_idx[target_class]
                target_dir = os.path.join(directory, target_class)
                if not os.path.isdir(target_dir):
                    continue
                for root, _, fnames in sorted(os.walk(target_dir, followlinks=True)):
                    for fname in sorted(fnames):
                        path = os.path.join(root, fname)
                        if 1:#验证:
                            item = path, class_index
                            instances.append(item)
    
                            if target_class not in available_classes:
                                available_classes.add(target_class)
    
            empty_classes = set(class_to_idx.keys()) - available_classes
            if empty_classes:
                msg = f"Found no valid file for the classes {', '.join(sorted(empty_classes))}. "
    
    
            return instances
    
        def find_classes(self, directory: str) -> Tuple[List[str], Dict[str, int]]:
     
            classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
            if not classes:
                raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")
    
            class_to_idx = {cls_name: i for i, cls_name in enumerate(classes)}
            return classes, class_to_idx
    
    
    
    dataset =  DatasetFolder(root="/media/a/flyfish/test");
    
    print(dataset)
    print("dataset.targets:",dataset.targets)
    print("dataset.classes:",dataset.classes)
    print("samples:",dataset.samples)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79

    find_classes 将标签索引和标签内容对应

    0,1,2是标签索引
    'n01440764', 'n01443537', 'n01484850'是类别名字也是文件夹名字
    按照升序排序

    dataset.targets: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]
    dataset.classes: ['n01440764', 'n01443537', 'n01484850']
    
    • 1
    • 2

    样本中一个是图像文件的绝对路径,后面的是标签

    samples: [('/media/a/flyfish/test/n01440764/ILSVRC2012_val_00000293.JPEG', 0),
              ('/media/a/flyfish/test/n01440764/ILSVRC2012_val_00002138.JPEG', 0),
              ('/media/a/flyfish/test/n01440764/ILSVRC2012_val_00003014.JPEG', 0),
              ('/media/a/flyfish/test/n01440764/ILSVRC2012_val_00006697.JPEG', 0),
              ('/media/a/flyfish/test/n01443537/ILSVRC2012_val_00000236.JPEG', 1),
              ('/media/a/flyfish/test/n01443537/ILSVRC2012_val_00000262.JPEG', 1),
              ('/media/a/flyfish/test/n01443537/ILSVRC2012_val_00000307.JPEG', 1),
              ('/media/a/flyfish/test/n01443537/ILSVRC2012_val_00000994.JPEG', 1),
              ('/media/a/flyfish/test/n01484850/ILSVRC2012_val_00002338.JPEG', 2),
              ('/media/a/flyfish/test/n01484850/ILSVRC2012_val_00002752.JPEG', 2),
              ('/media/a/flyfish/test/n01484850/ILSVRC2012_val_00004311.JPEG', 2),
              ('/media/a/flyfish/test/n01484850/ILSVRC2012_val_00004329.JPEG', 2)]
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    可以功能丰富一些,例如检测文件的扩展名是否是支持的图像文件

    IMG_EXTENSIONS = (".jpg", ".jpeg", ".png", ".ppm", ".bmp", ".pgm", ".tif", ".tiff", ".webp")
    
    def has_file_allowed_extension(filename: str, extensions: Union[str, Tuple[str, ...]]) -> bool:
        """检查文件是否为允许的扩展名
        """
        return filename.lower().endswith(extensions if isinstance(extensions, str) else tuple(extensions))
    
    def is_image_file(filename: str) -> bool:
        return has_file_allowed_extension(filename, IMG_EXTENSIONS)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    测试

    r=is_image_file("/media/a/flyfish/data/imagewoof/val/n02086240/1.jpeg");
    
    print(r)#True
    
    r=is_image_file("/media/a/flyfish/data/imagewoof/val/n02086240/1.txt");
    
    print(r)#False
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
  • 相关阅读:
    批量处理文件夹及子文件夹下文件名
    数据预处理&降维&主成分分析
    C# ComboBox 和 枚举类型(Enum)相互关联
    ES6 部分新特性使用
    iOS开发Swift-14-反向传值,右滑删除,语言本地化,编辑换序,DeBug,addTarget-待办事项App(3)...
    电脑win11怎么还原系统?分享5种电脑系统还原的方法
    VS Code断点调式Cesium
    react-高阶组件
    腾讯云DDos高仿包选购步骤并快速入门!
    [附源码]JAVA毕业设计高校校园社交网络(系统+LW)
  • 原文地址:https://blog.csdn.net/flyfish1986/article/details/134422106