• YOLOV1详解——Pytorch版


    1 YOLOV1

    1. YOLOV1的算法流程。YOLOV1把图像从448x448x3下采样6次得到特征图7x7x30,特征图上的每个网格生成两个框预测物体和物体存在的概率以及20个物体的类别,30=Bx(loc+conf)+cls = 2x(4+1)+20。B的数量为2,是预测框的数量。loc的数量为4,是预测框中心店的偏移和宽高。conf的数量为2,代表每个框与物体的概率。cls代表预测物体的数量。如果真实框落在某个网格内,网格内置信度高得一个框预测真实框。
    2. YOLOV1的优缺点。YOLOV1是end-to-end的模型,模型简单;速度快,预测精度逊色与Faster_RCNN,但是预测速度大幅提升;在不同的数据集上验证证明模型泛化能力强。不需要滑动窗口得到区域框,预测精度相对于其他的模型有所提高。

    由于YOLOV1是end没有设置先验框,因此预测精度受的影响且迁移能力差;只有一个特征层,对小物体和群体的小物体预测能力差,这样也意味着有更多的修改空间。
    3. 计算机视觉模型主要分为三大部分:数据处理、训练、预测。每个部分又包含若干细节,通过化整为零分步拆分、各个击破的方法学习模型,不仅可以加深对模型的理解,还可以为以后学习其他模型打下坚实基础,大多数模型都是用同样的套路,不同的是细节方面的修改,比如说数据增强、主干模型、损失函数的修改等等。
    4.参考代码:https://github.com/abeardear/pytorch-YOLO-v1

    1 数据处理

    数据处理主要包括数据集划分、读入xml文件、数据增强三大部分。

    首先把数据划分为训练集、验证集和测试集,每种类型的数据集里存储的是图片的名称,比如2002001.jpg,2002002.jpg,2002003.jpg图片,在数据集中是2002001,2002002、2002003。第二步根据数据集读入图片的地址和和真实框的信息。第三部就是根据图片和真实框的信息进行数据增强及编码得到标签。

    三部分的代码依次存放在 data_split_11.py 、data_jpgxlm_12.py、 data_label_13.py,点击运行就会得到对应结果。然后点击 train.py 就可以训练模型,根据训练模型的参数就可以点击 predict.py 预测了。

    1.1 数据集划分

    数据集划分在data_split_11.py文件中。

    已知信息:在VOCdevkit/VOC2007/Annotations存储的是每个图片对应的xml信息、训练集、验证集、测试集比例。
    输出的是各个数据集的图片名称。

    数据集划分流程:

    1. 得到所有的以’.xml’结尾的的信息。
    2. 根据训练集、验证集、测试集比例和上一步得到的样本数量,抽样确定各个数据集的样本数、各个样本集中的样本下标。
    3. 根据下标存储各个数据集。
    '''
    dataSplit
    '''
    import os
    import random
    
    xml_path = r'/Users/ls/PycharmProjects/YOLOV1_LS/VOCdevkit/VOC2007/Annotations'
    base_path = r'/Users/ls/PycharmProjects/YOLOV1_LS/VOCdevkit/VOC2007/ImageSets/Main'
    
    # 1 样本名字
    tmp = []
    img_names = os.listdir(xml_path)
    for i in img_names:
        if i.endswith('.xml'):
            tmp.append(i[:-4])
    
    # 2 数据集划分
    trainval_ratio = 0.9
    train_ratio = 0.9
    N = len(tmp)
    trainval_num = int(trainval_ratio*N)
    train_num = int(train_ratio*trainval_num)
    trainval_idx = random.sample(range(N),trainval_num)
    train_idx = random.sample(trainval_idx,train_num)
    
    ftrainval = open(os.path.join(base_path,'LS_trainval.txt'),'w')
    ftrain = open(os.path.join(base_path,'LS_train.txt'),'w')
    fval = open(os.path.join(base_path,'LS_val.txt'),'w')
    ftest = open(os.path.join(base_path,'LS_test.txt'),'w')
    
    # 3 写入数据
    for i in range(N):
        name = tmp[i]+'\n'
        if i in trainval_idx:
            ftrainval.write(name)
            if i in train_idx:
                ftrain.write(name)
            else:
                fval.write(name)
        else:
            ftest.write(name)
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42

    1.2 读入xml文件

    data_jpgxlm_12.py把图片信息和xml信息放在一起。

    1. 打开数据集,建立保存文件地址,遍历每个图片信息,读入图片名称和xml信息并保存。
    2. 对于每个xml文件,根据‘difficult’和‘name’判断是否要保存文件。根据物体名称,确定下标,读取框的信息。
    import xml.etree.ElementTree as ET
    
    sets =[('2007','train'),('2007','val'),('2007','test')]   # 集合里的元祖为数据集名称,用于之前划分的打开数据集和建立解析后的数据集(存储图片名称和框、类别)。
    classes = ["aeroplane", "bicycle", "bird", "boat", "bottle",
               "bus", "car", "cat", "chair", "cow", "diningtable",
               "dog", "horse", "motorbike", "person", "pottedplant",
               "sheep", "sofa", "train", "tvmonitor"]
    
    # 1 解析xml文件,读取图片地址、真实框和物体类别信息
    def convert_annotation(year,img_id,list_file):
        list_file.write("/Users/ls/PycharmProjects/YOLOV1_LS/VOCdevkit/VOC%s/JPEGImages/%s.jpg"%(year,img_id))  # 存储图片名称
        in_file = open('/Users/ls/PycharmProjects/YOLOV1_LS/VOCdevkit/VOC%s/Annotations/%s.xml'%(year,img_id))    # 打开对应图片的xml文件
        root = ET.parse(in_file).getroot()
    
        for obj in root.iter('object'):
            difficult = obj.find('difficult').text
            cls = obj.find('name').text
            if cls not in classes or int(difficult)==1:
                continue
            xml_box = obj.find('bndbox')
            b = (int(xml_box.find('xmin').text),
                 int(xml_box.find('ymin').text),
                 int(xml_box.find('xmax').text),
                 int(xml_box.find('ymax').text))  # 获取框
            cls_id = classes.index(cls)  # 获取类别
            list_file.write(' '+','.join([str(i) for i in b])+','+str(cls_id))  # 存储框和类别信息
    
    # 
    for year,img_set in sets:
        img_ids = open('/Users/ls/PycharmProjects/YOLOV1_LS/VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year,img_set)).read().strip().split()
        # 读取数据集信息
        list_file = open('/Users/ls/PycharmProjects/YOLOV1_LS/%s_%sLS.txt'%(year,img_set),'w') # 建立对应的存储文件
        for img_id in img_ids:  # 遍历每个图片
            convert_annotation(year,img_id,list_file)
            list_file.write('\n')  # 换行
        list_file.close()
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37

    1.3 数据增强

    data_label_13.py对图片数据增强,增加样本,提高模型的泛化能力。

    1. 首先把图片名称、框、类别信息分别存储。
    2. 生成迭代器,对每个图片进行数据增强。翻转、缩放、模糊、随机变换亮度、随机变换色度、随机变换饱和度、随机平移、随机剪切。
    3. 编码。对于增强后的图片,根据图片宽高,获取框在图片的相对位置、去均值、统一图片尺寸、编码。编码的关键在于找到真实框在特征图上的相对位置。先把真实框左上角和右下角坐标转换为中心点和宽高,中心点坐标x特征图宽高后,向下取整在-1就得到真实框在特征图上的位置ij。中心点坐标x特征图宽高-ij得到真实偏移。根据ij输入偏移、宽高、类别信息。

    (1)数据增强代码

    
    ''' 数据增强 '''
    
    import os
    import sys
    import cv2
    import torch
    import random
    import os.path
    import numpy as np
    import matplotlib.pyplot as plt
    import torch.utils.data as data
    import torchvision.transforms as transforms
    
    class yoloDataset(data.Dataset):
        image_size = 448
        def __init__(self,root,list_file,train,transform):
            print('data init')
            self.root = root    ## img.jpg_path
            self.train = train  ## bool:True,如果是训练模式,就进行数据增强。
            self.transform = transform  # 转置
            self.fnames = []   # 存储图片名  eg: 00014.jpg
            self.boxes = []    # 存放真实框信息
            self.labels = []   # 存放标签
            self.mean = (123,117,104)   # 图片各个通道均值,用于归一化数据,加速训练。
    
    
            with open(list_file) as f:
                lines = f.readlines()   # 读取数据 
    
            # line: /Users/ls/PycharmProjects/YOLOV1_LS/VOCdevkit/VOC2007/JPEGImages/000022.jpg 68,103,368,283,12 186,44,255,230,14
            for line in lines:   
                splited = line.strip().split()
                self.fnames.append(splited[0])  # 保存图片名
                num_boxes = (len(splited)-1)    # 一张图片中真实框的个数
                box = []     # 存储框
                label = []   # 存储标签
                for i in range(1,num_boxes+1): # 遍历每个框
                    tmp = [float(j) for j in splited[i].split(',')] # 把真实框油字符变为float类型,并用‘,’隔开。
                    box.append(tmp[:4]) 
                    label.append(int(tmp[4])+1)
                self.boxes.append(torch.Tensor(box))
                self.labels.append(torch.LongTensor(label))
            self.num_samples = len(self.boxes)
    
        def __getitem__(self,idx):
            fname = self.fnames[idx] # 用迭代器遍历每张图片
            img = cv2.imread(fname)  # 读取图片 cv2.imread(os.path.join(self.root+fname))
            boxes = self.boxes[idx].clone()
            labels = self.labels[idx].clone()
    
            if self.train:
                img,boxes = self.random_flip(img,boxes)
                img,boxes = self.randomScale(img,boxes)
                img = self.randomBlur(img)
                img = self.RandomBrightness(img)
                img = self.RandomHue(img)
                img = self.RandomSaturation(img)
                img,boxes,labels = self.randomShift(img,boxes,labels)
                img,boxes,labels = self.randomCrop(img,boxes,labels)
    
            h,w,_ = img.shape
            boxes /= torch.Tensor([w,h,w,h]).expand_as(boxes)
            img = self.BGR2RGB(img)
            img = self.subMean(img,self.mean)
            img = cv2.resize(img,(self.image_size,self.image_size))
            target = self.encoder(boxes,labels)
            for t in self.transform:
                img = t(img)
            return  img,target
    
        def __len__(self):
            return self.num_samples
    
        def encoder(self,boxes,labels):
            grid_num = 7
            target = torch.zeros((grid_num,grid_num,30))
            wh = boxes[:,2:] - boxes[:,:2]
            cxcy = (boxes[:,2:] + boxes[:,:2])/2
            for i in range(cxcy.size()[0]):
                cxcy_sample = cxcy[i]
                ij = (cxcy_sample*grid_num).ceil()-1
                dxy = cxcy_sample*grid_num-ij
                target[int(ij[1]),int(ij[0]),:2] = target[int(ij[1]),int(ij[0]),5:7] = dxy
                target[int(ij[1]),int(ij[0]),2:4] = target[int(ij[1]),int(ij[0]),7:9] = wh[i]
                target[int(ij[1]),int(ij[0]),4] = target[int(ij[1]),int(ij[0]),9] = 1
                target[int(ij[1]),int(ij[0]),int(labels[i])+9] = 1
            return target
    
        def BGR2RGB(self,img):
            return cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
    
        def BGR2HSV(self,img):
            return cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
    
        def HVS2BGR(self,img):
            return cv2.cvtColor(img,cv2.COLOR_HSV2BGR)
    
        def RandomBrightness(self,bgr):
            if random.random()<0.5:
                hsv = self.BGR2HSV(bgr)
                h,s,v = cv2.split(hsv)
                v = v.astype(float)
                v *= random.choice([0.5,1.5])
                v = np.clip(v,0,255).astype(hsv.dtype)
                hsv = cv2.merge((h,s,v))
                bgr = self.HVS2BGR(hsv)
            return  bgr
    
        def RandomSaturation(self,bgr):
            if random.random()<0.5:
                hsv = self.BGR2HSV(bgr)
                h,s,v = cv2.split(hsv)
                s = s.astype(float)
                s *= random.choice([0.5,1.5])
                s = np.clip(s,0,255).astype(hsv.dtype)
                hsv = cv2.merge((h,s,v))
                bgr = self.HVS2BGR(hsv)
            return bgr
    
        def RandomHue(self,bgr):
            if random.random() < 0.5:
                hsv = self.BGR2HSV(bgr)
                h,s,v = cv2.split(hsv)
                h = h.astype(float)
                h *= random.choice([0.5,1.5])
                h = np.clip(h,0,255).astype(hsv.dtype)
                hsv=cv2.merge((h,s,v))
                bgr = self.HVS2BGR(hsv)
            return bgr
    
        def randomBlur(self,bgr):
            if random.random() < 0.5:
                bgr = cv2.blur(bgr,(5,5))
            return bgr
    
        def randomShift(self,bgr,boxes,labels):
            center = (boxes[:,2:]+boxes[:,:2])/2
            if random.random()<0.5:
                height,width,c = bgr.shape
                after_shift_imge = np.zeros((height,width,c),dtype=bgr.dtype)
                after_shift_imge[:,:,:] = (104,117,123)
                shift_x = random.uniform(-width*0.2,width*0.2)
                shift_y = random.uniform(-height*0.2,height*0.2)
                if shift_x>=0 and shift_y>=0:
                    after_shift_imge[int(shift_y):,int(shift_x):,:] = bgr[:height-int(shift_y),:width-int(shift_x),:]
                elif shift_x>=0 and shift_y<0:
                    after_shift_imge[:height+int(shift_y),int(shift_x):,:] = bgr[-int(shift_y):,:width-int(shift_x),:]
                elif shift_x <0 and shift_y >=0:
                    after_shift_imge[int(shift_y):,:width+int(shift_x),:] = bgr[:height-int(shift_y),-int(shift_x):,:]
                elif shift_x<0 and shift_y<0:
                    after_shift_imge[:height+int(shift_y),:width+int(shift_x),:] = bgr[-int(shift_y):,-int(shift_x):,:]
    
                shift_xy = torch.FloatTensor([[int(shift_x),int(shift_y)]]).expand_as(center)
                center = center + shift_xy
                mask1 = (center[:,0]>0)& (center[:,0]>height)
                mask2 = (center[:,1]>0)& (center[:,1]>width)
                mask = (mask1 & mask2).view(-1,1)
                boxes_in = boxes[mask.expand_as(boxes)].view(-1,4)
                if len(boxes_in) == 0:
                    return bgr,boxes,labels
                box_shift = torch.FloatTensor([[int(shift_x),int(shift_y),int(shift_x),int(shift_y)]]).expand_as(boxes_in)
                boxes_in = boxes_in+box_shift
                labels_in = labels[mask.view(-1)]
                return after_shift_imge,boxes_in,labels_in
            return bgr,boxes,labels
    
        def randomScale(self,bgr,boxes):
            if random.random() < 0.5:
                scale = random.uniform(0.8,1.2)
                h,w,c = bgr.shape
                bgr = cv2.resize(bgr,(int(w*scale),h))
                scale_tensor = torch.FloatTensor([[scale,1,scale,1]]).expand_as(boxes)
                boxes = boxes*scale_tensor
                return bgr,boxes
            return bgr,boxes
    
        def randomCrop(self,bgr,boxes,labels):
            if random.random() < 0.5:
                center = (boxes[:,:2]+boxes[:,2:])/2
                height,width,c = bgr.shape
                h = random.uniform(0.6*height,height)
                w = random.uniform(0.6*width,width)
                x = random.uniform(0,width-w)
                y = random.uniform(0,height-h)
                x,y,h,w = int(x),int(y),int(h),int(w)
    
                center = center - torch.FloatTensor([[x,y]]).expand_as(center)
                mask1 = (center[:,0]>0) & (center[:,0]<w)
                mask2 = (center[:,1]>0) & (center[:,0]<h)
                mask = (mask1 & mask2).view(-1,1)
    
                boxes_in = boxes[mask.expand_as(boxes)].view(-1,4)
                if(len(boxes_in)==0):
                    return bgr,boxes,labels
    
                box_shift = torch.FloatTensor([[x,y,x,y]]).expand_as(boxes_in)
                boxes_in = boxes_in-box_shift
                boxes_in[:,0]=boxes_in[:,0].clamp(0,w)
                boxes_in[:,2]=boxes_in[:,2].clamp(0,w)
                boxes_in[:,1]=boxes_in[:,1].clamp(0,h)
                boxes_in[:,3]=boxes_in[:,3].clamp(0,h)
    
                labels_in = labels[mask.view(-1)]
                img_croped = bgr[y:y+h,x:x+w,:]
                return img_croped,boxes_in,labels_in
            return bgr,boxes,labels
    
        def subMean(self,bgr,mean):
            mean = np.array(mean,dtype=np.float32)
            bgr = bgr - mean
            return bgr
    
        def random_flip(self,im,boxes):
            if random.random() < 0.5:
                im_lr = np.fliplr(im).copy()
                h,w,_ = im.shape
                xmin = w - boxes[:,2]
                xmax = w - boxes[:,0]
                boxes[:,0] = xmin
                boxes[:,2] = xmax
                return im_lr,boxes
            return im,boxes
    
        def random_bright(self,im,delta=16):
            alpha = random.random()
            if alpha > 0.3:
                im = im * alpha + random.randrange(-delta,delta)
                im = im.clip(min=0,max=255).astype(np.uint8)
            return im
    
    if __name__=='__main__':
        from torch.utils.data import DataLoader
        import torchvision.transforms as transforms
        file_root ='/Users/ls/PycharmProjects/YOLOV1_LS'  ## xx.jpg
        list_f = r'/Users/ls/PycharmProjects/YOLOV1_LS/2007_train.txt'
        train_dataset = yoloDataset(root=file_root,list_file=list_f,train=True,transform = [transforms.ToTensor()])
        train_loader = DataLoader(train_dataset,batch_size=1,shuffle=False,num_workers=0)
        train_iter = iter(train_loader)
        # for i in range(5):
        #     img,target = next(train_iter)
        #     print(target[target[...,0]>0])
        for i,(images,target) in enumerate(train_loader):
            print(1111111111111111111111)
            print(target)
            print(images)
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145
    • 146
    • 147
    • 148
    • 149
    • 150
    • 151
    • 152
    • 153
    • 154
    • 155
    • 156
    • 157
    • 158
    • 159
    • 160
    • 161
    • 162
    • 163
    • 164
    • 165
    • 166
    • 167
    • 168
    • 169
    • 170
    • 171
    • 172
    • 173
    • 174
    • 175
    • 176
    • 177
    • 178
    • 179
    • 180
    • 181
    • 182
    • 183
    • 184
    • 185
    • 186
    • 187
    • 188
    • 189
    • 190
    • 191
    • 192
    • 193
    • 194
    • 195
    • 196
    • 197
    • 198
    • 199
    • 200
    • 201
    • 202
    • 203
    • 204
    • 205
    • 206
    • 207
    • 208
    • 209
    • 210
    • 211
    • 212
    • 213
    • 214
    • 215
    • 216
    • 217
    • 218
    • 219
    • 220
    • 221
    • 222
    • 223
    • 224
    • 225
    • 226
    • 227
    • 228
    • 229
    • 230
    • 231
    • 232
    • 233
    • 234
    • 235
    • 236
    • 237
    • 238
    • 239
    • 240
    • 241
    • 242
    • 243
    • 244
    • 245
    • 246
    • 247

    (2)以下是数据增强中主要函数的讲解。数据增强中,在对图片操作的同时,关键是也要对真实框作相应的操作。例如对图像进行翻转,目标物体位置发生变化,真实框也要进行同样的翻转,利于保障训练预测结果的准确性。

    1. 编码要点

    I.编码中输入的框是框在图像中的相对位置,即真实框的左上角和右下角坐标除以图像宽高后的值。因为要在找到特征图上真实框的位置,虽然原图和特征图的尺寸不一样,但是真实框在原图和特征图的相对位置一样,通过相对位置把真实框映射在特征图上。

    img.shape:[400,500,3]         # 原图 h:400;w:500
    box:[100,120,200,250]         # 真实框的坐标 [x1,y1,x2,y2]
    box_img: [100/500,120/400,200/500,250/400]            # 真实框在原图上的相对位置 [x1/w,y1/h,x2/w,y2/h]
    feature.shape:[7,7,30]        # 特征图
    box_feature: [100/500*7,120/400*7,200/500*7,250/400*7] # 真实框在特征图上的位置
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    II. dxy

    cxcy_sample = cxcy[i]          # 真实框相对原图的中心点
    ij = (cxcy_sample*grid_num).ceil()-1  # 真实框在特征图上的中心点,为了防止中心点越界,因此中心点坐标向下取整并减一。
    dxy = cxcy_sample*grid_num-ij  # 偏移
    
    
    • 1
    • 2
    • 3
    • 4

    III. target.shape[7,7,30],前10个数据是两个框和置信度数据,后面20个是类别的one_hot形式。

    dim = 2
    label = 5
    0   1  2  3    4    5   6   7  8    9     10  11  12  13  14 ...29
    dx  dy w  h  conf   dx  dy  w  h  conf     0   0   0   0   1 ... 0
    
    • 1
    • 2
    • 3
    • 4

    编码流程:

    I. 真实框左上角、右下角[x1,y1,x2,y2]坐标转换为中心点和宽高[cx,cy,w,h]。

    II. 遍历图像中真实框,计算真实框在特征图上的中心点坐标、中心点坐标偏移。把框、置信度、类别信息填写在标签target的对应位置上。

    
    def encoder(self,boxes,labels):
            grid_num = 7     # 特征图边长,feature.shape:[7,7,30]
            target = torch.zeros((grid_num,grid_num,30))   # 标签
            wh = boxes[:,2:] - boxes[:,:2]         # 真实框相对原图的宽高
            cxcy = (boxes[:,2:] + boxes[:,:2])/2   # 真实框相对原图的中心点
            for i in range(cxcy.size()[0]):  # 遍历每个框
                cxcy_sample = cxcy[i]         
                ij = (cxcy_sample*grid_num).ceil()-1  # 真实框在特征图上的中心点
                dxy = cxcy_sample*grid_num-ij     # 中心偏移
                target[int(ij[1]),int(ij[0]),:2] = target[int(ij[1]),int(ij[0]),5:7] = dxy   
                target[int(ij[1]),int(ij[0]),2:4] = target[int(ij[1]),int(ij[0]),7:9] = wh[i]
                target[int(ij[1]),int(ij[0]),4] = target[int(ij[1]),int(ij[0]),9] = 1  # 置信度
                target[int(ij[1]),int(ij[0]),int(labels[i])+9] = 1   # 类别
            return target
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    1. 随机调节亮度
    def RandomBrightness(self,bgr):
        if random.random()<0.5:        # 随机值
            hsv = self.BGR2HSV(bgr)    # bgr-->hsv
            h,s,v = cv2.split(hsv)     # 通道分割
            v = v.astype(float)        # 取出亮度通道,转换数据类型
            v *= random.choice([0.5,1.5])  # 随机变换亮度值
            v = np.clip(v,0,255).astype(hsv.dtype)  # 限制数据范围,转换数据类型
            hsv = cv2.merge((h,s,v))   # 通道合并
            bgr = self.HVS2BGR(hsv)    # hsv-->bgr
        return  bgr
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    1. 随机平移

    I. 找到平移距离shift_x,shift_y,平移图片。

    II. 判断框平移后是否合理。

    III.对框平移。

    bgr.shape[100,100,3]
    1: shift_x= 20,shift_y= 30,after_shift_imge[30:,20:,:]=bgr[:70,:80,:]向右下角移动
    2: shift_x= 20,shift_y=-30,after_shift_imge[:70,20:,:]=bgr[30:,:80,:]向上平移
    3: shift_x=-20,shift_y= 30,after_shift_imge[30:,:80,:]=bgr[:70,20:,:]向左平移
    4: shift_x=-20,shift_y=-30,after_shift_imge[:70,:80,:]=bgr[30:,20:,:]向左上角移动
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    def randomShift(self,bgr,boxes,labels):
            center = (boxes[:,2:]+boxes[:,:2])/2
            if random.random()<0.5:
                height,width,c = bgr.shape
                after_shift_imge = np.zeros((height,width,c),dtype=bgr.dtype)
                after_shift_imge[:,:,:] = (104,117,123)
                shift_x = random.uniform(-width*0.2,width*0.2)
                shift_y = random.uniform(-height*0.2,height*0.2)
                if shift_x>=0 and shift_y>=0:
                    after_shift_imge[int(shift_y):,int(shift_x):,:] = bgr[:height-int(shift_y),:width-int(shift_x),:]
                elif shift_x>=0 and shift_y<0:
                    after_shift_imge[:height+int(shift_y),int(shift_x):,:] = bgr[-int(shift_y):,:width-int(shift_x),:]
                elif shift_x <0 and shift_y >=0:
                    after_shift_imge[int(shift_y):,:width+int(shift_x),:] = bgr[:height-int(shift_y),-int(shift_x):,:]
                elif shift_x<0 and shift_y<0:
                    after_shift_imge[:height+int(shift_y),:width+int(shift_x),:] = bgr[-int(shift_y):,-int(shift_x):,:]
    
                shift_xy = torch.FloatTensor([[int(shift_x),int(shift_y)]]).expand_as(center)
                center = center + shift_xy # 框中心点移动后的位置
                mask1 = (center[:,0]>0)& (center[:,0]>height)  # 
                mask2 = (center[:,1]>0)& (center[:,1]>width)
                mask = (mask1 & mask2).view(-1,1) 
                boxes_in = boxes[mask.expand_as(boxes)].view(-1,4)  # 筛选移动后仍存在的框
                if len(boxes_in) == 0:
                    return bgr,boxes,labels
                box_shift = torch.FloatTensor([[int(shift_x),int(shift_y),int(shift_x),int(shift_y)]]).expand_as(boxes_in)
                boxes_in = boxes_in+box_shift
                labels_in = labels[mask.view(-1)]   # 筛选移动后仍存在的标签
                return after_shift_imge,boxes_in,labels_in
            return bgr,boxes,labels
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    1. 随机缩放
    def randomScale(self,bgr,boxes):
        if random.random() < 0.5:
            scale = random.uniform(0.8,1.2)  # 缩放因子
            h,w,c = bgr.shape
            bgr = cv2.resize(bgr,(int(w*scale),h)) # 缩放图片宽,高度不变
            scale_tensor = torch.FloatTensor([[scale,1,scale,1]]).expand_as(boxes)
            boxes = boxes*scale_tensor    # 缩放框的宽度
            return bgr,boxes
        return bgr,boxes
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    1. 随机剪切

    I.确定随机剪切的宽高hw,根据宽高确定剪切起始点xy.

    II.根据框的中心点判断框是否在剪切区间。

    III.对在剪切区间的框移动。返回剪切后的图片、框、标签。

    def randomCrop(self,bgr,boxes,labels):
        if random.random() < 0.5:
            center = (boxes[:,:2]+boxes[:,2:])/2
            height,width,c = bgr.shape
            h = random.uniform(0.6*height,height)
            w = random.uniform(0.6*width,width)
            x = random.uniform(0,width-w)
            y = random.uniform(0,height-h)
            x,y,h,w = int(x),int(y),int(h),int(w)
    
            center = center - torch.FloatTensor([[x,y]]).expand_as(center)
            mask1 = (center[:,0]>0) & (center[:,0]<w)
            mask2 = (center[:,1]>0) & (center[:,0]<h)
            mask = (mask1 & mask2).view(-1,1)
    
            boxes_in = boxes[mask.expand_as(boxes)].view(-1,4)
            if(len(boxes_in)==0):
                return bgr,boxes,labels
    
            box_shift = torch.FloatTensor([[x,y,x,y]]).expand_as(boxes_in)
            boxes_in = boxes_in-box_shift
            boxes_in[:,0]=boxes_in[:,0].clamp(0,w)
            boxes_in[:,2]=boxes_in[:,2].clamp(0,w)
            boxes_in[:,1]=boxes_in[:,1].clamp(0,h)
            boxes_in[:,3]=boxes_in[:,3].clamp(0,h)
    
            labels_in = labels[mask.view(-1)]
            img_croped = bgr[y:y+h,x:x+w,:]
            return img_croped,boxes_in,labels_in
        return bgr,boxes,labels
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    1. 随机左右翻转

    图片左右翻转后,框距左右边界翻转。

    def random_flip(self,im,boxes):
        if random.random() < 0.5:
            im_lr = np.fliplr(im).copy()
            h,w,_ = im.shape
            xmin = w - boxes[:,2]   # 框距右边边界的距离
            xmax = w - boxes[:,0]   # 框距左边边界的距离
            # 调换左右边界距离就是翻转后的框
            boxes[:,0] = xmin       
            boxes[:,2] = xmax
            return im_lr,boxes
        return im,boxes
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    2 训练

    论文中的主干模型是由24层卷积和两个全联接组成。代码中
    训练主要包括backbone(ResNet、VGG)、LOSS、代入数据训练模型。作者在ImageNet数据集上训练模型,达到top5上达到 88%的准确率。

    2.1 Backbone

    (1)ResNet
    ResNet主要分为三部分。首先通过卷积和池化进行两次步长为2的下采样,然后通过残差模块layer1~layer5扩展通道数和三次步长为2的下采样,最后一次卷积、批归一化、激活得到特征图[7,7,30]。

    resnet_yolo.py

    import math
    import torch.nn as nn
    import torch.utils.model_zoo as model_zoo
    import torch.nn.functional as F
    
    __all__ = ['ResNet','resnet18','resnet34','resnet50','resnet101','resnet152']
    
    model_urls = {
        'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
        'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
        'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
        'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
        'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
    }
    
    def conv3x3(in_planes,out_planes,stride=1):
        return nn.Conv2d(in_planes,out_planes,kernel_size=3,stride=stride,padding=1,bias=False)
    
    class BasicBlock(nn.Module):
        expansion = 1
        def __init__(self,inplanes,planes,stride=1,downsample=None):
            super(BasicBlock, self).__init__()
            self.conv1 = conv3x3(inplanes, planes, stride)
            self.bn1 = nn.BatchNorm2d(planes)
            self.relu = nn.ReLU(inplace=True)
            self.conv2 = conv3x3(planes, planes)
            self.bn2 = nn.BatchNorm2d(planes)
            self.downsample = downsample
            self.stride = stride
    
        def forward(self,x):
            residual = x
            out = self.conv1(x)
            out = self.bn1(out)
            out = self.relu(out)
            out = self.conv2(out)
            out = self.bn2(out)
            if self.downsample is not None:
                residual = self.downsample(x)
            out += residual
            out = self.relu(out)
            return out
    
    class Bottleneck(nn.Module):
        expansion = 4
        def __init__(self, inplanes, planes, stride=1, downsample=None):
            super(Bottleneck, self).__init__()
            self.conv1 = nn.Conv2d(inplanes,planes,kernel_size=1,bias=False)
            self.bn1 = nn.BatchNorm2d(planes)
            self.conv2 = nn.Conv2d(planes,planes,kernel_size=3,stride=stride,padding=1,bias=False)
            self.bn2 = nn.BatchNorm2d(planes)
            self.conv3 = nn.Conv2d(inplanes, planes * self.expansion, kernel_size=1,bias=False)
            self.bn3 = nn.BatchNorm2d(planes* self.expansion)
            self.relu = nn.ReLU(inplace=True)
            self.downsample = downsample
            self.stride = stride
    
        def forward(self,x):
            residual = x
            out = self.conv1(x)
            out = self.bn1(out)
            out = self.relu(out)
            out = self.conv2(out)
            out = self.bn2(out)
            out = self.relu(out)
            out = self.conv3(out)
            out = self.bn3(out)
            if self.downsample is not None:
                residual = self.downsample(x)
            out += residual
            out = self.relu(out)
            return out
    
    class detnet_bottleneck(nn.Module):
        # no expansion
        # dilation = 2
        # type B use 1x1 conv
        expansion = 1
        def __init__(self, in_planes, planes, stride=1, block_type='A'):
            super(detnet_bottleneck, self).__init__()
            self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
            self.bn1 = nn.BatchNorm2d(planes)
            self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=2, bias=False,dilation=2)
            self.bn2 = nn.BatchNorm2d(planes)
            self.conv3 = nn.Conv2d(planes, self.expansion*planes, kernel_size=1, bias=False)
            self.bn3 = nn.BatchNorm2d(self.expansion*planes)
    
            self.downsample = nn.Sequential()
            if stride != 1 or in_planes != self.expansion*planes or block_type == 'B':
                self.downsample = nn.Sequential(
                    nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
                    nn.BatchNorm2d(self.expansion*planes)
                )
    
        def forward(self, x):
            out = F.relu(self.bn1(self.conv1(x)))
            out = F.relu(self.bn2(self.conv2(out)))
            out = self.bn3(self.conv3(out))
            out += self.downsample(x)
            out = F.relu(out)
            return out
    
    class ResNet(nn.Module): # model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs)
        def __init__(self, block, layers, num_classes=1470):
            self.inplanes = 64
            super(ResNet, self).__init__()
            self.conv1 = nn.Conv2d(3,64,kernel_size=7,stride=2,padding=3,bias=False)
            self.bn1 = nn.BatchNorm2d(64)
            self.relu = nn.ReLU(inplace=True)
            self.maxpool = nn.MaxPool2d(kernel_size=3,stride=2,padding=1)  # torch.Size([2, 64, 112, 112])
            self.layer1 = self._make_layer(block,64,layers[0])
            self.layer2 = self._make_layer(block,128,layers[1],stride=2)
            self.layer3 = self._make_layer(block,256,layers[2],stride=2)
            self.layer4 = self._make_layer(block,512,layers[3],stride=2)
            self.layer5 = self._make_detnet_layer(in_channels=512)   # (in_channels=2048)
            self.conv_end = nn.Conv2d(256,30,kernel_size=3,stride=1,padding=1,bias=False)
            self.bn_end = nn.BatchNorm2d(30)
            for m in self.modules():
                if isinstance(m,nn.Conv2d):
                    n = m.kernel_size[0]*m.kernel_size[1]*m.out_channels
                    m.weight.data.normal_(0,math.sqrt(2./n))
                elif isinstance(m,nn.BatchNorm2d):
                    m.weight.data.fill_(1)
                    m.bias.data.zero_()
    
        def _make_layer(self,block, planes, blocks, stride=1): # 64,3
            downsample = None
            if stride != 1 or self.inplanes != planes * block.expansion:
                downsample = nn.Sequential(
                    nn.Conv2d(self.inplanes,planes*block.expansion,kernel_size=1,stride=stride,bias=False),
                    nn.BatchNorm2d(planes*block.expansion),
                )
            layers = []
            layers.append(block(self.inplanes,planes,stride,downsample))
            self.inplanes = planes*block.expansion
            for i in range(1,blocks):
                layers.append(block(self.inplanes,planes))
            return nn.Sequential(*layers)
    
        def _make_detnet_layer(self,in_channels):
            layers = []
            layers.append(detnet_bottleneck(in_planes=in_channels, planes=256, block_type='B'))
            layers.append(detnet_bottleneck(in_planes=256, planes=256, block_type='A'))
            layers.append(detnet_bottleneck(in_planes=256, planes=256, block_type='A'))
            return nn.Sequential(*layers)
        def forward(self,x):
            x = self.conv1(x)
            x = self.bn1(x)
            x = self.relu(x)
            x = self.maxpool(x)
            x = self.layer1(x)
            x = self.layer2(x)
            x = self.layer3(x)
            x = self.layer4(x)
            # print(x.shape)
            x = self.layer5(x)
            # print(x.shape)
    
            x = self.conv_end(x)
            x = self.bn_end(x)
            x = F.sigmoid(x)
    
            x = x.permute(0,2,3,1)
            return x
    
    def resnet18(pretrained=False,**kwargs):
        model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)
        if pretrained:
            model.load_state_dict(model_zoo.load_url(model_urls['resnet18']))
        return model
    
    def resnet34(pretrained=False,**kwargs):
        model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs)
        if pretrained:
            model.load_state_dict(model_zoo.load_url(model_urls['resnet34']))
        return model
    
    def resnet50(pretrained=False,**kwargs):
        model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs)
        if pretrained:
            model.load_state_dict(model_zoo.load_url(model_urls['resnet50']))
        return model
    
    def resnet101(pretrained=False,**kwargs):
        model = ResNet(BasicBlock, [3, 4, 23, 3], **kwargs)
        if pretrained:
            model.load_state_dict(model_zoo.load_url(model_urls['resnet101']))
        return model
    
    def resnet152(pretrained=False,**kwargs):
        model = ResNet(BasicBlock, [3, 8, 36, 3], **kwargs)
        if pretrained:
            model.load_state_dict(model_zoo.load_url(model_urls['resnet152']))
        return model
    
    
    def test():
        import torch
        from torch.autograd import Variable
        model = resnet18()
        x = torch.rand(2, 64, 112, 112)
        x = Variable(x)
        out = model(x)
        print(out.shape)
    
    if __name__ == '__main__':
        test()
    
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145
    • 146
    • 147
    • 148
    • 149
    • 150
    • 151
    • 152
    • 153
    • 154
    • 155
    • 156
    • 157
    • 158
    • 159
    • 160
    • 161
    • 162
    • 163
    • 164
    • 165
    • 166
    • 167
    • 168
    • 169
    • 170
    • 171
    • 172
    • 173
    • 174
    • 175
    • 176
    • 177
    • 178
    • 179
    • 180
    • 181
    • 182
    • 183
    • 184
    • 185
    • 186
    • 187
    • 188
    • 189
    • 190
    • 191
    • 192
    • 193
    • 194
    • 195
    • 196
    • 197
    • 198
    • 199
    • 200
    • 201
    • 202
    • 203
    • 204
    • 205
    • 206
    • 207
    • 208
    • 209

    (2)VGG
    VGG 模型分为两大部分,一部分用卷积、池化提取特征,然后通过两次全连接得到特征。
    net.py

    #encoding:utf-8
    import math
    import torch.nn as nn
    import torch.nn.functional as F
    import torch.utils.model_zoo as model_zoo
    
    __all__ = ['VGG','vgg11','vgg11_bn',
               'vgg13','vgg13_bn', 'vgg16',
               'vgg16_bn','vgg19','vgg19_bn']
    
    model_urls = {
        'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
        'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth',
        'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
        'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth',
        'vgg11_bn': 'https://download.pytorch.org/models/vgg11_bn-6002323d.pth',
        'vgg13_bn': 'https://download.pytorch.org/models/vgg13_bn-abd245e5.pth',
        'vgg16_bn': 'https://download.pytorch.org/models/vgg16_bn-6c64b313.pth',
        'vgg19_bn': 'https://download.pytorch.org/models/vgg19_bn-c79401a0.pth',
    }
    
    class VGG(nn.Module):
        def __init__(self,features,num_classes=1000,image_size=448):
            super(VGG,self).__init__()
            self.features = features
            self.image_size = image_size
            self.classifier = nn.Sequential(
                nn.Linear(512*7*7,4096),
                nn.ReLU(True),
                nn.Dropout(),
                nn.Linear(4096, 4096),
                nn.ReLU(True),
                nn.Dropout(),
                nn.Linear(4096,1470))
            self._initialize_weights()
    
        def forward(self,x):
            x = self.features(x)
            x = x.view(x.size(0),-1)
            x = self.classifier(x)
            x = F.sigmoid(x)
            x = x.view(-1,7,7,30)
            return x
    
        def _initialize_weights(self):
            for m in self.modules():
                if isinstance(m,nn.Conv2d):
                    n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                    m.weight.data.normal_(0,math.sqrt(2./n))
                    if m.bias is not None:
                        m.bias.data.zero_()
                elif isinstance(m, nn.BatchNorm2d):
                    m.weight.data.fill_(1)
                    m.bias.data.zero_()
                elif isinstance(m, nn.Linear):
                    m.weight.data.normal_(0, 0.01)
                    m.bias.data.zero_()
    
    def make_layers(cfg,batch_norm=False):
        layers = []
        in_channels = 3
        first_flag = True
        for v in cfg:
            s = 1
            if (v == 64 and first_flag):
                s = 2
                first_flag = False
            if v == 'M':
                layers += [nn.MaxPool2d(kernel_size=2,stride=2)]
            else:
                conv2d = nn.Conv2d(in_channels,v,kernel_size=3,stride=s,padding=1)
                if batch_norm:
                    layers += [conv2d,nn.BatchNorm2d(v),nn.ReLU(inplace=True)]
                else:
                    layers += [conv2d,nn.ReLU(inplace=True)]
                in_channels = v
        return nn.Sequential(*layers)
    
    cfg = { 'A': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
            'B': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
            'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
            'E': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M']}
    
    def vgg11(pretrained=False,**kwargs):
        model = VGG(make_layers(cfg['A']),**kwargs)
        if pretrained:
            model.load_state_dict(model_zoo.load_url(model_urls['vgg11']))
        return model
    
    def vgg11_bn(pretrained=False,**kwargs):
        model  = VGG(make_layers(cfg['A'],batch_norm=True),**kwargs)
        if pretrained:
            model.load_state_dict(model_zoo.load_url([vgg11_bn]))
        return model
    
    def vgg13(pretrained=False,**kwargs):
        model = VGG(make_layers(cfg['B']),**kwargs)
        if pretrained:
            model.load_state_dict(model_zoo.load_url(model_urls['vgg13']))
        return model
    
    def vgg13_bn(pretrained=False, **kwargs):
        model = VGG(make_layers(cfg['B'], batch_norm=True), **kwargs)
        if pretrained:
            model.load_state_dict(model_zoo.load_url(model_urls['vgg13_bn']))
        return model
    
    def vgg16(pretrained=False, **kwargs):
        model = VGG(make_layers(cfg['D']), **kwargs)
        if pretrained:
            model.load_state_dict(model_zoo.load_url(model_urls['vgg16']))
        return model
    
    def vgg16_bn(pretrained=False, **kwargs):
        model = VGG(make_layers(cfg['D'], batch_norm=True), **kwargs)
        if pretrained:
            model.load_state_dict(model_zoo.load_url(model_urls['vgg16_bn']))
        return model
    
    
    def vgg19(pretrained=False, **kwargs):
        model = VGG(make_layers(cfg['E']), **kwargs)
        if pretrained:
            model.load_state_dict(model_zoo.load_url(model_urls['vgg19']))
        return model
    
    def vgg19_bn(pretrained=False, **kwargs):
        model = VGG(make_layers(cfg['E'], batch_norm=True), **kwargs)
        if pretrained:
            model.load_state_dict(model_zoo.load_url(model_urls['vgg19_bn']))
        return model
    
    def test():
        # import torch
        # from torch.autograd import Variable
        # model = vgg16()
        # img = torch.rand(2,3,448,448)
        # img = Variable(img)
        # output = model(img)
        # print(output.size())
        import torch
        from torch.autograd import Variable
        model = vgg16()
        img = torch.rand(2,3,448,448)
        img = Variable(img)
        output = model(img)
        print(output.size())
    
    if __name__ == '__main__':
        test()
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145
    • 146
    • 147
    • 148
    • 149
    • 150
    • 151

    2.2 Loss

    YOLOV1计算损失的特殊性:用MSE计算损失,对框回归的宽高先开方在进行MSE,解决大小物体损失差异过大的问题。对回归和前景分类赋予不同的权重,解决正负样本不均衡问题。
    yoloLoss.py
    计算损失时,输入的是真实框target_tensor、和解码后的预测框pred_tensor[batch_size,7,7,30].

    (1)计算损失流程:

    1. 根据真实框的置信度对target_tensor和pred_tensor取出没有真实框的样本sample_nobj[-1,30],在取出样本的第5列和第10列,用mse计算负样本的损失noobj_loss。
    2. 根据真实框的置信度对target_tensor和pred_tensor取出没有真实框的样本sample_obj[-1,30]。对取出的样本分别在提取target_tensor和pred_tensor的框和物体类别,计算类别损失。
    3. 根据sample_obj,计算预测框和真实框的IuO,根据IuO选出与真实框匹配的样本,计算框的回归损失和正样本的损失。预测正样本的真实值用IuO计算。
    4. 对各种损失加权以平衡正负样本不平衡。
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    from torch.autograd import Variable
    
    class yoloLoss(nn.Module):
        def __init__(self,S,B,l_coord,l_noobj):
            super(yoloLoss, self).__init__()
            self.S = S
            self.B = B
            self.l_coord = l_coord
            self.l_noobj = l_noobj
    
        def compute_iou(self,box1,box2):
            '''
            Args:
                box1[N,4],box2[M,4]
            Return:
                iou, sized [N,M].
            '''
            N = box1.size()[0]
            M = box2.size()[0]
    
            lt = torch.max(
                box1[:,:2].unsqueeze(1).expand(N,M,2),
                box2[:,:2].unsqueeze(0).expand(N,M,2)
                           )
            rd = torch.min(
                box1[:,2:].unsqueeze(1).expand(N,M,2),
                box2[:,2:].unsqueeze(0).expand(N,M,2)
            )
    
            wh = rd-lt
            wh[wh<0] = 0
            inter = wh[...,0] * wh[...,1]
    
            area1 = ((box1[:,2]-box1[:,0])*(box1[:,3]-box1[:,1])).unsqueeze(1).expand_as(inter)
            area2 = ((box2[:,2]-box2[:,0])*(box2[:,3]-box2[:,1])).unsqueeze(0).expand_as(inter)
    
            iou = inter/(area1+area2-inter)
            return iou
    
        def forward(self,pred_tensor,target_tensor):
            ''' pred_tensor[b,S,S,B*5+20] ; target_tensor[b,S,S,30]'''
            # 1 mask_obj_nobj
            N = pred_tensor.size(0)
            coo_mask = target_tensor[...,4] > 0   # 存在物体的mask [batch_size,7,7]
            noo_mask = target_tensor[...,4] == 0  # 不存在物体的mask
            coo_mask = coo_mask.unsqueeze(-1).expand_as(target_tensor) # [b,7,7,30]
            noo_mask = noo_mask.unsqueeze(-1).expand_as(target_tensor)
    
            # 2 nobj loss
            noo_pred = pred_tensor[noo_mask].view(-1,30) # 没有物体的预测值
            # print('noo_mask.shape:',noo_mask.shape)
            # print('pred_tensor.shape:',pred_tensor.shape)
            # print('noo_pred.shape:',noo_pred.shape)
            noo_target = target_tensor[noo_mask].view(-1,30) # 存在物体的预测值
            noo_pred_c = noo_pred[:,[4,9]].flatten() # 取出预测值中的负样本的置信度
            noo_target_c = noo_target[:,[4,9]].flatten()  # 取出标签中负样本的置信度
            noobj_loss = F.mse_loss(noo_pred_c,noo_target_c,size_average=False)  # 计算负样本损失
    
            # 3  obj: box , class
            coo_pred = pred_tensor[coo_mask].view(-1,30)  # 存在物体的预测值
            box_pred = coo_pred[:,:10].contiguous().view(-1,5) # 预测框
            class_pred = coo_pred[:,10:]  # 预测类别
            coo_target = target_tensor[coo_mask].view(-1,30)  # 存在物体的标签
            box_target = coo_target[:,:10].contiguous().view(-1,5)  # 真实框
            class_target = coo_target[:,10:] # 真实类别
    
            # 3.1  class loss
            class_loss = F.mse_loss(class_pred,class_target,size_average=False) # 类别损失
            # 4  obj_iou(每个网格上有两个预测框,根据IoU选出与真实框最匹配的预测框计算回归损失和正样本损失)
            coo_response_mask = torch.ByteTensor(box_target.size()).zero_()
            # coo_response_mask = torch.tensor(coo_response_mask,dtype=torch.bool)
            box_target_iou = torch.zeros(box_target.size())
            for i in range(0,box_target.size(0),2):  # 遍历存在物体的框
                box1 = box_pred[i:i+2]   # 存在物体的两个预测框
                box1_xy = Variable(torch.FloatTensor(box1.size()))
                box1_xy[:,:2] = box1[:,:2] / 14. - 0.5*box1[:,2:4]
                box1_xy[:,2:4] = box1[:,:2] / 14. + 0.5*box1[:,2:4]
                box2 = box_target[i].view(-1,5)  # 存在物体的一个真实框
                box2_xy = Variable(torch.FloatTensor(box2.size()))
                box2_xy[:,:2] = box2[:,:2] / 14. - 0.5*box2[:,2:4]
                box2_xy[:,2:4] = box2[:,:2] / 14. + 0.5*box2[:,2:4]
                iou = self.compute_iou(box1_xy[:,:4],box2_xy[:,:4])
                max_iou,max_index = iou.max(0)  # 计算预测框和真实框的IoU,并返回最有的IoU和预测框的下标
                coo_response_mask[i+max_index] = 1
                box_target_iou[i+max_index,4] = max_iou
            box_target_iou = Variable(box_target_iou)
    
            # 4.1 obj_loss
            box_pred_response = box_pred[coo_response_mask].view(-1,5) # 与真实框最匹配的预测框
            box_target_response = box_target[coo_response_mask].view(-1,5)  # 真是框,这一步多余。
            box_target_response_iou = box_target_iou[coo_response_mask].view(-1,5)  # 正样本的概率
            # 4.1.1 contain_loss
            contain_loss = F.mse_loss(box_pred_response[:,4],box_target_response_iou[:,4],size_average = False)   # 正样本损失
            # 4.1.2 loc_loss
            loc_loss = F.mse_loss(box_pred_response[:,:2],box_target_response[:,:2],size_average = False)+ \
                       F.mse_loss(torch.sqrt(box_pred_response[:,2:]),torch.sqrt(box_target_response[:,2:]),size_average = False)  # 框的回归损失
    
            return (self.l_noobj*noobj_loss + class_loss + 2*contain_loss + self.l_coord*loc_loss)/N   加权平均损失
    
    if __name__ == '__main__':
        pred_tensor = torch.randn(2,14,14,30)
        target_tensor =  pred_tensor+0.01
        yolo_loss = yoloLoss(14,8,5,0.5)
        loss = yolo_loss(pred_tensor,target_tensor)
        print(loss)
    
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110

    (2)IoU
    I. 计算框相交部分的左上角和右下角坐标lt,rd。
    II. 计算交集面积inter和相交框的各自面积area1、area2。
    III.根据以上步骤计算交并比iou。

    def compute_iou(self,box1,box2):
            '''
            Args:
                box1[N,4],box2[M,4]
            Return:
                iou, sized [N,M].
            '''
            N = box1.size()[0]
            M = box2.size()[0]
    
            lt = torch.max(
                box1[:,:2].unsqueeze(1).expand(N,M,2),  # box1.shape[N,4]-->box1[:,:2].shape[N,2]-->box1[:,:2].unsqueeze(1).shape[N,1,2]-->lt.shape[N,M,2]
                box2[:,:2].unsqueeze(0).expand(N,M,2)
            )
            rd = torch.min(
                box1[:,2:].unsqueeze(1).expand(N,M,2),
                box2[:,2:].unsqueeze(0).expand(N,M,2)
            )
    
            wh = rd-lt     # wh.shape(N,M,2)
            wh[wh<0] = 0
            inter = wh[...,0] * wh[...,1]  # [N,M]
    
            area1 = ((box1[:,2]-box1[:,0])*(box1[:,3]-box1[:,1])).unsqueeze(1).expand_as(inter)  # area1.shape[N,M]
            area2 = ((box2[:,2]-box2[:,0])*(box2[:,3]-box2[:,1])).unsqueeze(0).expand_as(inter)
    
            iou = inter/(area1+area2-inter)   # iou.shape[N,M]
            return iou
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29

    2.3 训练

    train.py
    训练流程:

    1. 导入库
    2. 设置超参数
    3. 模型
    4. 导入模型参数
    5. 损失函数
    6. 设置优化器
    7. 导入数据
    8. 训练
    # 1 导入库
    import os
    os.environ["CUDA_VISIBLE_DEVICES"] = "0"
    import torch
    from torch.utils.data import DataLoader
    import torchvision.transforms as transforms
    from torchvision import models
    from torch.autograd import Variable
    
    from net import vgg16_bn
    from resnet_yolo import resnet50
    from yoloLoss import yoloLoss
    from data_label_13 import yoloDataset
    from visualize import Visualizer
    import numpy as np
    
    # 2 设置参数
    # use_gpu = torch.cuda.is_available()
    file_root = r'/Users/liushuang/PycharmProjects/YOLOV1_LS'
    learning_rate = 0.001
    num_epochs = 2
    batch_size = 1
    use_resnet = False
    
    # 3 backbone
    if use_resnet:
        net = resnet50()
    else:
        net = vgg16_bn()
    # print(net)
    
    # 3.1 导入预训练参数
    if use_resnet:
        resnet = models.resnet50(pretrained=False)  # True
        new_state_dict = resnet.state_dict()
        dd = net.state_dict()
        for k in new_state_dict.keys():
            if k in dd.keys() and not k.startswith('fc'):
                dd[k] = new_state_dict[k]
        net.load_state_dict(dd)
    else:
        vgg = models.vgg16_bn(pretrained=False)
        new_state_dict = vgg.state_dict()
        dd = net.state_dict()
        for k in new_state_dict.keys():
            if k in dd.keys() and k.startswith('features'):
                dd[k] = new_state_dict[k]
        net.load_state_dict(dd)
    if False:
        net.load_state_dict(torch.load('best.pth'))
    
    # 4 Loss
    criterion = yoloLoss(7,2,5,0.5)
    
    # if use_gpu:
    #     net.cuda()
    # 模型训练
    net.train()
    
    # 5 参数
    params = []
    params_dict = dict(net.named_parameters())
    
    for k,v in params_dict.items():
        if k.startswith('features'):
            params += [{'params':[v],'lr':learning_rate*1}]
        else:
            params += [{'params':[v],'lr':learning_rate*1}]
    
    # 6 优化器
    optimizer = torch.optim.SGD(params,lr=learning_rate,momentum=0.9,weight_decay=5e-4)
    
    # 7 导入数据
    
    train_dataset = yoloDataset(root=file_root,list_file='2007_train.txt',train=True,transform = [transforms.ToTensor()] )
    train_loader = DataLoader(train_dataset,batch_size=batch_size,shuffle=True,num_workers=0)
    
    test_dataset = yoloDataset(root=file_root,list_file='2007_test.txt',train=False,transform = [transforms.ToTensor()] )
    test_loader = DataLoader(test_dataset,batch_size=batch_size,shuffle=False,num_workers=4)
    
    print('the dataset has %d images' % (len(train_dataset)))
    print('the batch_size is %d' % (batch_size))
    logfile = open('log.txt', 'w')
    
    num_iter = 0
    vis = Visualizer(env='LS')
    best_test_loss = np.inf
    
    # 8 训练
    for epoch in range(num_epochs):
        net.train()
        if epoch == 30:
            learning_rate = 0.0001
        if epoch == 40:
            learning_rate = 0.00001
        for params_group in optimizer.param_groups:
            params_group['lr'] = learning_rate
    
        print('\n\nStarting epoch %d / %d' % (epoch + 1, num_epochs))
        print('Learning Rate for this epoch: {}'.format(learning_rate))
    
        total_loss = 0.
    
        for i,(images,target) in enumerate(train_loader):
            images = Variable(images)
            target = Variable(target)
            # if use_gpu:
            #     images,target = images.cuda(),target.cuda()
    
            pred = net(images)
            # print('pred.shape:',pred.shape)
            # print('target.shape:',target.shape)
            loss = criterion(pred,target)
            total_loss += loss.data.item()
    
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            if(i+1)%5 == 0:
                print ('Epoch [%d/%d], Iter [%d/%d] Loss: %.4f, average_loss: %.4f'
                       %(epoch+1, num_epochs, i+1, len(train_loader), loss.data.item(), total_loss / (i+1)))
                num_iter += 1
                vis.plot_train_val(loss_train = total_loss/(i+1))
    
        validation_loss = 0.0
        net.eval()
        for i,(images,target) in enumerate(test_loader):
            images = Variable(images,volatile=True)
            target = Variable(target,volatile=True)
            if use_gpu:
                images,target = images.cuda(),target.cuda()
            pred = net(images)
            loss = criterion(pred,target)
            validation_loss += loss.data[0]
        validation_loss /= len(test_loader)
        vis.plot_train_val(loss_val=validation_loss)
    
        if best_test_loss > validation_loss:
            best_test_loss = validation_loss
            print('get best test loss %.5f' % best_test_loss)
            torch.save(net.state_dict(),'best.pth')
        logfile.writelines(str(epoch) + '\t' + str(validation_loss) + '\n')
        logfile.flush()
        torch.save(net.state_dict(),'yolo.pth')
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145

    3 预测

    3.1 预测流程

    1. 图片预处理
    2. 预测
    3. 解码
    4. 画框

    (1) 预测
    predict.py

    import torch
    from torch.autograd import Variable
    from resnet_yolo import resnet50
    import torchvision.transforms as transforms
    import cv2
    import numpy as np
    
    VOC_CLASSES = (    # always index 0
        'aeroplane', 'bicycle', 'bird', 'boat',
        'bottle', 'bus', 'car', 'cat', 'chair',
        'cow', 'diningtable', 'dog', 'horse',
        'motorbike', 'person', 'pottedplant',
        'sheep', 'sofa', 'train', 'tvmonitor')
    
    Color = [[0, 0, 0],[128, 0, 0],[0, 128, 0],[128, 128, 0],[0, 0, 128],
             [128, 0, 128],[0, 128, 128],[128, 128, 128],[64, 0, 0],[192, 0, 0],
             [64, 128, 0],[192, 128, 0],[64, 0, 128],[192, 0,128],[64, 128, 128],
             [192, 128, 128],[0, 64, 0],[128, 64, 0],[0, 192, 0],[128, 192, 0],[0, 64, 128]]
    
    def nms(bboxes,scores,threshold=0.5):
        x1 = bboxes[:,0]
        y1 = bboxes[:,1]
        x2 = bboxes[:,2]
        y2 = bboxes[:,3]
        areas = (x2-x1)*(y2-y1)
    
        _,order = scores.sort(0,descending=True)
        keep = []
        while order.numel() > 0:
            if order.numel()>1:
                i = order[0]
            else:
                i = order
            keep.append(i)
    
            if order.numel() == 1:
                break
    
            xx1 = x1[order[1:]].clamp(min=x1[i])
            yy1 = y1[order[1:]].clamp(min=y1[i])
            xx2 = x2[order[1:]].clamp(max=x1[i])
            yy2 = y2[order[1:]].clamp(max=y1[i])
            w = (xx2-xx1).clamp(min=0)
            h = (yy2-yy1).clamp(min=0)
            inter = w*h
    
            ove = inter/(areas[i]+areas[order[1:]]-inter)
            #ids = (ove <= threshold).nonzero().squeeze()
            ids = torch.nonzero(ove <= threshold).squeeze()
            if ids.numel() == 0:
                break
            order = order[ids+1]
        return torch.LongTensor(keep)
    
    def decoder(pred):
        grid_num = 7
        boxes = []
        cls_indexs = []
        probs = []
        cell_size = 1./grid_num
        pred = pred.data
        pred = pred.squeeze(0)  # 7x7x30
        contain1 = pred[:,:,4].unsqueeze(2)  # [7, 7, 1]
        contain2 = pred[:,:,9].unsqueeze(2)  # [7, 7, 1]
        contain = torch.cat((contain1,contain2),2)  # [7, 7, 2]
        mask1 = contain > 0.1  # [7, 7, 2]
        mask2 = (contain==contain.max())  # [7, 7, 2]
        mask = (mask1+mask2).gt(0)  # [7, 7, 2]
        for i in range(grid_num):
            for j in range(grid_num):
                for b in range(2):
                    if mask[i,j,b] == 1:
                        box = pred[i,j,b*5:b*5+4]
                        contain_prob = torch.FloatTensor([pred[i,j,b*5+4]])
                        xy = torch.FloatTensor([j,i])*cell_size #cell左上角  up left of cell
                        box[:2] = box[:2]*cell_size + xy # return cxcy relative to image
                        box_xy = torch.FloatTensor(box.size())#转换成xy形式    convert[cx,cy,w,h] to [x1,y1,x2,y2]
                        box_xy[:2] = box[:2] - 0.5*box[2:]
                        box_xy[2:] = box[:2] + 0.5*box[2:]
                        max_prob,cls_index = torch.max(pred[i,j,10:],0)
                        if float((contain_prob*max_prob)[0]) > 0.1:
                            boxes.append(box_xy.view(1,4))
                            cls_indexs.append(torch.LongTensor(cls_index,0))
                            probs.append(contain_prob*max_prob)
        if len(boxes) == 0:
            boxes = torch.zeros((1,4))
            probs = torch.zeros(1)
            cls_indexs = torch.zeros(1)
        else:
            boxes = torch.cat(boxes,0) #(n,4)
            probs = torch.cat(probs,0) #(n,)
            cls_indexs = torch.cat(cls_indexs,0) #(n,)
        keep = nms(boxes,probs)
        return boxes[keep],cls_indexs[keep],probs[keep]
    
    def predict_gpu(model,image_name,root_path='/Users/ls/PycharmProjects/YOLOV1_LS/VOCdevkit/VOC2007/JPEGImages/'):
    
        result = []
        image = cv2.imread(root_path+image_name)
        # 1 图片预处理
        h,w,_ = image.shape
        img = cv2.resize(image,(448,448))  # 统一输入模型的图片尺寸
        img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)  # 色彩空间转换
        mean = (123,117,104)  #RGB
        img = img - np.array(mean,dtype=np.float32)  # 去均值
    
        transform = transforms.Compose([transforms.ToTensor(),])
        img = transform(img)  # 转置
        img = Variable(img[None,:,:,:],volatile=True)
        # img = img.cuda()
    
        # 2 预测
        pred = model(img) #1x7x7x30   
        pred = pred.cpu()
        # 3 解码
        boxes,cls_indexs,probs =  decoder(pred)
    
        for i,box in enumerate(boxes):
            x1 = int(box[0]*w)
            x2 = int(box[2]*w)
            y1 = int(box[1]*h)
            y2 = int(box[3]*h)
            cls_index = cls_indexs[i]
            if cls_index.numel()==0:return
            cls_index = int(cls_index) # convert LongTensor to int
            prob = probs[i]
            prob = float(prob)
            result.append([(x1,y1),(x2,y2),VOC_CLASSES[cls_index],image_name,prob])
        return result
    
    if __name__ == '__main__':
        model = resnet50()
        print('load model...')
        # model.load_state_dict(torch.load('best.pth'))
        model.eval()
        #model.cuda()
        image_name = '000015.jpg'
        image = cv2.imread(image_name)
        print('predicting...')
        result = predict_gpu(model,image_name)
        #  4 画框
        for left_up,right_bottom,class_name, _ ,prob in result:
            color = Color[VOC_CLASSES.index(class_name)]
            cv2.rectangle(image,left_up,right_bottom,color,2)
            label = class_name+str(round(prob,2))
            text_size, baseline = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.4, 1)
            p1 = (left_up[0], left_up[1]- text_size[1])
            cv2.rectangle(image, (p1[0] - 2//2, p1[1] - 2 - baseline), (p1[0] + text_size[0], p1[1] + text_size[1]), color, -1)
            cv2.putText(image, label, (p1[0], p1[1] + baseline), cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255,255,255), 1, 8)
    
        cv2.imwrite('result.jpg',image)
    
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145
    • 146
    • 147
    • 148
    • 149
    • 150
    • 151
    • 152
    • 153

    (2)NMS

    NMS的目的是根据IoU删除重复的预测框,原理是同一物体的预测框IoU 大,不同物体的预测框IoU小。

    I. 计算预测框面积。

    II. 对置信度降序排列,返回排序下标。把排序好的第一个概率大的预测框取出,计算剩余框与第一个框的IoU。

    III. 根据阈值筛选保留的框,并对保留的框做同样的操作,直到剩余一个框,则停止操作。

    
    def nms(bboxes,scores,threshold=0.5):
        x1 = bboxes[:,0]
        y1 = bboxes[:,1]
        x2 = bboxes[:,2]
        y2 = bboxes[:,3]
        # 1 计算所有预测框的面积
        areas = (x2-x1)*(y2-y1)
        # 2 按照预测概率排序
        _,order = scores.sort(0,descending=True)
        keep = []
        while order.numel() > 0:
            # 3 取出第一个框
            if order.numel()>1:
                i = order[0]
            else:
                i = order
            keep.append(i)
    
            if order.numel() == 1:
                break
            # 4 计算剩余框与第一个框的IoU
            xx1 = x1[order[1:]].clamp(min=x1[i])
            yy1 = y1[order[1:]].clamp(min=y1[i])
            xx2 = x2[order[1:]].clamp(max=x1[i])
            yy2 = y2[order[1:]].clamp(max=y1[i])
            w = (xx2-xx1).clamp(min=0)
            h = (yy2-yy1).clamp(min=0)
            inter = w*h
    
            ove = inter/(areas[i]+areas[order[1:]]-inter)
            #ids = (ove <= threshold).nonzero().squeeze()
            # 5 根据IoU剔除重合的框
            ids = torch.nonzero(ove <= threshold).squeeze()
            if ids.numel() == 0:
                break
            # 6 取出与第一个框重叠小或不重叠的框作为下一轮筛选的对象
            order = order[ids+1]
        return torch.LongTensor(keep)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39

    (3)解码

    I. 取出预测值中的置信度,根据置信度阈值初筛预测框。

    II.遍历输出特征图的行、列、每个网格的框,取出对应的预测框、类别概率。根据预测偏移计算预测框的中心点。预测类别概率与置信度乘积作为最终的预测概率,再根据最终的预测概率设置阈值帅选一遍框。

    III.根据预测框和物体类别概率进行非极大值抑制,输出符合条件的预测值。

    def decoder(pred):
        grid_num = 7
        boxes = []
        cls_indexs = []
        probs = []
        cell_size = 1./grid_num
        pred = pred.data
        pred = pred.squeeze(0)  # 7x7x30
        contain1 = pred[:,:,4].unsqueeze(2)  # [7, 7, 1]
        contain2 = pred[:,:,9].unsqueeze(2)  # [7, 7, 1]
        # 1 根据置信度筛选框
        contain = torch.cat((contain1,contain2),2)  # [7, 7, 2]
        mask1 = contain > 0.1  # [7, 7, 2]
        mask2 = (contain==contain.max())  # [7, 7, 2]
        mask = (mask1+mask2).gt(0)  # [7, 7, 2]
        for i in range(grid_num):
            for j in range(grid_num):
                for b in range(2):
                    if mask[i,j,b] == 1:
                        box = pred[i,j,b*5:b*5+4]
                        contain_prob = torch.FloatTensor([pred[i,j,b*5+4]])
                        xy = torch.FloatTensor([j,i])*cell_size #cell左上角  up left of cell
                        # 2 解码
                        box[:2] = box[:2]*cell_size + xy # return cxcy relative to image
                        box_xy = torch.FloatTensor(box.size())#转换成xy形式    convert[cx,cy,w,h] to [x1,y1,x2,y2]
                        box_xy[:2] = box[:2] - 0.5*box[2:]
                        box_xy[2:] = box[:2] + 0.5*box[2:]
                        max_prob,cls_index = torch.max(pred[i,j,10:],0)
                        # 3 根据最终预测概率筛选框
                        if float((contain_prob*max_prob)[0]) > 0.1:
                            boxes.append(box_xy.view(1,4))
                            cls_indexs.append(torch.LongTensor(cls_index,0))
                            probs.append(contain_prob*max_prob)
        if len(boxes) == 0:
            boxes = torch.zeros((1,4))
            probs = torch.zeros(1)
            cls_indexs = torch.zeros(1)
        else:
            boxes = torch.cat(boxes,0) #(n,4)
            probs = torch.cat(probs,0) #(n,)
            cls_indexs = torch.cat(cls_indexs,0) #(n,)
        # 4 非极大值抑制
        keep = nms(boxes,probs)
        return boxes[keep],cls_indexs[keep],probs[keep]
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
  • 相关阅读:
    python基础知识点
    [GFCTF 2021]web部分题解(更新中ing)
    记一次 .NET 某金融企业 WPF 程序卡死分析
    区块链技术在跑腿服务中的应用与App系统开发
    有趣的按钮分享
    SpringMVC学习笔记——1
    mmsegmentation 训练自己的数据集
    webpack5基础--04_处理样式资源
    Linux:Jenkins:GitLab+Maven+Jenkins的部署
    【Leetcode】202. 两数之和
  • 原文地址:https://blog.csdn.net/qq_35732321/article/details/126206643