深度学习目标检测——AP以及MAP

AP计算概述

在这里插入图片描述

知道了AP 的定义，下一步就是理解AP计算的实现，理论上可以通过积分来计算AP，公式如下：
$AP=\int_0^1 P(r) dr$
但通常情况下都是使用近似或者插值的方法来计算 AP。

近似计算AP

$\sum_{k=1}^{N}P(k)\Delta r(k)$

近似计算 AP(approximated average precision)，这种计算方式是 approximated 形式的；
很显然位于一条竖直线上的点对计算AP没有贡献；
这里 N 为数据总量，k 为每个样本点的索引， $Δ r (k) = r (k) - r (k - 1)$ 。

近似计算AP和绘制PR曲线代码如下：

import numpy as np
import matplotlib.pyplot as plt

class_names = ["car", "pedestrians", "bicycle"]

def draw_PR_curve(predict_scores, eval_labels, name, cls_idx=1):
    """calculate AP and draw PR curve, there are 3 types
    Parameters:
    @all_scores: single test dataset predict scores array, (-1, 3)
    @all_labels: single test dataset predict label array, (-1, 3)
    @cls_idx: the serial number of the AP to be calculated, example: 0,1,2,3...
    """
    # print('sklearn Macro-F1-Score:', f1_score(predict_scores, eval_labels, average='macro'))
    global class_names
    fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15, 10))
    # Rank the predicted scores from large to small, extract their corresponding index(index number), and generate an array
    idx = predict_scores[:, cls_idx].argsort()[::-1]
    eval_labels_descend = eval_labels[idx]
    pos_gt_num = np.sum(eval_labels == cls_idx) # number of all gt

    predict_results = np.ones_like(eval_labels)
    tp_arr = np.logical_and(predict_results == cls_idx, eval_labels_descend == cls_idx) # ndarray
    fp_arr = np.logical_and(predict_results == cls_idx, eval_labels_descend != cls_idx)

    tp_cum = np.cumsum(tp_arr).astype(float) # ndarray, Cumulative sum of array elements.
    fp_cum = np.cumsum(fp_arr).astype(float)

    precision_arr = tp_cum / (tp_cum + fp_cum) # ndarray
    recall_arr = tp_cum / pos_gt_num
    ap = 0.0
    prev_recall = 0
    for p, r in zip(precision_arr, recall_arr):
      ap += p * (r - prev_recall)
      # pdb.set_trace()
      prev_recall = r
    print("------%s, ap: %f-----" % (name, ap))

    fig_label = '[%s, %s] ap=%f' % (name, class_names[cls_idx], ap)
    ax.plot(recall_arr, precision_arr, label=fig_label)

    ax.legend(loc="lower left")
    ax.set_title("PR curve about class: %s" % (class_names[cls_idx]))
    ax.set(xticks=np.arange(0., 1, 0.05), yticks=np.arange(0., 1, 0.05))
    ax.set(xlabel="recall", ylabel="precision", xlim=[0, 1], ylim=[0, 1])

    fig.savefig("./pr-curve-%s.png" % class_names[cls_idx])
    plt.close(fig)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

插值计算AP

插值计算(Interpolated average precision) AP 的公式的演变过程这里不做讨论，详情可以参考这篇文章，我这里的公式和图也是参考此文章的。11点插值计算方式计算AP公式如下：
$A P = 1 11 ∑ r ∈ { 0 , 0.1 , 0.2 , … , 1.0 } P intep ( r ) P interp ( r ) = M A X r ^ : r ^ ≥ r P ( r ^ ) AP=111∑r∈{0,0.1,0.2,…,1.0}Pintep (r)Pinterp (r)=MAXˆr:ˆr≥rP(ˆr)$

这是通常意义上的 11 points_Interpolated 形式的 AP，选取固定的 {0,0.1,0.2,…,1.0} 11个阈值，这个在PASCAL2007中有使用
这里因为参与计算的只有11个点，所以 K=11，称为11points_Interpolated，k为阈值索引
$P_{interp}(k)$ 取第 k 个阈值所对应的样本点之后的样本中的最大值，只不过这里的阈值被限定在了 {0,0.1,0.2,…,1.0} 范围内。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KaGONv2a-1662646748371)(…/…/images/插值计算AP的PR曲线图.png)]

从曲线上看，真实 AP< approximated AP < Interpolated AP，11-points Interpolated AP 可能大也可能小，当数据量很多的时候会接近于 Interpolated AP，与 Interpolated AP 不同，前面的公式中计算 AP 时都是对 PR 曲线的面积估计，PASCAL的论文里给出的公式就更加简单粗暴了，直接计算11 个阈值处的 precision 的平均值。PASCAL 论文给出的 11 点计算 AP 的公式如下。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-AF2dVFBg-1662646748371)(…/…/images/11点计算AP公式.png)]

三，AP计算实现

1, 在给定 recal 和 precision 的条件下计算 AP：

def voc_ap(rec, prec, use_07_metric=False):
    """ 
    ap = voc_ap(rec, prec, [use_07_metric])
    Compute VOC AP given precision and recall.
    If use_07_metric is true, uses the
    VOC 07 11 point method (default:False).
    """
    if use_07_metric:
        # 11 point metric
        ap = 0.
        for t in np.arange(0., 1.1, 0.1):
            if np.sum(rec >= t) == 0:
                p = 0
            else:
                p = np.max(prec[rec >= t])
            ap = ap + p / 11.
    else:
        # correct AP calculation
        # first append sentinel values at the end
        mrec = np.concatenate(([0.], rec, [1.]))
        mpre = np.concatenate(([0.], prec, [0.]))

        # compute the precision envelope
        for i in range(mpre.size - 1, 0, -1):
            mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])

        # to calculate area under PR curve, look for points
        # where X axis (recall) changes value
        i = np.where(mrec[1:] != mrec[:-1])[0]

        # and sum (\Delta recall) * prec
        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
    return ap
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

2，给定目标检测结果文件和测试集标签文件 xml 等计算 AP：

def parse_rec(filename):
    """ Parse a PASCAL VOC xml file 
    Return : list, element is dict.
    """
    tree = ET.parse(filename)
    objects = []
    for obj in tree.findall('object'):
        obj_struct = {}
        obj_struct['name'] = obj.find('name').text
        obj_struct['pose'] = obj.find('pose').text
        obj_struct['truncated'] = int(obj.find('truncated').text)
        obj_struct['difficult'] = int(obj.find('difficult').text)
        bbox = obj.find('bndbox')
        obj_struct['bbox'] = [int(bbox.find('xmin').text),
                              int(bbox.find('ymin').text),
                              int(bbox.find('xmax').text),
                              int(bbox.find('ymax').text)]
        objects.append(obj_struct)

    return objects

def voc_eval(detpath,
             annopath,
             imagesetfile,
             classname,
             cachedir,
             ovthresh=0.5,
             use_07_metric=False):
    """rec, prec, ap = voc_eval(detpath,
                                annopath,
                                imagesetfile,
                                classname,
                                [ovthresh],
                                [use_07_metric])
    Top level function that does the PASCAL VOC evaluation.
    detpath: Path to detections result file
        detpath.format(classname) should produce the detection results file.
    annopath: Path to annotations file
        annopath.format(imagename) should be the xml annotations file.
    imagesetfile: Text file containing the list of images, one image per line.
    classname: Category name (duh)
    cachedir: Directory for caching the annotations
    [ovthresh]: Overlap threshold (default = 0.5)
    [use_07_metric]: Whether to use VOC07's 11 point AP computation
        (default False)
    """
    # assumes detections are in detpath.format(classname)
    # assumes annotations are in annopath.format(imagename)
    # assumes imagesetfile is a text file with each line an image name
    # cachedir caches the annotations in a pickle file

    # first load gt
    if not os.path.isdir(cachedir):
        os.mkdir(cachedir)
    cachefile = os.path.join(cachedir, '%s_annots.pkl' % imagesetfile)
    # read list of images
    with open(imagesetfile, 'r') as f:
        lines = f.readlines()
    imagenames = [x.strip() for x in lines]

    if not os.path.isfile(cachefile):
        # load annotations
        recs = {}
        for i, imagename in enumerate(imagenames):
            recs[imagename] = parse_rec(annopath.format(imagename))
            if i % 100 == 0:
                print('Reading annotation for {:d}/{:d}'.format(
                    i + 1, len(imagenames)))
        # save
        print('Saving cached annotations to {:s}'.format(cachefile))
        with open(cachefile, 'wb') as f:
            pickle.dump(recs, f)
    else:
        # load
        with open(cachefile, 'rb') as f:
            try:
                recs = pickle.load(f)
            except:
                recs = pickle.load(f, encoding='bytes')

    # extract gt objects for this class
    class_recs = {}
    npos = 0
    for imagename in imagenames:
        R = [obj for obj in recs[imagename] if obj['name'] == classname]
        bbox = np.array([x['bbox'] for x in R])
        difficult = np.array([x['difficult'] for x in R]).astype(np.bool)
        det = [False] * len(R)
        npos = npos + sum(~difficult)
        class_recs[imagename] = {'bbox': bbox,
                                 'difficult': difficult,
                                 'det': det}

    # read dets
    detfile = detpath.format(classname)
    with open(detfile, 'r') as f:
        lines = f.readlines()

    splitlines = [x.strip().split(' ') for x in lines]
    image_ids = [x[0] for x in splitlines]
    confidence = np.array([float(x[1]) for x in splitlines])
    BB = np.array([[float(z) for z in x[2:]] for x in splitlines])

    nd = len(image_ids)
    tp = np.zeros(nd)
    fp = np.zeros(nd)

    if BB.shape[0] > 0:
        # sort by confidence
        sorted_ind = np.argsort(-confidence)
        sorted_scores = np.sort(-confidence)
        BB = BB[sorted_ind, :]
        image_ids = [image_ids[x] for x in sorted_ind]

        # go down dets and mark TPs and FPs
        for d in range(nd):
            R = class_recs[image_ids[d]]
            bb = BB[d, :].astype(float)
            ovmax = -np.inf
            BBGT = R['bbox'].astype(float)

            if BBGT.size > 0:
                # compute overlaps
                # intersection
                ixmin = np.maximum(BBGT[:, 0], bb[0])
                iymin = np.maximum(BBGT[:, 1], bb[1])
                ixmax = np.minimum(BBGT[:, 2], bb[2])
                iymax = np.minimum(BBGT[:, 3], bb[3])
                iw = np.maximum(ixmax - ixmin + 1., 0.)
                ih = np.maximum(iymax - iymin + 1., 0.)
                inters = iw * ih

                # union
                uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
                       (BBGT[:, 2] - BBGT[:, 0] + 1.) *
                       (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)

                overlaps = inters / uni
                ovmax = np.max(overlaps)
                jmax = np.argmax(overlaps)

            if ovmax > ovthresh:
                if not R['difficult'][jmax]:
                    if not R['det'][jmax]:
                        tp[d] = 1.
                        R['det'][jmax] = 1
                    else:
                        fp[d] = 1.
            else:
                fp[d] = 1.

    # compute precision recall
    fp = np.cumsum(fp)
    tp = np.cumsum(tp)
    rec = tp / float(npos)
    # avoid divide by zero in case the first detection matches a difficult
    # ground truth
    prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
    ap = voc_ap(rec, prec, use_07_metric)

    return rec, prec, ap
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161

四，map计算方法

因为 mAP 值的计算是对数据集中所有类别的 AP 值求平均，所以我们要计算 mAP，首先得知道某一类别的 AP 值怎么求。不同数据集的某类别的 AP 计算方法大同小异，主要分为三种：

（1）在 VOC2007，只需要选取当Recall >= 0, 0.1, 0.2, …, 1共11个点时的Precision最大值，然后AP就是这11个Precision的平均值，map就是所有类别AP值的平均。VOC 数据集中计算 AP 的代码（用的是插值计算方法，代码出自py-faster-rcnn仓库）

（2）在 VOC2010 及以后，需要针对每一个不同的 Recall 值（包括0和1），选取其大于等于这些 Recall 值时的 Precision 最大值，然后计算PR曲线下面积作为 AP 值，map 就是所有类别 AP 值的平均。

（3）COCO 数据集，设定多个 IOU 阈值（0.5-0.95,0.05为步长），在每一个IOU阈值下都有某一类别的 AP 值，然后求不同 IOU 阈值下的 AP 平均，就是所求的最终的某类别的AP值。

相关阅读:
多态day02
java计算机毕业设计精准扶贫项目管理系统源码+mysql数据库+系统+lw文档+部署
怎么突破反爬虫机制
postgresql基于postgis常用空间函数
Week 7 Latent Variable Models and Expectation Maximization
[vue3] Element Plus 入门使用
【面向就业的Linux基础】从入门到熟练，探索Linux的秘密（二）
手写一个 Redis LruCache 缓存机制
远程访问Linux的DataEase数据可视化分析，有哪些推荐的工具？
Linux虚拟内存

原文地址：https://blog.csdn.net/weixin_42917352/article/details/126773350