• 医疗图像分割指标


    医疗图像其中两种图像格式:MRI(Magnetic Resonance Imaging,磁共振成像)、CT(Computed Tomography,计算机断层),常存成 .nii.gz 格式。都是 3D 的 H × W × L H \times W \times L H×W×L,但不是 RGB,而是类似 grey-scale 图只有一个 channel,如 CT 的值可能是 Hu 值,取值范围可从负几千到正几千,参考 [1]。segmentation label 也相应是 3D 的。
    med-planes
    这里记录几种分割指标,参考 [2,3],可调 medpy 的包,其文档也有指标列表。记有 C + 1 个类,0 固定是 background,(某个数据的)prediction 和 label Y ^ , Y ∈ { 0 , … , C } H × W × L \hat{Y},Y\in \{0, \dots, C\}^{H \times W \times L} Y^,Y{0,,C}H×W×L,只留下第 c 类的 binary prediction 和 label 为 B ^ c , B c ∈ { 0 , 1 } H × W × L \hat{B}^c,B^c \in \{0, 1\}^{H \times W \times L} B^c,Bc{0,1}H×W×L ∣ ⋅ ∣ |\cdot| 表示非零元素个数, B ^ c , B c \hat{B}^c,B^c B^c,Bc 的 surface / boundray 记为 S ^ c , S c \hat{S}^c, S^c S^c,Sc,是其表面 voxels 的三维索引向量(下标)的集合。

    Dice Coefficient

    即 F1-scoure[2]。第 c 类的 dice 系数: D C c = 2 ∣ B ^ c ∩ B c ∣ ∣ B ^ c ∣ + ∣ B c ∣ DC_c=\frac{2|\hat{B}^c \cap B^c|}{|\hat{B}^c| + |B^c|} DCc=B^c+Bc2∣B^cBc 调包:medpy.metric.binary.dc

    IoU

    即 Jaccard 系数[2]。第 c 类的 IoU: I o U c = ∣ B ^ c ∩ B c ∣ ∣ B ^ c ∪ B c ∣ IoU_c = \frac{|\hat{B}^c \cap B^c|}{|\hat{B}^c \cup B^c|} IoUc=B^cBcB^cBc 调包:medpy.metric.binary.jc。由 [3],IoU 与 dice 系数有确定的转换关系,即等价,所以两者应该用其中一种就行。C 类的 IoU 取平均就是 mIoU: m I o U = ∑ c = 0 C I o U c C + 1 mIoU=\frac{\sum_{c=0}^{C}IoU_c}{C + 1} mIoU=C+1c=0CIoUc

    • (2023.11.23)代码可选包不包含 background。
    • (2023.10.17)[6,7] 都是算上 background,此处跟它们。
    • (2023.9.28)不确定算 mIoU 时要不要加上 background。

    Frequency Weighted IoU

    [5] 用到,但其加权系数的好像写错了,应参照 [6,7] 的公式。而 [5] 本身的实现 与 [6-8] 一致,是对的。公式应为: f w I o U c = ∣ B c ∣ H W L I o U c f w I o U = ∑ c = 0 C f w I o U c

    fwIoUcfwIoU=|Bc|HWLIoUc=c=0CfwIoUc
    fwIoUcfwIoU=HWLBcIoUc=c=0CfwIoUc 注意此时只求和,不再除以 C + 1。另 [6] 有一份 tensorflow 版。

    Sensitivity

    即 recall,也叫 TPR(True Positive Rate),非对称。第 c 类的 sensitivity: R c = ∣ B ^ c ∩ B c ∣ ∣ B c ∣ R_c=\frac{|\hat{B}^c \cap B^c|}{|B^c|} Rc=BcB^cBc 调包:medpy.metric.binary.sensitivitymedpy.metric.binary.recall

    Specificity

    也叫 selectivity、TNR(True Negative Rate),是 sensitivity 的补。第 c 类的 specifity: S P c = ∣ ( 1 − B ^ c ) ∩ ( 1 − B c ) ∣ ∣ 1 − B c ∣ SP_c = \frac{|(1-\hat{B}^c) \cap (1-B^c)|}{|1-B^c|} SPc=∣1Bc(1B^c)(1Bc) 调包:medpy.metric.binary.specificity

    (Pixel) Accuracy

    [5] 用到 a c c = 1 ( Y ^ = Y ) H W L acc=\frac{1(\hat{Y} = Y)}{HWL} acc=HWL1(Y^=Y)

    Average (Symmetric) Surface Distance

    由 [4],第 c 类的 average surface distance: A S D ( S ^ c , S c ) = ∑ p ∈ S ^ c d s ( p , S c ) / ∣ S ^ c ∣ ≠ A S D ( S c , S ^ c ) ASD(\hat{S}^c, S^c) = \sum_{p \in \hat{S}^c} d_s(p,S^c) \big/ |\hat{S}^c| \ne ASD(S^c, \hat{S}^c) ASD(S^c,Sc)=pS^cds(p,Sc)/S^c=ASD(Sc,S^c) 其中 d s ( ⋅ , ⋅ ) d_s(\cdot,\cdot) ds(,) 是点到点集(此处是表面 S)距离: d s ( p , S ) = min ⁡ q ∈ S d ( p , q ) d_s(p,S)=\min_{q \in S} d(p,q) ds(p,S)=qSmind(p,q) p , q ∈ { 1 , … , W } × { 1 , … , H } × { 1 , … , L } p,q \in \{1,\dots,W\} \times \{1,\dots,H\} \times \{1,\dots,L\} p,q{1,,W}×{1,,H}×{1,,L} 是三维索引向量(下标), d ( ⋅ , ⋅ ) d(\cdot,\cdot) d(,) 是欧氏距离。就是对 S ^ \hat{S} S^ 中属于此 object 表面的每一个 voxel,都算一下它到 S 的距离,然后取平均。调包:medpy.metric.binary.asd

    ASD 不对称,一般会用 average symmetric surface distance: A S S D ( S ^ c , S c ) = A S D ( S ^ c , S c ) + A S D ( S c , S ^ c ) 2 = A S S D ( S c , S ^ c ) ASSD(\hat{S}^c, S^c) = \frac{ASD(\hat{S}^c, S^c) + ASD(S^c, \hat{S}^c)}{2} = ASSD(S^c, \hat{S}^c) ASSD(S^c,Sc)=2ASD(S^c,Sc)+ASD(Sc,S^c)=ASSD(Sc,S^c) 调包:medpy.metric.binary.assd。有些文章也会将 ASSD 简叫成 ASD,感觉可以无脑只用 ASSD。

    Hausdorff Distance

    由 [2,4],第 c 类的 Hausdorff 距离: H D ( S ^ c , S c ) = max ⁡ { sup ⁡ p ∈ S ^ c d s ( p , S c ) , sup ⁡ p ∈ S c d s ( p , S ^ c ) } HD(\hat{S}^c,S^c)=\max\left\{ \sup_{p \in \hat{S}^c} d_s(p, S^c), \sup_{p \in S^c} d_s(p, \hat{S}^c) \right\} HD(S^c,Sc)=max{pS^csupds(p,Sc),pScsupds(p,S^c)} 调包:medpy.metric.binary.hd

    Calibration

    从公式看,有些指标在遇见特殊情况(全零 prediction、全零 label、全一 label)可能会有除零问题,一例见下文 example: error from distance metrics 小节。现测试 medpy.metric.binary 各 API 对这些突殊情况作何处理,以便校正不合理之处。

    import numpy as np
    import medpy.metric.binary as mmb
    
    np.bool = np.bool_ # 新的 numpy 已无 np.bool
    
    p0 = np.array([0, 0, 0, 0]) # all 0
    p1 = np.array([1, 1, 1, 1]) # all 1
    p  = np.array([0, 1, 1, 0]) # both 0 & 1
    y0 = p0
    y1 = p1
    y  = 1 - p
    
    print("\tdice")
    print(mmb.dc(p0, y0), mmb.dc(p0, y), mmb.dc(p, y0))
    
    print("\tiou")
    try:
        print(mmb.jc(p0, y0))
    except ZeroDivisionError as e:
        print("iou(p0, y0):", e)
    print(mmb.jc(p0, y), mmb.jc(p, y0))
    
    print("\tsensitivity")
    print(mmb.sensitivity(p0, y0), mmb.sensitivity(p0, y), mmb.sensitivity(p, y0))
    
    print("\tspecificity")
    print(mmb.specificity(p1, y1), mmb.specificity(p1, y), mmb.specificity(p, y1))
    
    # ASD 不对称!此处只依上文公式调包,即 pred 在前、label 在后
    print("\tasd")
    try:
        print(mmb.asd(p0, y0))
    except RuntimeError as e:
        print("asd(p0, y0):", e)
    try:
        print(mmb.asd(p0, y))
    except RuntimeError as e:
        print("asd(p0, y):", e)
    try:
        print(mmb.asd(p, y0))
    except RuntimeError as e:
        print("asd(p, y0):", e)
    
    print("\tassd")
    try:
        print(mmb.assd(p0, y0))
    except RuntimeError as e:
        print("assd(p0, y0):", e)
    try:
        print(mmb.assd(p0, y))
    except RuntimeError as e:
        print("assd(p0, y):", e)
    try:
        print(mmb.assd(p, y0))
    except RuntimeError as e:
        print("assd(p, y0):", e)
    
    print("\thd")
    try:
        print(mmb.hd(p0, y0))
    except RuntimeError as e:
        print("hd(p0, y0):", e)
    try:
        print(mmb.hd(p0, y))
    except RuntimeError as e:
        print("hd(p0, y):", e)
    try:
        print(mmb.hd(p, y0))
    except RuntimeError as e:
        print("hd(p, y0):", e)
    
    print("\thd95")
    try:
        print(mmb.hd95(p0, y0))
    except RuntimeError as e:
        print("hd95(p0, y0):", e)
    try:
        print(mmb.hd95(p0, y))
    except RuntimeError as e:
        print("hd95(p0, y):", e)
    try:
        print(mmb.hd95(p, y0))
    except RuntimeError as e:
        print("hd95(p, y0):", e)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84

    输出:

            dice
    0.0 0.0 0.0
            iou
    iou(p0, y0): float division by zero
    0.0 0.0
            sensitivity
    0.0 0.0 0.0
            specificity
    0.0 0.0 0.0
            asd
    asd(p0, y0): The first supplied array does not contain any binary object.
    asd(p0, y): The first supplied array does not contain any binary object.
    asd(p, y0): The second supplied array does not contain any binary object.
            assd
    assd(p0, y0): The first supplied array does not contain any binary object.
    assd(p0, y): The first supplied array does not contain any binary object.
    assd(p, y0): The second supplied array does not contain any binary object.
            hd
    hd(p0, y0): The first supplied array does not contain any binary object.
    hd(p0, y): The first supplied array does not contain any binary object.
    hd(p, y0): The second supplied array does not contain any binary object.
            hd95
    hd95(p0, y0): The first supplied array does not contain any binary object.
    hd95(p0, y): The first supplied array does not contain any binary object.
    hd95(p, y0): The second supplied array does not contain any binary object.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25

    可见 medpy.metric.binary 的处理分两类:

    • 置零:dice、sensitivity、specificity
    • 报错:iou、asd、assd、hd、hd95

    分类讨论:

    • 当 prediction 与 label 一致时,即使是全零或全一,也应该视为预测正确,赋最优值,即 dice、iou、sensitivity、specificity 为 1,asd、assd、hd、hd95 为 0,这点 medpy.metric.binary 没处理好;
    • sensitivity 当 label 为空、prediction 非空时,应当为 0,medpy.metric.binary 的处理是对的;
    • asd、assd、hd、hd95 四个距离指标当 prediction 和 label 一方为空、一方非空时,没有标准解法,可能这是 medpy.metric.binary 报错、MONAI 只抛 warning 不处理 的原因。本文选择置 NaN,求平均时忽略不计,但同时记录 NaN 的个数。

    example: error from distance metrics

    • monai 1.3.0

    算 assd 时,如果 y_true 为空,会报错:

    Traceback (most recent call last):
      File "/home/tom/codes/monai-tutorials/2d_segmentation/torch/mmwhs_train.py", line 268, in 
        main(tempdir)
      File "/home/tom/codes/monai-tutorials/2d_segmentation/torch/mmwhs_train.py", line 246, in main
        _res = evaluate(val_outputs, val_labels, 4+1)
      File "/home/tom/codes/monai-tutorials/2d_segmentation/torch/mmwhs_train.py", line 66, in my_evaluate
        res[metr].append(fn(B_pred_c, B_c))
      File "/share/tom/miniconda3/envs/cu116_pt1131/lib/python3.9/site-packages/medpy/metric/binary.py", line 453, in assd
        assd = numpy.mean( (asd(result, reference, voxelspacing, connectivity), asd(reference, result, voxelspacing, connectivity)) )
      File "/share/tom/miniconda3/envs/cu116_pt1131/lib/python3.9/site-packages/medpy/metric/binary.py", line 561, in asd
        sds = __surface_distances(result, reference, voxelspacing, connectivity)
      File "/share/tom/miniconda3/envs/cu116_pt1131/lib/python3.9/site-packages/medpy/metric/binary.py", line 1215, in __surface_distances
        raise RuntimeError('The second supplied array does not contain any binary object.')
    RuntimeError: The second supplied array does not contain any binary object.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14

    其它 distance metrics(asd、hd、hd95)只要 prediction 和 label 其中一方为空也都会遇到。好像没有标准解法:

    • [11] 中 eval_UDA.py 是报错就将 assd 置 1;
    • MONAI[12] 只抛 warning 不处理;
    • [13] 在 prediction、ground-truth 皆空时置零,只其中一方空时置 128(而 128 恰好是 [13] 中设定的 crop patch size,不知与此有无关系);
    • [14] 有置零、373.128664、NaN(即忽略) 的方案。

    Code

    • (2024.1.18)重构代码,写成类以便调用,后文对拍测试亦相应更新。
    • (2023.11.23)改了代码,现跟 MONAI[12] 在 2D 数据对拍,(基本)一致。
      distance metrics 的问题仍未解决,似乎没有标准做法,相关讨论见 [14-16]。现在 Y_predY_true 皆空时,学 [13] 置零;而只其中一方为空时,[13] 置 128、[14] 有置 0373.128664、NaN 的,不知道什么鬼,此处选置 NaN,即忽略。
      MONAI 的 distance metrics 默认忽略 background,此处强制忽略;而其它 metrics 就可选忽不忽略。
    • (2023.11.21)asd、assd、hd、hd95 等 distance metrics 有潜在报错,见后文,未完全修复
    • 分割模型应该一般 predict 的都是 (C+1)-way softmax prediction,即 Y ^ \hat{Y} Y^
    import packaging.version
    import numpy as np
    import medpy.metric.binary as mmb
    
    # https://stackoverflow.com/questions/74893742/how-to-solve-attributeerror-module-numpy-has-no-attribute-bool
    if packaging.version.parse(np.__version__) >= packaging.version.parse('1.24'):
        if hasattr(np, 'bool_'):
            np.bool = np.bool_
        else:
            np.bool = bool
    
    class SegEvaluator:
        """segmentation evaluation based on medpy.metric.binary
        It will calibrate some results from medpy.metric.binary (see __call__).
        It will also record the number of NaN in distance-based metrics that are
        caused by empty prediction or label.
        Usage:
            ```python
            # 2D image evaluation
            evaluator = SegEvaluator(N_CLASSES, IGNORE_BACKGROUND)
            for images, labels in loader:
                predictions = model(images)         # [bs, c, h, w]
                predictions = predictions.argmax(1) # -> [bs, h, w]
                for pred, lab in zip(predictions, labels):
                    evaluator(pred, lab) # eval each image
            print(evaluator.reduce())
            evaluator.reset()
            ```
        """
    
        METRICS = {
            "dice": mmb.dc,
            "iou": mmb.jc,
            "accuracy": lambda _B1, _B2: (_B1 == _B2).sum() / _B1.size,
            "sensitivity": mmb.sensitivity,
            "specificity": mmb.specificity,
            "hd": mmb.hd,
            "assd": mmb.assd,
            "hd95": mmb.hd95,
            "asd": mmb.asd
        }
    
        def __init__(self, n_classes, ignore_bg=False, select=[]):
            """
            Input:
                n_classes: int, assuming 0 is background
                ignore_bg: bool, ignore background or not in reduction
                select: List[str], list of name of metrics of interest,
                    i.e. will only evaluate on these selected metrics if provided.
                    Should be a subset of supported metrics (see METRICS).
            """
            self.n_classes = n_classes
            self.ignore_bg = ignore_bg
    
            self.metrics = {}
            _no_select = len(select) == 0
            for m, f in SegEvaluator.METRICS.items():
                if _no_select or m in select:
                    self.metrics[m] = f
    
            self.reset()
    
        def reset(self):
            # records:
            #  - records[metr][c][i] =  score of i-th datum on c-th class, or
            #  - records[metr][c] = # of NaN caused by empty pred/label
            self.records = {}
            for metr in self.metrics:
                # self.records[metr] = [[]] * self.n_classes # wrong
                self.records[metr] = [[] for _ in range(self.n_classes)]
            for metr in ("hd", "assd", "hd95", "asd"):
                if metr in self.metrics:
                    self.records[f"empty_gt_{metr}"] = [0] * self.n_classes
                    self.records[f"empty_pred_{metr}"] = [0] * self.n_classes
    
        def __call__(self, Y_pred, Y_true):
            """evaluates 1 prediction
            Input:
                Y_pred: numpy.ndarray, typically [H, W] or [H, W, L], predicted class ID
                Y_true: same as `Y_pred`, label, ground-truth class ID
            """
            for c in range(self.n_classes):
                B_pred_c = (Y_pred == c).astype(np.int64)
                B_c      = (Y_true == c).astype(np.int64)
                pred_l0, pred_inv_l0, gt_l0, gt_inv_l0 = B_pred_c.sum(), (1 - B_pred_c).sum(), B_c.sum(), (1 - B_c).sum()
                for metr, fn in self.metrics.items():
                    is_distance_metr = metr in ("hd", "assd", "hd95", "asd")
                    if 0 == c and (self.ignore_bg or is_distance_metr):
                        # always ignore bg for distance metrics
                        a = np.nan
                    elif 0 == gt_l0 and 0 == pred_l0 and metr in ("dice", "iou", "sensitivity"):
                        a = 1
                    elif 0 == gt_inv_l0 and 0 == pred_inv_l0 and "specificity" == metr:
                        a = 1
                    elif is_distance_metr and pred_l0 * gt_l0 == 0: # at least one party is all 0
                        if 0 == pred_l0 and 0 == gt_l0: # both are all 0
                            # nips23a&d, xmed-lab/GenericSSL
                            a = 0
                        else: # only one party is all 0
                            a = np.nan
                            if 0 == pred_l0:
                                self.records[f"empty_pred_{metr}"][c] += 1
                            else: # 0 == gt_l0
                                self.records[f"empty_gt_{metr}"][c] += 1
                    else: # normal cases or that medpy can solve well
                        # try:
                        a = fn(B_pred_c, B_c)
                        # except:
                        #     a = np.nan
    
                    self.records[metr][c].append(a)
    
        def reduce(self, prec=4):
            """calculate class-wise & overall average
            Input:
                prec: int, decimal precision
            Output:
                res: dict
                    - res[]: float, overall average
                    - res[_clswise]: List[float], class-wise average of each class
                    - res[empty_pred|gt_]: int, overall #NaN caused by empty pred/label
                    - res[empty_pred|gt__clswise]: List[int], class-wise #NaN
            """
            res = {}
            for metr in self.records:
                if metr.startswith("empty_"):
                    res[metr+"_clswise"] = self.records[metr]
                    res[metr] = int(np.sum(self.records[metr]))
                else:
                    CxN = np.asarray(self.records[metr], dtype=np.float32)
                    nans = np.isnan(CxN)
                    CxN[nans] = 0
                    not_nans = ~nans
    
                    # class-wise average
                    cls_n = not_nans.sum(1) # [c]
                    # cls_avg = np.where(cls_n > 0, CxN.sum(1) / cls_n, 0)
                    _cls_n_denom = cls_n.copy()
                    _cls_n_denom[0 == _cls_n_denom] = 1 # to avoid warning though not necessary
                    cls_avg = np.where(cls_n > 0, CxN.sum(1) / _cls_n_denom, 0)
                    res[f"{metr}_clswise"] = np.round(cls_avg, prec).tolist()
    
                    # overall average
                    ins_cls_n = not_nans.sum(0) # [n]
                    # ins_avg = np.where(ins_cls_n > 0, CxN.sum(0) / ins_cls_n, 0)
                    _ins_cls_n_denom = ins_cls_n.copy()
                    _ins_cls_n_denom[0 == _ins_cls_n_denom] = 1 # to avoid warning though not necessary
                    ins_avg = np.where(ins_cls_n > 0, CxN.sum(0) / _ins_cls_n_denom, 0)
                    ins_n = (ins_cls_n > 0).sum()
                    avg = ins_avg.sum() / ins_n if ins_n > 0 else 0
                    res[metr] = float(np.round(avg, prec))
    
            return res
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145
    • 146
    • 147
    • 148
    • 149
    • 150
    • 151
    • 152
    • 153

    compare to MONAI

    • MedPy 0.4.0
    • monai 1.3.0

    2D

    • 加背景共 5 类
    import os, os.path as osp
    import numpy as np
    import medpy.metric.binary as mmb
    from collections import defaultdict
    import torch
    from monai.networks.utils import one_hot
    from monai.metrics import DiceMetric, MeanIoU, GeneralizedDiceScore, ConfusionMatrixMetric, HausdorffDistanceMetric, SurfaceDistanceMetric
    
    
    class SegEvaluator:
        """同上文"""
        pass
    
    
    print("自制 pred、label,保证非空以对拍 asd、assd、hd、hd95")
    nc = 4+1
    gen_data_size = [100, 40*(nc-1)]
    gen_data_list = []
    for i in range(30):
        pred = np.zeros(gen_data_size, dtype=np.uint8)
        label = pred.copy()
        for ic in range(1, nc):
            fill_v = np.concatenate([np.zeros([70*40], dtype=np.uint8), ic * np.ones([30*40], dtype=np.uint8)])
            np.random.shuffle(fill_v)
            pred[:, (ic-1)*40: ic*40] = fill_v.reshape(100, 40)
            if i > 2: # let the 1st 3 be perfect predictions
                np.random.shuffle(fill_v)
            label[:, (ic-1)*40: ic*40] = fill_v.reshape(100, 40)
        gen_data_list.append((pred, label))
    print(len(gen_data_list)) # 30
    
    
    print("MONAI overall")
    monai_metrics = {
        "dice": DiceMetric(include_background=True, reduction="mean", get_not_nans=False, ignore_empty=True, num_classes=4+1),
        "iou": MeanIoU(include_background=True, reduction="mean", get_not_nans=False, ignore_empty=True),
        "sensitivity": ConfusionMatrixMetric(include_background=True, metric_name='sensitivity', reduction="mean", compute_sample=True, get_not_nans=False),
        "specificity": ConfusionMatrixMetric(include_background=True, metric_name='specificity', reduction="mean", compute_sample=True, get_not_nans=False),
        "accuracy": ConfusionMatrixMetric(include_background=True, metric_name='accuracy', reduction="mean", get_not_nans=False),
        # 下面 distance metrics 忽略 background
        "hd": HausdorffDistanceMetric(include_background=False, reduction="mean", get_not_nans=False),
        "hd95": HausdorffDistanceMetric(include_background=False, percentile=95, reduction="mean", get_not_nans=False),
        "assd": SurfaceDistanceMetric(include_background=False, symmetric=True, reduction="mean", get_not_nans=False),
        "asd": SurfaceDistanceMetric(include_background=False, symmetric=False, reduction="mean", get_not_nans=False),
    }
    
    print("MONAI class-wise")
    monai_metrics_cls = {
        "dice": DiceMetric(include_background=True, reduction="mean_batch", get_not_nans=False, ignore_empty=True, num_classes=4+1),
        "iou": MeanIoU(include_background=True, reduction="mean_batch", get_not_nans=False, ignore_empty=True),
        "sensitivity": ConfusionMatrixMetric(include_background=True, metric_name='sensitivity', reduction="mean_batch", compute_sample=True, get_not_nans=False),
        "specificity": ConfusionMatrixMetric(include_background=True, metric_name='specificity', reduction="mean_batch", compute_sample=True, get_not_nans=False),
        "accuracy": ConfusionMatrixMetric(include_background=True, metric_name='accuracy', reduction="mean_batch", get_not_nans=False),
        # 下面 distance metrics 忽略 background
        "hd": HausdorffDistanceMetric(include_background=False, reduction="mean_batch", get_not_nans=False),
        "hd95": HausdorffDistanceMetric(include_background=False, percentile=95, reduction="mean_batch", get_not_nans=False),
        "assd": SurfaceDistanceMetric(include_background=False, symmetric=True, reduction="mean_batch", get_not_nans=False),
        "asd": SurfaceDistanceMetric(include_background=False, symmetric=False, reduction="mean_batch", get_not_nans=False),
    }
    
    print("medpy")
    seg_eval = SegEvaluator(4+1, False)
    
    
    print("2D seg,一张张测")
    for pred, label in gen_data_list:
        pred = pred.astype(np.int32)
        label = label.astype(np.int32)
        # print(pred.shape, pred.dtype, np.unique(pred), label.shape, label.dtype, np.unique(label))
    
        seg_eval(pred, label)
    
        # MONAI 要求 [B, C, H, W],且有些指标要求 one-hot
        pred = one_hot(torch.from_numpy(pred).unsqueeze(0).unsqueeze(0), num_classes=4+1, dim=1)
        label = one_hot(torch.from_numpy(label).unsqueeze(0).unsqueeze(0), num_classes=4+1, dim=1)
        for _m in monai_metrics:
            monai_metrics[_m](y_pred=pred, y=label)
            monai_metrics_cls[_m](y_pred=pred, y=label)
    
    
    print("monai")
    for _m in monai_metrics:
        print('\t', _m)
        _agg = monai_metrics[_m].aggregate()#.item()
        _agg_cls = monai_metrics_cls[_m].aggregate()
        # print(type(_agg_cls))
        if isinstance(_agg, list):
            assert len(_agg) == 1
            _agg = _agg[0]
        if isinstance(_agg_cls, list):
            assert len(_agg_cls) == 1
            _agg_cls = _agg_cls[0]
        _agg = round(_agg.item(), 6)
        _agg_cls = [round(_x, 6) for _x in _agg_cls.tolist()]
        # monai_metrics[_m].reset()
        print("class-wise:", _agg_cls)
        print("overall:", _agg)
    
    
    print("medpy")
    medpy_res = {k: v for k, v in seg_eval.reduce(6).items()
    	if not k.startswith("empty_")} # 虑掉那些 NaN 记数
    for metr in medpy_res:
        if not metr.endswith("_clswise"):
            print('\t', metr)
            print("class-wise:", medpy_res[metr+"_clswise"])
            print("overall:", medpy_res[metr])
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107

    输出

    monai
    	 dice
    class-wise: [0.819524, 0.275542, 0.280583, 0.279583, 0.276667]
    overall: 0.38638
    	 iou
    class-wise: [0.699342, 0.197282, 0.200425, 0.199778, 0.197985]
    overall: 0.298962
    	 sensitivity
    class-wise: [0.819524, 0.275542, 0.280583, 0.279583, 0.276667]
    overall: 0.38638
    	 specificity
    class-wise: [0.278094, 0.961871, 0.962136, 0.962083, 0.96193]
    overall: 0.825223
    	 accuracy
    class-wise: [0.711238, 0.927554, 0.928058, 0.927958, 0.927667]
    overall: 0.884495
    	 hd
    class-wise: [3.283388, 3.363868, 3.308639, 3.316445]
    overall: 3.318085
    	 hd95
    class-wise: [2.012461, 2.004592, 2.012461, 2.012461]
    overall: 2.010494
    	 assd
    class-wise: [0.945859, 0.939, 0.945332, 0.943413]
    overall: 0.943401
    	 asd
    class-wise: [0.943933, 0.938989, 0.944227, 0.941334]
    overall: 0.942121
    
    
    medpy
    	 dice
    class-wise: [0.819523, 0.275542, 0.280583, 0.279583, 0.276667]
    overall: 0.38638
    	 iou
    class-wise: [0.699342, 0.197282, 0.200425, 0.199778, 0.197985]
    overall: 0.298962
    	 sensitivity
    class-wise: [0.819523, 0.275542, 0.280583, 0.279583, 0.276667]
    overall: 0.38638
    	 specificity
    class-wise: [0.278094, 0.961871, 0.962136, 0.962083, 0.96193]
    overall: 0.825223
    	 accuracy
    class-wise: [0.711238, 0.927554, 0.928058, 0.927958, 0.927667]
    overall: 0.884495
    	 hd
    class-wise: [0.0, 3.283388, 3.363868, 3.308639, 3.316445]
    overall: 3.318085
    	 hd95
    class-wise: [0.0, 2.004592, 2.004592, 2.005379, 2.004592]
    overall: 2.004789
    	 assd
    class-wise: [0.0, 0.945857, 0.939, 0.945332, 0.94341]
    overall: 0.9434
    	 asd
    class-wise: [0.0, 0.943933, 0.938989, 0.944227, 0.941334]
    overall: 0.942121
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58

    结论:hd95 同,其它一致。对比 medpy 的实现MONAI 的实现,是因为它们取 95% 分位数的方式不同:

    • medpy 是取两个方向的所有 voxel 的 surface distance 的 95% 分位数;
    • MONAI 是两个方向的 surface diatance 各取 95%分位数,再取两者最大值。

    3D

    • 代码接上节
    print("2D 辑成 3D")
    gen_data_3d_list = []
    for i in range(0, len(gen_data_list), 10):
        pred3d = np.concatenate([x[0][:, :, np.newaxis] for x in gen_data_list[i: i+10]], axis=2)
        label3d = np.concatenate([x[1][:, :, np.newaxis] for x in gen_data_list[i: i+10]], axis=2)
        print(pred3d.shape, label3d.shape) # (100, 160, 10) (100, 160, 10)
        gen_data_3d_list.append((pred3d, label3d))
    print(len(gen_data_3d_list)) # 3
    
    
    print("重置 metrics 记录")
    seg_eval.reset()
    for _m in monai_metrics:
        monai_metrics[_m].reset()
        monai_metrics_cls[_m].reset()
    
    
    print("3D,逐 volume 测")
    for pred, label in gen_data_3d_list:
        pred = pred.astype(np.int32)
        label = label.astype(np.int32)
        # print(pred.shape, pred.dtype, np.unique(pred), label.shape, label.dtype, np.unique(label))
    
        seg_eval(pred, label)
    
        pred = one_hot(torch.from_numpy(pred).unsqueeze(0).unsqueeze(0), num_classes=4+1, dim=1)
        label = one_hot(torch.from_numpy(label).unsqueeze(0).unsqueeze(0), num_classes=4+1, dim=1)
        for _m in monai_metrics:
            monai_metrics[_m](y_pred=pred, y=label)
            monai_metrics_cls[_m](y_pred=pred, y=label)
    
    
    print("monai") # 同前面 2D
    for _m in monai_metrics:
        print('\t', _m)
        _agg = monai_metrics[_m].aggregate()#.item()
        _agg_cls = monai_metrics_cls[_m].aggregate()
        # print(type(_agg_cls))
        if isinstance(_agg, list):
            assert len(_agg) == 1
            _agg = _agg[0]
        if isinstance(_agg_cls, list):
            assert len(_agg_cls) == 1
            _agg_cls = _agg_cls[0]
    
        _agg = round(_agg.item(), 6)
        _agg_cls = [round(_x, 6) for _x in _agg_cls.tolist()]
        # monai_metrics[_m].reset()
        print("class-wise:", _agg_cls)
        print("overall:", _agg)
    
    
    print("medpy")
    medpy_res = {k: v for k, v in seg_eval.reduce(6).items()
    	if not k.startswith("empty_")} # 虑掉那些 NaN 记数
    for metr in medpy_res:
        if not metr.endswith("_clswise"):
            print('\t', metr)
            print("class-wise:", medpy_res[metr+"_clswise"])
            print("overall:", medpy_res[metr])
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60

    输出

    monai
    	 dice
    class-wise: [0.819523, 0.275542, 0.280583, 0.279583, 0.276667]
    overall: 0.38638
    	 iou
    class-wise: [0.695224, 0.165041, 0.168134, 0.168165, 0.165894]
    overall: 0.272492
    	 sensitivity
    class-wise: [0.819523, 0.275542, 0.280583, 0.279583, 0.276667]
    overall: 0.38638
    	 specificity
    class-wise: [0.278094, 0.961871, 0.962136, 0.962083, 0.96193]
    overall: 0.825223
    	 accuracy
    class-wise: [0.711237, 0.927554, 0.928058, 0.927958, 0.927667]
    overall: 0.884495
    	 hd
    class-wise: [2.307209, 2.44949, 2.575802, 2.44949]
    overall: 2.445498
    	 hd95
    class-wise: [1.414214, 1.414214, 1.414214, 1.414214]
    overall: 1.414214
    	 assd
    class-wise: [0.815341, 0.80984, 0.813648, 0.815224]
    overall: 0.813514
    	 asd
    class-wise: [0.812973, 0.809779, 0.813435, 0.814375]
    overall: 0.812641
    
    
    medpy
    	 dice
    class-wise: [0.819523, 0.275542, 0.280583, 0.279583, 0.276667]
    overall: 0.38638
    	 iou
    class-wise: [0.695224, 0.165041, 0.168134, 0.168165, 0.165894]
    overall: 0.272492
    	 sensitivity
    class-wise: [0.819523, 0.275542, 0.280583, 0.279583, 0.276667]
    overall: 0.38638
    	 specificity
    class-wise: [0.278094, 0.961871, 0.962136, 0.962083, 0.96193]
    overall: 0.825223
    	 accuracy
    class-wise: [0.711238, 0.927554, 0.928058, 0.927958, 0.927667]
    overall: 0.884495
    	 hd
    class-wise: [0.0, 2.307209, 2.44949, 2.575802, 2.44949]
    overall: 2.445498
    	 hd95
    class-wise: [0.0, 1.414213, 1.414213, 1.414213, 1.414213]
    overall: 1.414214
    	 assd
    class-wise: [0.0, 0.815341, 0.80984, 0.813648, 0.815224]
    overall: 0.813514
    	 asd
    class-wise: [0.0, 0.812973, 0.809779, 0.813435, 0.814375]
    overall: 0.812641
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58

    结论:

    • 跟 MONAI 一致,连 hd95 都一致;
    • 同样的数据,辑成 3D 测跟 2D 式逐 slice 测结果同(符合直觉,因为求平均的方式不同)。

    References

    1. 亨氏单位
    2. 常用的医学图像分割评价指标
    3. 医学图像分割常用指标及代码(pytorch)
    4. (MIA 2022) Rethinking adversarial domain adaptation: Orthogonal decomposition for unsupervised domain adaptation in medical image segmentation - paper, github
    5. (ICML 2018) CyCADA: Cycle-Consistent Adversarial Domain Adaptation - paper, code
    6. 评价指标
    7. 语义分割之评价指标
    8. jfzhang95/pytorch-deeplab-xception
    9. xxxzhi/Freq weight IoU
    10. (arXiv 21) Weighted Intersection over Union (wIoU): A New Evaluation Metric for Image Segmentation - paper
    11. TFboys-lzz/MPSCL
    12. Project-MONAI/MONAI
    13. xmed-lab/GenericSSL
    14. Evaluation metric #39
    15. test_model指标计算 #5
    16. 性能咨询 #4
    17. python生成列表坑
  • 相关阅读:
    LabelImg使用笔记
    PyTorch官方文档学习笔记(备忘)
    跳表论文解读
    [ATC复盘] abc329 20231118
    IPO观察丨“闷头做手机”的龙旗科技,如何拓宽价值边界?
    【大数据 - Doris 实践】数据表的基本使用(一):基本概念、创建表
    Gradle系列——Gradle插件(基于Gradle文档7.5)day3-2
    【Linux中的自旋锁、信号量、互斥体】通俗的解释
    mysql 忘记密码后重置
    时间显示(蓝桥杯)
  • 原文地址:https://blog.csdn.net/HackerTom/article/details/133382705