图像分割的评价指标

TP TN FP FN

The following are the 4 basic terminologies you need to know.

True Positives (TP): when the actual value is Positive and predicted is also Positive.
True Negatives (TN): when the actual value is Negative and prediction is also Negative.
False Positives (FP): When the actual is negative but prediction is Positive. Also known as the Type 1 error.
False Negatives (FN): When the actual is Positive but the prediction is Negative. Also known as the Type 2 error.

OA, Overall Accuracy, 总体精度

计算总体的分类精度

$\rm OA = \frac{Number of correctly predicted samples}{Total number of samples} = \frac{\sum TP_c}{\sum (TP_c + FN_c)}$

Precision, 精确率/查准率

$\rm Precision_c = \frac{Number of samples correctly predicted as class c}{Total number of samples predicted as class c} = \frac{TP_c}{TP_c + FP_c}$

Accuracy, 单个类别的精度

计算每一类别的分类精度

$\rm Accuracy_c = \frac{Number of correctly predicted samples in class c}{Total number of samples in class c} = \frac{TP_c}{TP_c + FN_c}$

其中的 sample 在不同任务中有不同的涵义，在图像分类中一张图片为一个 sample，在图像分割任务中一个像素点即一个 sample。

Recall, 召回率/查全率

单个类别的 Accuracy 也可以称之为 Recall，两者在计算公式上是一样的。

$\rm Recall_c = \frac{Number of correctly predicted samples}{Total number of samples} = \frac{TP_c}{TP_c + FN_c}$

F1 Score, F1分数

$\begin{gather*} \rm \frac{1}{F1_c} = \frac{1}{2}\left(\frac{1}{Precision_c}+\frac{1}{Recall_c}\right)\\ \rm \Rightarrow F1_c = 2 \times \frac{Precision_c \times Recall_c}{Precision_c + Recall_c} = \frac{2TP_c}{2TP_c + FP_c + FN_c} \end{gather*}$

Dice, Dice similarity coefficient, DSC

The Sørensen-Dice index, known as the Dice similarity coefficient (DSC)

$\rm Dice_c = \frac{2TP_c}{2TP_c + FP_c + FN_c}$

这与 F1 score 的计算公式是一样的。

IoU, Intersection over Union, 交并比

$\rm IoU_c = \frac{TP_c}{TP_c + FP_c + FN_c}$

Kappa 系数

$Kappa = \frac{p_o-p_e}{1-p_e}$

$p_o = OA$ , $p_e$ 如下计算

$p_e = \sum_{\text{类别}} \frac{类别的实际像素数}{总像素数}\times\frac{类别的预测像素数}{总像素数}$

对于只有两类前景和背景的分割任务, $p_e$ 如下计算

$p_e = \underset{前景实际像素数}{\frac{TP+FN}{N}} \times \underset{前景预测像素数}{\frac{TP+FP}{N}} + \underset{背景实际像素数}{\frac{TN+FP}{N}} \times \underset{背景预测像素数}{\frac{TN+FN}{N}}$

对于多类别的分类任务, 例如

$\begin{array}{c|ccc|c} \frac{\text{Predicted Class}\rightarrow}{\underset{\downarrow}{\text{Actual Class}}} & A & B & C & \text{Actual Class Num}\\ \hline A & N_{aa} & N_{ab} & N_{ac} & N_{a\cdot} \\ B & N_{ba} & N_{bb} & N_{bc} & N_{b\cdot} \\ C & N_{ca} & N_{cb} & N_{cc} & N_{c\cdot} \\ \hline \text{Predicted Class Num} & N_{\cdot a} & N_{\cdot b} & N_{\cdot c} & N \\ \end{array}$

$p_o$ 和 $p_e$ 的计算如下

$p_o = OA = \frac{N_{aa}+N_{bb}+N_{cc}}{N}$

$\begin{split} p_e &= \frac{N_{a\cdot}}{N}\cdot \frac{N_{\cdot a}}{N} + \frac{N_{b\cdot}}{N}\cdot \frac{N_{\cdot b}}{N} + \frac{N_{c\cdot}}{N}\cdot \frac{N_{\cdot c}}{N}\\ &= \frac{N_{a\cdot} \cdot N_{\cdot a} + N_{b\cdot} \cdot N_{\cdot b} + N_{c\cdot} \cdot N_{\cdot c}}{N^2} \end{split}$

Confusion Matrix, 混淆矩阵

Confusion matrix is a matrix of size (class_num x class_num).

How to get TP,TN,FP,FN

For class A, its TP are $N_{aa}$ , TN are $N_{bb}+N_{cc}$ , FP are $N_{ba}+N_{ca}$ , FN are $N_{ab}+N_{ac}$

$\begin{array}{|c|} \hline \color{red}TP \quad \color{green}TN \quad \color{blue}FP \quad \color{gold}FN\\ \hline \begin{matrix} \color{red}{N_{aa}} & \color{gold}N_{ab} & \color{gold}N_{ac}\\ \color{blue}N_{ba} & \color{green}N_{bb} & N_{bc}\\ \color{blue}N_{ca} & N_{cb} & \color{green}N_{cc} \end{matrix} \\ \hline \end{array}$

For class B, its TP are $N_{bb}$ , TN are $N_{aa}+N_{cc}$ , FP are $N_{ab}+N_{cb}$ , FN are $N_{ba}+N_{bc}$

$\begin{array}{|c|} \hline \color{red}TP \quad \color{green}TN \quad \color{blue}FP \quad \color{gold}FN\\ \hline \begin{matrix} \color{green}N_{aa} & \color{blue}N_{ab} & N_{ac}\\ \color{gold}N_{ba} & \color{red}N_{bb} & \color{gold}N_{bc}\\ N_{ca} & \color{blue}N_{cb} & \color{green}N_{cc} \end{matrix} \\ \hline \end{array}$

For class C, its TP are $N_{cc}$ , TN are $N_{aa}+N_{bb}$ , FP are $N_{ac}+N_{bc}$ , FN are $N_{ca}+N_{cb}$

$\begin{array}{|c|} \hline \color{red}TP \quad \color{green}TN \quad \color{blue}FP \quad \color{gold}FN\\ \hline \begin{matrix} \color{green}N_{aa} & N_{ab} & \color{blue}N_{ac}\\ N_{ba} & \color{green}N_{bb} & \color{blue}N_{bc}\\ \color{gold}N_{ca} & \color{gold}N_{cb} & \color{red}N_{cc} \end{matrix} \\ \hline \end{array}$

How to compute confusion matrix in semantic segmentation case using python.

import numpy as np

class_num = 3

gound_truth = np.random.randint(low=0, high=class_num, size=(5, 5))
# [[1 0 0 0 2]
#  [1 2 0 2 1]
#  [1 0 0 2 2]
#  [0 1 1 1 0]
#  [2 1 1 0 1]]
prediction = np.random.randint(low=0, high=class_num, size=(5, 5))
# [[2 1 1 1 1]
#  [2 1 1 2 0]
#  [2 1 2 2 1]
#  [2 0 2 0 0]
#  [0 0 1 2 0]]

'''
  [0, 1, 2]*3 + [0, 1, 2]
By doing this, we will get 3*3=9 kinds of results:
  0+0, 0+1, 0+2
  3+0, 3+1, 3+2
  6+0, 6+1, 6+2
'''
labels = class_num*gound_truth + prediction
# [[5 1 1 1 7]
#  [5 7 1 8 3]
#  [5 1 2 8 7]
#  [2 3 5 3 0]
#  [6 3 4 2 3]]
counts = np.bincount(labels.flatten(), minlength=class_num ** 2)
# counts: [1 5 3 5 1 4 1 3 2]
#          ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
# labels:  0 1 2 3 4 5 6 7 8
confusion_matrix = counts.reshape(class_num, class_num)
#  counts        labels
# [[1 5 3]      [[0 1 2] 
#  [5 1 4]  <--  [3 4 5]
#  [1 3 2]]      [6 7 8]]

TP = np.diag(confusion_matrix)
# [1, 1, 2]
TN = np.diag(confusion_matrix).sum() - np.diag(confusion_matrix)
# [3, 3, 2] <-- sum([1, 1, 2]) - [1, 1, 2]
FP = confusion_matrix.sum(axis=0) - np.diag(confusion_matrix)
# [6, 8, 7] <-- [7, 9, 9] - [1, 1, 2]
FN = confusion_matrix.sum(axis=1) - np.diag(confusion_matrix)
# [8, 9, 4] <-- [9, 10, 6] - [1, 1, 2]

'''Overall Accuracy'''
OA = TP.sum() / (confusion_matrix.sum(dim=None) + eps)
# 0.16

'''Precision or Producer's Accuracy (Accuracy from the aspect of ground truth)'''
Precision = TP / (TP + FP + eps)
# [0.143, 0.111, 0.222]

'''Recall or User's Accuracy (Accuracy from the aspect of prediction result)'''
Recall = TP / (TP + FN + eps)
# [0.111, 0.1  , 0.333]

'''F1 socre'''
F1 = (2.0 * Precision * Recall) / (Precision + Recall + eps)
# [0.125, 0.105, 0.267]

'''Intersection over Union'''
IoU = TP / (TP + FN + FP)
# [0.067, 0.056, 0.154]

Another way to calculate TP,TN,FP,FN

For the inputs in segmentation, a Prediction with shape (Batch_size, Class_num, Height, Width) and a Mask or Ground Truth with shape (Batch_size, Height, Width), Prediction contains Class_num values per pixel for each image of every batch, representing predicted probability for each class respectively.

For category c, the TP,TN,FP,FN will be calculated as follwing:

$\begin{split} TP_c &= \sum_{h,w}p_{c,h,w}g_{c,h,w}\\ TN_c &= \sum_{h,w}\sum_{i\neq c}p_{i,h,w}g_{i,h,w}\\ FP_c &= \sum_{h,w}p_{c,h,w}(1-g_{c,h,w})\\ FN_c &= \sum_{h,w}(1-p_{c,h,w})g_{c,h,w} \end{split}$

where $g_{h,w}$ uses a one-hot encoding scheme for ground truth labels of pixel (h,w), and $g_{c,h,w}$ is the c-th element of $g_{h,w}$ ; $p_{c,h,w}\in[0,1]$ is the predicted value of the pixel (h,w) belonging to label c.

$g_{h,w} = (0_0,\cdots,0_{i-1},1_{i},0_{i+1},\cdots,0_{C-1})$ , i is the label of pixel (h,w)

eg. predicton = [p0, p1, p2, p3], ground-truth = [0, 0, 1, 0], which has 4 categories.

For 0th category:

TP0 = p0 × g0 = 0,
TN0 = p1 × g1 + p2 × g2 + p3 × g3 = p2,
FP0 = p0 × (1 - g0) = p0,
FN0 = (1 - p0) × g0 = 0.

For 2nd category:

TP2 = p2 × g2 = p2,
TN2 = p0 × g0 + p1 × g1 + p3 × g3 = 0,
FP2 = p2 × (1 - g2) = 0,
FN2 = (1 - p2) × g2 = 1 - p2.

代码

import torch
from torch import Tensor

class Metric(object):
    def __init__(self, num_classes: int, device):
        self.num_classes = num_classes # 6
        self.confusion_matrix = torch.zeros(size=(num_classes, num_classes),
                                            dtype=torch.long, device=device)
        self.eps = 1e-8
        self.tp = None
        self.fp = None
        self.tn = None
        self.fn = None
        self.num = None

    def calculate_tp_fp_tn_fn_num(self) -> None:
        '''calculate true positive, false positive, true negative and false negative of each class'''
        # shape of tp, fp, tn, fn, num are all num_classes
        self.tp = torch.diag(self.confusion_matrix)
        self.fp = self.confusion_matrix.sum(dim=0) - self.tp
        self.fn = self.confusion_matrix.sum(dim=1) - self.tp
        self.tn = self.tp.sum() - self.tp
        self.num = self.confusion_matrix.sum(dim=1)

    def get_precision(self) -> Tensor:
        """calculate precision for each class"""
        precision = self.tp / (self.tp + self.fp + self.eps)
        return precision

    def get_recall(self) -> Tensor:
        """calculate recall for each class"""
        recall = self.tp / (self.tp + self.fn + self.eps)
        return recall

    def get_f1(self) -> Tensor:
        """calculate f1 score for each class"""
        Precision = self.tp / (self.tp + self.fp + self.eps)
        Recall = self.tp / (self.tp + self.fn + self.eps)
        F1 = (2.0 * Precision * Recall) / (Precision + Recall + self.eps)
        return F1

    def get_mf1(self) -> Tensor:
        f1 = self.get_f1()
        mF1 = f1.mean(dim=None)
        return mF1

    def get_fwf1(self):
        """Frequency Weighted F1 score, the weighted average of all F1 scores"""
        f1 = self.get_f1()
        FWF1 = (self.num * f1).sum(dim=None) / self.num.sum(dim=None)
        return FWF1

    def get_iou(self) -> Tensor:
        """calculate Intersection over Union"""
        # tp, fp, tn, fn = self.get_tp_fp_tn_fn()
        IoU = self.tp / (self.tp + self.fn + self.fp + self.eps) # shape: (6,)
        return IoU

    def get_miou(self) -> Tensor:
        """calculate the mean of all IoU"""
        IoU = self.get_iou()
        mIoU = IoU.mean(dim=None)
        return mIoU
    
    def get_fwiou(self):
        """Frequency Weighted IoU, the weighted average of all IoU"""
        iou = self.get_iou()
        FWIoU = (self.num * iou).sum(dim=None) / self.num.sum(dim=None)
        return FWIoU

    def get_dice(self) -> Tensor:
        """calculate dice for each class"""
        Dice = 2 * self.tp / ((self.tp + self.fp) + (self.tp + self.fn) + self.eps)
        return Dice

    def get_accuracy(self) -> Tensor:
        """calculate accuracy for each class"""
        Acc = self.tp / (self.tp + self.fn + self.eps)
        return Acc
    
    def get_overall_accuracy(self) -> Tensor:
        """calculate the overall accuracy"""
        OA = self.tp.sum(dim=None) / self.confusion_matrix.sum(dim=None)
        return OA

    def get_average_accuracy(self) -> Tensor:
        """calculate the average accuracy"""
        Acc = self.get_accuracy()
        AverAcc = Acc.mean(dim=None)
        return AverAcc

    def _get_confusion_matrix(self, labels: Tensor, predictions: Tensor) -> Tensor:
        """ calculate confusion matrix for one result or a batch of results
        labels: [height, width],        predictions: [height, width];
        labels: [batch, height, width], predictions: [batch, height, width]; """

        '''
        0: impervious surfaces, 1: building, 2: low vegetation, 3: tree, 4: car, 5: background
        
        6 * (0,1,2,3,4,5) + (0,1,2,3,4,5)
        ---------------------------------
        0:  0  1  2  3  4  5
        1:  6  7  8  9 10 11
        2: 12 13 14 15 16 17
        3: 18 19 20 21 22 23
        4: 24 25 26 27 28 29
        5: 30 31 32 33 34 35
        '''
        assert labels.shape == predictions.shape, f"shape should be same"
        index = self.num_classes * labels + predictions
        count = torch.bincount(input=index.flatten(), minlength=self.num_classes ** 2)
        confusion_matrix = count.reshape(self.num_classes, self.num_classes) # shape: (6, 6)
        return confusion_matrix

    def add_batch(self, labels: Tensor, predictions: Tensor) -> None:
        """labels: [height, width],        predictions: [height, width]
           labels: [batch, height, width], predictions: [batch, height, width]"""
        assert labels.shape == predictions.shape, f'shape should be same'
        self.confusion_matrix += self._get_confusion_matrix(labels, predictions)
        self.calculate_tp_fp_tn_fn_num()

    def reset_confusion_matrix(self):
        self.confusion_matrix = torch.zeros(size=(self.num_classes, self.num_classes))


if __name__ == '__main__':
    num_classes = 6
    labels = torch.randint(low=0, high=num_classes, size=(2, 224, 224))
    predictions = torch.randint(low=0, high=num_classes, size=(2, 224, 224))
    
    metric = Metric(num_classes=num_classes, device="cpu")

    metric.add_batch(labels=labels, predictions=predictions)
    # metric.calculate_tp_fp_tn_fn_num()
    print(
        f"num: {metric.num}\n"
        f"oa: {metric.get_overall_accuracy()}\n"
        f"aa: {metric.get_average_accuracy()}\n"
        f"accuracy: {metric.get_accuracy()}\n"
        f"iou: {metric.get_iou()}\n"
        f"miou: {metric.get_miou()}\n"
        f"fwiou: {metric.get_fwiou()}\n"
        f"f1: {metric.get_f1()}\n"
        f"mf1: {metric.get_mf1()}\n"
        f"fwf1: {metric.get_fwf1()}"
    )
    print(metric.confusion_matrix)

借助 sklearn

1	pip install scikit-learn

import numpy as np
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score, jaccard_score

if __name__ == "__main__":
  num_classes = 6
  B, H, W = 8, 224, 224
  labels = ["0", "1", "2", "3", "4", "5"]
  y_true = np.random.randint(low=0, high=num_classes, size=(B, H, W))
  y_pred = np.random.randint(low=0, high=num_classes, size=(B, H, W))

  '''reshape(-1) and flatten() can do the same thing --
  converting a multidimensional array into a 1D array.'''
  confusion = confusion_matrix(y_true=y_true.reshape(-1), y_pred=y_pred.flatten())
  print(f"{confusion = }")
  # confusion = array([[10881, 11015, 11179, 11279, 11076, 11210],
  #                    [11184, 11055, 11134, 11040, 11139, 11074],
  #                    [11091, 11196, 11306, 11026, 11204, 11213],
  #                    [11171, 11303, 11299, 11218, 11095, 11178],
  #                    [11182, 11248, 11205, 10986, 11191, 10996],
  #                    [11248, 11098, 11101, 11022, 11228, 11337]], dtype=int64)

  accuracy = accuracy_score(y_true=y_true.reshape(-1), y_pred=y_pred.flatten())
  precision = precision_score(y_true=y_true.reshape(-1), y_pred=y_pred.flatten(), average=None)
  recall = recall_score(y_true=y_true.reshape(-1), y_pred=y_pred.flatten(), average=None)
  f1 = f1_score(y_true=y_true.reshape(-1), y_pred=y_pred.flatten(), average=None)
  iou = jaccard_score(y_true=y_true.reshape(-1), y_pred=y_pred.flatten(), average=None)

  print(f"{accuracy = :>.2%}")
  # accuracy = 16.69%
  np.set_printoptions(precision=4)
  print(f"{precision = }")
  print(f"{recall = }")
  print(f"{f1 = }")
  print(f"{iou = }")
  # precision = array([0.163 , 0.1652, 0.1682, 0.1685, 0.1672, 0.1692])
  # recall = array([0.1633, 0.1659, 0.1687, 0.1668, 0.1675, 0.1691])
  # f1 = array([0.1631, 0.1656, 0.1684, 0.1676, 0.1674, 0.1692])
  # iou = array([0.0888, 0.0903, 0.092 , 0.0915, 0.0913, 0.0924])

evaluator 的设计

from tqdm import tqdm

from typing import Dict
import logging

from torch import nn
import torch
from torch import Tensor

def calculate_confusion_matrix(y_true: Tensor, y_pred: Tensor, num_classes: int) -> Tensor:
  """ calculate confusion matrix
  The shape of passed tensor should be
  [height * width], [height, width] or [batch, height, width]
  """

  '''
  0: impervious surfaces, 1: building, 2: low vegetation, 3: tree, 4: car, 5: background
  ---------------------------------
  num_classes * y_true + y_pred
  6 * (0,1,2,3,4,5) + (0,1,2,3,4,5)
  ---------------------------------
  0:  0  1  2  3  4  5
  1:  6  7  8  9 10 11
  2: 12 13 14 15 16 17
  3: 18 19 20 21 22 23
  4: 24 25 26 27 28 29
  5: 30 31 32 33 34 35
  '''
  assert y_pred.shape == y_true.shape, f"shape should be same"
  index = num_classes * y_true + y_pred
  counts = torch.bincount(input=index.flatten(), minlength=num_classes ** 2)
  confusion_matrix = counts.reshape(num_classes, num_classes)
  return confusion_matrix

def evaluator_potsdam(cfg, model, testloader, device) -> Dict:
  '''evaluate the model over testset'''
  model.eval()
  model.to(device)

  '''initialize confusion matrix'''
  confusion_matrix = torch.zeros(size=[cfg.num_classes, cfg.num_classes], dtype=torch.int64, device=device)

  '''create a process bar by tqdm'''
  testloader_bar = tqdm(testloader)
  testloader_bar.set_description(desc="val")
  for batch in testloader_bar:
    images, labels = batch['img'].to(device), batch['ann'].to(device)

    '''raw_prediction: [B, Classes, Height, Width]'''
    raw_predictions = model(images)
    raw_predictions = nn.Softmax(dim=1)(raw_predictions)
    '''[B, Classes, Height, Width] -argmax(dim=1)-> [B, Height, Width]
        predictions: [B, Height, Width]'''
    predictions = raw_predictions.argmax(dim=1)

    confusion_matrix += calculate_confusion_matrix(y_true=labels, y_pred=predictions, num_classes=cfg.num_classes)

  testloader_bar.close()

  eps = 1e-8
  proportion_per_class = confusion_matrix.sum(dim=1) / confusion_matrix.sum(dim=None)

  '''true positive, false positive, true negative and false negative for each class'''
  tp = torch.diag(confusion_matrix)
  fp = confusion_matrix.sum(dim=0) - tp
  tn = tp.sum(dim=None) - tp
  fn = confusion_matrix.sum(dim=1) - tp

  '''overall accuracy'''
  oa = tp.sum(dim=None) / confusion_matrix.sum(dim=None)

  '''intersection over union'''
  iou_per_class = tp / (tp + fn + fp)
  '''mean iou'''
  miou = iou_per_class.mean(dim=None)
  '''frequency weighted iou'''
  fwiou = (iou_per_class * proportion_per_class).sum(dim=None)

  '''f1 score'''
  precision, recall = tp / (tp + fp + eps), tp / (tp + fn + eps)
  f1_per_class = 2.0 * precision * recall / (precision + recall + eps)
  '''mean f1 score'''
  mf1 = f1_per_class.mean(dim=None)
  '''frequency weighted f1 score'''
  fwf1 = (f1_per_class * proportion_per_class).sum(dim=None)

  logging.info(f"OA:{oa:06.2%}, mF1:{mf1:06.2%}, fwF1:{fwf1:06.2%}, mIoU:{miou:06.2%}, fwIoU:{fwiou:06.2%}")
  for class_name, portion, iou, f1 in zip(cfg.class_names, proportion_per_class, iou_per_class, f1_per_class):
    logging.info(f"{class_name:>9}({portion:06.2%}): f1={f1:>06.2%}, iou={iou:>06.2%}")

  return {"oa": oa.item(), "mf1": mf1.item(), "fwf1": fwf1.item(), "miou": miou.item(), "fwiou": fwiou.item()}

if __name__ == "__main__":
  from ml_collections import ConfigDict
  from torch.utils.data import Dataset, DataLoader
  
  config = ConfigDict()
  config.num_classes = 6
  config.class_names = ('ImSurf', 'Building', 'LowVeg', 'Tree', 'Car', 'Clutter')

  device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

  logging.basicConfig(level=logging.INFO,
                      format="%(asctime)s %(levelname)s %(message)s",
                      datefmt="%Y-%m-%d %H:%M:%S")

  class CustomDataset(Dataset):
    def __init__(self):
      self.images = torch.rand(size=[80, 3, 512, 512], dtype=torch.float32)
      self.labels = torch.randint(low=0, high=6, size=[80, 512, 512], dtype=torch.long)

    def __getitem__(self, index):
      return {"name": "name", "img": self.images[index], "ann": self.labels[index]}

    def __len__(self):
      return len(self.images)

  dataset = CustomDataset()
  dataloader = DataLoader(dataset, batch_size=8)

  model = nn.Sequential(
      nn.Conv2d(in_channels=3, out_channels=6, kernel_size=1)
  ).to(device)

  evaluator_potsdam(config, model, dataloader, device)