TP TN FP FN

The following are the 4 basic terminologies you need to know.

  • True Positives (TP): when the actual value is Positive and predicted is also Positive.
  • True Negatives (TN): when the actual value is Negative and prediction is also Negative.
  • False Positives (FP): When the actual is negative but prediction is Positive. Also known as the Type 1 error.
  • False Negatives (FN): When the actual is Positive but the prediction is Negative. Also known as the Type 2 error.
tp_tn_fp_fn

OA, Overall Accuracy, 总体精度

计算总体的分类精度

OA=Number of correctly predicted samplesTotal number of samples=TPc(TPc+FNc)\rm OA = \frac{Number of correctly predicted samples}{Total number of samples} = \frac{\sum TP_c}{\sum (TP_c + FN_c)}

Precision, 精确率/查准率

Precisionc=Number of samples correctly predicted as class cTotal number of samples predicted as class c=TPcTPc+FPc\rm Precision_c = \frac{Number of samples correctly predicted as class c}{Total number of samples predicted as class c} = \frac{TP_c}{TP_c + FP_c}

Accuracy, 单个类别的精度

计算每一类别的分类精度

Accuracyc=Number of correctly predicted samples in class cTotal number of samples in class c=TPcTPc+FNc\rm Accuracy_c​ = \frac{Number of correctly predicted samples in class c}{Total number of samples in class c}​ = \frac{TP_c}{TP_c + FN_c}

其中的 sample 在不同任务中有不同的涵义,在图像分类中一张图片为一个 sample,在图像分割任务中一个像素点即一个 sample。

Recall, 召回率/查全率

单个类别的 Accuracy 也可以称之为 Recall,两者在计算公式上是一样的。

Recallc=Number of correctly predicted samplesTotal number of samples=TPcTPc+FNc\rm Recall_c = \frac{Number of correctly predicted samples}{Total number of samples} = \frac{TP_c}{TP_c + FN_c}

F1 Score, F1分数

1F1c=12(1Precisionc+1Recallc)F1c=2×Precisionc×RecallcPrecisionc+Recallc=2TPc2TPc+FPc+FNc\begin{gather*} \rm \frac{1}{F1_c} = \frac{1}{2}\left(\frac{1}{Precision_c}+\frac{1}{Recall_c}\right)\\ \rm \Rightarrow F1_c = 2 \times \frac{Precision_c \times Recall_c}{Precision_c + Recall_c} = \frac{2TP_c}{2TP_c + FP_c + FN_c} \end{gather*}

Dice, Dice similarity coefficient, DSC

The Sørensen-Dice index, known as the Dice similarity coefficient (DSC)

Dicec=2TPc2TPc+FPc+FNc\rm Dice_c = \frac{2TP_c}{2TP_c + FP_c + FN_c}

这与 F1 score 的计算公式是一样的。

IoU, Intersection over Union, 交并比

IoUc=TPcTPc+FPc+FNc\rm IoU_c = \frac{TP_c}{TP_c + FP_c + FN_c}

Kappa 系数

Kappa=pope1peKappa = \frac{p_o-p_e}{1-p_e}

po=OAp_o = OA, pep_e 如下计算

pe=类别类别的实际像素数总像素数×类别的预测像素数总像素数p_e = \sum_{\text{类别}} \frac{类别的实际像素数}{总像素数}\times\frac{类别的预测像素数}{总像素数}

对于只有两类 前景和背景 的分割任务, pep_e 如下计算

pe=TP+FNN前景实际像素数×TP+FPN前景预测像素数+TN+FPN背景实际像素数×TN+FNN背景预测像素数p_e = \underset{前景实际像素数}{\frac{TP+FN}{N}} \times \underset{前景预测像素数}{\frac{TP+FP}{N}} + \underset{背景实际像素数}{\frac{TN+FP}{N}} \times \underset{背景预测像素数}{\frac{TN+FN}{N}}

对于多类别的分类任务, 例如

Predicted ClassActual ClassABCActual Class NumANaaNabNacNaBNbaNbbNbcNbCNcaNcbNccNcPredicted Class NumNaNbNcN\begin{array}{c|ccc|c} \frac{\text{Predicted Class}\rightarrow}{\underset{\downarrow}{\text{Actual Class}}} & A & B & C & \text{Actual Class Num}\\ \hline A & N_{aa} & N_{ab} & N_{ac} & N_{a\cdot} \\ B & N_{ba} & N_{bb} & N_{bc} & N_{b\cdot} \\ C & N_{ca} & N_{cb} & N_{cc} & N_{c\cdot} \\ \hline \text{Predicted Class Num} & N_{\cdot a} & N_{\cdot b} & N_{\cdot c} & N \\ \end{array}

pop_opep_e 的计算如下

po=OA=Naa+Nbb+NccNp_o = OA = \frac{N_{aa}+N_{bb}+N_{cc}}{N}

pe=NaNNaN+NbNNbN+NcNNcN=NaNa+NbNb+NcNcN2\begin{split} p_e &= \frac{N_{a\cdot}}{N}\cdot \frac{N_{\cdot a}}{N} + \frac{N_{b\cdot}}{N}\cdot \frac{N_{\cdot b}}{N} + \frac{N_{c\cdot}}{N}\cdot \frac{N_{\cdot c}}{N}\\ &= \frac{N_{a\cdot} \cdot N_{\cdot a} + N_{b\cdot} \cdot N_{\cdot b} + N_{c\cdot} \cdot N_{\cdot c}}{N^2} \end{split}

Confusion Matrix, 混淆矩阵

Confusion matrix is a matrix of size (class_num x class_num).

Predicted ClassActual ClassABCActual Class NumANaaNabNacNaBNbaNbbNbcNbCNcaNcbNccNcPredicted Class NumNaNbNcN\begin{array}{c|ccc|c} \frac{\text{Predicted Class}\rightarrow}{\underset{\downarrow}{\text{Actual Class}}} & A & B & C & \text{Actual Class Num}\\ \hline A & N_{aa} & N_{ab} & N_{ac} & N_{a\cdot} \\ B & N_{ba} & N_{bb} & N_{bc} & N_{b\cdot} \\ C & N_{ca} & N_{cb} & N_{cc} & N_{c\cdot} \\ \hline \text{Predicted Class Num} & N_{\cdot a} & N_{\cdot b} & N_{\cdot c} & N \\ \end{array}

How to get TP,TN,FP,FN

  • For class A, its TP are NaaN_{aa}, TN are Nbb+NccN_{bb}+N_{cc}, FP are Nba+NcaN_{ba}+N_{ca}, FN are Nab+NacN_{ab}+N_{ac}

TPTNFPFNNaaNabNacNbaNbbNbcNcaNcbNcc\begin{array}{|c|} \hline \color{red}TP \quad \color{green}TN \quad \color{blue}FP \quad \color{gold}FN\\ \hline \begin{matrix} \color{red}{N_{aa}} & \color{gold}N_{ab} & \color{gold}N_{ac}\\ \color{blue}N_{ba} & \color{green}N_{bb} & N_{bc}\\ \color{blue}N_{ca} & N_{cb} & \color{green}N_{cc} \end{matrix} \\ \hline \end{array}

  • For class B, its TP are NbbN_{bb}, TN are Naa+NccN_{aa}+N_{cc}, FP are Nab+NcbN_{ab}+N_{cb}, FN are Nba+NbcN_{ba}+N_{bc}

TPTNFPFNNaaNabNacNbaNbbNbcNcaNcbNcc\begin{array}{|c|} \hline \color{red}TP \quad \color{green}TN \quad \color{blue}FP \quad \color{gold}FN\\ \hline \begin{matrix} \color{green}N_{aa} & \color{blue}N_{ab} & N_{ac}\\ \color{gold}N_{ba} & \color{red}N_{bb} & \color{gold}N_{bc}\\ N_{ca} & \color{blue}N_{cb} & \color{green}N_{cc} \end{matrix} \\ \hline \end{array}

  • For class C, its TP are NccN_{cc}, TN are Naa+NbbN_{aa}+N_{bb}, FP are Nac+NbcN_{ac}+N_{bc}, FN are Nca+NcbN_{ca}+N_{cb}

TPTNFPFNNaaNabNacNbaNbbNbcNcaNcbNcc\begin{array}{|c|} \hline \color{red}TP \quad \color{green}TN \quad \color{blue}FP \quad \color{gold}FN\\ \hline \begin{matrix} \color{green}N_{aa} & N_{ab} & \color{blue}N_{ac}\\ N_{ba} & \color{green}N_{bb} & \color{blue}N_{bc}\\ \color{gold}N_{ca} & \color{gold}N_{cb} & \color{red}N_{cc} \end{matrix} \\ \hline \end{array}

How to compute confusion matrix in semantic segmentation case using python.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import numpy as np

class_num = 3

gound_truth = np.random.randint(low=0, high=class_num, size=(5, 5))
# [[1 0 0 0 2]
# [1 2 0 2 1]
# [1 0 0 2 2]
# [0 1 1 1 0]
# [2 1 1 0 1]]
prediction = np.random.randint(low=0, high=class_num, size=(5, 5))
# [[2 1 1 1 1]
# [2 1 1 2 0]
# [2 1 2 2 1]
# [2 0 2 0 0]
# [0 0 1 2 0]]

'''
[0, 1, 2]*3 + [0, 1, 2]
By doing this, we will get 3*3=9 kinds of results:
0+0, 0+1, 0+2
3+0, 3+1, 3+2
6+0, 6+1, 6+2
'''
labels = class_num*gound_truth + prediction
# [[5 1 1 1 7]
# [5 7 1 8 3]
# [5 1 2 8 7]
# [2 3 5 3 0]
# [6 3 4 2 3]]
counts = np.bincount(labels.flatten(), minlength=class_num ** 2)
# counts: [1 5 3 5 1 4 1 3 2]
# ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
# labels: 0 1 2 3 4 5 6 7 8
confusion_matrix = counts.reshape(class_num, class_num)
# counts labels
# [[1 5 3] [[0 1 2]
# [5 1 4] <-- [3 4 5]
# [1 3 2]] [6 7 8]]

TP = np.diag(confusion_matrix)
# [1, 1, 2]
TN = np.diag(confusion_matrix).sum() - np.diag(confusion_matrix)
# [3, 3, 2] <-- sum([1, 1, 2]) - [1, 1, 2]
FP = confusion_matrix.sum(axis=0) - np.diag(confusion_matrix)
# [6, 8, 7] <-- [7, 9, 9] - [1, 1, 2]
FN = confusion_matrix.sum(axis=1) - np.diag(confusion_matrix)
# [8, 9, 4] <-- [9, 10, 6] - [1, 1, 2]

'''Overall Accuracy'''
OA = TP.sum() / (confusion_matrix.sum(dim=None) + eps)
# 0.16

'''Precision or Producer's Accuracy (Accuracy from the aspect of ground truth)'''
Precision = TP / (TP + FP + eps)
# [0.143, 0.111, 0.222]

'''Recall or User's Accuracy (Accuracy from the aspect of prediction result)'''
Recall = TP / (TP + FN + eps)
# [0.111, 0.1 , 0.333]

'''F1 socre'''
F1 = (2.0 * Precision * Recall) / (Precision + Recall + eps)
# [0.125, 0.105, 0.267]

'''Intersection over Union'''
IoU = TP / (TP + FN + FP)
# [0.067, 0.056, 0.154]

Another way to calculate TP,TN,FP,FN

For the inputs in segmentation, a Prediction with shape (Batch_size, Class_num, Height, Width) and a Mask or Ground Truth with shape (Batch_size, Height, Width), Prediction contains Class_num values per pixel for each image of every batch, representing predicted probability for each class respectively.

For category c, the TP,TN,FP,FN will be calculated as follwing:

TPc=h,wpc,h,wgc,h,wTNc=h,wicpi,h,wgi,h,wFPc=h,wpc,h,w(1gc,h,w)FNc=h,w(1pc,h,w)gc,h,w\begin{split} TP_c &= \sum_{h,w}p_{c,h,w}g_{c,h,w}\\ TN_c &= \sum_{h,w}\sum_{i\neq c}p_{i,h,w}g_{i,h,w}\\ FP_c &= \sum_{h,w}p_{c,h,w}(1-g_{c,h,w})\\ FN_c &= \sum_{h,w}(1-p_{c,h,w})g_{c,h,w} \end{split}

where gh,wg_{h,w} uses a one-hot encoding scheme for ground truth labels of pixel (h,w), and gc,h,wg_{c,h,w} is the c-th element of gh,wg_{h,w}; pc,h,w[0,1]p_{c,h,w}\in[0,1] is the predicted value of the pixel (h,w) belonging to label c.

gh,w=(00,,0i1,1i,0i+1,,0C1)g_{h,w} = (0_0,\cdots,0_{i-1},1_{i},0_{i+1},\cdots,0_{C-1}), i is the label of pixel (h,w)

eg. predicton = [p0, p1, p2, p3], ground-truth = [0, 0, 1, 0], which has 4 categories.

For 0th category:

  • TP0 = p0 × g0 = 0,
  • TN0 = p1 × g1 + p2 × g2 + p3 × g3 = p2,
  • FP0 = p0 × (1 - g0) = p0,
  • FN0 = (1 - p0) × g0 = 0.

For 2nd category:

  • TP2 = p2 × g2 = p2,
  • TN2 = p0 × g0 + p1 × g1 + p3 × g3 = 0,
  • FP2 = p2 × (1 - g2) = 0,
  • FN2 = (1 - p2) × g2 = 1 - p2.

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
import torch
from torch import Tensor

class Metric(object):
def __init__(self, num_classes: int, device):
self.num_classes = num_classes # 6
self.confusion_matrix = torch.zeros(size=(num_classes, num_classes),
dtype=torch.long, device=device)
self.eps = 1e-8
self.tp = None
self.fp = None
self.tn = None
self.fn = None
self.num = None

def calculate_tp_fp_tn_fn_num(self) -> None:
'''calculate true positive, false positive, true negative and false negative of each class'''
# shape of tp, fp, tn, fn, num are all num_classes
self.tp = torch.diag(self.confusion_matrix)
self.fp = self.confusion_matrix.sum(dim=0) - self.tp
self.fn = self.confusion_matrix.sum(dim=1) - self.tp
self.tn = self.tp.sum() - self.tp
self.num = self.confusion_matrix.sum(dim=1)

def get_precision(self) -> Tensor:
"""calculate precision for each class"""
precision = self.tp / (self.tp + self.fp + self.eps)
return precision

def get_recall(self) -> Tensor:
"""calculate recall for each class"""
recall = self.tp / (self.tp + self.fn + self.eps)
return recall

def get_f1(self) -> Tensor:
"""calculate f1 score for each class"""
Precision = self.tp / (self.tp + self.fp + self.eps)
Recall = self.tp / (self.tp + self.fn + self.eps)
F1 = (2.0 * Precision * Recall) / (Precision + Recall + self.eps)
return F1

def get_mf1(self) -> Tensor:
f1 = self.get_f1()
mF1 = f1.mean(dim=None)
return mF1

def get_fwf1(self):
"""Frequency Weighted F1 score, the weighted average of all F1 scores"""
f1 = self.get_f1()
FWF1 = (self.num * f1).sum(dim=None) / self.num.sum(dim=None)
return FWF1

def get_iou(self) -> Tensor:
"""calculate Intersection over Union"""
# tp, fp, tn, fn = self.get_tp_fp_tn_fn()
IoU = self.tp / (self.tp + self.fn + self.fp + self.eps) # shape: (6,)
return IoU

def get_miou(self) -> Tensor:
"""calculate the mean of all IoU"""
IoU = self.get_iou()
mIoU = IoU.mean(dim=None)
return mIoU

def get_fwiou(self):
"""Frequency Weighted IoU, the weighted average of all IoU"""
iou = self.get_iou()
FWIoU = (self.num * iou).sum(dim=None) / self.num.sum(dim=None)
return FWIoU

def get_dice(self) -> Tensor:
"""calculate dice for each class"""
Dice = 2 * self.tp / ((self.tp + self.fp) + (self.tp + self.fn) + self.eps)
return Dice

def get_accuracy(self) -> Tensor:
"""calculate accuracy for each class"""
Acc = self.tp / (self.tp + self.fn + self.eps)
return Acc

def get_overall_accuracy(self) -> Tensor:
"""calculate the overall accuracy"""
OA = self.tp.sum(dim=None) / self.confusion_matrix.sum(dim=None)
return OA

def get_average_accuracy(self) -> Tensor:
"""calculate the average accuracy"""
Acc = self.get_accuracy()
AverAcc = Acc.mean(dim=None)
return AverAcc

def _get_confusion_matrix(self, labels: Tensor, predictions: Tensor) -> Tensor:
""" calculate confusion matrix for one result or a batch of results
labels: [height, width], predictions: [height, width];
labels: [batch, height, width], predictions: [batch, height, width]; """

'''
0: impervious surfaces, 1: building, 2: low vegetation, 3: tree, 4: car, 5: background

6 * (0,1,2,3,4,5) + (0,1,2,3,4,5)
---------------------------------
0: 0 1 2 3 4 5
1: 6 7 8 9 10 11
2: 12 13 14 15 16 17
3: 18 19 20 21 22 23
4: 24 25 26 27 28 29
5: 30 31 32 33 34 35
'''
assert labels.shape == predictions.shape, f"shape should be same"
index = self.num_classes * labels + predictions
count = torch.bincount(input=index.flatten(), minlength=self.num_classes ** 2)
confusion_matrix = count.reshape(self.num_classes, self.num_classes) # shape: (6, 6)
return confusion_matrix

def add_batch(self, labels: Tensor, predictions: Tensor) -> None:
"""labels: [height, width], predictions: [height, width]
labels: [batch, height, width], predictions: [batch, height, width]"""
assert labels.shape == predictions.shape, f'shape should be same'
self.confusion_matrix += self._get_confusion_matrix(labels, predictions)
self.calculate_tp_fp_tn_fn_num()

def reset_confusion_matrix(self):
self.confusion_matrix = torch.zeros(size=(self.num_classes, self.num_classes))


if __name__ == '__main__':
num_classes = 6
labels = torch.randint(low=0, high=num_classes, size=(2, 224, 224))
predictions = torch.randint(low=0, high=num_classes, size=(2, 224, 224))

metric = Metric(num_classes=num_classes, device="cpu")

metric.add_batch(labels=labels, predictions=predictions)
# metric.calculate_tp_fp_tn_fn_num()
print(
f"num: {metric.num}\n"
f"oa: {metric.get_overall_accuracy()}\n"
f"aa: {metric.get_average_accuracy()}\n"
f"accuracy: {metric.get_accuracy()}\n"
f"iou: {metric.get_iou()}\n"
f"miou: {metric.get_miou()}\n"
f"fwiou: {metric.get_fwiou()}\n"
f"f1: {metric.get_f1()}\n"
f"mf1: {metric.get_mf1()}\n"
f"fwf1: {metric.get_fwf1()}"
)
print(metric.confusion_matrix)

借助 sklearn

1
pip install scikit-learn
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import numpy as np
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score, jaccard_score

if __name__ == "__main__":
num_classes = 6
B, H, W = 8, 224, 224
labels = ["0", "1", "2", "3", "4", "5"]
y_true = np.random.randint(low=0, high=num_classes, size=(B, H, W))
y_pred = np.random.randint(low=0, high=num_classes, size=(B, H, W))

'''reshape(-1) and flatten() can do the same thing --
converting a multidimensional array into a 1D array.'''
confusion = confusion_matrix(y_true=y_true.reshape(-1), y_pred=y_pred.flatten())
print(f"{confusion = }")
# confusion = array([[10881, 11015, 11179, 11279, 11076, 11210],
# [11184, 11055, 11134, 11040, 11139, 11074],
# [11091, 11196, 11306, 11026, 11204, 11213],
# [11171, 11303, 11299, 11218, 11095, 11178],
# [11182, 11248, 11205, 10986, 11191, 10996],
# [11248, 11098, 11101, 11022, 11228, 11337]], dtype=int64)

accuracy = accuracy_score(y_true=y_true.reshape(-1), y_pred=y_pred.flatten())
precision = precision_score(y_true=y_true.reshape(-1), y_pred=y_pred.flatten(), average=None)
recall = recall_score(y_true=y_true.reshape(-1), y_pred=y_pred.flatten(), average=None)
f1 = f1_score(y_true=y_true.reshape(-1), y_pred=y_pred.flatten(), average=None)
iou = jaccard_score(y_true=y_true.reshape(-1), y_pred=y_pred.flatten(), average=None)

print(f"{accuracy = :>.2%}")
# accuracy = 16.69%
np.set_printoptions(precision=4)
print(f"{precision = }")
print(f"{recall = }")
print(f"{f1 = }")
print(f"{iou = }")
# precision = array([0.163 , 0.1652, 0.1682, 0.1685, 0.1672, 0.1692])
# recall = array([0.1633, 0.1659, 0.1687, 0.1668, 0.1675, 0.1691])
# f1 = array([0.1631, 0.1656, 0.1684, 0.1676, 0.1674, 0.1692])
# iou = array([0.0888, 0.0903, 0.092 , 0.0915, 0.0913, 0.0924])

evaluator 的设计

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
from tqdm import tqdm

from typing import Dict
import logging

from torch import nn
import torch
from torch import Tensor

def calculate_confusion_matrix(y_true: Tensor, y_pred: Tensor, num_classes: int) -> Tensor:
""" calculate confusion matrix
The shape of passed tensor should be
[height * width], [height, width] or [batch, height, width]
"""

'''
0: impervious surfaces, 1: building, 2: low vegetation, 3: tree, 4: car, 5: background
---------------------------------
num_classes * y_true + y_pred
6 * (0,1,2,3,4,5) + (0,1,2,3,4,5)
---------------------------------
0: 0 1 2 3 4 5
1: 6 7 8 9 10 11
2: 12 13 14 15 16 17
3: 18 19 20 21 22 23
4: 24 25 26 27 28 29
5: 30 31 32 33 34 35
'''
assert y_pred.shape == y_true.shape, f"shape should be same"
index = num_classes * y_true + y_pred
counts = torch.bincount(input=index.flatten(), minlength=num_classes ** 2)
confusion_matrix = counts.reshape(num_classes, num_classes)
return confusion_matrix

def evaluator_potsdam(cfg, model, testloader, device) -> Dict:
'''evaluate the model over testset'''
model.eval()
model.to(device)

'''initialize confusion matrix'''
confusion_matrix = torch.zeros(size=[cfg.num_classes, cfg.num_classes], dtype=torch.int64, device=device)

'''create a process bar by tqdm'''
testloader_bar = tqdm(testloader)
testloader_bar.set_description(desc="val")
for batch in testloader_bar:
images, labels = batch['img'].to(device), batch['ann'].to(device)

'''raw_prediction: [B, Classes, Height, Width]'''
raw_predictions = model(images)
raw_predictions = nn.Softmax(dim=1)(raw_predictions)
'''[B, Classes, Height, Width] -argmax(dim=1)-> [B, Height, Width]
predictions: [B, Height, Width]'''
predictions = raw_predictions.argmax(dim=1)

confusion_matrix += calculate_confusion_matrix(y_true=labels, y_pred=predictions, num_classes=cfg.num_classes)

testloader_bar.close()

eps = 1e-8
proportion_per_class = confusion_matrix.sum(dim=1) / confusion_matrix.sum(dim=None)

'''true positive, false positive, true negative and false negative for each class'''
tp = torch.diag(confusion_matrix)
fp = confusion_matrix.sum(dim=0) - tp
tn = tp.sum(dim=None) - tp
fn = confusion_matrix.sum(dim=1) - tp

'''overall accuracy'''
oa = tp.sum(dim=None) / confusion_matrix.sum(dim=None)

'''intersection over union'''
iou_per_class = tp / (tp + fn + fp)
'''mean iou'''
miou = iou_per_class.mean(dim=None)
'''frequency weighted iou'''
fwiou = (iou_per_class * proportion_per_class).sum(dim=None)

'''f1 score'''
precision, recall = tp / (tp + fp + eps), tp / (tp + fn + eps)
f1_per_class = 2.0 * precision * recall / (precision + recall + eps)
'''mean f1 score'''
mf1 = f1_per_class.mean(dim=None)
'''frequency weighted f1 score'''
fwf1 = (f1_per_class * proportion_per_class).sum(dim=None)

logging.info(f"OA:{oa:06.2%}, mF1:{mf1:06.2%}, fwF1:{fwf1:06.2%}, mIoU:{miou:06.2%}, fwIoU:{fwiou:06.2%}")
for class_name, portion, iou, f1 in zip(cfg.class_names, proportion_per_class, iou_per_class, f1_per_class):
logging.info(f"{class_name:>9}({portion:06.2%}): f1={f1:>06.2%}, iou={iou:>06.2%}")

return {"oa": oa.item(), "mf1": mf1.item(), "fwf1": fwf1.item(), "miou": miou.item(), "fwiou": fwiou.item()}

if __name__ == "__main__":
from ml_collections import ConfigDict
from torch.utils.data import Dataset, DataLoader

config = ConfigDict()
config.num_classes = 6
config.class_names = ('ImSurf', 'Building', 'LowVeg', 'Tree', 'Car', 'Clutter')

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

logging.basicConfig(level=logging.INFO,
format="%(asctime)s %(levelname)s %(message)s",
datefmt="%Y-%m-%d %H:%M:%S")

class CustomDataset(Dataset):
def __init__(self):
self.images = torch.rand(size=[80, 3, 512, 512], dtype=torch.float32)
self.labels = torch.randint(low=0, high=6, size=[80, 512, 512], dtype=torch.long)

def __getitem__(self, index):
return {"name": "name", "img": self.images[index], "ann": self.labels[index]}

def __len__(self):
return len(self.images)

dataset = CustomDataset()
dataloader = DataLoader(dataset, batch_size=8)

model = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=6, kernel_size=1)
).to(device)

evaluator_potsdam(config, model, dataloader, device)