ISPRS-Potsdam

The Potsdam dataset is for urban semantic segmentation used in the 2D Semantic Labeling Contest - Potsdam.

The dataset can be requested at the challenge homepage. You need to get a package file named utf-8' 'Potsdam.zip (size: 13.3GB), and unzip this package to get a folder named Potsdam which contains 10 files as follows:

1
2
3
4
5
6
7
8
9
10
11
Potsdam
├── 1_DSM.rar
├── 1_DSM_normalisation.zip
├── 2_Ortho_RGB.zip # <-- RGB image
├── 3_Ortho_IRRG.zip
├── 4_Ortho_RGBIR.zip
├── 5_Labels_all.zip # <-- don't have black boundary line
├── 5_Labels_all_noBoundary.zip # <-- have black boundary line
├── 5_Labels_for_participants.zip
├── 5_Labels_for_participants_no_Boundary.zip
└── assess_classification_reference_implementation.tgz

dataset_prepare.html#isprs-potsdam | mmsegmentation docs

where only 2_Ortho_RGB.zip and 5_Labels_all.zip are needed.

1
2
3
potsdam
├── 2_Ortho_RGB.zip
└── 5_Labels_all.zip

2_Ortho_RGB.zip 中的图片为 tif 格式,Windows 自带的各种图片工具都不能正常打开,使用 vscode 中 tif 插件,例如 TIFF Preview,可以查看 Potsdam 数据集的 tif 原图。此外 Potsdam 数据集的原图存在扭曲现象,这是数据集本身的问题,不是处理的失误。

Correspondence between colors and categories

0[255,255,255]whiteimpervious surfaces1[0,0,255]bluebuilding2[0,255,255]cyanlow vegetation3[0,255,0]greentree4[255,255,0]yellowcar5[255,0,0]redclutter/background\begin{array}{cclcl} 0 & \fcolorbox{black}{white}{\quad} & [255, 255, 255] & \text{white} & \text{impervious surfaces}\\ 1 & \colorbox{blue}{$\quad$} & [0, 0, 255] & \text{blue} & \text{building}\\ 2 & \colorbox{cyan}{$\quad$} & [0, 255, 255] & \text{cyan} & \text{low vegetation}\\ 3 & \colorbox{green}{$\quad$} & [0, 255, 0] & \text{green} & \text{tree}\\ 4 & \colorbox{yellow}{$\quad$} & [255, 255, 0] & \text{yellow} & \text{car}\\ 5 & \colorbox{red}{$\quad$} & [255, 0, 0] & \text{red} & \text{clutter/background}\\ \end{array}

1
2
3
4
5
6
7
8
9
10
'''
0: [255 255 255] : white : impervious surface
1: [ 0 0 255] : blue : building
2: [ 0 255 255] : cyan : low vegetation
3: [ 0 255 0] : green : tree
4: [255 255 0] : yellow : car
5: [255 0 0] : red : clutter/background
'''
color_map = np.array([[255, 255, 255], [ 0, 0, 255], [ 0, 255, 255],
[ 0, 255, 0], [255, 255, 0], [255, 0, 0]])

注意如果使用的是 5_Labels_all_noBoundary.zip 作为标签,其包含了边界标注 (黑色),对应的 color_map 会有所不同

1
2
3
4
5
6
7
8
9
10
11
12
'''
0: [ 0 0 0] : black : boundary
1: [255 255 255] : white : impervious surface
2: [ 0 0 255] : blue : building
3: [ 0 255 255] : cyan : low vegetation
4: [ 0 255 0] : green : tree
5: [255 255 0] : yellow : car
6: [255 0 0] : red : clutter/background
'''
color_map = np.array([[0, 0, 0], [255, 255, 255], [0, 0, 255],
[0, 255, 255], [0, 255, 0], [255, 255, 0],
[255, 0, 0]])

Configuration

Use the code below to convert the original images (pixels 6000×6000) to patches (pixels 512×512)

In the 2_Ortho_RGB.zip file, it contains 38 pictures of size 6000x6000:

overview_potsdam

In the default configuration, We assign the training, validation, and test sets as follows, 21 for training, 1 for validation and 14 for testing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
splits = {
'train': [
'2_11', '2_12', '3_10', '3_11', '3_12', '4_10', '4_11',
'5_10', '5_11', '5_12', '6_8', '6_9', '6_10', # '4_12', '6_7',
'6_11', '6_12', '7_7', '7_8', '7_9', '7_10', '7_11', '7_12'
],
'val': [
'2_10'
],
'test': [
'2_13', '2_14', '3_13', '3_14', '4_13', '4_14', '4_15', '5_13',
'5_14', '5_15', '6_13', '6_14', '6_15', '7_13'
]
}

where there is a problem with the label images for 4_12 and 6_7, so we discard them.

And every picture will be seplited into 12x12=144 patches of size 512x512. There are (38-2)x144=5184=3024+144+2016 patches in total, in which 21 images / 3024 patches are used for training, 1 image / 144 patches for velidation and 14 images / 2016 patches for testing.

For an image with size 6000x6000 and patch_size 512, 6000 = 11×512+368 = 12×512-144, which are not divisible, we split 6000 with 512 as follows:

1
2
x/ymin: [0, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096, 4608, 5120, 5632]
offset: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -144]
1
2
0~512, 512~(2*512), (2*512)~(3*512), ..., (10*512)~(11*512), (11*512-144)~(12*512-144)
0~512, 512~1024, 1024~1563, ..., 5120~5632, 5488~6000
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
+----------------------------------------------------------------------------------------+
| x+offset, y+offset, x+offset+512, y+offset+512 |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_0_512_512 | 512_0_1024_512 | ... | 5120_0_5632_512 | 5488_0_6000_512 |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_512_512_1024 | 512_512_1024_1024 | ... | 5120_512_5632_1024 | 5488_512_6000_1024 |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_1024_512_1536 | 512_1024_1024_1536 | ... | 5120_1024_5632_1536 | 5488_1024_6000_1536 |
+-----------------+--------------------+-----+---------------------+---------------------+
| : | : | | : | : |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_5120_512_5632 | 512_5120_1024_5632 | ... | 5120_5120_5632_5632 | 5488_5120_6000_5632 |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_5488_512_6000 | 512_5488_1024_6000 | ... | 5120_5488_5632_6000 | 5488_5488_6000_6000 |
+-----------------+--------------------+-----+---------------------+---------------------+

文件结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ tree -L 3 path/to/datasets
datasets
├── potsdam_512x512
│ ├── ann_dir
│ │ ├── test
│ │ ├── train
│ │ └── val
│ └── img_dir
│ ├── test
│ ├── train
│ └── val
└── vaihingen_512x512
├── ann_dir
│ ├── test
│ ├── train
│ └── val
└── img_dir
├── test
├── train
└── val
1
2
3
4
5
6
$ ls -l path/to/potsdam_512x512/img_dir/train/ | grep "^-" | wc -l
3024
$ ls -l path/to/potsdam_512x512/img_dir/test/ | grep "^-" | wc -l
2304
$ ls -l path/to/potsdam_512x512/img_dir/val/ | grep "^-" | wc -l
144

Potsdam 数据集中的错误

错误来源于 potsdam 数据集本身的两张 label。第一张是 top_potsdam_6_7_label.tif 包含了错误的像素值

1
2
3
4
5
6
7
8
9
from PIL import Image
import numpy as np

mask_path = 'path/to/top_potsdam_6_7_label.tif'

mask = Image.open(mask_path)
mask = np.array(mask).reshape(-1,3)
values, counts = np.unique(mask, return_counts=True, axis=0)
print(values, counts)

通过python统计 top_potsdam_6_7_label.tif 的像素输出结果如下

1
2
3
4
5
6
7
8
9
10
# values
[[ 0 0 255]
[ 0 255 0]
[ 0 255 255]
[252 255 0] # <-- Error
[255 0 0]
[255 255 0]
[255 255 255]]
# counts
[ 4857912 5669942 19962121 246304 797467 6749 4459505]

输出中包含 2 个 list,上面的 list 为 rgb 颜色值,下面为各颜色对应的像素数。注意 6_7 包含了一类非正常的 rgb 值 [252, 255, 0],其中第一个值是 252 并非 255,这个 rgb 值在为像素打 label 时造成了错误,所以可以将这个 252 改成 255 即可。

第二张错误 label 是 top_potsdam_4_12_label.tif,这张图片不正常,其像素标注值非常混乱

1
2
3
4
5
6
7
8
9
from PIL import Image
import numpy as np

mask_path = '../datasets/original_potsdam/label/top_potsdam_4_12_label.tif'

mask = Image.open(mask_path)
mask = np.array(mask).reshape(-1,3)
values, counts = np.unique(mask, return_counts=True, axis=0)
print(len(values)) # 24850 <-- Not Six Classes

从输出结果可以看到,它包含 24850 种 RGB 颜色值,它的像素标签值非常混乱,我们将它舍去。

Split Code

Prepare Dataset: ISPRS-Potsdam | mmsegmentation doc

tools/dataset_converters/potsdam.py | mmsegmentation github

1
2
3
4
# potsdam
# ├── 2_Ortho_RGB.zip
# ├── 5_Labels_all.zip
python path\to\potsdam.py path\to\potsdam
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# ref: https://github.com/open-mmlab/mmsegmentation/blob/main/tools/dataset_converters/potsdam.py
import argparse
import glob
import math
import os
import os.path as osp
import tempfile
import zipfile
from tqdm import tqdm

import cv2
import numpy as np

def get_parser():
parser = argparse.ArgumentParser(
description='Convert potsdam dataset to mmsegmentation format')
parser.add_argument('dataset_path', help='potsdam folder path')
parser.add_argument('--tmp_dir', help='path of the temporary directory')
parser.add_argument('-o', '--out_dir', help='output path')
parser.add_argument(
'--clip_size',
type=int,
help='clipped size of image after preparation',
default=512)
parser.add_argument(
'--stride_size',
type=int,
help='stride of clipping original images',
default=256)
return parser

def clip_big_image(image_path, clip_save_dir, args, to_label=False):
'''
Original image of Potsdam dataset is very large, thus pre-processing
of them is adopted. Given fixed clip size and stride size to generate
clipped image, the intersection of width and height is determined.
For example, given one 5120 x 5120 original image, the clip size is
512 and stride size is 256, thus it would generate 20x20 = 400 images
whose size are all 512x512.
'''
image = cv2.imread(image_path) # get a BGR/BRIR image
image = np.array(image)

h, w, c = image.shape # 6000, 6000, 3
clip_size = args.clip_size # 512
stride_size = args.stride_size # 256

num_rows = math.ceil((h - clip_size) / stride_size) \
if math.ceil((h - clip_size) / stride_size) * stride_size + clip_size >= h \
else math.ceil((h - clip_size) / stride_size) + 1
num_cols = math.ceil((w - clip_size) / stride_size) \
if math.ceil((w - clip_size) / stride_size) * stride_size + clip_size >= w \
else math.ceil((w - clip_size) / stride_size) + 1005

x, y = np.meshgrid(np.arange(num_cols + 1), np.arange(num_rows + 1))
xmin = x * clip_size
ymin = y * clip_size

xmin = xmin.ravel()
ymin = ymin.ravel()
xmin_offset = np.where(xmin + clip_size > w, w - xmin - clip_size, np.zeros_like(xmin))
ymin_offset = np.where(ymin + clip_size > h, h - ymin - clip_size, np.zeros_like(ymin))
boxes = np.stack([
xmin + xmin_offset, ymin + ymin_offset,
np.minimum(xmin + clip_size, w),
np.minimum(ymin + clip_size, h)
], axis=1)

if to_label:
'''This is the normal RGB color map
R G B | B G R
0: [255 255 255] | [255 255 255] : impervious surfaces
1: [ 0 0 255] | [255 0 0] : building
2: [ 0 255 255] | [255 255 0] : low vegetation
3: [ 0 255 0] | [ 0 255 0] : tree
4: [255 255 0] | [ 0 255 255] : car
5: [255 0 0] | [ 0 0 255] : clutter/background
'''
# Note it is a BGR color map in this place rather than RGB,
# because cv2 reverses the RGB to BGR when call the imread()
# and reverses BGR to RGB when call the imwrite().
color_map = np.array([[255, 255, 255], [255, 0, 0], [255, 255, 0],
[ 0, 255, 0], [ 0, 255, 255], [ 0, 0, 255]])

flatten_v = np.matmul( image.reshape(-1, c), np.array([2, 3, 4]).reshape(3, 1))
out = np.zeros_like(flatten_v)
for idx, class_color in enumerate(color_map):
value_idx = np.matmul(class_color, np.array([2, 3, 4]).reshape(3, 1))
out[flatten_v == value_idx] = idx
image = out.reshape(h, w)

for box in boxes:
start_x, start_y, end_x, end_y = box
clipped_image = image[start_y:end_y, start_x:end_x] if to_label else image[start_y:end_y, start_x:end_x, :]
idx_i, idx_j = osp.basename(image_path).split('_')[2:4]

cv2.imwrite(img=clipped_image.astype(np.uint8),
filename=osp.join(clip_save_dir, f'{idx_i}_{idx_j}_{start_x}_{start_y}_{end_x}_{end_y}.png'))

def main(args):

splits = {
'train': [
'2_11', '2_12', '3_10', '3_11', '3_12', '4_10', '4_11',
'5_10', '5_11', '5_12', '6_8', '6_9', '6_10', # '4_12', '6_7',
'6_11', '6_12', '7_7', '7_8', '7_9', '7_10', '7_11', '7_12'
], # there is a problem with the label images for 4_12 and 6_7, so we discard them.
'val': [
'2_10'
],
'test': [
'2_13', '2_14', '3_13', '3_14', '4_13', '4_14', '4_15', '5_13',
'5_14', '6_13', '6_14', '6_15', '7_13'
]
}

dataset_path = args.dataset_path
out_dir = out_dir = osp.join('data', 'potsdam') if args.out_dir is None else args.out_dir

print('Making directories...')
if not osp.exists(osp.join(out_dir, 'img_dir', 'train')):
os.makedirs(osp.join(out_dir, 'img_dir', 'train'))
if not osp.exists(osp.join(out_dir, 'img_dir', 'val')):
os.makedirs(osp.join(out_dir, 'img_dir', 'val'))
if not osp.exists(osp.join(out_dir, 'img_dir', 'test')):
os.makedirs(osp.join(out_dir, 'img_dir', 'test'))

if not osp.exists(osp.join(out_dir, 'ann_dir', 'train')):
os.makedirs(osp.join(out_dir, 'ann_dir', 'train'))
if not osp.exists(osp.join(out_dir, 'ann_dir', 'val')):
os.makedirs(osp.join(out_dir, 'ann_dir', 'val'))
if not osp.exists(osp.join(out_dir, 'ann_dir', 'test')):
os.makedirs(osp.join(out_dir, 'ann_dir', 'test'))

zipp_list = glob.glob(os.path.join(dataset_path, '*.zip'))
print('Find the data', zipp_list)

for zipp in zipp_list:
with tempfile.TemporaryDirectory(dir=args.tmp_dir) as tmp_dir: # tmp_dir changes in every loop if dir=None
# unzip
print("Unzipping to the temporary folder...")
zip_file = zipfile.ZipFile(zipp) # open zipfile
zip_file.extractall(tmp_dir) # extract zipfile
# path2tif
src_path_list: list = []
mode, to_label = None, None
if 'Ortho' in zipp:
mode, to_label = "img_dir", False
src_path_list = glob.glob(os.path.join(os.path.join(tmp_dir, os.listdir(tmp_dir)[0]), '*.tif'))
elif 'Labels' in zipp:
mode, to_label = "ann_dir", True
src_path_list = glob.glob(os.path.join(tmp_dir, '*.tif'))
else: continue

prog_bar = tqdm(src_path_list, desc=mode)
for src_path in prog_bar:
# e.g: 'top_potsdam_2_10_RGB.tif'.split('_')[2:4] -> [2, 10]
idx_i, idx_j = osp.basename(src_path).split('_')[2:4]
if f'{idx_i}_{idx_j}' in splits['train']:
data_type = 'train'
elif f'{idx_i}_{idx_j}' in splits['val']:
data_type = 'val'
elif f'{idx_i}_{idx_j}' in splits['val']:
data_type = 'test'
else: continue

dst_dir = osp.join(out_dir, mode, data_type)
clip_big_image(src_path, dst_dir, args, to_label=to_label)

print('Removing the temporary files...')
print('Done!')

if __name__ == '__main__':
# path/to/potsdam
# ├── 2_Ortho_RGB.zip
# └── 5_Labels_all.zip
parser = get_parser()
args = parser.parse_args(["/15T-2/zwx/datasets_package/potsdam/", # path to *.zip
"--out_dir", "/15T-2/zwx/datasets/potsdam_512x512",
"--tmp_dir", "/15T-2/zwx/temp",
"--clip_size", "512", "--stride_size", "512"])
main(args)

mmsegmentation potsdam.py 代码中的 color_map

源代码见链接 potsdam.py | github,下面的代码块是截取的 color_map 部分

mmsegmentation 源码中对 color_map 的设置为 BGR,与正常的 RGB 正好相反,这一点需要注意。

1
2
3
4
5
6
7
8
9
# mmsegmentation 的 color_map 颜色顺序为 BGR
color_map = np.array([[0, 0, 0], [255, 255, 255], [255, 0, 0],
[255, 255, 0], [0, 255, 0], [0, 255, 255],
[0, 0, 255]])

# 正常颜色顺序应该为 RGB
color_map = np.array([[0, 0, 0], [255, 255, 255], [0, 0, 255],
[0, 255, 255], [0, 255, 0], [255, 255, 0],
[255, 0, 0]])

mmsegmentation 之所以标注 BGR 的颜色顺序,应该是其 imread 和 imwrite 方法的底层调用了 cv2 的 imread 和 imwrite,或者模仿了它们的设计。这里可以参考一下这篇博客 cv2如何处理RGB和BGR | 文羊羽

  • mmsegmentaion 的 color map, BGR

    1
    2
    3
    4
    5
    6
    7
    0: [  0   0   0] : boundary
    1: [255 255 255] : impervious surfaces
    2: [255 0 0] : background
    3: [255 255 0] : car
    4: [ 0 255 0] : tree
    5: [ 0 255 255] : low vegetation
    6: [ 0 0 255] : building
  • 正常的 color map, RGB

    1
    2
    3
    4
    5
    6
    7
    0: [  0   0   0] : boundary
    1: [255 255 255] : impervious surfaces
    2: [ 0 0 255] : building
    3: [ 0 255 255] : low vegetation
    4: [ 0 255 0] : tree
    5: [255 255 0] : car
    6: [255 0 0] : clutter/background

ann2rgb

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from typing import Union
import numpy
import torch

color_map = {
0: [255, 255, 255], # white
1: [ 0, 0, 255], # blue
2: [ 0, 255, 255], # cyan
3: [ 0, 255, 0], # green
4: [255, 255, 0], # yellow
5: [255, 0, 0], # red
}

def ann2rgb(annimg: Union[numpy.ndarray, torch.tensor]) -> numpy.ndarray:
"""convert [H, W] annotation mask to [H, W, 3] rgb image"""
h, w = annimg.shape
rgbimg = numpy.zeros(shape=(h, w, 3), dtype=numpy.uint8)
for idx, rgb in color_map.items():
rgbimg[annimg == idx] = rgb
return rgbimg

if __name__ == "__main__":
# annimg = torch.randint(low=0, high=6, size=(224, 224))
annimg = numpy.random.randint(low=0, high=6, size=(224, 224))
rgbimg = ann2rgb(annimg)
print(rgbimg.shape)

Datasets

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
import os
import os.path as osp
from typing import Literal, Tuple

from PIL import Image

import torch
from torch.utils.data import Dataset
from torchvision.transforms import v2

transform = v2.Compose([
# PILToTensor() convert a PIL Image with shape [H, W, C] and type uint8
# to a torch tensor with shape [C, H, W] and type torch.uint8
v2.PILToTensor(),
# ToDtype() convert a uint8 tensor in the range [0, 255] to
# a float32 tensor in the range [0.0, 1.0]
v2.ToDtype(torch.float32, scale=True),
# Normalize() normalize a tensor by using the equation below:
# output[channel] = (input[channel] - mean[channel]) / std[channel]
v2.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
])

CLASSES = ('ImSurf', 'Building', 'LowVeg', 'Tree', 'Car', 'Clutter')

'''
datasets
├── potsdam_512x512
│ ├── ann_dir
│ │ ├── test
│ │ ├── train
│ │ └── val
│ └── img_dir
│ ├── test
│ ├── train
│ └── val
└── vaihingen_512x512
├── ann_dir
│ ├── test
│ ├── train
│ └── val
└── img_dir
├── test
├── train
└── val
'''

class IsprsDataset(Dataset):
def __init__(self, data_root, mode:Literal["train", "test", "val"], transform=transform,
img_dir='img_dir', ann_dir='ann_dir', img_suffix='.png', ann_suffix='.png', idx=None):
self.data_root = data_root
self.img_dir = img_dir
self.ann_dir = ann_dir
self.img_suffix = img_suffix
self.ann_suffix = ann_suffix
self.transform = transform
self.mode = mode
self.name = self.get_name(idx)

def __len__(self):
return len(self.name)

def __getitem__(self, index):
img, ann = self.load_img_and_ann(index)
if self.transform is None:
# PILToTensor() convert a PIL Image with shape [H, W, C] and type uint8
# to a torch tensor with shape [C, H, W] and type torch.uint8
self.transform = v2.PILToTensor()

# apply transform
img, ann = self.transform(img, ann)
# convert shape and type from [1, H, W], uint8 to [H, W], long
ann = ann.squeeze(0).to(torch.long)

name = self.name[index]
results = {"name":name, "img":img, "ann":ann}
return results

def get_name(self, idx=None):
img_basename_list = os.listdir(osp.join(self.data_root, self.img_dir, self.mode))
ann_basename_list = os.listdir(osp.join(self.data_root, self.ann_dir, self.mode))
if idx is not None: # e.g. 3_13
img_basename_list = [basename for basename in img_basename_list if basename.startswith(idx)]
ann_basename_list = [basename for basename in ann_basename_list if basename.startswith(idx)]
assert len(img_basename_list) == len(ann_basename_list), "numbers of img and ann don't match"
name_list = [osp.splitext(basename)[0] for basename in ann_basename_list]
return name_list

def load_img_and_ann(self, index) -> Tuple[Image.Image]:
name = self.name[index]
img_path = osp.join(self.data_root, self.img_dir, self.mode, name + self.img_suffix)
ann_path = osp.join(self.data_root, self.ann_dir, self.mode, name + self.ann_suffix)
img = Image.open(img_path).convert('RGB') # range [0, 255)
ann = Image.open(ann_path).convert('L') # range [0, 6)
return img, ann


if __name__ == "__main__":
from torch.utils.data import DataLoader

data_root = "/15T-2/zwx/datasets/potsdam_512x512/"
# data_root = "/15T-2/zwx/datasets/vaihingen_512x512/"

trainset = IsprsDataset(data_root=data_root)
trainloader = DataLoader(dataset=trainset, batch_size=8, shuffle=True, num_workers=2)

for batch in trainloader:
imgs, anns = batch['img'], batch['ann']
print(f"imgs: {imgs.shape}, dtype: {imgs.dtype}\n"
f"anns: {anns.shape}, dtype: {anns.dtype}")
print(anns.unique())
break

Concate

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import glob
import os
import os.path as osp

import cv2
import numpy as np

''' concate the splited ann patches back a whole ann image
+----------------------------------------------------------------------------------------+
| x+offset, y+offset, x+offset+512, y+offset+512 |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_0_512_512 | 512_0_1024_512 | ... | 5120_0_5632_512 | 5488_0_6000_512 |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_512_512_1024 | 512_512_1024_1024 | ... | 5120_512_5632_1024 | 5488_512_6000_1024 |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_1024_512_1536 | 512_1024_1024_1536 | ... | 5120_1024_5632_1536 | 5488_1024_6000_1536 |
+-----------------+--------------------+-----+---------------------+---------------------+
| : | : | | : | : |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_5120_512_5632 | 512_5120_1024_5632 | ... | 5120_5120_5632_5632 | 5488_5120_6000_5632 |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_5488_512_6000 | 512_5488_1024_6000 | ... | 5120_5488_5632_6000 | 5488_5488_6000_6000 |
+-----------------+--------------------+-----+---------------------+---------------------+
'''

# model_list = {"groundtruth", "image"}
# model_list = ["fcn_resnet101", "unet", "deeplabv3_resnet101", "fpn", "farseg", "crfnet"] # , "msfcn"
model_list = ["transunet", "swinunet", "unetformer", "cmtfnet", "sfanet"]
idx = "3_13"
data_name = f"potsdam_{idx}"
path2out = f"/home/zwx/deeplearning/Segment/images/{data_name}"
if not os.path.exists(path2out): os.makedirs(path2out)

for model_name in model_list:
path2ann_dir = f"/home/zwx/deeplearning/Segment/results/{data_name}/{model_name}"
ann_list = glob.glob(os.path.join(path2ann_dir, f'{idx}_*.png'))

out_ann = np.zeros(shape=(6000, 6000, 3), dtype=np.uint8)

for ann_path in ann_list:
basename = osp.basename(ann_path)
filename = osp.splitext(basename)[0]
"""e.g: '2_10_0_0_512_512'.split('_') -> ['2','10','0','0','512','512']"""
idx_i, idx_j = filename.split('_')[0:2]
coordinate = filename.split('_')[2:6]
'''str --> int'''
coordinate = [int(s) for s in coordinate]
start_x, start_y, end_x, end_y = coordinate

# startx endx
# | |
# starty —— +--------------------------------+
# | startx_starty_end_x_endy |
# endy —— +--------------------------------+
ann = cv2.imread(ann_path) # RGB -> BGR
out_ann[start_y:end_y, start_x:end_x, :] = ann

cv2.imwrite(filename=f"{path2out}/{idx}_{model_name}.png", img=out_ann) # BGR -> RGB
print(f"saved to {path2out}/{idx}_{model_name}.png")