Potsdam Dataset

ISPRS-Potsdam

The Potsdam dataset is for urban semantic segmentation used in the 2D Semantic Labeling Contest - Potsdam.

The dataset can be requested at the challenge homepage. You need to get a package file named utf-8' 'Potsdam.zip (size: 13.3GB), and unzip this package to get a folder named Potsdam which contains 10 files as follows:

Potsdam
├── 1_DSM.rar
├── 1_DSM_normalisation.zip
├── 2_Ortho_RGB.zip # <-- RGB image
├── 3_Ortho_IRRG.zip
├── 4_Ortho_RGBIR.zip
├── 5_Labels_all.zip # <-- don't have black boundary line
├── 5_Labels_all_noBoundary.zip # <-- have black boundary line
├── 5_Labels_for_participants.zip
├── 5_Labels_for_participants_no_Boundary.zip
└── assess_classification_reference_implementation.tgz

dataset_prepare.html#isprs-potsdam | mmsegmentation docs

where only 2_Ortho_RGB.zip and 5_Labels_all.zip are needed.

1
2
3

potsdam
├── 2_Ortho_RGB.zip
└── 5_Labels_all.zip

2_Ortho_RGB.zip 中的图片为 tif 格式，Windows 自带的各种图片工具都不能正常打开，使用 vscode 中 tif 插件，例如 TIFF Preview，可以查看 Potsdam 数据集的 tif 原图。此外 Potsdam 数据集的原图存在扭曲现象，这是数据集本身的问题，不是处理的失误。

Correspondence between colors and categories

$\begin{array}{cclcl} 0 & \fcolorbox{black}{white}{\quad} & [255, 255, 255] & \text{white} & \text{impervious surfaces}\\ 1 & \colorbox{blue}{$\quad$} & [0, 0, 255] & \text{blue} & \text{building}\\ 2 & \colorbox{cyan}{$\quad$} & [0, 255, 255] & \text{cyan} & \text{low vegetation}\\ 3 & \colorbox{green}{$\quad$} & [0, 255, 0] & \text{green} & \text{tree}\\ 4 & \colorbox{yellow}{$\quad$} & [255, 255, 0] & \text{yellow} & \text{car}\\ 5 & \colorbox{red}{$\quad$} & [255, 0, 0] & \text{red} & \text{clutter/background}\\ \end{array}$

'''
0: [255 255 255] : white  : impervious surface
1: [  0   0 255] : blue   : building
2: [  0 255 255] : cyan   : low vegetation
3: [  0 255   0] : green  : tree
4: [255 255   0] : yellow : car
5: [255   0   0] : red    : clutter/background
'''
color_map = np.array([[255, 255, 255], [  0,   0, 255], [  0, 255, 255], 
                      [  0, 255,   0], [255, 255,   0], [255,   0,   0]])

注意如果使用的是 5_Labels_all_noBoundary.zip 作为标签，其包含了边界标注 (黑色)，对应的 color_map 会有所不同

'''
0: [  0   0   0] : black  : boundary
1: [255 255 255] : white  : impervious surface
2: [  0   0 255] : blue   : building
3: [  0 255 255] : cyan   : low vegetation
4: [  0 255   0] : green  : tree
5: [255 255   0] : yellow : car
6: [255   0   0] : red    : clutter/background
'''
color_map = np.array([[0, 0, 0], [255, 255, 255], [0, 0, 255],
                      [0, 255, 255], [0, 255, 0], [255, 255, 0],
                      [255, 0, 0]])

Configuration

Use the code below to convert the original images (pixels 6000×6000) to patches (pixels 512×512)

In the 2_Ortho_RGB.zip file, it contains 38 pictures of size 6000x6000:

In the default configuration, We assign the training, validation, and test sets as follows, 21 for training, 1 for validation and 14 for testing

splits = {
    'train': [
        '2_11', '2_12', '3_10', '3_11', '3_12', '4_10', '4_11',
        '5_10', '5_11', '5_12', '6_8', '6_9', '6_10', # '4_12', '6_7', 
        '6_11', '6_12', '7_7', '7_8', '7_9', '7_10', '7_11', '7_12'
    ],
    'val': [
        '2_10'
    ],
    'test': [
        '2_13', '2_14', '3_13', '3_14', '4_13', '4_14', '4_15', '5_13',
        '5_14', '5_15', '6_13', '6_14', '6_15', '7_13'
    ]
}

where there is a problem with the label images for 4_12 and 6_7, so we discard them.

And every picture will be seplited into 12x12=144 patches of size 512x512. There are (38-2)x144=5184=3024+144+2016 patches in total, in which 21 images / 3024 patches are used for training, 1 image / 144 patches for velidation and 14 images / 2016 patches for testing.

For an image with size 6000x6000 and patch_size 512, 6000 = 11×512+368 = 12×512-144, which are not divisible, we split 6000 with 512 as follows:

1 2	x/ymin: [0, 512, 1024, 1536, 2048, 2560, 3072, 3584, 4096, 4608, 5120, 5632] offset: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -144]

1 2	0~512, 512~(2512), (2512)~(3512), ..., (10512)~(11512), (11512-144)~(12*512-144) 0~512, 512~1024, 1024~1563, ..., 5120~5632, 5488~6000

+----------------------------------------------------------------------------------------+
|                     x+offset, y+offset, x+offset+512, y+offset+512                     |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_0_512_512     | 512_0_1024_512     | ... | 5120_0_5632_512     | 5488_0_6000_512     |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_512_512_1024  | 512_512_1024_1024  | ... | 5120_512_5632_1024  | 5488_512_6000_1024  |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_1024_512_1536 | 512_1024_1024_1536 | ... | 5120_1024_5632_1536 | 5488_1024_6000_1536 |
+-----------------+--------------------+-----+---------------------+---------------------+
|        :        |          :         |     |          :          |          :          |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_5120_512_5632 | 512_5120_1024_5632 | ... | 5120_5120_5632_5632 | 5488_5120_6000_5632 |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_5488_512_6000 | 512_5488_1024_6000 | ... | 5120_5488_5632_6000 | 5488_5488_6000_6000 |
+-----------------+--------------------+-----+---------------------+---------------------+

文件结构

$ tree -L 3 path/to/datasets
datasets
├── potsdam_512x512
│   ├── ann_dir
│   │   ├── test
│   │   ├── train
│   │   └── val
│   └── img_dir
│       ├── test
│       ├── train
│       └── val
└── vaihingen_512x512
    ├── ann_dir
    │   ├── test
    │   ├── train
    │   └── val
    └── img_dir
        ├── test
        ├── train
        └── val

$ ls -l path/to/potsdam_512x512/img_dir/train/ | grep "^-" | wc -l
3024
$ ls -l path/to/potsdam_512x512/img_dir/test/ | grep "^-" | wc -l
2304
$ ls -l path/to/potsdam_512x512/img_dir/val/ | grep "^-" | wc -l
144

Potsdam 数据集中的错误

错误来源于 potsdam 数据集本身的两张 label。第一张是 top_potsdam_6_7_label.tif 包含了错误的像素值

from PIL import Image
import numpy as np

mask_path = 'path/to/top_potsdam_6_7_label.tif'

mask = Image.open(mask_path)
mask = np.array(mask).reshape(-1,3)
values, counts = np.unique(mask, return_counts=True, axis=0)
print(values, counts)

通过python统计 top_potsdam_6_7_label.tif 的像素输出结果如下

# values
[[  0   0 255]
 [  0 255   0]
 [  0 255 255]
 [252 255   0] # <-- Error
 [255   0   0]
 [255 255   0]
 [255 255 255]]
# counts
[ 4857912  5669942 19962121   246304   797467     6749  4459505]

输出中包含 2 个 list，上面的 list 为 rgb 颜色值，下面为各颜色对应的像素数。注意 6_7 包含了一类非正常的 rgb 值 [252, 255, 0]，其中第一个值是 252 并非 255，这个 rgb 值在为像素打 label 时造成了错误，所以可以将这个 252 改成 255 即可。

第二张错误 label 是 top_potsdam_4_12_label.tif，这张图片不正常，其像素标注值非常混乱

from PIL import Image
import numpy as np

mask_path = '../datasets/original_potsdam/label/top_potsdam_4_12_label.tif'

mask = Image.open(mask_path)
mask = np.array(mask).reshape(-1,3)
values, counts = np.unique(mask, return_counts=True, axis=0)
print(len(values)) # 24850 <-- Not Six Classes

从输出结果可以看到，它包含 24850 种 RGB 颜色值，它的像素标签值非常混乱，我们将它舍去。

Split Code

Prepare Dataset: ISPRS-Potsdam | mmsegmentation doc

tools/dataset_converters/potsdam.py | mmsegmentation github

# potsdam
# ├── 2_Ortho_RGB.zip
# ├── 5_Labels_all.zip
python path\to\potsdam.py path\to\potsdam

# ref: https://github.com/open-mmlab/mmsegmentation/blob/main/tools/dataset_converters/potsdam.py
import argparse
import glob
import math
import os
import os.path as osp
import tempfile
import zipfile
from tqdm import tqdm

import cv2
import numpy as np

def get_parser():
    parser = argparse.ArgumentParser(
        description='Convert potsdam dataset to mmsegmentation format')
    parser.add_argument('dataset_path', help='potsdam folder path')
    parser.add_argument('--tmp_dir', help='path of the temporary directory')
    parser.add_argument('-o', '--out_dir', help='output path')
    parser.add_argument(
        '--clip_size',
        type=int,
        help='clipped size of image after preparation',
        default=512)
    parser.add_argument(
        '--stride_size',
        type=int,
        help='stride of clipping original images',
        default=256)
    return parser

def clip_big_image(image_path, clip_save_dir, args, to_label=False):
    '''
    Original image of Potsdam dataset is very large, thus pre-processing
    of them is adopted. Given fixed clip size and stride size to generate
    clipped image, the intersection of width and height is determined.
    For example, given one 5120 x 5120 original image, the clip size is
    512 and stride size is 256, thus it would generate 20x20 = 400 images
    whose size are all 512x512.
    '''
    image = cv2.imread(image_path) # get a BGR/BRIR image
    image = np.array(image)

    h, w, c = image.shape # 6000, 6000, 3
    clip_size = args.clip_size # 512
    stride_size = args.stride_size # 256

    num_rows = math.ceil((h - clip_size) / stride_size) \
        if math.ceil((h - clip_size) / stride_size) * stride_size + clip_size >= h \
        else math.ceil((h - clip_size) / stride_size) + 1
    num_cols = math.ceil((w - clip_size) / stride_size) \
        if math.ceil((w - clip_size) / stride_size) * stride_size + clip_size >= w \
        else math.ceil((w - clip_size) / stride_size) + 1005

    x, y = np.meshgrid(np.arange(num_cols + 1), np.arange(num_rows + 1))
    xmin = x * clip_size
    ymin = y * clip_size

    xmin = xmin.ravel()
    ymin = ymin.ravel()
    xmin_offset = np.where(xmin + clip_size > w, w - xmin - clip_size, np.zeros_like(xmin))
    ymin_offset = np.where(ymin + clip_size > h, h - ymin - clip_size, np.zeros_like(ymin))
    boxes = np.stack([
        xmin + xmin_offset, ymin + ymin_offset,
        np.minimum(xmin + clip_size, w),
        np.minimum(ymin + clip_size, h)
    ], axis=1)

    if to_label:
        '''This is the normal RGB color map
             R   G   B   |   B   G   R
        0: [255 255 255] | [255 255 255] : impervious surfaces
        1: [  0   0 255] | [255   0   0] : building
        2: [  0 255 255] | [255 255   0] : low vegetation
        3: [  0 255   0] | [  0 255   0] : tree
        4: [255 255   0] | [  0 255 255] : car
        5: [255   0   0] | [  0   0 255] : clutter/background
        '''
        # Note it is a BGR color map in this place rather than RGB,
        # because cv2 reverses the RGB to BGR when call the imread()
        # and reverses BGR to RGB when call the imwrite().
        color_map = np.array([[255, 255, 255], [255,   0,   0], [255, 255,   0],
                              [  0, 255,   0], [  0, 255, 255], [  0,   0, 255]])
        
        flatten_v = np.matmul( image.reshape(-1, c), np.array([2, 3, 4]).reshape(3, 1))
        out = np.zeros_like(flatten_v)
        for idx, class_color in enumerate(color_map):
            value_idx = np.matmul(class_color, np.array([2, 3, 4]).reshape(3, 1))
            out[flatten_v == value_idx] = idx
        image = out.reshape(h, w)

    for box in boxes:
        start_x, start_y, end_x, end_y = box
        clipped_image = image[start_y:end_y, start_x:end_x] if to_label else image[start_y:end_y, start_x:end_x, :]
        idx_i, idx_j = osp.basename(image_path).split('_')[2:4]

        cv2.imwrite(img=clipped_image.astype(np.uint8),
                    filename=osp.join(clip_save_dir, f'{idx_i}_{idx_j}_{start_x}_{start_y}_{end_x}_{end_y}.png'))

def main(args):

    splits = {
        'train': [
            '2_11', '2_12', '3_10', '3_11', '3_12', '4_10', '4_11',
            '5_10', '5_11', '5_12', '6_8', '6_9', '6_10', # '4_12', '6_7', 
            '6_11', '6_12', '7_7', '7_8', '7_9', '7_10', '7_11', '7_12'
        ], # there is a problem with the label images for 4_12 and 6_7, so we discard them.
        'val': [
            '2_10'
        ],
        'test': [
            '2_13', '2_14', '3_13', '3_14', '4_13', '4_14', '4_15', '5_13',
            '5_14', '6_13', '6_14', '6_15', '7_13'
        ]
    }

    dataset_path = args.dataset_path
    out_dir = out_dir = osp.join('data', 'potsdam') if args.out_dir is None else args.out_dir

    print('Making directories...')
    if not osp.exists(osp.join(out_dir, 'img_dir', 'train')):
        os.makedirs(osp.join(out_dir, 'img_dir', 'train'))
    if not osp.exists(osp.join(out_dir, 'img_dir', 'val')):
        os.makedirs(osp.join(out_dir, 'img_dir', 'val'))
    if not osp.exists(osp.join(out_dir, 'img_dir', 'test')):
        os.makedirs(osp.join(out_dir, 'img_dir', 'test'))

    if not osp.exists(osp.join(out_dir, 'ann_dir', 'train')):
        os.makedirs(osp.join(out_dir, 'ann_dir', 'train'))
    if not osp.exists(osp.join(out_dir, 'ann_dir', 'val')):
        os.makedirs(osp.join(out_dir, 'ann_dir', 'val'))
    if not osp.exists(osp.join(out_dir, 'ann_dir', 'test')):
        os.makedirs(osp.join(out_dir, 'ann_dir', 'test'))

    zipp_list = glob.glob(os.path.join(dataset_path, '*.zip'))
    print('Find the data', zipp_list)

    for zipp in zipp_list:
        with tempfile.TemporaryDirectory(dir=args.tmp_dir) as tmp_dir: # tmp_dir changes in every loop if dir=None
            # unzip
            print("Unzipping to the temporary folder...")
            zip_file = zipfile.ZipFile(zipp) # open zipfile
            zip_file.extractall(tmp_dir)     # extract zipfile
            # path2tif
            src_path_list: list = []
            mode, to_label = None, None
            if 'Ortho' in zipp:
                mode, to_label = "img_dir", False
                src_path_list = glob.glob(os.path.join(os.path.join(tmp_dir, os.listdir(tmp_dir)[0]), '*.tif'))
            elif 'Labels' in zipp:
                mode, to_label = "ann_dir", True
                src_path_list = glob.glob(os.path.join(tmp_dir, '*.tif'))
            else: continue

            prog_bar = tqdm(src_path_list, desc=mode)
            for src_path in prog_bar:
                # e.g: 'top_potsdam_2_10_RGB.tif'.split('_')[2:4] -> [2, 10]
                idx_i, idx_j = osp.basename(src_path).split('_')[2:4]
                if f'{idx_i}_{idx_j}' in splits['train']:
                    data_type = 'train'
                elif f'{idx_i}_{idx_j}' in splits['val']:
                    data_type = 'val'
                elif f'{idx_i}_{idx_j}' in splits['val']:
                    data_type = 'test'
                else: continue
                
                dst_dir = osp.join(out_dir, mode, data_type)
                clip_big_image(src_path, dst_dir, args, to_label=to_label)

        print('Removing the temporary files...')
    print('Done!')

if __name__ == '__main__':
    # path/to/potsdam
    # ├── 2_Ortho_RGB.zip
    # └── 5_Labels_all.zip
    parser = get_parser()
    args = parser.parse_args(["/15T-2/zwx/datasets_package/potsdam/", # path to *.zip
                              "--out_dir", "/15T-2/zwx/datasets/potsdam_512x512",
                              "--tmp_dir", "/15T-2/zwx/temp",
                              "--clip_size", "512", "--stride_size", "512"])
    main(args)

mmsegmentation potsdam.py 代码中的 color_map

源代码见链接 potsdam.py | github，下面的代码块是截取的 color_map 部分

mmsegmentation 源码中对 color_map 的设置为 BGR，与正常的 RGB 正好相反，这一点需要注意。

# mmsegmentation 的 color_map 颜色顺序为 BGR
color_map = np.array([[0, 0, 0], [255, 255, 255], [255, 0, 0],
                      [255, 255, 0], [0, 255, 0], [0, 255, 255],
                      [0, 0, 255]])

# 正常颜色顺序应该为 RGB
color_map = np.array([[0, 0, 0], [255, 255, 255], [0, 0, 255],
                      [0, 255, 255], [0, 255, 0], [255, 255, 0],
                      [255, 0, 0]])

mmsegmentation 之所以标注 BGR 的颜色顺序，应该是其 imread 和 imwrite 方法的底层调用了 cv2 的 imread 和 imwrite，或者模仿了它们的设计。这里可以参考一下这篇博客 cv2如何处理RGB和BGR | 文羊羽。

mmsegmentaion 的 color map, BGR

0: [  0   0   0] : boundary
1: [255 255 255] : impervious surfaces
2: [255   0   0] : background
3: [255 255   0] : car
4: [  0 255   0] : tree
5: [  0 255 255] : low vegetation
6: [  0   0 255] : building

正常的 color map, RGB

0: [  0   0   0] : boundary
1: [255 255 255] : impervious surfaces
2: [  0   0 255] : building
3: [  0 255 255] : low vegetation
4: [  0 255   0] : tree
5: [255 255   0] : car
6: [255   0   0] : clutter/background

ann2rgb

from typing import Union
import numpy
import torch

color_map = {
  0: [255, 255, 255], # white
  1: [  0,   0, 255], # blue
  2: [  0, 255, 255], # cyan
  3: [  0, 255,   0], # green
  4: [255, 255,   0], # yellow
  5: [255,   0,   0], # red
}

def ann2rgb(annimg: Union[numpy.ndarray, torch.tensor]) -> numpy.ndarray:
  """convert [H, W] annotation mask to [H, W, 3] rgb image"""
  h, w = annimg.shape
  rgbimg = numpy.zeros(shape=(h, w, 3), dtype=numpy.uint8)
  for idx, rgb in color_map.items():
    rgbimg[annimg == idx] = rgb
  return rgbimg

if __name__ == "__main__":
  # annimg = torch.randint(low=0, high=6, size=(224, 224))
  annimg = numpy.random.randint(low=0, high=6, size=(224, 224))
  rgbimg = ann2rgb(annimg)
  print(rgbimg.shape)

Datasets

import os
import os.path as osp
from typing import Literal, Tuple

from PIL import Image

import torch
from torch.utils.data import Dataset
from torchvision.transforms import v2

transform = v2.Compose([
    # PILToTensor() convert a PIL Image with shape [H, W, C] and type uint8
    # to a torch tensor with shape [C, H, W] and type torch.uint8
    v2.PILToTensor(),
    # ToDtype() convert a uint8 tensor in the range [0, 255] to
    # a float32 tensor in the range [0.0, 1.0]
    v2.ToDtype(torch.float32, scale=True),
    # Normalize() normalize a tensor by using the equation below:
    # output[channel] = (input[channel] - mean[channel]) / std[channel]
    v2.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
])

CLASSES = ('ImSurf', 'Building', 'LowVeg', 'Tree', 'Car', 'Clutter')

'''
datasets
├── potsdam_512x512
│   ├── ann_dir
│   │   ├── test
│   │   ├── train
│   │   └── val
│   └── img_dir
│       ├── test
│       ├── train
│       └── val
└── vaihingen_512x512
    ├── ann_dir
    │   ├── test
    │   ├── train
    │   └── val
    └── img_dir
        ├── test
        ├── train
        └── val
'''

class IsprsDataset(Dataset):
    def __init__(self, data_root, mode:Literal["train", "test", "val"], transform=transform,
                 img_dir='img_dir', ann_dir='ann_dir', img_suffix='.png', ann_suffix='.png', idx=None):
        self.data_root = data_root
        self.img_dir = img_dir
        self.ann_dir = ann_dir
        self.img_suffix = img_suffix
        self.ann_suffix = ann_suffix
        self.transform = transform
        self.mode = mode
        self.name = self.get_name(idx)

    def __len__(self):
        return len(self.name)

    def __getitem__(self, index):
        img, ann = self.load_img_and_ann(index)
        if self.transform is None:
            # PILToTensor() convert a PIL Image with shape [H, W, C] and type uint8
            # to a torch tensor with shape [C, H, W] and type torch.uint8
            self.transform = v2.PILToTensor()

        # apply transform
        img, ann = self.transform(img, ann)
        # convert shape and type from [1, H, W], uint8 to [H, W], long
        ann = ann.squeeze(0).to(torch.long)

        name = self.name[index]
        results = {"name":name, "img":img, "ann":ann}
        return results

    def get_name(self, idx=None):
        img_basename_list = os.listdir(osp.join(self.data_root, self.img_dir, self.mode))
        ann_basename_list = os.listdir(osp.join(self.data_root, self.ann_dir, self.mode))
        if idx is not None: # e.g. 3_13
            img_basename_list = [basename for basename in img_basename_list if basename.startswith(idx)]
            ann_basename_list = [basename for basename in ann_basename_list if basename.startswith(idx)]
        assert len(img_basename_list) == len(ann_basename_list), "numbers of img and ann don't match"
        name_list = [osp.splitext(basename)[0] for basename in ann_basename_list]
        return name_list
    
    def load_img_and_ann(self, index) -> Tuple[Image.Image]:
        name = self.name[index]
        img_path = osp.join(self.data_root, self.img_dir, self.mode, name + self.img_suffix)
        ann_path = osp.join(self.data_root, self.ann_dir, self.mode, name + self.ann_suffix)
        img = Image.open(img_path).convert('RGB') # range [0, 255)
        ann = Image.open(ann_path).convert('L') # range [0, 6)
        return img, ann


if __name__ == "__main__":
    from torch.utils.data import DataLoader

    data_root = "/15T-2/zwx/datasets/potsdam_512x512/"
    # data_root = "/15T-2/zwx/datasets/vaihingen_512x512/"

    trainset = IsprsDataset(data_root=data_root)
    trainloader = DataLoader(dataset=trainset, batch_size=8, shuffle=True, num_workers=2)

    for batch in trainloader:
        imgs, anns = batch['img'], batch['ann']
        print(f"imgs: {imgs.shape}, dtype: {imgs.dtype}\n"
              f"anns: {anns.shape}, dtype: {anns.dtype}")
        print(anns.unique())
        break

Concate

import glob
import os
import os.path as osp

import cv2
import numpy as np

''' concate the splited ann patches back a whole ann image
+----------------------------------------------------------------------------------------+
|                     x+offset, y+offset, x+offset+512, y+offset+512                     |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_0_512_512     | 512_0_1024_512     | ... | 5120_0_5632_512     | 5488_0_6000_512     |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_512_512_1024  | 512_512_1024_1024  | ... | 5120_512_5632_1024  | 5488_512_6000_1024  |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_1024_512_1536 | 512_1024_1024_1536 | ... | 5120_1024_5632_1536 | 5488_1024_6000_1536 |
+-----------------+--------------------+-----+---------------------+---------------------+
|        :        |          :         |     |          :          |          :          |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_5120_512_5632 | 512_5120_1024_5632 | ... | 5120_5120_5632_5632 | 5488_5120_6000_5632 |
+-----------------+--------------------+-----+---------------------+---------------------+
| 0_5488_512_6000 | 512_5488_1024_6000 | ... | 5120_5488_5632_6000 | 5488_5488_6000_6000 |
+-----------------+--------------------+-----+---------------------+---------------------+
'''

# model_list = {"groundtruth", "image"}
# model_list = ["fcn_resnet101", "unet", "deeplabv3_resnet101", "fpn", "farseg", "crfnet"] # , "msfcn"
model_list = ["transunet", "swinunet", "unetformer", "cmtfnet", "sfanet"]
idx = "3_13"
data_name = f"potsdam_{idx}"
path2out = f"/home/zwx/deeplearning/Segment/images/{data_name}"
if not os.path.exists(path2out): os.makedirs(path2out)

for model_name in model_list:
    path2ann_dir = f"/home/zwx/deeplearning/Segment/results/{data_name}/{model_name}"
    ann_list = glob.glob(os.path.join(path2ann_dir, f'{idx}_*.png'))

    out_ann = np.zeros(shape=(6000, 6000, 3), dtype=np.uint8)
    
    for ann_path in ann_list:
        basename = osp.basename(ann_path)
        filename = osp.splitext(basename)[0]
        """e.g: '2_10_0_0_512_512'.split('_') -> ['2','10','0','0','512','512']"""
        idx_i, idx_j = filename.split('_')[0:2]
        coordinate = filename.split('_')[2:6]
        '''str --> int'''
        coordinate = [int(s) for s in coordinate]
        start_x, start_y, end_x, end_y = coordinate

        #         startx                           endx
        #           |                                |
        # starty —— +--------------------------------+
        #           |    startx_starty_end_x_endy    |
        #   endy —— +--------------------------------+
        ann = cv2.imread(ann_path) # RGB -> BGR
        out_ann[start_y:end_y, start_x:end_x, :] = ann

    cv2.imwrite(filename=f"{path2out}/{idx}_{model_name}.png", img=out_ann) # BGR -> RGB
    print(f"saved to {path2out}/{idx}_{model_name}.png")