Vaihingen Dataset

ISPRS-Vaihingen

The Vaihingen dataset is for urban semantic segmentation used in the 2D Semantic Labeling Contest - Vaihingen.

The dataset can be requested at the challenge homepage. You need to get a package file named Vaihingen.zip (size: 14.9 GB), and unzip this package to get a folder named Vaihingen which contains 14 files as follows:

Vaihingen
├── 3DLabeling
├── ALS
├── DSM
├── Images
├── Ortho
├── Reference_3d_reconstruction
├── Reference_object_detection
├── dsm_09cm_matching_area11.tif
├── ISPRS_semeantic_labeing_Vaihingen_ground_truth_eroded_for_participants.zip
├── ISPRS_semeantic_labeling_Vaihingen.zip # <-- original image
├── ISPRS_semeantic_labeling_Vaihingen_ground_truth_COMPLETE.zip # <-- ground truth without balck boundary line
├── ISPRS_semeantic_labeling_Vaihingen_ground_truth_eroded_COMPLETE.zip # <-- ground truth with balck boundary line
├── Overview_Vaihingen_DMC.pdf
└── Vaihingen_dsm_tiles_geoinfo.zip

dataset_prepare.html#isprs-vaihingen | mmsegmentation docs

其中我们需要 ISPRS_semantic_labeling_Vaihingen.zip 和 ISPRS_semantic_labeling_Vaihingen_ground_truth_COMPLETE.zip 两个文件

1
2
3

vaihingen
├── ISPRS_semantic_labeling_Vaihingen.zip # <-- image
└── ISPRS_semantic_labeling_Vaihingen_ground_truth_COMPLETE.zip # <-- label

其中 ISPRS_semantic_labeling_Vaihingen.zip 包含原始的 IRRG 图像

ISPRS_semeantic_labeling_Vaihingen.zip
├── dsm
├── gts_for_participants
└── top # <-- IRRG image

对于标签一共有两种，一种是不带 eroded 黑色边界的 ground truth，另一种是带这种 eroded 黑色边界的 ground truth。请选择使用，本文这里选择的是不带黑色边界的 ground truth，即 ISPRS_semantic_labeling_Vaihingen_ground_truth_COMPLETE.zip 这个文件。

数据集中的图片为 tif 格式，Windows 自带的各种图片工具可能不能正常打开，使用 vscode 中 tif 插件，例如 TIFF Preview，可以查看数据集的 tif 原图。

Correspondence between colors and categories

$\begin{array}{cclcl} 0 & \colorbox{white}{$\quad$} & [255, 255, 255] & \text{white} & \text{impervious surfaces}\\ 1 & \colorbox{blue}{$\quad$} & [0, 0, 255] & \text{blue} & \text{building}\\ 2 & \colorbox{cyan}{$\quad$} & [0, 255, 255] & \text{cyan} & \text{low vegetation}\\ 3 & \colorbox{green}{$\quad$} & [0, 255, 0] & \text{green} & \text{tree}\\ 4 & \colorbox{yellow}{$\quad$} & [255, 255, 0] & \text{yellow} & \text{car}\\ 5 & \colorbox{red}{$\quad$} & [255, 0, 0] & \text{red} & \text{clutter/background}\\ \end{array}$

'''
0: [255 255 255] : white  : impervious surface
1: [  0   0 255] : blue   : building
2: [  0 255 255] : cyan   : low vegetation
3: [  0 255   0] : green  : tree
4: [255 255   0] : yellow : car
5: [255   0   0] : red    : clutter/background
'''
color_map = np.array([[255, 255, 255], [  0,   0, 255], [  0, 255, 255], 
                      [  0, 255,   0], [255, 255,   0], [255,   0,   0]])

注意如果使用的是 ISPRS_semeantic_labeling_Vaihingen_ground_truth_eroded_COMPLETE.zip 作为标签，其包含了边界标注，对应的 color_map 会有所不同

'''
0: [  0   0   0] : black  : boundary
1: [255 255 255] : white  : impervious surface
2: [  0   0 255] : blue   : building
3: [  0 255 255] : cyan   : low vegetation
4: [  0 255   0] : green  : tree
5: [255 255   0] : yellow : car
6: [255   0   0] : red    : clutter/background
'''
color_map = np.array([[0, 0, 0], [255, 255, 255], [0, 0, 255],
                      [0, 255, 255], [0, 255, 0], [255, 255, 0],
                      [255, 0, 0]])

Configuration

Use the code below to convert the original images to patches (pixels 512×512)

In the 2_Ortho_RGB.zip file, it contains 33 pictures with different sizes:

In the default configuration, We assign the training, validation, and test sets as follows, 15 for training, 1 for validation and 17 for testing

splits = {
    'train': [
        'area1', 'area3', 'area5', 'area7', 'area11', 'area13', 'area15', 'area17',
        'area21', 'area23', 'area26', 'area28', 'area32', 'area34', 'area37'
    ], # 15 for training
    'val': [
        'area30'
    ], # 1 for validation
    'test': [
        'area2', 'area4', 'area6', 'area8', 'area10', 'area12', 'area14', 'area16', 'area20',
        'area22', 'area24', 'area27', 'area29', 'area31', 'area33', 'area35', 'area38'
    ], # 17 for testing
}

文件结构

$ tree datasets -L 3
datasets
├── potsdam_512x512
│   ├── ann_dir
│   │   ├── test
│   │   ├── train
│   │   └── val
│   └── img_dir
│       ├── test
│       ├── train
│       └── val
└── vaihingen_512x512
    ├── ann_dir
    │   ├── test
    │   ├── train
    │   └── val
    └── img_dir
        ├── test
        ├── train
        └── val

$ ls -l path/to/vaihingen_512x512/img_dir/train/ | grep "^-" | wc -l
320
$ ls -l path/to/vaihingen_512x512/img_dir/test/ | grep "^-" | wc -l
398
$ ls -l path/to/vaihingen_512x512/img_dir/val/ | grep "^-" | wc -l
24

Split Code

Prepare Dataset: ISPRS-Vaihingen | mmsegmentation doc

tools/dataset_converters/vaihingen.py | mmsegmentation github

注意我们在下面代码中使用的是 cv2 来读取和保存 crop 后的标签图片，所以使用的 color map 是 BGR 格式的。cv2.imread 在读取本地的 RGB 图片时会按照 BGR 的顺序读取，所以 cv2.imread 读取在内存里面图片的通道顺序是 BGR 的；cv2.imwrite 在保存图片时默认内存里面需要被保存图片的通道顺序是 BGR 的，它会按照 BGR 的通道顺序将图片写到本地变成 RGB 的通道顺序。

'''
     R   G   B   |   B   G   R
0: [255 255 255] | [255 255 255] : impervious surfaces
1: [  0   0 255] | [255   0   0] : building
2: [  0 255 255] | [255 255   0] : low vegetation
3: [  0 255   0] | [  0 255   0] : tree
4: [255 255   0] | [  0 255 255] : car
5: [255   0   0] | [  0   0 255] : clutter/background
'''
color_map = np.array([[255, 255, 255], [255,   0,   0], [255, 255,   0],
                      [  0, 255,   0], [  0, 255, 255], [  0,   0, 255]])

# ref: https://github.com/open-mmlab/mmsegmentation/blob/main/tools/dataset_converters/vaihingen.py
import argparse
import glob
import math
import os
import os.path as osp
import tempfile
import zipfile
from tqdm import tqdm

import cv2
import numpy as np

def get_parser():
    parser = argparse.ArgumentParser(
        description='Convert vaihingen dataset to mmsegmentation format')
    parser.add_argument('dataset_path', help='vaihingen folder path')
    parser.add_argument('--tmp_dir', help='path of the temporary directory')
    parser.add_argument('-o', '--out_dir', help='output path')
    parser.add_argument(
        '--clip_size',
        type=int,
        help='clipped size of image after preparation',
        default=512)
    parser.add_argument(
        '--stride_size',
        type=int,
        help='stride of clipping original images',
        default=256)
    return parser

def clip_big_image(image_path, clip_save_dir, to_label=False):
    # Original image of Vaihingen dataset is very large, thus pre-processing
    # of them is adopted. Given fixed clip size and stride size to generate
    # clipped image, the intersection　of width and height is determined.
    # For example, given one 5120 x 5120 original image, the clip size is
    # 512 and stride size is 256, thus it would generate 20x20 = 400 images
    # whose size are all 512x512.
    
    image = cv2.imread(image_path) # get a BGR/BRIR image

    h, w, c = image.shape
    cs = args.clip_size
    ss = args.stride_size

    num_rows = math.ceil((h - cs) / ss) \
            if math.ceil((h - cs) / ss) * ss + cs >= h \
            else math.ceil((h - cs) / ss) + 1
    num_cols = math.ceil((w - cs) / ss) \
            if math.ceil((w - cs) / ss) * ss + cs >= w \
            else math.ceil((w - cs) / ss) + 1

    x, y = np.meshgrid(np.arange(num_cols + 1), np.arange(num_rows + 1))
    xmin = x * cs
    ymin = y * cs

    xmin = xmin.ravel()
    ymin = ymin.ravel()
    xmin_offset = np.where(xmin + cs > w, w - xmin - cs, np.zeros_like(xmin))
    ymin_offset = np.where(ymin + cs > h, h - ymin - cs, np.zeros_like(ymin))
    boxes = np.stack([
        xmin + xmin_offset, ymin + ymin_offset,
        np.minimum(xmin + cs, w),
        np.minimum(ymin + cs, h)
    ], axis=1)

    if to_label:
        '''This is the normal RGB color map
             R   G   B   |   B   G   R
        0: [255 255 255] | [255 255 255] : impervious surfaces
        1: [  0   0 255] | [255   0   0] : building
        2: [  0 255 255] | [255 255   0] : low vegetation
        3: [  0 255   0] | [  0 255   0] : tree
        4: [255 255   0] | [  0 255 255] : car
        5: [255   0   0] | [  0   0 255] : clutter/background
        '''
        # Note it is a BGR color map in this place rather than RGB,
        # because cv2 reverses the RGB to BGR when call the imread()
        # and reverses BGR to RGB when call the imwrite().
        color_map = np.array([[255, 255, 255], [255,   0,   0], [255, 255,   0],
                              [  0, 255,   0], [  0, 255, 255], [  0,   0, 255]])
        flatten_v = np.matmul(image.reshape(-1, c), np.array([2, 3, 4]).reshape(3, 1))
        out = np.zeros_like(flatten_v)
        for idx, class_color in enumerate(color_map):
            value_idx = np.matmul(class_color, np.array([2, 3, 4]).reshape(3, 1))
            out[flatten_v == value_idx] = idx
        image = out.reshape(h, w)

    for box in boxes:
        start_x, start_y, end_x, end_y = box
        clipped_image = image[start_y:end_y, start_x:end_x] if to_label else image[start_y:end_y, start_x:end_x, :]
        area_idx = osp.basename(image_path).split('_')[3].strip('.tif')
        cv2.imwrite(img=clipped_image.astype(np.uint8),
                    filename=osp.join(clip_save_dir, f'{area_idx}_{start_x}_{start_y}_{end_x}_{end_y}.png'))

def main(args):
    splits = {
        'train': [
            'area1', 'area3', 'area5', 'area7', 'area11', 'area13', 'area15', 'area17',
            'area21', 'area23', 'area26', 'area28', 'area32', 'area34', 'area37'
        ], # 15
        'val': [
            'area30'
        ], # 1
        'test': [
            'area2', 'area4', 'area6', 'area8', 'area10', 'area12', 'area14', 'area16', 'area20',
            'area22', 'area24', 'area27', 'area29', 'area31', 'area33', 'area35', 'area38'
        ], # 17
    }

    dataset_path = args.dataset_path
    if args.out_dir is None:
        out_dir = osp.join('data', 'vaihingen')
    else:
        out_dir = args.out_dir

    print('Making directories...')
    if not osp.exists(osp.join(out_dir, 'img_dir', 'train')):
        os.makedirs(osp.join(out_dir, 'img_dir', 'train'))
    if not osp.exists(osp.join(out_dir, 'img_dir', 'val')):
        os.makedirs(osp.join(out_dir, 'img_dir', 'val'))
    if not osp.exists(osp.join(out_dir, 'img_dir', 'test')):
        os.makedirs(osp.join(out_dir, 'img_dir', 'test'))

    if not osp.exists(osp.join(out_dir, 'ann_dir', 'train')):
        os.makedirs(osp.join(out_dir, 'ann_dir', 'train'))
    if not osp.exists(osp.join(out_dir, 'ann_dir', 'val')):
        os.makedirs(osp.join(out_dir, 'ann_dir', 'val'))
    if not osp.exists(osp.join(out_dir, 'ann_dir', 'test')):
        os.makedirs(osp.join(out_dir, 'ann_dir', 'test'))

    zipp_list = glob.glob(os.path.join(dataset_path, '*.zip'))
    print('Find the data', zipp_list)

    for zipp in zipp_list:
        with tempfile.TemporaryDirectory(dir=args.tmp_dir) as tmp_dir:
            # unzip
            print("Unzipping to the temporary folder...")
            zip_file = zipfile.ZipFile(zipp) # open zipfile
            zip_file.extractall(tmp_dir)     # extract zipfile
            # path2tif
            src_path_list: list = []
            mode, to_label = None, None
            if 'ISPRS_semantic_labeling_Vaihingen.zip' in zipp:
                mode, to_label = "img_dir", False
                # 'ISPRS_semantic_labeling_Vaihingen/top' folder
                src_path_list = glob.glob(os.path.join(os.path.join(tmp_dir, 'top'), '*.tif'))
            elif 'ISPRS_semantic_labeling_Vaihingen_ground_truth_COMPLETE.zip' in zipp:
                mode, to_label = "ann_dir", True
                src_path_list = glob.glob(os.path.join(tmp_dir, '*.tif'))
            else: continue

            prog_bar = tqdm(src_path_list, desc=mode)
            for src_path in prog_bar:
                area_idx = osp.basename(src_path).split('_')[3].strip('.tif')
                if area_idx in splits['train']:
                    data_type = 'train'
                elif area_idx in splits['val']:
                    data_type = 'val'
                elif area_idx in splits['test']:
                    data_type = 'test'
                else: continue

                dst_dir = osp.join(out_dir, mode, data_type)
                clip_big_image(src_path, dst_dir, to_label=to_label)

        print('Removing the temporary files...')
    print('Done!')

if __name__ == '__main__':
    # path/to/vaihingen
    # ├── ISPRS_semantic_labeling_Vaihingen.zip # <-- image
    # └── ISPRS_semantic_labeling_Vaihingen_ground_truth_COMPLETE.zip # <-- label
    parser = get_parser()
    args = parser.parse_args(["/15T-2/zwx/datasets_package/vaihingen/", # path to *.zip
                              "--out_dir", "/15T-2/zwx/datasets/vaihingen_512x512",
                              "--tmp_dir", "/15T-2/zwx/temp",
                              "--clip_size", "512", "--stride_size", "512"])
    main(args)

mmsegmentation potsdam.py 代码中的 color_map

源代码见链接 potsdam.py | github，下面的代码块是截取的 color_map 部分

mmsegmentation 源码中对 color_map 的设置为 BGR，与正常的 RGB 正好相反，这一点需要注意。

# mmsegmentation 的 color_map 颜色顺序为 BGR
color_map = np.array([[0, 0, 0], [255, 255, 255], [255, 0, 0],
                      [255, 255, 0], [0, 255, 0], [0, 255, 255],
                      [0, 0, 255]])

# 正常颜色顺序应该为 RGB
color_map = np.array([[0, 0, 0], [255, 255, 255], [0, 0, 255],
                      [0, 255, 255], [0, 255, 0], [255, 255, 0],
                      [255, 0, 0]])

mmsegmentation 之所以标注 BGR 的颜色顺序，应该是其 imread 和 imwrite 方法的底层调用了 cv2 的 imread 和 imwrite，或者模仿了它们的设计。这里可以参考一下这篇博客 cv2如何处理RGB和BGR | 文羊羽。

mmsegmentaion 的 color map, BGR

0: [  0   0   0] : boundary
1: [255 255 255] : impervious surfaces
2: [255   0   0] : background
3: [255 255   0] : car
4: [  0 255   0] : tree
5: [  0 255 255] : low vegetation
6: [  0   0 255] : building

正常的 color map, RGB

0: [  0   0   0] : boundary
1: [255 255 255] : impervious surfaces
2: [  0   0 255] : building
3: [  0 255 255] : low vegetation
4: [  0 255   0] : tree
5: [255 255   0] : car
6: [255   0   0] : clutter/background

ann2rgb

from typing import Union
import numpy
import torch

color_map = {
  0: [255, 255, 255], # white
  1: [  0,   0, 255], # blue
  2: [  0, 255, 255], # cyan
  3: [  0, 255,   0], # green
  4: [255, 255,   0], # yellow
  5: [255,   0,   0], # red
}

def ann2rgb(annimg: Union[numpy.ndarray, torch.tensor]) -> numpy.ndarray:
  """convert [H, W] annotation mask to [H, W, 3] rgb image"""
  h, w = annimg.shape
  rgbimg = numpy.zeros(shape=(h, w, 3), dtype=numpy.uint8)
  for idx, rgb in color_map.items():
    rgbimg[annimg == idx] = rgb
  return rgbimg

if __name__ == "__main__":
  # annimg = torch.randint(low=0, high=6, size=(224, 224))
  annimg = numpy.random.randint(low=0, high=6, size=(224, 224))
  rgbimg = ann2rgb(annimg)
  print(rgbimg.shape)