Run Swin Transformer for Semantic Segmentation on Colab

Swin-Transformer-Semantic-Segmentation | GitHub

mmsegmentation | GitHub

mmsegmentation/…/get_started.md#install-on-google-colab | GitHub

get_started/installation | mmcv

Swin Transformer算法环境配置(语义分割)| ZhiHu

MMSegmentation_Tutorial.ipynb | Colab

install mmcv and mmsegmentation

1
2
!pip install openmim
!mim install mmcv

Use mmcv, not mmcv-full, to install the full version, because it took me so long that I canceled the installation.

MMCV contains C++ and CUDA extensions, thus depending on PyTorch in a complex way. MIM solves such dependencies automatically and makes the installation easier.However, it is not a must.
To install MMCV with pip instead of MIM, please follow MMCV installation guides. This requires manually specifying a find-url based on PyTorch version and its CUDA version.

There are two ways to install mmsegmentation:

  • Option(a): If you use mmsegmentation as a dependency or third-party package, install it with pip:

    1
    !pip install mmsegmentation
  • Option(b): If you develop and run mmseg directly, install it from source:

    1
    2
    3
    !git clone https://github.com/open-mmlab/mmsegmentation.git
    %cd mmsegmentation
    !pip install -e .

Either way is ok, choose which you prefer.

In fact, Swin-Transformer-Semantic-Segmentation | GitHub is based on mmsegmentation | GitHub, and the code structure and running environment of their repositories are completely the same, so there is no need for additional environment adaptation.

verify the installation

mmsegmentation/…/get_started.md#verify-the-installation | GitHub

get_started.html#verify-the-installation | latest

Step 1. You need to download config and checkpoint files.

1
mim download mmsegmentation --config pspnet_r50-d8_4xb2-40k_cityscapes-512x1024 --dest .

The downloading will take several seconds or more, depending on your network environment. When it is done, you will find two files in your current folder.

  • pspnet_r50-d8_512x1024_40k_cityscapes.py
  • pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth

Step 2. Verify the inference demo.

Option (a). If you install mmsegmentation from source, just run the following command.

1
python demo/image_demo.py demo/demo.png configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth --device cuda:0 --out-file result.jpg

You will see a new image result.jpg on your current folder, where segmentation masks are covered on all objects.

Option (b). If you install mmsegmentation with pip, open you python interpreter and copy&paste the following codes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from mmseg.apis import inference_model, init_model, show_result_pyplot
import mmcv

config_file = 'pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py'
checkpoint_file = 'pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'

# build the model from a config file and a checkpoint file
model = init_model(config_file, checkpoint_file, device='cuda:0')

# test a single image and show the results
img = 'demo/005.jpg' # or img = mmcv.imread(img), which will only load it once
result = inference_model(model, img)
# visualize the results in a new window
show_result_pyplot(model, img, result, show=True)
# or save the visualization results to image files
# you can change the opacity of the painted segmentation map in (0, 1].
show_result_pyplot(model, img, result, show=True, out_file='result.jpg', opacity=0.5)

You can modify the code above to test a single image or a video, both of these options can verify that the installation was successful.

Run Swin Transformer for Semantic Segmentation on local environment

Prerequisites

1
2
conda create --name openmmlab python=3.8 -y
conda activate openmmlab

On CPU platforms

1
conda install pytorch torchvision cpuonly -c pytorch

Installation

Step 1. Install MMCV using MIM

I have to close my Clash to excute the following commands successfully. And it takes very long time to download the file from the internet without a proxy.
Maybe you could try to add -i https://mirrors.aliyun.com/pypi/simple/ behind the install command next time.

1
2
3
4
# (openmmlab)...>
pip install openmim
mim install mmengine
mim install "mmcv>=2.0.0"

ERROR: Could not build wheels for mmcv, which is required to install pyproject.toml-based projects
solution by onexiaophai
Use pip installation regardless of mim.

Step 2. Install MMSegmentation.

Case a: If you develop and run mmseg directly, install it from source:

1
2
3
# using proxy of Clash to download this from github
git clone -b main https://github.com/open-mmlab/mmsegmentation.git
cd mmsegmentation

I closed the proxy of Clash for installation, and I successfully make the downloading process quicker by adding the aliyun mirror image source behind the install command.

1
2
3
4
5
# (openmmlab)...\mmsegmentation>
pip install -v -e . -i https://mirrors.aliyun.com/pypi/simple/
# '-v' means verbose, or more output
# '-e' means installing a project in editable mode,
# thus any local modifications made to the code will take effect without reinstallation.

I chose to install from the source, the case a, so I can check the source code of models and menthods easily.

Case b: If you use mmsegmentation as a dependency or third-party package, install it with pip:

1
2
# (openmmlab)...>
pip install "mmsegmentation>=1.0.0"

Verify the installation

Step 1. We need to download config and checkpoint files.

1
2
# (openmmlab)...\mmsegmentation>
mim download mmsegmentation --config pspnet_r50-d8_4xb2-40k_cityscapes-512x1024 --dest .

excute this command under the mmsegmentation folder, don’t change your directory.

Step 2. Verify the inference demo.

Excute this command under the mmsegmentation folder.

1
2
# (openmmlab)...\mmsegmentation>
python demo/image_demo.py demo/demo.png configs/pspnet/pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth --device cpu --out-file result.jpg

note: I changed the device parameter from --device cuda:0 to --device cpu because this is a cpuonly platform.

You will see a new image result.jpg on your current folder, where segmentation masks are covered on all objects.

Training with mmsegmentation

Train & Test | mmsegmentation docs

run train.py in terminal

prepare dataset

Take chase-db1 for an example.

Download the CHASEDB1.zip file from the link given by the Tutorial 2: Prepare datasets#chase-db1

To convert CHASE DB1 dataset to MMSegmentation format, you should run the following command

1
2
# (openmmlab)...\mmsegmentation>
python tools/dataset_converters/chase_db1.py C:/Users/xiaophai/Downloads/Compressed/CHASEDB1.zip

Then, run the following command to start your training.

1
2
# (openmmlab)...\mmsegmentation>
python tools/train.py configs/unet/unet_s5-d16_deeplabv3_4xb4-40k_chase-db1-128x128.py

debug train.py in vscode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Folder Structure:
├── .vscode
│ └── launch.json
└── mmsegmentation
├── tools
│ └── train.py
└── data
└── CHASE_DB1
├── images
│ ├── training
│ └── validation
└── annotations
├── training
└── validation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// launch.json
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: config_train",
"type": "python",
"request": "launch",
"program": "${workspaceFolder}/mmsegmentation/tools/train.py",
"args": "configs/unet/unet_s5-d16_deeplabv3_4xb4-40k_chase-db1-128x128.py",
"console": "integratedTerminal",
"cwd": "${workspaceFolder}/mmsegmentation",
"justMyCode": true
}
]
}

rewrite the mmsegmentation

mmengine | GitHub

Swin-Transformer-Semantic-Segmentation | GitHub

1
2
# (openmmlab)...\Swin-Transformer-Semantic-Segmentation-main>
python tools/train.py configs/unet/deeplabv3_unet_s5-d16_128x128_40k_chase_db1.py

Prepare Dataset

ISPRS-Potsdam

Prepare Dataset: ISPRS-Potsdam

The Potsdam dataset is for urban semantic segmentation used in the 2D Semantic Labeling Contest - Potsdam.

The dataset can be requested at the challenge homepage. You will get a file named utf-8' 'Potsdam.zip (size: 13.3GB), and unzip this file to get a folder named Potsdam which contains 10 files:

1
2
3
4
5
6
7
8
9
10
11
Potsdam
├── 1_DSM.rar
├── 1_DSM_normalisation.zip
├── 2_Ortho_RGB.zip <--
├── 3_Ortho_IRRG.zip
├── 4_Ortho_RGBIR.zip
├── 5_Labels_all.zip
├── 5_Labels_all_noBoundary.zip <--
├── 5_Labels_for_participants.zip
├── 5_Labels_for_participants_no_Boundary.zip
├── assess_classification_reference_implementation.tgz

where 2_Ortho_RGB.zip and 5_Labels_all_noBoundary.zip are only required.

1
2
3
Potsdam
├── 2_Ortho_RGB.zip <--
├── 5_Labels_all_noBoundary.zip <--

For Potsdam dataset, please run the following command to re-organize the dataset.

1
(openmmlab) ...\mmsegmentation>python tools/dataset_converters/potsdam.py "D:/Dataset/Potsdam"

And you will get a folder structure as below:

1
2
3
4
5
6
7
8
9
10
11
12
mmsegmentation
├── mmseg
├── tools
├── configs
├── data
│ ├── potsdam
│ │ ├── img_dir
│ │ │ ├── train: 3456
│ │ │ ├── val: 2016
│ │ ├── ann_dir
│ │ │ ├── train: 3456
│ │ │ ├── val: 2016

In the default setting of mmsegmentation, it will generate 3456 images for training and 2016 images for validation.

In the 2_Ortho_RGB.zip file, it contains 38 pictures of size 6000x6000:

Potsdam

And every picture have been seplited into 12x12=144 patches of size 512x512. There are 38x144=5472=3456+2016 patches in total, in which 3456 patches are used for training and 2016 for validation.

Masks are single-channel images, and the comparison table of their values and categories is as follows:

  • The original color map of the mmseg split code for potsdam

    1
    2
    3
    4
    5
    6
    7
    0: [  0   0   0]
    1: [255 255 255] : impervious surfaces
    2: [255 0 0] : background
    3: [255 255 0] : car
    4: [ 0 255 0] : tree
    5: [ 0 255 255] : low vegetation
    6: [ 0 0 255] : building

    You should change the color map to:

    1
    2
    3
    4
    5
    6
    1: [255 255 255] : impervious surfaces
    2: [ 0 0 255] : building
    3: [ 0 255 255] : low vegetation
    4: [ 0 255 0] : tree
    5: [255 255 0] : car
    6: [255 0 0] : clutter/background
    1
    2
    3
    4
    5
    6
    # color_map = np.array([[0, 0, 0], [255, 255, 255], [255, 0, 0],
    # [255, 255, 0], [0, 255, 0], [0, 255, 255],
    # [0, 0, 255]])
    color_map = np.array([[0, 0, 0], [255, 255, 255], [0, 0, 255],
    [0, 255, 255], [0, 255, 0], [255, 255, 0],
    [255, 0, 0]])

    实际的标号会减1,采用0,1,2,3,4,5(而不是1,2,3,4,5,6)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    '''
    0: [255 255 255] : white : impervious surfaces
    1: [ 0 0 255] : blue : building
    2: [ 0 255 255] : cyan : low vegetation
    3: [ 0 255 0] : green : tree
    4: [255 255 0] : yellow : car
    5: [255 0 0] : red : clutter/background
    '''
    color_map = np.array([[255, 255, 255], [0, 0, 255], [0, 255, 255],
    [0, 255, 0], [255, 255, 0], [255, 0, 0]])
  • 关于颜色的设置

0[255,255,255]whiteimpervious surfaces1[0,0,255]bluebuilding2[0,255,255]cyanlow vegetation3[0,255,0]greentree4[255,255,0]yellowcar5[255,0,0]redclutter/background\begin{array}{cclcl} 0 & \colorbox{white}{$\quad$} & [255, 255, 255] & \text{white} & \text{impervious surfaces}\\ 1 & \colorbox{blue}{$\quad$} & [0, 0, 255] & \text{blue} & \text{building}\\ 2 & \colorbox{cyan}{$\quad$} & [0, 255, 255] & \text{cyan} & \text{low vegetation}\\ 3 & \colorbox{green}{$\quad$} & [0, 255, 0] & \text{green} & \text{tree}\\ 4 & \colorbox{yellow}{$\quad$} & [255, 255, 0] & \text{yellow} & \text{car}\\ 5 & \colorbox{red}{$\quad$} & [255, 0, 0] & \text{red} & \text{clutter/background}\\ \end{array}

label color rgb name catagory
0 $$\colorbox{white}{\quad}$$ [255,255,255] white impervious surfaces
1 $$\colorbox{blue}{\quad}$$ [0,0,255] blue building
2 $$\colorbox{cyan}{\quad}$$ [0,255,255] cyan low vegetation
3 $$\colorbox{green}{\quad}$$ [0,255,0] green tree
4 $$\colorbox{yellow}{\quad}$$ [255,255,0] yellow car
5 $$\colorbox{red}{\quad}$$ [255,0,0] red clutter/background
1
2
3
4
5
6
7
0: impervious surfaces
1: building
2: low vegetation
3: tree
4: car
5: clutter/background
6: boundary

If you use 5_Labels_all.zip as your ground truth

1
2
3
Potsdam
├── 2_Ortho_RGB.zip <--
├── 5_Labels_all.zip <--

the comparison table of their values and categories is as follows:

1
2
3
4
5
6
1: impervious surfaces
2: building
3: low vegetation
4: tree
5: car
6: clutter/background

The top_potsdam_6_7_label.tif contains error pixels

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from PIL import Image
import numpy as np

mask_path = 'path/to/top_potsdam_6_7_label.tif'

mask = Image.open(mask_path)
mask = np.array(mask).reshape(-1,3)
values, counts = np.unique(mask, return_counts=True, axis=0)
print(values, counts)

# [[ 0 0 255]
# [ 0 255 0]
# [ 0 255 255]
# [252 255 0] <-- Error
# [255 0 0]
# [255 255 0]
# [255 255 255]]

# [ 4857912 5669942 19962121 246304 797467 6749 4459505]

The values of top_potsdam_4_12_label.tif are abnormal

1
2
3
4
5
6
7
8
9
from PIL import Image
import numpy as np

mask_path = '../datasets/original_potsdam/label/top_potsdam_4_12_label.tif'

mask = Image.open(mask_path)
mask = np.array(mask).reshape(-1,3)
values, counts = np.unique(mask, return_counts=True, axis=0)
print(len(values)) # 24850 <-- Not Six Classes

Code to show the original image and label

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import os
from PIL import Image
import matplotlib.pyplot as plt

image_names = ['top_potsdam_2_10', 'top_potsdam_4_12', 'top_potsdam_6_7']
path2image = '../datasets/original_potsdam/image'
path2mask = '../datasets/original_potsdam/label'

plt.close()
nrows = 2; ncols = len(image_names)
fig, axes = plt.subplots(nrows, ncols, figsize=(5*ncols, 5*nrows))

for i in range(ncols):
axes[0,i].set_title(image_names[i])

for i, name in enumerate(image_names):
image_path = os.path.join(path2image, name+"_RGB.tif")
mask_path = os.path.join(path2mask, name+"_label.tif")

image = Image.open(image_path)
mask = Image.open(mask_path)

axes[0,i].imshow(image)
axes[1,i].imshow(mask)
plt.show()

Code used to split the images

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
import argparse
import glob
import math
import os
import os.path as osp
import tempfile
import zipfile
from tqdm import tqdm

from PIL import Image
import numpy as np

def get_parser():
parser = argparse.ArgumentParser(
description='Convert potsdam dataset to mmsegmentation format')
parser.add_argument('dataset_path', help='potsdam folder path')
parser.add_argument('--tmp_dir', help='path of the temporary directory')
parser.add_argument('-o', '--out_dir', help='output path')
parser.add_argument(
'--clip_size',
type=int,
help='clipped size of image after preparation',
default=512)
parser.add_argument(
'--stride_size',
type=int,
help='stride of clipping original images',
default=256)
# args = parser.parse_args(arg_list)
return parser

def clip_big_image(image_path, clip_save_dir, args, to_label=False):
# Original image of Potsdam dataset is very large, thus pre-processing
# of them is adopted. Given fixed clip size and stride size to generate
# clipped image, the intersection of width and height is determined.
# For example, given one 5120 x 5120 original image, the clip size is
# 512 and stride size is 256, thus it would generate 20x20 = 400 images
# whose size are all 512x512.
# image = PIL.Image.open(image_path)
image = Image.open(image_path)
image = np.array(image)

h, w, c = image.shape
clip_size = args.clip_size
stride_size = args.stride_size

num_rows = math.ceil((h - clip_size) / stride_size) if math.ceil(
(h - clip_size) /
stride_size) * stride_size + clip_size >= h else math.ceil(
(h - clip_size) / stride_size) + 1
num_cols = math.ceil((w - clip_size) / stride_size) if math.ceil(
(w - clip_size) /
stride_size) * stride_size + clip_size >= w else math.ceil(
(w - clip_size) / stride_size) + 1

x, y = np.meshgrid(np.arange(num_cols + 1), np.arange(num_rows + 1))
xmin = x * clip_size
ymin = y * clip_size

xmin = xmin.ravel()
ymin = ymin.ravel()
xmin_offset = np.where(xmin + clip_size > w, w - xmin - clip_size,
np.zeros_like(xmin))
ymin_offset = np.where(ymin + clip_size > h, h - ymin - clip_size,
np.zeros_like(ymin))
boxes = np.stack([
xmin + xmin_offset, ymin + ymin_offset,
np.minimum(xmin + clip_size, w),
np.minimum(ymin + clip_size, h)
],
axis=1)

if to_label:
# color_map = np.array([[0, 0, 0], [255, 255, 255], [255, 0, 0],
# [255, 255, 0], [0, 255, 0], [0, 255, 255],
# [0, 0, 255]])
color_map = np.array([[0, 0, 0], [255, 255, 255], [0, 0, 255],
[0, 255, 255], [0, 255, 0], [255, 255, 0],
[255, 0, 0]])
flatten_v = np.matmul(
image.reshape(-1, c),
np.array([2, 3, 4]).reshape(3, 1))
out = np.zeros_like(flatten_v)
for idx, class_color in enumerate(color_map):
value_idx = np.matmul(class_color,
np.array([2, 3, 4]).reshape(3, 1))
out[flatten_v == value_idx] = idx
image = out.reshape(h, w)

for box in boxes:
start_x, start_y, end_x, end_y = box
clipped_image = image[start_y:end_y,
start_x:end_x] if to_label else image[
start_y:end_y, start_x:end_x, :]
idx_i, idx_j = osp.basename(image_path).split('_')[2:4]

# it takes too much of time to save clipped images in this way.
# mmcv.imwrite(
# clipped_image.astype(np.uint8),
# osp.join(
# clip_save_dir,
# f'{idx_i}_{idx_j}_{start_x}_{start_y}_{end_x}_{end_y}.png'))

clipped_image = Image.fromarray(clipped_image.astype(np.uint8))
clipped_image.save(
fp=osp.join(clip_save_dir, f'{idx_i}_{idx_j}_{start_x}_{start_y}_{end_x}_{end_y}.png'),
format='PNG', compress_level=1
)
# 'data\\potsdam\\img_dir\\train'

def main():
parser = get_parser()
args = parser.parse_args(["D:/Dataset/Potsdam"])
splits = {
'train': [
'2_11', '2_12', '3_10', '3_11', '3_12', '4_10', '4_11',
'5_10', '5_11', '5_12', '6_8', '6_9', '6_10', # '4_12', '6_7',
'6_11', '6_12', '7_7', '7_8', '7_9', '7_10', '7_11', '7_12'
],
'val': [
'2_10'
],
'test': [
'2_13', '2_14', '3_13', '3_14', '4_13', '4_14', '4_15', '5_13',
'5_14', '6_13', '6_14', '6_15', '7_13'
]
}

dataset_path = args.dataset_path
if args.out_dir is None:
out_dir = osp.join('data', 'potsdam') # 'data\\potsdam'
else:
out_dir = args.out_dir

print('Making directories...')
if not osp.exists(osp.join(out_dir, 'img_dir', 'train')):
os.makedirs(osp.join(out_dir, 'img_dir', 'train'))
if not osp.exists(osp.join(out_dir, 'img_dir', 'val')):
os.makedirs(osp.join(out_dir, 'img_dir', 'val'))
if not osp.exists(osp.join(out_dir, 'img_dir', 'test')):
os.makedirs(osp.join(out_dir, 'img_dir', 'test'))

if not osp.exists(osp.join(out_dir, 'ann_dir', 'train')):
os.makedirs(osp.join(out_dir, 'ann_dir', 'train'))
if not osp.exists(osp.join(out_dir, 'ann_dir', 'val')):
os.makedirs(osp.join(out_dir, 'ann_dir', 'val'))
if not osp.exists(osp.join(out_dir, 'ann_dir', 'test')):
os.makedirs(osp.join(out_dir, 'ann_dir', 'test'))

zipp_list = glob.glob(os.path.join(dataset_path, '*.zip'))
print('Find the data', zipp_list)
# ['D:/Dataset/Potsdam\\2_Ortho_RGB.zip',
# 'D:/Dataset/Potsdam\\5_Labels_all_noBoundary.zip']

for zipp in zipp_list:
with tempfile.TemporaryDirectory(dir=args.tmp_dir) as tmp_dir: # tmp_dir changes in every loop
zip_file = zipfile.ZipFile(zipp)
zip_file.extractall(tmp_dir)
# Check whether the *.tif files are unziped to current directory or a sub directory
src_path_list = glob.glob(os.path.join(tmp_dir, '*.tif'))
# if len(src_path_list)==0, it means *.tif are extracted to a sub directory rather than current directory directly
if not len(src_path_list):
sub_tmp_dir = os.path.join(tmp_dir, os.listdir(tmp_dir)[0])
src_path_list = glob.glob(os.path.join(sub_tmp_dir, '*.tif'))

prog_bar = tqdm(src_path_list)
for src_path in prog_bar:
idx_i, idx_j = osp.basename(src_path).split('_')[2:4] # e.g.'top_potsdam_2_10_RGB.tif'.split('_')[2:4]
# data_type = 'train' if f'{idx_i}_{idx_j}' in splits[
# 'train'] else 'val'
if f'{idx_i}_{idx_j}' in splits['train']:
data_type = 'train'
elif f'{idx_i}_{idx_j}' in splits['val']:
data_type = 'val'
else:
data_type = 'test'

if 'label' in src_path:
dst_dir = osp.join(out_dir, 'ann_dir', data_type)
clip_big_image(src_path, dst_dir, args, to_label=True)
else:
dst_dir = osp.join(out_dir, 'img_dir', data_type) # 'data\\potsdam\\img_dir\\train'
clip_big_image(src_path, dst_dir, args, to_label=False)

print('Removing the temporary files...')
print('Done!')

if __name__ == '__main__':
main()

Synapse

Synapse dataset{target="_blank"}

Multi-Atlas Labeling Beyond the Cranial Vault - Workshop and Challenge{target="_blank"}

Note: You need to join the challenge firstly, then you will see Abdomen and Cervix in the Files directory, which are private and invisible if you have not joined the challenge.
Just downloading the RawData.zip (1.531GB) is enough.