prepare synapse dataset
Image Sequentialization
Input x x x are image of size 224x224
reshape the input x ∈ R H × W × C x \in \R^{H\times W\times C} x ∈ R H × W × C into a sequence of flattened 2D patches { x p 1 , x p 2 , ⋯ , x p N } ∈ R 1 × P 2 C , N = H W C P 2 C \{x_p^1, x_p^2, \cdots, x_p^N\}\in\R^{1\times P^2C},N=\frac{HWC}{P^2C} { x p 1 , x p 2 , ⋯ , x p N } ∈ R 1 × P 2 C , N = P 2 C H W C , where P denotes Patch_size and each patch is of size P × P P\times P P × P .
Patch Embedding :
z 0 = [ x p 1 E x p 2 E ⋮ x p N E ] + E p o s z_0 = \begin{bmatrix}x_p^1E \\ x_p^2E \\ \vdots \\ x_p^NE\end{bmatrix} + E_{pos}
z 0 = x p 1 E x p 2 E ⋮ x p N E + E p os
where E ∈ R P 2 C × D E\in \R^{P^2C\times D} E ∈ R P 2 C × D is the patch embedding projection matrix, E p o s ∈ R N × D = R H W P 2 × D E_{pos}\in \R^{N\times D} = \R^{\frac{HW}{P^2}\times D} E p os ∈ R N × D = R P 2 H W × D denotes the position embedding.
Pure Transformer Encoder :
z l ′ = M S A ( L N ( z l − 1 ) ) + z l − 1 z l = M L P ( L N ( z l ′ ) ) + z l ′ l = 1 , 2 , ⋯ , L ⇓ z L ∈ R H W P 2 × D \begin{split}
z'_l &= MSA(LN(z_{l-1})) + z_{l-1}\\
z_l &= MLP(LN(z'_l)) + z'_l
\end{split}\\
l = 1,2,\cdots,L\\
\Downarrow\\
z_L \in \R^{\frac{HW}{P^2}\times D}
z l ′ z l = MS A ( L N ( z l − 1 )) + z l − 1 = M L P ( L N ( z l ′ )) + z l ′ l = 1 , 2 , ⋯ , L ⇓ z L ∈ R P 2 H W × D
where MSA denotes Multihead Self-Attention, LN denotes Layer Normalization, MLP denotes Multi-Lyaer Perceptron
CNN-Transformer Hybrid as Encoder :
CNN feature extractor + Transformer
raw image → 1st feature extractor CNN feature map → patch embedding x E + E p o s z 0 → 2nd feature extractor Transformer z L \text{raw image}
\xrightarrow[\text{1st feature extractor}]{\text{CNN}}
\text{feature map}
\xrightarrow[\text{patch embedding}]{xE+E_{pos}}
z_0
\xrightarrow[\text{2nd feature extractor}]{\text{Transformer}}
z_L
raw image CNN 1st feature extractor feature map x E + E p os patch embedding z 0 Transformer 2nd feature extractor z L
Cascaded Upsampler :
TransUNet = CNN-Transformer Hybrid Encoder + Cascaded Upsampler
Evaluation
Sørensen–Dice coefficient (DSC)