论文
转置卷积
移动窗口的卷积运算可以转换成矩阵乘法, 将输入图片中的像素按从左到右从上到下的顺序排列成一个列向量, 卷积核的每个窗口 都可以同样排成一个行向量, 整个卷积核对应着一个矩阵, 其行数是卷积窗口的数量, 列数是输入图片的像素数.
例如, 对于 4 × 4 4\times4 4 × 4 的输入图片和 3 × 3 3\times3 3 × 3 的卷积核, 做 padding=0, strides=1 的卷积运算, 得到一个 2 × 2 2\times2 2 × 2 的输出. 将这个过程用矩阵乘法表示如下:
[ y 00 y 01 y 10 y 11 ] = [ w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 0 0 0 0 0 0 w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 0 0 0 0 0 0 0 0 w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 0 0 0 0 0 0 w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 ] [ x 00 x 01 x 02 x 03 ⋮ x 30 x 31 x 32 x 33 ] \scriptsize{
\begin{bmatrix}
y_{00} \\ y_{01} \\ y_{10} \\ y_{11}
\end{bmatrix} =
\left[\begin{array}{cccc|cccc|cccc|cccc}
w_{0,0} & w_{0,1} & w_{0,2} & 0 &
w_{1,0} & w_{1,1} & w_{1,2} & 0 &
w_{2,0} & w_{2,1} & w_{2,2} & 0 &
0 & 0 & 0 & 0
\\
0 & w_{0,0} & w_{0,1} & w_{0,2} &
0 & w_{1,0} & w_{1,1} & w_{1,2} &
0 & w_{2,0} & w_{2,1} & w_{2,2} &
0 & 0 & 0 & 0
\\
0 & 0 & 0 & 0 &
w_{0,0} & w_{0,1} & w_{0,2} & 0 &
w_{1,0} & w_{1,1} & w_{1,2} & 0 &
w_{2,0} & w_{2,1} & w_{2,2} & 0
\\
0 & 0 & 0 & 0 &
0 & w_{0,0} & w_{0,1} & w_{0,2} &
0 & w_{1,0} & w_{1,1} & w_{1,2} &
0 & w_{2,0} & w_{2,1} & w_{2,2}
\end{array}\right]
\begin{bmatrix}
x_{00} \\ x_{01} \\ x_{02} \\ x_{03}
\\ \vdots \\
x_{30} \\ x_{31} \\ x_{32} \\ x_{33}
\end{bmatrix}
}
y 00 y 01 y 10 y 11 = w 0 , 0 0 0 0 w 0 , 1 w 0 , 0 0 0 w 0 , 2 w 0 , 1 0 0 0 w 0 , 2 0 0 w 1 , 0 0 w 0 , 0 0 w 1 , 1 w 1 , 0 w 0 , 1 w 0 , 0 w 1 , 2 w 1 , 1 w 0 , 2 w 0 , 1 0 w 1 , 2 0 w 0 , 2 w 2 , 0 0 w 1 , 0 0 w 2 , 1 w 2 , 0 w 1 , 1 w 1 , 0 w 2 , 2 w 2 , 1 w 1 , 2 w 1 , 1 0 w 2 , 2 0 w 1 , 2 0 0 w 2 , 0 0 0 0 w 2 , 1 w 2 , 0 0 0 w 2 , 2 w 2 , 1 0 0 0 w 2 , 2 x 00 x 01 x 02 x 03 ⋮ x 30 x 31 x 32 x 33
上面矩阵中的每一行都对应着一个卷积的窗口, 它们和输入图片做 element-wise 的乘积再求和, 即得到对应窗口位置的卷积输出, 如下示意:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 \scriptsize \begin{bmatrix} \textcolor{red}{y_{00} }\\ y_{01} \\ y_{10} \\ y_{11} \end{bmatrix} = \begin{matrix} \textcolor{red}{ \begin{bmatrix} w_{0,0} & w_{0,1} & w_{0,2} & 0 \\ w_{1,0} & w_{1,1} & w_{1,2} & 0 \\ w_{2,0} & w_{2,1} & w_{2,2} & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}} \\ \begin{bmatrix} 0 & w_{0,0} & w_{0,1} & w_{0,2} \\ 0 & w_{1,0} & w_{1,1} & w_{1,2} \\ 0 & w_{2,0} & w_{2,1} & w_{2,2} \\ 0 & 0 & 0 & 0 \end{bmatrix} \\ \begin{bmatrix} 0 & 0 & 0 & 0 \\ w_{0,0} & w_{0,1} & w_{0,2} & 0 \\ w_{1,0} & w_{1,1} & w_{1,2} & 0 \\ w_{2,0} & w_{2,1} & w_{2,2} & 0 \end{bmatrix} \\ \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & w_{0,0} & w_{0,1} & w_{0,2} \\ 0 & w_{1,0} & w_{1,1} & w_{1,2} \\ 0 & w_{2,0} & w_{2,1} & w_{2,2} \end{bmatrix} \end{matrix} * \textcolor{red}{ \begin{bmatrix} x_{00} & x_{01} & x_{02} & x_{03} \\ x_{10} & x_{11} & x_{12} & x_{13} \\ x_{20} & x_{21} & x_{22} & x_{23} \\ x_{30} & x_{31} & x_{32} & x_{33} \end{bmatrix}}
而转置卷积是在上述矩阵乘法形式的卷积的基础上, 将变换矩阵转置 . 从而, 将原本 16 4 × 4 → 4 2 × 2 \overset{4\times4}{16}\rightarrow \overset{2\times2}{4} 16 4 × 4 → 4 2 × 2 的线性变换, 变成了 4 2 × 2 → 16 4 × 4 \overset{2\times2}{4}\rightarrow \overset{4\times4}{16} 4 2 × 2 → 16 4 × 4 的线性变换. 如此可以对图片进行放大 , 进行上采样.
[ y 00 ′ y 01 ′ y 02 ′ y 03 ′ ⋮ y 30 ′ y 31 ′ y 32 ′ y 33 ′ ] = [ w 0 , 0 0 0 0 w 0 , 1 w 0 , 0 0 0 w 0 , 2 w 0 , 1 0 0 0 w 0 , 2 0 0 w 1 , 0 0 w 0 , 0 0 w 1 , 1 w 1 , 0 w 0 , 1 w 0 , 0 w 1 , 2 w 1 , 1 w 0 , 2 w 0 , 1 0 w 1 , 2 0 w 0 , 2 w 2 , 0 0 w 1 , 0 0 w 2 , 1 w 2 , 0 w 1 , 1 w 1 , 0 w 2 , 2 w 2 , 1 w 1 , 2 w 1 , 1 0 w 2 , 2 0 w 1 , 2 0 0 w 2 , 0 0 0 0 w 2 , 1 w 2 , 0 0 0 w 2 , 2 w 2 , 1 0 0 0 w 2 , 2 ] [ x 00 ′ x 01 ′ x 10 ′ x 11 ′ ] \scriptsize{
\begin{bmatrix}
\textcolor{red}{y'_{00}} \\ y'_{01} \\ y'_{02} \\ y'_{03}
\\ \vdots \\
y'_{30} \\ y'_{31} \\ y'_{32} \\ y'_{33}
\end{bmatrix} =
\left[\begin{array}{cccc}
\textcolor{red}{w_{0,0}} & \textcolor{red}{0}
& \textcolor{red}{0} & \textcolor{red}{0} \\
w_{0,1} & w_{0,0} & 0 & 0 \\
w_{0,2} & w_{0,1} & 0 & 0 \\
0 & w_{0,2} & 0 & 0 \\
\hline
w_{1,0} & 0 & w_{0,0} & 0 \\
w_{1,1} & w_{1,0} & w_{0,1} & w_{0,0} \\
w_{1,2} & w_{1,1} & w_{0,2} & w_{0,1} \\
0 & w_{1,2} & 0 & w_{0,2}\\
\hline
w_{2,0} & 0 & w_{1,0} & 0 \\
w_{2,1} & w_{2,0} & w_{1,1} & w_{1,0} \\
w_{2,2} & w_{2,1} & w_{1,2} & w_{1,1} \\
0 & w_{2,2} & 0 & w_{1,2} \\
\hline
0 & 0 & w_{2,0} & 0 \\
0 & 0 & w_{2,1} & w_{2,0} \\
0 & 0 & w_{2,2} & w_{2,1} \\
0 & 0 & 0 & w_{2,2}
\end{array}\right]
\textcolor{red}{
\begin{bmatrix}
x'_{00} \\ x'_{01} \\ x'_{10} \\ x'_{11}
\end{bmatrix}}
}
y 00 ′ y 01 ′ y 02 ′ y 03 ′ ⋮ y 30 ′ y 31 ′ y 32 ′ y 33 ′ = w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 0 0 0 0 0 0 w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 0 0 0 0 0 0 0 0 w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 0 0 0 0 0 0 w 0 , 0 w 0 , 1 w 0 , 2 0 w 1 , 0 w 1 , 1 w 1 , 2 0 w 2 , 0 w 2 , 1 w 2 , 2 x 00 ′ x 01 ′ x 10 ′ x 11 ′
转置矩阵中的每一列都对应着一个卷积的窗口, 上面的过程对应着, 卷积窗口和输入图片作乘积再求和, 如下示意:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 \tiny \begin{split} \begin{bmatrix} y'_{00} & y'_{01} & y'_{02} & y'_{03}\\ y'_{10} & y'_{11} & y'_{12} & y'_{13}\\ y'_{20} & y'_{21} & y'_{22} & y'_{23}\\ y'_{30} & y'_{31} & y'_{32} & y'_{33} \end{bmatrix} &= \left\{ \textcolor{red}{ \begin{bmatrix} w_{0,0} & w_{0,1} & w_{0,2} & 0 \\ w_{1,0} & w_{1,1} & w_{1,2} & 0 \\ w_{2,0} & w_{2,1} & w_{2,2} & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}}, \begin{bmatrix} 0 & w_{0,0} & w_{0,1} & w_{0,2} \\ 0 & w_{1,0} & w_{1,1} & w_{1,2} \\ 0 & w_{2,0} & w_{2,1} & w_{2,2} \\ 0 & 0 & 0 & 0 \end{bmatrix}, \begin{bmatrix} 0 & 0 & 0 & 0 \\ w_{0,0} & w_{0,1} & w_{0,2} & 0 \\ w_{1,0} & w_{1,1} & w_{1,2} & 0 \\ w_{2,0} & w_{2,1} & w_{2,2} & 0 \end{bmatrix}, \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & w_{0,0} & w_{0,1} & w_{0,2} \\ 0 & w_{1,0} & w_{1,1} & w_{1,2} \\ 0 & w_{2,0} & w_{2,1} & w_{2,2} \end{bmatrix} \right\} \begin{bmatrix} \textcolor{red}{x'_{00}}\\ x'_{01}\\ x'_{10}\\ x'_{11} \end{bmatrix}\\ &= \begin{bmatrix} w_{0,0} & w_{0,1} & w_{0,2} & 0 \\ w_{1,0} & w_{1,1} & w_{1,2} & 0 \\ w_{2,0} & w_{2,1} & w_{2,2} & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}x'_{00}+ \begin{bmatrix} 0 & w_{0,0} & w_{0,1} & w_{0,2} \\ 0 & w_{1,0} & w_{1,1} & w_{1,2} \\ 0 & w_{2,0} & w_{2,1} & w_{2,2} \\ 0 & 0 & 0 & 0 \end{bmatrix}x'_{01}+ \begin{bmatrix} 0 & 0 & 0 & 0 \\ w_{0,0} & w_{0,1} & w_{0,2} & 0 \\ w_{1,0} & w_{1,1} & w_{1,2} & 0 \\ w_{2,0} & w_{2,1} & w_{2,2} & 0 \end{bmatrix}x'_{10}+ \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & w_{0,0} & w_{0,1} & w_{0,2} \\ 0 & w_{1,0} & w_{1,1} & w_{1,2} \\ 0 & w_{2,0} & w_{2,1} & w_{2,2} \end{bmatrix}x'_{11}\\ &= \left\{ \begin{matrix} \textcolor{red}{ \begin{bmatrix} w_{0,0} & w_{0,1} & w_{0,2} & 0 \\ w_{1,0} & w_{1,1} & w_{1,2} & 0 \\ w_{2,0} & w_{2,1} & w_{2,2} & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}}& \begin{bmatrix} 0 & w_{0,0} & w_{0,1} & w_{0,2} \\ 0 & w_{1,0} & w_{1,1} & w_{1,2} \\ 0 & w_{2,0} & w_{2,1} & w_{2,2} \\ 0 & 0 & 0 & 0 \end{bmatrix}\\ \begin{bmatrix} 0 & 0 & 0 & 0 \\ w_{0,0} & w_{0,1} & w_{0,2} & 0 \\ w_{1,0} & w_{1,1} & w_{1,2} & 0 \\ w_{2,0} & w_{2,1} & w_{2,2} & 0 \end{bmatrix}& \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & w_{0,0} & w_{0,1} & w_{0,2} \\ 0 & w_{1,0} & w_{1,1} & w_{1,2} \\ 0 & w_{2,0} & w_{2,1} & w_{2,2} \end{bmatrix} \end{matrix} \right\}* \begin{bmatrix} \textcolor{red}{x'_{00}} & x'_{01} \\ x'_{10} & x'_{11} \end{bmatrix} \end{split}
这样相当于 [ x 00 ′ x 01 ′ x 10 ′ x 11 ′ ] \left[\begin{smallmatrix}x'_{00}&x'_{01}\\x'_{10}&x'_{11}\end{smallmatrix}\right] [ x 00 ′ x 10 ′ x 01 ′ x 11 ′ ] 作为权重,按照下面的方式分布在 4 × 4 4\times4 4 × 4 的输出上
1 2 3 4 5 6 7 8 9 10 \scriptsize \left[ \begin{array}{ccc} x_{00} & \fbox{$\begin{matrix} x_{00}+x_{01} \end{matrix}$} & x_{01}\\\\ \fbox{$\begin{matrix} x_{00}\\+\\x_{10} \end{matrix}$} & \fbox{$\begin{matrix} x_{00}+x_{01}\\+\\x_{10}+x_{11} \end{matrix}$} & \fbox{$\begin{matrix} x_{01}\\+\\x_{11} \end{matrix}$}\\\\ x_{10} & \fbox{$\begin{matrix} x_{10}+x_{11} \end{matrix}$} & x_{11}\\ \end{array} \right]
这个结果相当于用原卷积核左右镜像+上下镜像后的矩阵作为卷积核, 对两层 zero padding 的输入图像作卷积
w 2 , 2 w 2 , 1 w 2 , 0 w 1 , 2 w 1 , 1 w 1 , 0 w 0 , 2 w 0 , 1 w 0 , 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x 00 ′ x 01 ′ 0 0 0 0 x 10 ′ x 11 ′ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] \fbox{$\begin{matrix}
w_{2,2} & w_{2,1} & w_{2,0} \\
w_{1,2} & w_{1,1} & w_{1,0} \\
w_{0,2} & w_{0,1} & w_{0,0}
\end{matrix}$}
\begin{bmatrix}
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & x'_{00} & x'_{01} & 0 & 0 \\
0 & 0 & x'_{10} & x'_{11} & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 \\
\end{bmatrix}
w 2 , 2 w 1 , 2 w 0 , 2 w 2 , 1 w 1 , 1 w 0 , 1 w 2 , 0 w 1 , 0 w 0 , 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x 00 ′ x 10 ′ 0 0 0 0 x 01 ′ x 11 ′ 0 0 0 0 0 0 0 0 0 0 0 0 0 0
关于转置卷积是否会还原被卷积的图像
注意转置卷积并不能还原 经过卷积操作后的矩阵(图像),它还原的是矩阵的维度,相当于重新将卷积后的图像从低维映射到高维。
y = A x ⇔ A is invertible A − 1 y = x , A T y ≠ x y = Ax \xLeftrightarrow{\text{A is invertible}} A^{-1}y = x,\quad A^T y \neq x
y = A x A is invertible A − 1 y = x , A T y = x
借助广义逆来观察这一结果:已知 Y = A X Y = AX Y = A X ,求是否存在变换矩阵 B B B 使得 Y B = X YB = X Y B = X
以下是 B B B 所有的解
B = Y g X + [ I − Y g Y ] w B = Y^gX + [I - Y^gY]w
B = Y g X + [ I − Y g Y ] w
其中,Y g Y^g Y g 是 Y Y Y 的任意一个广义逆矩阵,w w w 为任意矩阵。
存在解的条件是当且仅当 Y g X Y^gX Y g X 为其中一个解, 也就是当且仅当 Y Y g X = X YY^gX = X Y Y g X = X
是否存在还原 Y 回 X 的变换?
对于 Y = A X Y = AX Y = A X , 是否存在 B B B 使得 B Y = B A X = X BY = BAX = X B Y = B A X = X
[ x 1 ⋮ x n ] → A X [ y 1 ⋮ y m ] \begin{bmatrix}x_1 \\ \vdots \\ x_n\end{bmatrix}
\xrightarrow{AX}
\begin{bmatrix}y_1 \\ \vdots \\ y_m\end{bmatrix}
x 1 ⋮ x n A X y 1 ⋮ y m
[ x 1 ⋮ x n ] ← [ x 1 y 1 0 ⋯ 0 ⋮ ⋮ ⋱ ⋮ x n y 1 0 ⋯ 0 ] n × m [ y 1 ⋮ y m ] \begin{bmatrix}x_1 \\ \vdots \\ x_n\end{bmatrix}
\leftarrow
\begin{bmatrix}
\frac{x_1}{y_1} & 0 & \cdots & 0\\
\vdots & \vdots & \ddots & \vdots \\
\frac{x_n}{y_1} & 0 & \cdots & 0
\end{bmatrix}_{n\times m}
\begin{bmatrix}y_1 \\ \vdots \\ y_m\end{bmatrix}
x 1 ⋮ x n ← y 1 x 1 ⋮ y 1 x n 0 ⋮ 0 ⋯ ⋱ ⋯ 0 ⋮ 0 n × m y 1 ⋮ y m
ConvTranspose2d
torch.nn.ConvTranspose2d {target="_blank"}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 import torchinput = torch.randn(1 , 32 , 28 , 28 ) tranposed_conv = torch.nn.ConvTranspose2d( in_channels=32 , out_channels=16 , kernel_size=2 , stride=2 ) output = tranposed_conv(input ) print (input .shape) print (output.shape) print (tranposed_conv.weight.shape)
Pytorch 实现
PyTorch Image Segmentation Tutorial with U-NET: everything from scratch baby | Aladdin Persson | YouTube
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 import torchfrom torch import nnfrom torchvision.transforms import functional as Fclass DualConv (nn.Module): def __init__ (self, in_channels, out_channels ): super (DualConv, self).__init__() self.dualconv = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=3 , stride=1 , padding=1 , bias=False ), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True ), nn.Conv2d(out_channels, out_channels, kernel_size=3 , stride=1 , padding=1 , bias=False ), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True ), ) def forward (self, x ): return self.dualconv(x) class UNet (nn.Module): def __init__ (self, in_channels=3 , out_channels=1 , features=[64 , 128 , 256 , 512 ] ): super (UNet, self).__init__() self.downs = nn.ModuleList() self.ups = nn.ModuleList() self.pool = nn.MaxPool2d(kernel_size=2 , stride=2 ) for feature in features: self.downs.append(DualConv(in_channels, feature)) in_channels = feature for feature in reversed (features): self.ups.append( nn.ConvTranspose2d(in_channels=feature*2 , out_channels=feature, kernel_size=2 , stride=2 ) ) self.ups.append(DualConv(feature*2 , feature)) self.bottleneck = DualConv(features[-1 ], features[-1 ]*2 ) self.final_conv = nn.Conv2d(features[0 ], out_channels, kernel_size=1 ) print (f"{len (self.ups) = } " ) def forward (self, x ): skip_connections = [] for down in self.downs: x = down(x) skip_connections.append(x) x = self.pool(x) x = self.bottleneck(x) skip_connections = skip_connections[::-1 ] for idx in range (0 , len (self.ups), 2 ): x = self.ups[idx](x) skip_connection = skip_connections[idx//2 ] if x.shape != skip_connection.shape: x = F.resize(x, size=skip_connection.shape[2 :], antialias=True ) concat_skip = torch.cat((skip_connection, x), dim=1 ) x = self.ups[idx+1 ](concat_skip) return self.final_conv(x) if __name__ == "__main__" : B, C, H, W = 8 , 3 , 512 , 512 class_num = 6 x = torch.randn(size=[B, C, H, W], dtype=torch.float32) print (x.shape, x.dtype) model = UNet(in_channels=C, out_channels=class_num) preds = model(x) print (preds.shape, preds.dtype) assert preds.shape == torch.Size([B, class_num, H, W]), "something wrong!"
What dose Encoder do?
For a given image of size 1x572x572, U-Net uses 64 convolutional kernals to extract 64 feature maps from it and get 64x570x570 (original paper) or 64x572x572 (padding=1). And extract 64 feature maps again.
What dose Decoder do?
Convolution and Full Connected Layer
Convolution can be seen as a special kind of FCL, which selectively connects the neurons between the front and back layers. Besides, each output neuron shares a set of parameters/weights (represented by different colored lines) of the same kernel.
1 × 4 × 4 → (kernel_size=2, stride=1, padding=0) Conv2d(in_channels=1, out_channels=1) 1 × 3 × 3 → (kernel_size=2, stride=1, padding=0) Conv2d(in_channels=1, out_channels=1) 1 × 2 × 2 1\times4\times4
\xrightarrow[\text{(kernel\_size=2, stride=1, padding=0)}]{\text{Conv2d(in\_channels=1, out\_channels=1)}}
1\times3\times3
\xrightarrow[\text{(kernel\_size=2, stride=1, padding=0)}]{\text{Conv2d(in\_channels=1, out\_channels=1)}}
1\times2\times2
1 × 4 × 4 Conv2d(in_channels=1, out_channels=1) (kernel_size=2, stride=1, padding=0) 1 × 3 × 3 Conv2d(in_channels=1, out_channels=1) (kernel_size=2, stride=1, padding=0) 1 × 2 × 2
When you increase the output channels of the convolution, you are actually increasing the neurons in the output layer.
1 × 4 × 4 → (kernel_size=2, stride=1, padding=0) Conv2d(in_channels=1, out_channels=3) 3 × 3 × 3 → (kernel_size=2, stride=1, padding=0) Conv2d(in_channels=3, out_channels=1) 1 × 2 × 2 1\times4\times4
\xrightarrow[\text{(kernel\_size=2, stride=1, padding=0)}]{\text{Conv2d(in\_channels=1, out\_channels=3)}}
3\times3\times3
\xrightarrow[\text{(kernel\_size=2, stride=1, padding=0)}]{\text{Conv2d(in\_channels=3, out\_channels=1)}}
1\times2\times2
1 × 4 × 4 Conv2d(in_channels=1, out_channels=3) (kernel_size=2, stride=1, padding=0) 3 × 3 × 3 Conv2d(in_channels=3, out_channels=1) (kernel_size=2, stride=1, padding=0) 1 × 2 × 2
Transposed convolution dose the similar but reversed thing as convolution. It also selectively connects the neurons and shares parameters of kernel.
卷积:每个输出neuron共享同一个kernel的一组参数(权重)
转置卷积:每个输入neuron共享同一个kernel的一组参数(权重)
1 × 2 × 2 → (kernel_size=2, stride=1, padding=0) ConvTranspose2d(in_channels=1, out_channels=1) 1 × 3 × 3 → (kernel_size=2, stride=1, padding=0) Conv2d(in_channels=1, out_channels=1) 1 × 4 × 4 1\times2\times2
\xrightarrow[\text{(kernel\_size=2, stride=1, padding=0)}]{\text{ConvTranspose2d(in\_channels=1, out\_channels=1)}}
1\times3\times3
\xrightarrow[\text{(kernel\_size=2, stride=1, padding=0)}]{\text{Conv2d(in\_channels=1, out\_channels=1)}}
1\times4\times4
1 × 2 × 2 ConvTranspose2d(in_channels=1, out_channels=1) (kernel_size=2, stride=1, padding=0) 1 × 3 × 3 Conv2d(in_channels=1, out_channels=1) (kernel_size=2, stride=1, padding=0) 1 × 4 × 4