ResNet 笔记
The Paper
Deep Residual Learning for Image Recognition | arxiv
Pytorch ResNet implementation from Scratch | Aladdin Persson | YouTube
Comparison of the time complexity between two blocks
-
2nd block of layer conv2_x in 34-layer (left block)
- 3x3, 64→64, strides=1, padding=1, 56x56→56x56
- 3x3, 64→64, strides=1, padding=1, 56x56→56x56
-
2nd block of layer conv2_x in 50-layer (right block)
- 1x1, 256→64, strides=1, padding=0, 56x56→56x56
- 3x3, 64→64, strides=1, padding=1, 56x56→56x56
- 1x1, 64→256, strides=1, padding=0, 56x56→56x56
in_channels64642566464kernel_size3×33×31×13×31×1image_size56×5656×5656×5656×5656×56out_channels64646464256
left block2×642×32×562vsright block2×256×64×562+642×32×562
left−right=(57664∗9−5122∗256)×64×562
In fact, the computational complexity of this two blocks is the same order of magnitude, and the complexity of the block of three layers on the right is even slightly lower.
ResNet-18/34
Note: How to calculate the pixel number of an image before and after convolution
Output_size=⌊(Image_size−Kernal_size+2×Padding)/Stride⌋+1
ResNet34 Structure with details
input: channels 3, size 224x224
layer conv1 in 34-layer:
- 7x7, 3→64, stride=2, padding=3, 224x224→112x112
layer conv2_x in 34-layer:
-
3x3 maxpool, channels=64, stride=2, padding=1, 112x112→56x56
-
1st block
- 3x3, 64→64, strides=1, padding=1, 56x56→56x56
- 3x3, 64→64, strides=1, padding=1, 56x56→56x56
-
2nd block
- 3x3, 64→64, strides=1, padding=1, 56x56→56x56
- 3x3, 64→64, strides=1, padding=1, 56x56→56x56
-
3rd block
- repeat 2nd block
layer conv3_x in 34-layer:
- 1st block
- 3x3, 64→128, strides=2, padding=1, 56x56→28x28
- 3x3, 128→128, strides=1, padding=1, 28x28→28x28
- 2nd block
- 3x3, 128→128, strides=1, padding=1, 28x28→28x28
- 3x3, 128→128, strides=1, padding=1, 28x28→28x28
- 3rd block
- repeat 2nd block
- 4th block
- repeat 2nd block
layer conv4_x in 34-layer:
- 1st block
- 3x3, 128→256, strides=2, padding=1, 28x28→14x14
- 3x3, 256→256, strides=1, padding=1, 14x14→14x14
- 2nd block
- 3x3, 256→256, strides=1, padding=1, 14x14→14x14
- 3x3, 256→256, strides=1, padding=1, 14x14→14x14
- 3rd block
- repeat 2nd block
- 4th block
- repeat 2nd block
- 5th block
- repeat 2nd block
- 6th block
- repeat 2nd block
layer conv5_x in 34-layer:
- 1st block
- 3x3, 256→512, strides=2, padding=1, 14x14→7x7
- 3x3, 512→512, strides=1, padding=1, 7x7→7x7
- 2nd block
- 3x3, 512→512, strides=1, padding=1, 7x7→7x7
- 3x3, 512→512, strides=1, padding=1, 7x7→7x7
- 3rd block
- repeat 2nd block
last layer in 34-layer:
- average pool, channels 512, 7x7→1x1
- fully connected layer, 512→1000
- softmax
Code with Pytorch
1 | import torch |
ResNet-50/101/152
ResNet50 Structure with details
input: channels 3, size 224x224
layer conv1 in 50-layer:
- 7x7, 3→64, stride=2, padding=3, 224x224→112x112
layer conv2_x in 50-layer:
-
3x3 maxpool, stride=2, padding=1, 112x112→56x56
Some implementations set the padding of maxpool to 0, which is also feasible and so the image size will be
112maxpool(k3,s2,p0)⌊(112−3)/2⌋+1=5555→...→55conv(k3,s1,p1)⌊(55−3+2∗1)/2⌋+1=2828
Then things will be same.
-
1st block
- 1x1, 64→64, strides=1, padding=0, 56x56→56x56
- 3x3, 64→64, strides=1, padding=1, 56x56→56x56
- 1x1, 64→256, strides=1, padding=0, 56x56→56x56
-
2nd block
- 1x1, 256→64, strides=1, padding=0, 56x56→56x56
- 3x3, 64→64, strides=1, padding=1, 56x56→56x56
- 1x1, 64→256, strides=1, padding=0, 56x56→56x56
-
3rd block
- repeat 2nd block
layer conv3_x in 50-layer:
- 1st block
- 1x1, 256→128, strides=1, padding=0, 56x56→56x56
- 3x3, 128→128, strides=2, padding=1, 56x56→28x28
- 1x1, 128→512, strides=1, padding=0, 28x28→28x28
- 2nd block
- 1x1, 512→128, strides=1, padding=0, 28x28→28x28
- 3x3, 128→128, strides=1, padding=1, 28x28→28x28
- 1x1, 128→512, strides=1, padding=0, 28x28→28x28
- 3rd block
- repeat 2nd block
- 4th block
- repeat 2nd block
layer conv4_x in 50-layer:
- 1st block
- 1x1, 512→256, strides=1, padding=0, 28x28→28x28
- 3x3, 256→256, strides=2, padding=1, 28x28→14x14
- 1x1, 256→1024, strides=1, padding=0, 14x14→14x14
- 2nd block
- 1x1, 1024→256, strides=1, padding=0, 14x14→14x14
- 3x3, 256→256, strides=1, padding=1, 14x14→14x14
- 1x1, 256→1024, strides=1, padding=0, 14x14→14x14
- 3rd block
- repeat 2nd block
- 4th block
- repeat 2nd block
- 5th block
- repeat 2nd block
- 6th block
- repeat 2nd block
layer conv5_x in 50-layer:
- 1st block
- 1x1, 1024→512, strides=1, padding=0, 14x14→14x14
- 3x3, 512→512, strides=2, padding=1, 14x14→7x7
- 1x1, 512→2048, strides=1, padding=0, 7x7→7x7
- 2nd block
- 1x1, 2048→512, strides=1, padding=0, 7x7→7x7
- 3x3, 512→512, strides=1, padding=1, 7x7→7x7
- 1x1, 512→2048, strides=1, padding=0, 7x7→7x7
- 3rd block
- repeat 2nd block
last layer in 50-layer:
- average pool, 2048, 7x7→1x1
- fully connected layer, 2048→1000
- softmax
image of size 512
input: channels 3, size 512x512
layer conv1 in 50-layer:
- 7x7, 3→64, stride=2, padding=3, 512x512→256x256
layer conv2_x in 50-layer:
-
3x3 maxpool, stride=2, padding=1, 256x256→128x128
Some implementations set the padding of maxpool to 0, which is also feasible and so the image size will be
256maxpool(k3,s2,p0)⌊(256−3)/2⌋+1=127127→...→127conv(k3,s1,p1)⌊(127−3+2∗1)/2⌋+1=6262
Then things will be same.
-
1st block
- 1x1, 64→64, strides=1, padding=0, 128x128→128x128
- 3x3, 64→64, strides=1, padding=1, 128x128→128x128
- 1x1, 64→256, strides=1, padding=0, 128x128→128x128
-
2nd block
- 1x1, 256→64, strides=1, padding=0, 128x128→128x128
- 3x3, 64→64, strides=1, padding=1, 128x128→128x128
- 1x1, 64→256, strides=1, padding=0, 128x128→128x128
-
3rd block
- repeat 2nd block
layer conv3_x in 50-layer:
- 1st block
- 1x1, 256→128, strides=1, padding=0, 128x128→128x128
- 3x3, 128→128, strides=2, padding=1, 128x128→64x64
- 1x1, 128→512, strides=1, padding=0, 64x64→64x64
- 2nd block
- 1x1, 512→128, strides=1, padding=0, 64x64→64x64
- 3x3, 128→128, strides=1, padding=1, 64x64→64x64
- 1x1, 128→512, strides=1, padding=0, 64x64→64x64
- 3rd block
- repeat 2nd block
- 4th block
- repeat 2nd block
layer conv4_x in 50-layer:
- 1st block
- 1x1, 512→256, strides=1, padding=0, 64x64→64x64
- 3x3, 256→256, strides=2, padding=1, 64x64→32x32
- 1x1, 256→1024, strides=1, padding=0, 32x32→32x32
- 2nd block
- 1x1, 1024→256, strides=1, padding=0, 32x32→32x32
- 3x3, 256→256, strides=1, padding=1, 32x32→32x32
- 1x1, 256→1024, strides=1, padding=0, 32x32→32x32
- 3rd block
- repeat 2nd block
- 4th block
- repeat 2nd block
- 5th block
- repeat 2nd block
- 6th block
- repeat 2nd block
layer conv5_x in 50-layer:
- 1st block
- 1x1, 1024→512, strides=1, padding=0, 32x32→32x32
- 3x3, 512→512, strides=2, padding=1, 32x32→16x16
- 1x1, 512→2048, strides=1, padding=0, 16x16→16x16
- 2nd block
- 1x1, 2048→512, strides=1, padding=0, 16x16→16x16
- 3x3, 512→512, strides=1, padding=1, 16x16→16x16
- 1x1, 512→2048, strides=1, padding=0, 16x16→16x16
- 3rd block
- repeat 2nd block
last layer in 50-layer:
- average pool, 2048, 16x16→1x1
- fully connected layer, 2048→1000
- softmax
Code with Pytorch
1 | from typing import List, Tuple |