CUDA-CNN
CUDA-CNN | Github
项目简介
这是一个用CUDA实现的 CNN (Convolutional Neural Network),使用MINIST数据集进行训练,epoch=10,耗时35.8s,在测试集上的分类正确率达到96.54%。
运行项目
1 2 3 4 5 6 7
| # clone项目到本地(需要CUDA环境) git clone git@github.com:whut-zhangwx/CUDA-CNN.git cd ./cuda-cnn # 编译项目 make all # 运行项目 ./CNN
|
网络结构
Input LayerConv2d Layer1ActivationConv2d Layer2ActivationFC LayerActivationOutput Layerinput↓Cin=1,Cout=6,kernel=6×5×5,stride=1↓Sigmoid Layer↓Cin=6,Cout=6,kernel=1×4×4,stride=4↓Sigmoid Layer↓fin=216,fout=10↓Sigmoid Layer↓Output(1,28,28)(6,24,24)(6,24,24)(6,6,6)(6,6,6)(10)(10)
Out=(In−Kernal+2×Padding)/Stride+1
Forward propagation
Outin[i][j]=img[i][j],i,j∈{0,1,⋯,27}
Convolution layer1
Kernal size: 6×5×5, Stride: 1
Input size: 1×28×28, Output size: 6×24×24
PreAc1[i2][i3][i4]=i7=0∑4i8=0∑4Weightc1[i2][i7][i8]⋅Outin[i3+i7][i4+i8]+Biasc1[i2]i2∈0,1,⋯,5;i3,i4∈0,1,⋯,23
Activation
Outc1[i2][i3][i4]=1+exp(−PreAc1[i2][i3][i4])1i2∈0,1,⋯,5;i3,i4∈0,1,⋯,23
Convolution layer2
Kernal size: 1×4×4, Stride: 4
Input size: 6×24×24, Output size: 6×6×6
PreAc2[i2][i3][i4]=i5=0∑3i6=0∑3Weightc2[i5][i6]⋅Outc1[i2][4i3+i5][4i4+i6]+Biasc2i2∈0,1,⋯,5;i3,i4∈0,1,⋯,5
Activation
Outc2[i2][i3][i4]=1+exp(−PreAc2[i2][i3][i4])1i2∈0,1,⋯,5;i3,i4∈0,1,⋯,5
Fully Connected Layer
Input size: 6×6×6, Output size: 10
PreAfc[i1]=i2=0∑5i3=0∑5i4=0∑5Weightfc[i1][i2][i3][i4]⋅Outc2[i2][i3][i4]+Biasfc[i1]i1∈0,1,⋯,9
Activation
Outfc[i1]=1+exp(−PreAfc[i1])1i1∈0,1,⋯,9
Loss Function
对于一个样本 (img,label) 的输出 Outfc, 令 err[i],i∈{0,1,⋯,9} 表示每个类别的预估错误
err[i]={Outfc[i],−(1−Outfc[i]),i=labeli=label
其中 err[label] 本应为 1−Outfc[label], 但是为了方便后面求梯度时 ∂err[i]∂Loss⋅∂Outfc[i]∂err[i]=err[i] 的表示, 我们为它加了个负号. 这对计算Loss没有影响, 因为都要平方.
采用预估错误的平方和来计算损失
Loss=21i=0∑9err[i]2=21(outfc[label]−1)2+21i=0,i=label∑9outfc[i]2
Back propagation
Fully Connected Layer
∂Weightfc[i1][i2][i3][i4]∂Loss=∂err[i1]∂Loss⋅∂Outfc[i1]∂err[i1]⋅∂PreAfc[i1]∂Outfc[i1]⋅∂Weightfc[i1][i2][i3][i4]∂PreAfc[i1]=err[i1]⋅1⋅Outfc[i1](1−Outfc[i1])⋅Outc2[i2][i3][i4]
∂Biasfc[i1]∂Loss=∂err[i1]∂Loss⋅∂Outfc[i1]∂err[i1]⋅∂PreAfc[i1]∂Outfc[i1]⋅∂Biasfc[i1]∂PreAfc[i1]=err[i1]⋅1⋅Outfc[i1](1−Outfc[i1])⋅1
Convolution layer2
∂Weightc2[i5][i6]∂Loss=i1=0∑9i2=0∑5i3=0∑5i4=0∑5∂err[i1]∂Loss⋅∂Outfc[i1]∂err[i1]⋅∂PreAfc[i1]∂Outfc[i1]⋅∂Outc2[i2][i3][i4]∂PreAfc[i1]∂PreAc2[i2][i3][i4]∂Outc2[i2][i3][i4]⋅∂Weightc2[i5][i6]∂PreAc2[i2][i3][i4]=i1=0∑9i2=0∑5i3=0∑5i4=0∑5err[i1]⋅1⋅Outfc[i1](1−Outfc[i1])⋅Weightfc[i1][i2][i3][i4]⋅Outc2[i2][i3][i4](1−Outc2[i2][i3][i4])⋅Outc1[i2][4i3+i5][4i4+i6]
∂Biasc2∂Loss=i1=0∑9i2=0∑5i3=0∑5i4=0∑5∂err[i1]∂Loss⋅∂Outfc[i1]∂err[i1]⋅∂PreAfc[i1]∂Outfc[i1]⋅∂Outc2[i2][i3][i4]∂PreAfc[i1]⋅∂PreAc2[i2][i3][i4]∂Outc2[i2][i3][i4]⋅∂Biasc2∂PreAc2[i2][i3][i4]=i1=0∑9i2=0∑5i3=0∑5i4=0∑5err[i1]⋅1⋅Outfc[i1](1−Outfc[i1])⋅Weightfc[i1][i2][i3][i4]⋅Outc2[i2][i3][i4](1−Outc2[i2][i3][i4])⋅1
Convolution layer1
∂Weightc1[i2][i7][i8]∂Loss=i1=0∑9i3=0∑5i4=0∑5i5=0∑3i6=0∑3∂err[i1]∂Loss⋅∂Outfc[i1]∂err[i1]⋅∂PreAfc[i1]∂Outfc[i1]⋅∂Outc2[i2][i3][i4]∂PreAfc[i1]∂PreAc2[i2][i3][i4]∂Outc2[i2][i3][i4]⋅∂Outc1[i2][4i3+i5][4i4+i6]∂PreAc2[i2][i3][i4]⋅∂PreAc1[i2][4i3+i5][4i4+i6]∂Outc1[i2][4i3+i5][4i4+i6]⋅∂Weightc1[i2][i7][i8]∂PreAc1[i2][4i3+i5][4i4+i6]=i1=0∑9i3=0∑5i4=0∑5i5=0∑3i6=0∑3err[i1]⋅1⋅Outfc[i1](1−Outfc[i1])⋅Weightfc[i1][i2][i3][i4]⋅Outc2[i2][i3][i4](1−Outc2[i2][i3][i4])⋅Weightc2[i5][i6]⋅Outc1[i2][4i3+i5][4i4+i6](1−Outc1[i2][4i3+i5][4i4+i6])⋅Outin[4i3+i5+i7][4i4+i6+i8]
∂Biasc1[i2]∂Loss=i1=0∑9i3=0∑5i4=0∑5i5=0∑3i6=0∑3∂err[i1]∂Loss⋅∂Outfc[i1]∂err[i1]⋅∂PreAfc[i1]∂Outfc[i1]⋅∂Outc2[i2][i3][i4]∂PreAfc[i1]∂PreAc2[i2][i3][i4]∂Outc2[i2][i3][i4]⋅∂Outc1[i2][4i3+i5][4i4+i6]∂PreAc2[i2][i3][i4]⋅∂PreAc1[i2][4i3+i5][4i4+i6]∂Outc1[i2][4i3+i5][4i4+i6]⋅∂Biasc1[i2]∂PreAc1[i2][4i3+i5][4i4+i6]=i1=0∑9i3=0∑5i4=0∑5i5=0∑3i6=0∑3err[i1]⋅1⋅Outfc[i1](1−Outfc[i1])⋅Weightfc[i1][i2][i3][i4]⋅Outc2[i2][i3][i4](1−Outc2[i2][i3][i4])⋅Weightc2[i5][i6]⋅Outc1[i2][4i3+i5][4i4+i6](1−Outc1[i2][4i3+i5][4i4+i6])⋅1