Pytorch安装
安装miniconda
安装cuda(本台电脑暂时没有独立显卡,暂时跳过)
使用conda创建新的python虚拟环境
1 conda create --name pytorch python=3.9
激活新的python环境
Note: 这里的pytorch仅仅是环境的名字, 可以修改为其它任何你喜欢的环境名.
进入Pytorch官网 , 根据自己的情况进行选择, 注意如果电脑没有独显, 没有安装CUDA时请选择CPU版本
选择好后获得一串安装指令, 将其在之前创建的conda环境下运行
Note1: 此时给出的句子后有 -c pytorch
参数, 这表示从官网下载, 国内速度会比较慢. 如果你已经配置好conda的下载源, 例如清华源, 阿里源等, 删去 -c pytorch
参数即可. 否则请挂VPN再执行命令.
第一次选择Conda安装失败, 无论是去掉 -c pytorch
或者不去掉, 最后都没能正常安装.
第二次选择Pip安装速度过慢30kb/s, Ctrl+C终止安装, 第三次挂梯子Pip安装正常2mb/s.
Note2: CUDA版本的选择针对有支持CUDA独显的电脑, 如果不支持或者无显卡, 则选择 CPU.
Note3: Conda如果安装失败可以尝试使用Pip安装
安装完成后会返回如下信息
1 2 3 Requirement already satisfied: certifi>=2017.4 .17 in c:\users\xiaophai\.conda\envs\pytorch\lib\site-packages (from requests->torchvision) (2022.12 .7 ) Installing collected packages: urllib3, typing-extensions, pillow, numpy, idna, charset-normalizer, torch, requests, torchvision, torchaudio Successfully installed charset-normalizer-2.1 .1 idna-3.4 numpy-1.24 .1 pillow-9.4 .0 requests-2.28 .1 torch-1.13 .1 torchaudio-0.13 .1 torchvision-0.14 .1 typing-extensions-4.4 .0 urllib3-1.26 .13
测试Pytorch是否能正常运行
1 2 3 4 5 6 7 8 9 (pytorch) C:\Users\xiaophai>python Python 3.9 .15 (main, Nov 24 2022 , 14 :39 :17 ) [MSC v.1916 64 bit (AMD64)] on win32 Type "help" , "copyright" , "credits" or "license" for more information.>>> import torch>>> >>> print (torch.__version__)1.13 .1 +cpu>>> print (torch.cuda.is_available())False
Pytorch
官方文档: Pytorch官方文档
中文文档: Pytorch中文文档
Tensor
Tensor的创建
Pytorch的tensor类似于Numpy的ndarrays, 并且Pytorch中的数据(向量, 矩阵)绝大多数都是以tensor类型存储的.
1 2 3 torch.empty(3 ,4 ) torch.zeros(3 ,4 )
1 2 3 4 tensor([[0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. ]])
1 2 3 4 tensor([[1. , 1. , 1. , 1. ], [1. , 1. , 1. , 1. ], [1. , 1. , 1. , 1. ]])
1 2 3 torch.Tensor([[1 ,2 ,3 ,4 ], [4 ,5 ,6 ,7 ], [7 ,8 ,9 ,0 ]])
torch.rand 返回[0,1)上均匀分布的随机数
1 2 3 4 tensor([[0.1591 , 0.5634 , 0.0369 , 0.6377 ], [0.2301 , 0.8195 , 0.9913 , 0.6825 ], [0.8075 , 0.8222 , 0.6498 , 0.5535 ]])
torch.randint
1 torch.randint(low, high, size)
1 2 3 >>> torch.randint(0 , 10 , (16 ,))tensor([9 , 2 , 7 , 9 , 0 , 7 , 1 , 9 , 5 , 1 , 8 , 3 , 5 , 7 , 3 , 0 ])
1 torch.arange(9 ).reshape(3 ,3 )
1 2 3 tensor([[0 , 1 , 2 ], [3 , 4 , 5 ], [6 , 7 , 8 ]])
1 2 array = numpy.ones(2 ,3 ) tensor = torch.from_numpy(array)
1 2 3 4 5 6 [[1. 1. 1. ] [1. 1. 1. ]] tensor([[1. , 1. , 1. ], [1. , 1. , 1. ]], dtype=torch.float64)
1 2 x = torch.zeros(3 ,4 ) print (x.size())
tensor.item()
torch.tensor.item | Torch Docs
item() 是 Pytorch 中 tensor 的一个方法, 用于返回 tensor 的 python 类型的值, 仅仅用于单个元素的tensor
1 2 3 4 5 6 7 8 a = torch.tensor([1 ,2 ,3 ]) print (a[0 ], type (a[0 ]))print (a[0 ].item(), type (a[0 ].item()))print (a.item())
Tensor的索引
1 2 3 x = torch.Tensor([[1 , 2 , 3 , 4 ], [5 , 6 , 7 , 8 ], [9 ,10 ,11 ,12 ]])
tensor的索引与numpy一样, 下标是从0开始
1 2 tensor([5. , 6. , 7. , 8. ])
注意x[,1]
报错
1 2 3 4 tensor([[ 2. ], [ 6. ], [10. ]])
0:3
和python索引一样, 表示[ 0 , 3 ) [0,3) [ 0 , 3 ) , 是左开右闭的
1 2 3 tensor([[ 5. , 6. , 7. ], [ 9. , 10. , 11. ]])
tensor的索引也像python一样支持负数索引, 意味着倒数
1 2 3 tensor([[ 5. , 6. , 7. , 8. ], [ 9. , 10. , 11. , 12. ]])
使用元组可以自定索引的下标
1 2 3 4 tensor([[ 3. , 2. , 1. ], [ 7. , 6. , 5. ], [11. , 10. , 9. ]])
1 2 3 x[[0 ,1 ,2 ],[2 ,1 ,0 ]] x[(0 ,1 ,2 ),(2 ,1 ,0 )]
Tensor的运算
Tensor的加法和减法都是符合矩阵的加减运算的, 同尺寸的Tensor间可以进行加减运算, 或者Tensor和数之间进行加减运算, 但是不同尺寸的Tensor间不能进行加减运算, 否则会报错
RuntimeError: The size of tensor a must match the size of tensor b at non-singleton dimension 1
1 2 3 4 5 result = x+y result = x.add(y) result = torch.add(x,y) x.add_(y)
1 2 3 4 5 result = x-y result = x.sub(y) result = torch.sub(x,y) x.sub_(y)
此处的乘法是同尺寸的Tensor间元素的一对一的乘法, 并非矩阵乘法 . 对Tensor尺寸的要求同矩阵加减法一样, 必须是同尺寸的Tensor间或者两个Tensor中至少有一个为常数, 才可以进行运算.
1 2 3 4 5 result = x*y result = x.multiply(y) result = torch.multiply(x,y) x.multiply_(y)
1 2 3 4 5 result = x/y result = x.div(y) result = torch.div(x,y) x.div_(y)
1 2 3 result = 1 /x result = torch.reciprocal(x) result = x.reciprocal()
参与幂运算的两个Tensor的尺寸必须一样, 结过为对应元素的幂次方
1 2 3 4 result = x**y result = x.pow (y) result = torch.pow (x,y) x.pow_(y)
广播机制
当两个张量的维度不同,对他们进行运算时,需要对维度小的张量进行扩展,扩展成高纬度的张量,这个扩展的过程采用的是广播机制,即对低维度数据进行广播式(拷贝)扩展
满足一下情况的tensor是可以广播的
至少有一个维度
两个tensor维度相等
维度不等,其中一个为1
维度不等,其中一个维度不存在
计算规则
维度不同,小维度的增加维度
每个维度,计算结果取大的
扩展维度是对数值进行复制
1 2 3 4 5 x = torch.Tensor([[1 ], [2 ], [3 ]]) y = torch.Tensor([[1 , 2 ]])
1 2 3 4 tensor([[2. , 3. ], [3. , 4. ], [4. , 5. ]])
Pytorch线性代数
1 A = torch.tensor([[1 ,2 ,3 ],[4 ,5 ,6 ]])
1 2 3 4 5 6 7 tensor([[1 , 2 , 3 ], [4 , 5 , 6 ]]) tensor([[1 , 4 ], [2 , 5 ], [3 , 6 ]])
两个Tensor的尺寸必须满足线代矩阵乘法的规则才可以进行矩阵乘法
1 2 3 A = torch.arange(6 ).reshape(2 ,3 ) B = torch.arange(6 ).reshape(3 ,2 ) AB = torch.mm(A,B)
1 2 3 4 5 6 7 8 9 10 tensor([[0 , 1 , 2 ], [3 , 4 , 5 ]]) tensor([[0 , 1 ], [2 , 3 ], [4 , 5 ]]) tensor([[10 , 13 ], [28 , 40 ]])
torch.mm()
的两个参数必须是矩阵, 需要有两个维度(即使一个维度值为1). 在矩阵和向量的运算里面就需要用到函数torch.mv()
1 2 3 A = torch.arange(6 ).reshape(2 ,3 ) x = torch.arange(3 ) Ax = torch.mv(A,x)
1 2 3 4 5 6 7 8 tensor([[0 , 1 ], [2 , 3 ], [4 , 5 ]]) torch.Size([2 , 3 ]) tensor([0 , 1 , 2 ]) torch.Size([3 ]) tensor([ 5 , 14 ]) torch.Size([2 ])
范数
1 2 x = torch.tensor([3. ,4 ]) Norm = torch.norm(x)
1 2 3 4 tensor([3. , 4. ]) tensor(5. )
注意, 在计算norm的时候, 向量的类型必须是浮点型的, 不能是整型的, 否则会报错
1 2 x = torch.tensor([3 ,4 ]) torch.norm(x)
1 2 RuntimeError: norm(): input dtype should be either floating point or complex . Got Long instead.
自动微分
torch.Tensor.backward | pytorch docs
Autograd mechanics | pytorch docs
李沐-自动微分 ;李沐-07 自动求导【动手学深度学习v2】
Pytorch最核心的一个功能就是通过backward进行自动微分
Pytorch的自动微分是通过计算图 来实现的, 一个函数 z z z 和变量的关系构成一个无环图, 每个结点就是一次+/-/*/sum/mean…的操作, Pytorch会记录这些操作, 然后通过复合函数求导的链式法则 来计算相应的梯度.
Pytorch中的tensor有一个额外的属性叫做 requires_grad , 它默认状态为 False , 当把它设为 True 时, 表示这个tensor是需要被求导数的变量
1 2 3 4 5 x = torch.arange(3 , requires_grad=False , dtype=float ) y = torch.dot(x,x) y.backward() RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Note: 只有浮点数才可以参与求导运算, 所以上面(以及之后)定义tensor时令它的dtype=float
通过requires_grad_()可以设置tensor的
tensor的grad属性用于记录该变量的梯度值, 每次backward操作的梯度值都会累加 在grad中
1 2 3 4 5 6 7 8 x = torch.arange(3 , requires_grad=True , dtype=float ) print (x.grad) y = torch.dot(x,x) y.backward() print (x.grad) z = torch.sum (x) z.backward() print (x.grad)
在第一次没有进行任何backward()操作时, x.grad是None;
进行第一次backward操作后, x.grad的值变成了∂ y ∂ x = [ 0 , 2 , 4 ] \frac{\partial y}{\partial x} = [0,2,4] ∂ x ∂ y = [ 0 , 2 , 4 ] ;
进行第二次backward操作后, x.grad的值在第一次的[ 0 , 2 , 4 ] [0,2,4] [ 0 , 2 , 4 ] 的基础上又加了∂ z ∂ x = [ 1 , 1 , 1 ] \frac{\partial z}{\partial x} = [1,1,1] ∂ x ∂ z = [ 1 , 1 , 1 ] , 变成了[ 1 , 3 , 5 ] [1,3,5] [ 1 , 3 , 5 ] .
由于Pytorch会累加grad, 所以在进行新的backward操作之前一般会使用grad.zero_()
对tensor的梯度进行清零
下面的代码在上面介绍grad的代码的基础上增加了x.grad.zero_()
操作
1 2 3 4 5 6 7 ... print (x.grad) x.grad.zero_() print (x.grad) z = torch.sum (x) z.backward() print (x.grad)
计算图中非叶子结点的张量(中间变量) ,需用.retain_grad()
方法保留中间变量的梯度,否则它的梯度将会在反向传播完成之后被释放掉
z = y 2 , y = x T x = x 1 2 + x 2 2 + x 3 2 ∂ z ∂ x = ∂ z ∂ y ∂ y ∂ x = 2 y [ 2 x 1 , 2 x 2 , 2 x 3 ] = 4 y [ x 1 , x 2 , x 3 ] \begin{gather}
z = y^2,\qquad y = \bold{x}^T\bold{x} = x_1^2+x_2^2+x_3^2\\
\frac{\partial z}{\partial \bold{x}} = \frac{\partial z}{\partial y}
\frac{\partial y}{\partial \bold{x}} = 2y[2x_1,2x_2,2x_3] = 4y[x_1,x_2,x_3]
\end{gather}
z = y 2 , y = x T x = x 1 2 + x 2 2 + x 3 2 ∂ x ∂ z = ∂ y ∂ z ∂ x ∂ y = 2 y [ 2 x 1 , 2 x 2 , 2 x 3 ] = 4 y [ x 1 , x 2 , x 3 ]
1 2 3 4 5 6 7 8 9 x = torch.arange(3 , requires_grad=True , dtype=float ) y = torch.dot(x,x) print (y.requires_grad) z = y**2 print (z.requires_grad) z.backward() print (y.grad) print (x.grad)
在上面的示例中可以看到, y作为中间变量, 它的requires_grad为True, 但是 z 在做backward操作之后, 它的 grad 为 None. 并且编译器给了一个Warning.
1 UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute woult not be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
在上面代码的基础上加一句 y.retain_grad()
, 以此来保留中间变量y的梯度. 加上之后, 就可以看到y.grad的值了.
1 2 3 4 5 ...... y.retain_grad() z.backward() print (y.grad) print (x.grad)
dim
Pytorch的dim是从0开始算的, 一个 2 × 3 × 4 2\times3\times4 2 × 3 × 4 的 Tensor 如下所示
t e n s o r ( [ 0 [ 1 [ 2 a 000 , a 001 , a 002 , a 003 ] , [ a 010 , a 011 , a 012 , a 013 ] , [ a 020 , a 021 , a 022 , a 023 ] ] , [ [ a 100 , a 101 , a 102 , a 103 ] , [ a 110 , a 111 , a 112 , a 113 ] , [ a 120 , a 121 , a 122 , a 123 ] ] ] ) \begin{split}
\rm tensor
(
\textcolor{red}{\overset{0}{\boldsymbol{[}}}
\textcolor{blue}{\overset{1}{\boldsymbol{[}}}
&\textcolor{green}{\overset{2}{\boldsymbol{[}}}
a_{000},a_{001},a_{002},a_{003}\textcolor{green}{\boldsymbol{]}},\\
&\textcolor{green}{\boldsymbol{[}}a_{010},a_{011},a_{012},a_{013}\textcolor{green}{\boldsymbol{]}},\\
&\textcolor{green}{\boldsymbol{[}}a_{020},a_{021},a_{022},a_{023}\textcolor{green}{\boldsymbol{]}}
\textcolor{blue}{\boldsymbol{]}},\\\\
\textcolor{blue}{\boldsymbol{[}}
&\textcolor{green}{\boldsymbol{[}}a_{100},a_{101},a_{102},a_{103}\textcolor{green}{\boldsymbol{]}},\\
&\textcolor{green}{\boldsymbol{[}}a_{110},a_{111},a_{112},a_{113}\textcolor{green}{\boldsymbol{]}},\\
&\textcolor{green}{\boldsymbol{[}}a_{120},a_{121},a_{122},a_{123}\textcolor{green}{\boldsymbol{]}}
\textcolor{blue}{\boldsymbol{]}}
\textcolor{red}{\boldsymbol{]}}
)
\end{split}
tensor ( [ 0 [ 1 [ [ 2 a 000 , a 001 , a 002 , a 003 ] , [ a 010 , a 011 , a 012 , a 013 ] , [ a 020 , a 021 , a 022 , a 023 ] ] , [ a 100 , a 101 , a 102 , a 103 ] , [ a 110 , a 111 , a 112 , a 113 ] , [ a 120 , a 121 , a 122 , a 123 ] ] ] )
对一个 3 维的 d 0 × d 1 × d 2 d_0\times d_1\times d_2 d 0 × d 1 × d 2 的Tensor进行sum
操作, 指定 d i m ∈ { 0 , 1 , 2 } \rm dim\in\{0,1,2\} dim ∈ { 0 , 1 , 2 } , 分别得到
∑ i ∈ d 0 a i j k ∑ j ∈ d 1 a i j k ∑ k ∈ d 2 a i j k \sum_{i\in d_0} a_{ijk} \qquad \sum_{j\in d_1} a_{ijk} \qquad \sum_{k\in d_2} a_{ijk}
i ∈ d 0 ∑ a ijk j ∈ d 1 ∑ a ijk k ∈ d 2 ∑ a ijk
例如分别指定 dim=0,1,2
对 torch.ones(2,3,4)
进行求和
1 2 3 4 5 6 7 8 tensor([[[1. , 1. , 1. , 1. ], [1. , 1. , 1. , 1. ], [1. , 1. , 1. , 1. ]], [[1. , 1. , 1. , 1. ], [1. , 1. , 1. , 1. ], [1. , 1. , 1. , 1. ]]])
1 2 3 4 tensor([[2. , 2. , 2. , 2. ], [2. , 2. , 2. , 2. ], [2. , 2. , 2. , 2. ]])
1 2 3 tensor([[3. , 3. , 3. , 3. ], [3. , 3. , 3. , 3. ]])
1 2 3 tensor([[4. , 4. , 4. ], [4. , 4. , 4. ]])
Logistic Regression
分类问题中比较的不是数值的大小, 例如在MNIST手写数字数据集中, 所要算的不是输入图片的某种大小, 而是它属于0-9这10个数字类别的概率
P r ( x ∈ 0 ) , P r ( x ∈ 1 ) , ⋯ , P r ( x ∈ 9 ) Pr(x\in 0),Pr(x\in1),\cdots,Pr(x\in9)
P r ( x ∈ 0 ) , P r ( x ∈ 1 ) , ⋯ , P r ( x ∈ 9 )
Logistic Function
σ ( x ) = 1 1 + e − x , x ∈ R \sigma(x) = \frac{1}{1+e^{-x}},\quad x\in \R
σ ( x ) = 1 + e − x 1 , x ∈ R
其它的一些 sigmoid 函数
e r f ( π 2 x ) , x 1 + x 2 , tanh ( x ) {\rm erf}(\frac{\sqrt{\pi}}{2}x),\quad \frac{x}{\sqrt{1+x^2}},\quad \tanh(x)
erf ( 2 π x ) , 1 + x 2 x , tanh ( x )
2 π arctan ( π 2 x ) , 2 π g d ( π 2 x ) , x 1 + ∣ x ∣ \frac{2}{\pi}\arctan(\frac{\pi}{2}x),\quad \frac{2}{\pi}gd(\frac{\pi}{2}x),\quad \frac{x}{1+|x|}
π 2 arctan ( 2 π x ) , π 2 g d ( 2 π x ) , 1 + ∣ x ∣ x
Softmax
设神经网络最后一层的输出 z = [ z 1 , z 2 , ⋯ , z n ] ∈ R n \bold{z} = [z_1,z_2,\cdots,z_n]\in\R^n z = [ z 1 , z 2 , ⋯ , z n ] ∈ R n , 它的Softmax函数值为
[ e z 1 e z 1 + ⋯ + e z n , e z 2 e z 1 + ⋯ + e z n , ⋯ , e z n e z 1 + ⋯ + e z n ] \begin{bmatrix}
\frac{e^{z_1}}{e^{z_1}+\cdots+e^{z_n}},
\frac{e^{z_2}}{e^{z_1}+\cdots+e^{z_n}},
\cdots,
\frac{e^{z_n}}{e^{z_1}+\cdots+e^{z_n}}
\end{bmatrix}
[ e z 1 + ⋯ + e z n e z 1 , e z 1 + ⋯ + e z n e z 2 , ⋯ , e z 1 + ⋯ + e z n e z n ]
第 i i i 个Softmax值 e z i e z 1 + ⋯ + e z n \frac{e^{z_i}}{e^{z_1}+\cdots+e^{z_n}} e z 1 + ⋯ + e z n e z i 表示输入样本为第 i i i 个类别的概率.
[ z 1 z 2 ⋮ z n ] ⟶ S o f t m a x [ e z 1 ∑ e z i e z 2 ∑ e z i ⋮ e z n ∑ e z i ] ⟶ C r o s s E n t r o p y ∑ ( − y 1 ⋅ log ( e z 1 ∑ e z i ) − y 2 ⋅ log ( e z 2 ∑ e z i ) ⋮ − y n ⋅ log ( e z n ∑ e z i ) ) = o n e − h o t − log ( e z j ∑ e z i ) \begin{bmatrix}
z_1\\z_2\\\vdots\\z_n
\end{bmatrix}
\overset{\rm Softmax}{\longrightarrow}
\begin{bmatrix}
\frac{e^{z_1}}{\sum e^{z_i}}\\\frac{e^{z_2}}{\sum e^{z_i}}\\\vdots\\\frac{e^{z_n}}{\sum e^{z_i}}
\end{bmatrix}
\overset{\rm CrossEntropy}{\longrightarrow}
\sum
\begin{pmatrix}
-y_1\cdot\log(\frac{e^{z_1}}{\sum e^{z_i}})\\
-y_2\cdot\log(\frac{e^{z_2}}{\sum e^{z_i}})\\
\vdots\\
-y_n\cdot\log(\frac{e^{z_n}}{\sum e^{z_i}})
\end{pmatrix}
\overset{\rm one-hot}{=}
-\log(\frac{e^{z_j}}{\sum e^{z_i}})
z 1 z 2 ⋮ z n ⟶ Softmax ∑ e z i e z 1 ∑ e z i e z 2 ⋮ ∑ e z i e z n ⟶ CrossEntropy ∑ − y 1 ⋅ log ( ∑ e z i e z 1 ) − y 2 ⋅ log ( ∑ e z i e z 2 ) ⋮ − y n ⋅ log ( ∑ e z i e z n ) = one − hot − log ( ∑ e z i e z j )
其中 y = [ y 1 , y 2 , ⋯ , y n ] \bold{y}=[y_1,y_2,\cdots,y_n] y = [ y 1 , y 2 , ⋯ , y n ] 是类别标签的one-hot独热编码, 例如 [ 1 , 0 , ⋯ , 0 ] [1,0,\cdots,0] [ 1 , 0 , ⋯ , 0 ] 表示第1个类别的标签, [ 0 , 1 , ⋯ , 0 ] [0,1,\cdots,0] [ 0 , 1 , ⋯ , 0 ] 表示第2个类别的标签, [ 0 , 0 , ⋯ , 1 ] [0,0,\cdots,1] [ 0 , 0 , ⋯ , 1 ] 表示第n个类别的标签. 由于独热编码只有一个位置的值是1, 其它位置的值都是0, 所以最后只剩下一项, 其他项全为0.
softmax的导数
令网络的的输出为
z = [ z 1 , z 2 , ⋯ , z n ] \bold{z} = [z_1,z_2,\cdots,z_n]
z = [ z 1 , z 2 , ⋯ , z n ]
Softmax的输出为
y ^ = [ y ^ 1 , y ^ 2 , ⋯ , y ^ n ] = s o f t m a x ( z ) = [ e z 1 ∑ e z i , e z 2 ∑ e z i , ⋯ , e z n ∑ e z i ] \bold{\hat{y}} = [\hat{y}_1,\hat{y}_2,\cdots,\hat{y}_n] = {\rm softmax}(\bold{z}) = [\frac{e^{z_1}}{\sum e^{z_i}},\frac{e^{z_2}}{\sum e^{z_i}},\cdots,\frac{e^{z_n}}{\sum e^{z_i}}]
y ^ = [ y ^ 1 , y ^ 2 , ⋯ , y ^ n ] = softmax ( z ) = [ ∑ e z i e z 1 , ∑ e z i e z 2 , ⋯ , ∑ e z i e z n ]
第 j j j 个类别的标签为
y ( j ) = [ 0 1 , ⋯ , 0 j − 1 , 1 j , 0 j + 1 , ⋯ , 0 n ] \bold{y}^{(j)} = [\underset{1}{0},\cdots,\underset{j-1}{0},\underset{j}{1},\underset{j+1}{0},\cdots,\underset{n}{0}]
y ( j ) = [ 1 0 , ⋯ , j − 1 0 , j 1 , j + 1 0 , ⋯ , n 0 ]
计算 y \bold{y} y 和 y ^ \bold{\hat{y}} y ^ 两者的交叉熵得到
C r o s s E n t r o p y ( y , y ^ ) = − ∑ j = 1 n y j log ( e z j ∑ i = 1 n e z i ) = ∑ j = 1 n ( y j [ log ( ∑ i = 1 n e z i ) − log ( e z j ) ] ) = ∑ j = 1 n y j log ( ∑ i = 1 n e z i ) − ∑ j = 1 n y j z j ( ∑ y i = 1 ) → = log ( ∑ i = 1 n e z i ) ⋅ ∑ j = 1 n y j − ∑ j = 1 n y j z j = log ( ∑ i = 1 n e z i ) − ∑ j = 1 n y j z j \begin{split}
{\rm CrossEntropy}(\bold{y},\bold{\hat{y}})
&= -\sum_{j=1}^n y_j\log(\frac{e^{z_j}}{\sum_{i=1}^n e^{z_i}})\\
&= \sum_{j=1}^n\left(y_j\left[\log(\sum_{i=1}^n e^{z_i}) - \log(e^{z_j})\right]\right)\\
&= \sum_{j=1}^ny_j\log(\sum_{i=1}^n e^{z_i}) - \sum_{j=1}^ny_jz_j\\
(\sum y_i = 1)\rightarrow&=
\log(\sum_{i=1}^n e^{z_i})\cdot\sum_{j=1}^ny_j - \sum_{j=1}^ny_jz_j\\
&= \log(\sum_{i=1}^n e^{z_i}) - \sum_{j=1}^ny_jz_j
\end{split}
CrossEntropy ( y , y ^ ) ( ∑ y i = 1 ) → = − j = 1 ∑ n y j log ( ∑ i = 1 n e z i e z j ) = j = 1 ∑ n ( y j [ log ( i = 1 ∑ n e z i ) − log ( e z j ) ] ) = j = 1 ∑ n y j log ( i = 1 ∑ n e z i ) − j = 1 ∑ n y j z j = log ( i = 1 ∑ n e z i ) ⋅ j = 1 ∑ n y j − j = 1 ∑ n y j z j = log ( i = 1 ∑ n e z i ) − j = 1 ∑ n y j z j
特别的对于one-hot编码的标签 y ( j ) \bold{y}^{(j)} y ( j )
C r o s s E n t r o p y ( y ( j ) , y ^ ) = log ( ∑ i = 1 n e z i ) − z j {\rm CrossEntropy}(\bold{y}^{(j)},\bold{\hat{y}})
= \log(\sum_{i=1}^n e^{z_i}) - z_j
CrossEntropy ( y ( j ) , y ^ ) = log ( i = 1 ∑ n e z i ) − z j
计算导数方法一
[
\begin{split}
{\rm CrossEntropy}(\bold{y},\bold{\hat{y}}) &=
{\rm CrossEntropy}\big(\bold{y},{\rm softmax}(\bold{z})\big)\
\bold{y} 和 \bold{z} 是行向量\rightarrow&=
-\bold{y}\log\left[{\rm softmax}(\bold{z}^T)\right]\
(\cdot,\cdot)为内积运算\rightarrow&=
-\Big(\bold{y},\log\left[{\rm softmax}(\bold{z})\right]\Big)
\end{split}
]
所以
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 \begin{split} \frac{\partial\ {\rm CrossEntropy}(\bold{y},\bold{\hat{y}})}{\partial \bold{z}} &= \frac{\partial\ {\rm CrossEntropy}(\bold{y},\bold{\hat{y}})}{\partial \bold{z}}\\ &= -\frac{\partial\Big(\bold{y},\log\left[{\rm softmax}(\bold{z})\right]\Big)}{\partial\bold{z}}\\ &= -\frac{\partial\Big(\bold{y},\log(\bold{\hat{y}})\Big)}{\partial\log(\bold{\hat{y}})} \frac{\partial\log(\bold{\hat{y}}^T)}{\partial\bold{\hat{y}}} \frac{\partial\bold{\hat{y}}^T}{\partial\bold{z}}\\ &= -\bold{y} \begin{bmatrix} \frac{1}{\hat{y}_1}&0&\cdots&0\\ 0&\frac{1}{\hat{y}_2}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\frac{1}{\hat{y}_n}\\ \end{bmatrix} \frac{\partial\bold{\hat{y}}^T}{\partial\bold{z}}\\ &= -\bold{y} \left(E- \begin{bmatrix} \hat{y}_1&\hat{y}_2&\cdots&\hat{y}_n\\ \hat{y}_1&\hat{y}_2&\cdots&\hat{y}_n\\ \vdots&\vdots&\ddots&\vdots\\ \hat{y}_1&\hat{y}_2&\cdots&\hat{y}_n\\ \end{bmatrix} \right)\\ &= \bold{y} \begin{bmatrix} \hat{y}_1&\hat{y}_2&\cdots&\hat{y}_n\\ \hat{y}_1&\hat{y}_2&\cdots&\hat{y}_n\\ \vdots&\vdots&\ddots&\vdots\\ \hat{y}_1&\hat{y}_2&\cdots&\hat{y}_n\\ \end{bmatrix} - \bold{y}\\ &= \begin{bmatrix} \hat{y}_1\sum y_i&\hat{y}_2\sum y_i&\cdots&\hat{y}_n\sum y_i \end{bmatrix} -\bold{y}\\ (\sum y_i = 1)\rightarrow&= \begin{bmatrix} \hat{y}_1&\hat{y}_2&\cdots&\hat{y}_n \end{bmatrix} -\bold{y}\\ &= {\rm softmax}(\bold{z})-\bold{y} \end{split}
其中
∂ y ^ T ∂ z = ∂ s o f t m a x ( z T ) ∂ z = ∂ [ e z 1 e z 1 + ⋯ + e z n , e z 2 e z 1 + ⋯ + e z n , ⋯ , e z n e z 1 + ⋯ + e z n ] T ∂ [ z 1 , z 2 , ⋯ , z n ] = [ y ^ 1 ( 1 − y ^ 1 ) y ^ 2 ( − y ^ 1 ) ⋯ y ^ n ( − y ^ 1 ) y ^ 1 ( − y ^ 2 ) y ^ 2 ( 1 − y ^ 2 ) ⋯ y ^ n ( − y ^ 2 ) ⋮ ⋮ ⋱ ⋮ y ^ 1 ( − y ^ n ) y ^ 2 ( − y ^ n ) ⋯ y ^ n ( 1 − y ^ n ) ] = [ y ^ 1 0 ⋯ 0 0 y ^ 2 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ y ^ n ] ( E − [ y ^ 1 y ^ 2 ⋯ y ^ n y ^ 1 y ^ 2 ⋯ y ^ n ⋮ ⋮ ⋱ ⋮ y ^ 1 y ^ 2 ⋯ y ^ n ] ) \begin{split}
\frac{\partial\bold{\hat{y}}^T}{\partial \bold{z}}
=\frac{\partial\ {\rm softmax(\bold{z}^T)}}{\partial \bold{z}}
&=
\frac{\partial
\begin{bmatrix}
\frac{e^{z_1}}{e^{z_1}+\cdots+e^{z_n}},
\frac{e^{z_2}}{e^{z_1}+\cdots+e^{z_n}},
\cdots,
\frac{e^{z_n}}{e^{z_1}+\cdots+e^{z_n}}
\end{bmatrix}^T}
{\partial [z_1,z_2,\cdots,z_n]}\\
&=
\begin{bmatrix}
\hat{y}_1(1-\hat{y}_1) & \hat{y}_2(-\hat{y}_1) & \cdots & \hat{y}_n(-\hat{y}_1)\\
\hat{y}_1(-\hat{y}_2) & \hat{y}_2(1-\hat{y}_2) & \cdots & \hat{y}_n(-\hat{y}_2)\\
\vdots&\vdots&\ddots&\vdots\\
\hat{y}_1(-\hat{y}_n) & \hat{y}_2(-\hat{y}_n) & \cdots & \hat{y}_n(1-\hat{y}_n)\\
\end{bmatrix}\\
&=
\begin{bmatrix}
\hat{y}_1&0&\cdots&0\\
0&\hat{y}_2&\cdots&0\\
\vdots&\vdots&\ddots&\vdots\\
0&0&\cdots&\hat{y}_n\\
\end{bmatrix}
\left(E-
\begin{bmatrix}
\hat{y}_1&\hat{y}_2&\cdots&\hat{y}_n\\
\hat{y}_1&\hat{y}_2&\cdots&\hat{y}_n\\
\vdots&\vdots&\ddots&\vdots\\
\hat{y}_1&\hat{y}_2&\cdots&\hat{y}_n\\
\end{bmatrix}
\right)
\end{split}
∂ z ∂ y ^ T = ∂ z ∂ softmax ( z T ) = ∂ [ z 1 , z 2 , ⋯ , z n ] ∂ [ e z 1 + ⋯ + e z n e z 1 , e z 1 + ⋯ + e z n e z 2 , ⋯ , e z 1 + ⋯ + e z n e z n ] T = y ^ 1 ( 1 − y ^ 1 ) y ^ 1 ( − y ^ 2 ) ⋮ y ^ 1 ( − y ^ n ) y ^ 2 ( − y ^ 1 ) y ^ 2 ( 1 − y ^ 2 ) ⋮ y ^ 2 ( − y ^ n ) ⋯ ⋯ ⋱ ⋯ y ^ n ( − y ^ 1 ) y ^ n ( − y ^ 2 ) ⋮ y ^ n ( 1 − y ^ n ) = y ^ 1 0 ⋮ 0 0 y ^ 2 ⋮ 0 ⋯ ⋯ ⋱ ⋯ 0 0 ⋮ y ^ n E − y ^ 1 y ^ 1 ⋮ y ^ 1 y ^ 2 y ^ 2 ⋮ y ^ 2 ⋯ ⋯ ⋱ ⋯ y ^ n y ^ n ⋮ y ^ n
计算导数方法二
求交叉熵关于 z \bold{z} z 的导数
∂ C E ( y , y ^ ) ∂ z = [ ∂ C E ∂ z 1 , ∂ C E ∂ z 2 , ⋯ , ∂ C E ∂ z n ] = [ s m x ( z 1 ) − y 1 , s m x ( z 2 ) − y 2 , ⋯ , s m x ( z n ) − y n ] = s o f t m a x ( [ z 1 , z 2 , ⋯ , z n ] ) − [ y 1 , y 2 , ⋯ , y n ] = s o f t m a x ( z ) − y \begin{split}
\frac{\partial\ {\rm CE}(\bold{y},\bold{\hat{y}})}{\partial \bold{z}}
&=
\begin{bmatrix}
\frac{\partial\ {\rm CE}}{\partial z_1},
\frac{\partial\ {\rm CE}}{\partial z_2},
\cdots,
\frac{\partial\ {\rm CE}}{\partial z_n}
\end{bmatrix}\\
&=
\begin{bmatrix}
{\rm smx}(z_1)-y_1,&{\rm smx}(z_2)-y_2&,\cdots,&{\rm smx}(z_n)-y_n
\end{bmatrix}\\
&=
{\rm softmax}([z_1,z_2,\cdots,z_n])-[y_1,y_2,\cdots,y_n]\\
&=
{\rm softmax}(\bold{z})-\bold{y}
\end{split}
∂ z ∂ CE ( y , y ^ ) = [ ∂ z 1 ∂ CE , ∂ z 2 ∂ CE , ⋯ , ∂ z n ∂ CE ] = [ smx ( z 1 ) − y 1 , smx ( z 2 ) − y 2 , ⋯ , smx ( z n ) − y n ] = softmax ([ z 1 , z 2 , ⋯ , z n ]) − [ y 1 , y 2 , ⋯ , y n ] = softmax ( z ) − y
其中
C r o s s E n t r o p y ( y , y ^ ) = log ( ∑ i = 1 n e z i ) − ∑ j = 1 n y j z j {\rm CrossEntropy}(\bold{y},\bold{\hat{y}})
= \log(\sum_{i=1}^n e^{z_i}) - \sum_{j=1}^ny_jz_j
CrossEntropy ( y , y ^ ) = log ( i = 1 ∑ n e z i ) − j = 1 ∑ n y j z j
∂ C E ( y , y ^ ) ∂ z i = ∂ ∂ z i log ( ∑ i = 1 n e z i ) − ∂ ∂ z i ∑ j = 1 n y j z j = e z i ∑ k = 1 n e z k − y i = s o f t m a x ( z ) i − y i \begin{split}
\frac{\partial\ {\rm CE}(\bold{y},\bold{\hat{y}})}{\partial z_i}
&=
\frac{\partial}{\partial z_i}\log(\sum_{i=1}^n e^{z_i}) - \frac{\partial}{\partial z_i}\sum_{j=1}^ny_jz_j\\
&=
\frac{e^{z_i}}{\sum_{k=1}^ne^{z_k}} - y_i\\
&=
{\rm softmax}(\bold{z})_i - y_i
\end{split}
∂ z i ∂ CE ( y , y ^ ) = ∂ z i ∂ log ( i = 1 ∑ n e z i ) − ∂ z i ∂ j = 1 ∑ n y j z j = ∑ k = 1 n e z k e z i − y i = softmax ( z ) i − y i
draft
s o f t m a x ( z ) = [ e z 1 ∑ e z i , e z 2 ∑ e z i , ⋯ , e z n ∑ e z i ] {\rm softmax}(\bold{z}) = [\frac{e^{z_1}}{\sum e^{z_i}},\frac{e^{z_2}}{\sum e^{z_i}},\cdots,\frac{e^{z_n}}{\sum e^{z_i}}]
softmax ( z ) = [ ∑ e z i e z 1 , ∑ e z i e z 2 , ⋯ , ∑ e z i e z n ]
C r o s s E n t r o p y ( y , y ^ ) = C r o s s E n t r o p y ( y , s o f t m a x ( z ) ) y 和 z 是行向量 → = − y log [ s o f t m a x ( z T ) ] ( ⋅ , ⋅ ) 为内积运算 → = − ( y , log [ s o f t m a x ( z ) ] ) \begin{split}
{\rm CrossEntropy}(\bold{y},\bold{\hat{y}}) &=
{\rm CrossEntropy}\big(\bold{y},{\rm softmax}(\bold{z})\big)\\
\bold{y} 和 \bold{z} 是行向量\rightarrow&=
-\bold{y}\log\left[{\rm softmax}(\bold{z}^T)\right]\\
(\cdot,\cdot)为内积运算\rightarrow&=
-\Big(\bold{y},\log\left[{\rm softmax}(\bold{z})\right]\Big)
\end{split}
CrossEntropy ( y , y ^ ) y 和 z 是行向量 → ( ⋅ , ⋅ ) 为内积运算 → = CrossEntropy ( y , softmax ( z ) ) = − y log [ softmax ( z T ) ] = − ( y , log [ softmax ( z ) ] )
l o g ∘ s o f t m a x ( z ) = z − log ( ∑ e z ) \begin{array}{c}
\rm log \circ softmax(\bold{z})
=\bold{z}-\log(\sum\bold{e^z})
\end{array}
log ∘ softmax ( z ) = z − log ( ∑ e z )
PloyLoss
draft
log ( x + 1 ) = x − x 2 2 + x 3 3 + ⋯ + ( − 1 ) n + 1 x n n + O ( x n ) \log(x+1) = x - \frac{x^2}{2} + \frac{x^3}{3}+\cdots+(-1)^{n+1}\frac{x^n}{n}+O(x^n)
log ( x + 1 ) = x − 2 x 2 + 3 x 3 + ⋯ + ( − 1 ) n + 1 n x n + O ( x n )
log ( x + 1 ) x ∼ 1 ( x → 0 ) \frac{\log(x+1)}{x} \sim 1 \quad (x \rightarrow 0)
x log ( x + 1 ) ∼ 1 ( x → 0 )
log ( x ) = log ( ( x − 1 ) + 1 ) = ( x − 1 ) − ( x − 1 ) 2 2 + ( x − 1 ) 3 3 + ⋯ + ( − 1 ) ( n + 1 ) ( x − 1 ) n n + O ( x n ) \log(x) = \log((x-1)+1) = (x-1) - \frac{(x-1)}{2}^2 + \frac{(x-1)^3}{3}+\cdots+(-1)^{(n+1)}\frac{(x-1)^n}{n}+O(x^n)
log ( x ) = log (( x − 1 ) + 1 ) = ( x − 1 ) − 2 ( x − 1 ) 2 + 3 ( x − 1 ) 3 + ⋯ + ( − 1 ) ( n + 1 ) n ( x − 1 ) n + O ( x n )
log ( x ) x − 1 ∼ 1 ( x → 1 ) \frac{\log(x)}{x-1} \sim 1 \quad(x\rightarrow 1)
x − 1 log ( x ) ∼ 1 ( x → 1 )
log ( x ) = ( x − 1 ) + O ( ( x − 1 ) 2 ) \log(x) = (x-1) + O((x-1)^2)
log ( x ) = ( x − 1 ) + O (( x − 1 ) 2 )
全连接神经网络FCN
FCN on MNIST
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 import torchfrom torchvision import transforms from torchvision import datasetsfrom torch.utils.data import DataLoaderimport torch.nn.functional as F import torch.optim as optim batch_size = 64 transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307 ,),(0.3081 ,)) ]) train_dataset = datasets.MNIST(root='../dataset/mnist/' , train=True , download=False , transform=transform) test_dataset = datasets.MNIST(root='../dataset/mnist/' , train=False , download=False , transform=transform) train_loader = DataLoader(train_dataset, shuffle=True , batch_size=batch_size) test_loader = DataLoader(test_dataset, shuffle=False , batch_size=batch_size) class Net (torch.nn.Module): def __init__ (self ): super (Net, self).__init__() self.l1 = torch.nn.Linear(784 , 512 ) self.l2 = torch.nn.Linear(512 , 256 ) self.l3 = torch.nn.Linear(256 , 128 ) self.l4 = torch.nn.Linear(128 , 64 ) self.l5 = torch.nn.Linear(64 , 10 ) def forward (self, x ): x = x.view(-1 , 784 ) x = F.relu(self.l1(x)) x = F.relu(self.l2(x)) x = F.relu(self.l3(x)) x = F.relu(self.l4(x)) return self.l5(x) model = Net() criterion = torch.nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=0.01 , momentum=0.5 ) def train (epoch ): running_loss = 0.0 for batch_idx, data in enumerate (train_loader, 0 ): inputs, target = data optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, target) loss.backward() optimizer.step() running_loss += loss.item() if batch_idx % 300 == 299 : print ('[%d, %5d] loss: %.3f' % (epoch + 1 , batch_idx + 1 , running_loss / 300 )) running_loss = 0.0 def test (): correct = 0 total = 0 with torch.no_grad(): for data in test_loader: images, labels = data outputs = model(images) _, predicted = torch.max (outputs.data, dim=1 ) total += labels.size(0 ) correct += (predicted == labels).sum ().item() print ('Accuracy on test set: %d %%' % (100 * correct / total)) if __name__=='__main__' : for epoch in range (10 ): train(epoch) test()
卷积神经网络CNN
输入输出尺寸的计算
O u t = ( I n − K e r n a l + 2 P a d d i n g ) / S t r i d e + 1 Out = (In - Kernal + 2Padding)/Stride + 1
O u t = ( I n − Ker na l + 2 P a dd in g ) / St r i d e + 1
演示1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 import torchin_channels, out_channels = 5 , 10 width, height = 100 , 100 kernel_size = 3 batch_size = 1 input = torch.randn(batch_size, in_channels, width, height) conv_layer = torch.nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size) output = conv_layer(input ) print (input .shape) print (output.shape) print (conv_layer.weight.shape)
演示2 padding操作
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 import torchconv_layer = torch.nn.Conv2d(1 , 1 , kernel_size=3 , padding=1 , bias=False ) kernel = torch.arange(1 , 10 , dtype=torch.float32).view(1 ,1 ,3 ,3 ) conv_layer.weight.data = kernel.data input = torch.ones(1 , 1 , 5 , 5 , dtype=torch.float32)output = conv_layer(input ) print (output.size()) print (output)
演示3 MaxPool
1 2 3 4 5 6 7 8 9 10 11 12 import torchmaxpooling_layer = torch.nn.MaxPool2d(kernel_size=2 ) input = torch.ones(1 , 1 , 5 , 5 , dtype=torch.float32)output = maxpooling_layer(input ) print (output.size()) print (output)
演示4 CNN
网络结构
Input Layer input ↓ ( b a t c h , 1 , 28 , 28 ) Conv2d Layer1 C i n = 1 , C o u t = 10 , k e r n e l = 5 ↓ ( b a t c h , 10 , 24 , 24 ) Rule Layer1 Rule Layer ↓ ( b a t c h , 10 , 24 , 24 ) Pooling Layer1 k e r n e l = 2 × 2 ↓ ( b a t c h , 10 , 12 , 12 ) Conv2d Layer2 C i n = 10 , C o u t = 20 , k e r n e l = 5 ↓ ( b a t c h , 20 , 8 , 8 ) Rule Layer2 Rule Layer ↓ ( b a t c h , 20 , 8 , 8 ) Pooling Layer2 k e r n e l = 2 × 2 ↓ ( b a t c h , 20 , 4 , 4 ) → ( b a t c h , 320 ) Linear Layer f i n = 320 , f o u t = 10 ↓ ( b a t c h , 10 ) Output Layer Output \begin{array}{rcl}
\text{Input Layer} & \fbox{input}\\
&\downarrow & (batch,1,28,28)\\
\text{Conv2d Layer1} & \fbox{$C_{in}=1,C_{out}=10,kernel=5$}\\
&\downarrow & (batch,10,24,24)\\
\text{Rule Layer1} & \fbox{Rule Layer}\\
&\downarrow & (batch,10,24,24)\\
\text{Pooling Layer1} & \fbox{$kernel=2\times2$}\\
&\downarrow & (batch,10,12,12)\\
\text{Conv2d Layer2} & \fbox{$C_{in}=10,C_{out}=20,kernel=5$}\\
&\downarrow & (batch,20,8,8)\\
\text{Rule Layer2} & \fbox{Rule Layer}\\
&\downarrow & (batch,20,8,8)\\
\text{Pooling Layer2} & \fbox{$kernel=2\times2$}\\
&\downarrow & (batch,20,4,4) \rightarrow (batch,320)\\
\text{Linear Layer} & \fbox{$f_{in}=320,f_{out}=10$}\\
&\downarrow & (batch,10)\\
\text{Output Layer} & \fbox{Output}
\end{array}
Input Layer Conv2d Layer1 Rule Layer1 Pooling Layer1 Conv2d Layer2 Rule Layer2 Pooling Layer2 Linear Layer Output Layer input ↓ C in = 1 , C o u t = 10 , k er n e l = 5 ↓ Rule Layer ↓ k er n e l = 2 × 2 ↓ C in = 10 , C o u t = 20 , k er n e l = 5 ↓ Rule Layer ↓ k er n e l = 2 × 2 ↓ f in = 320 , f o u t = 10 ↓ Output ( ba t c h , 1 , 28 , 28 ) ( ba t c h , 10 , 24 , 24 ) ( ba t c h , 10 , 24 , 24 ) ( ba t c h , 10 , 12 , 12 ) ( ba t c h , 20 , 8 , 8 ) ( ba t c h , 20 , 8 , 8 ) ( ba t c h , 20 , 4 , 4 ) → ( ba t c h , 320 ) ( ba t c h , 10 )
模型的代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 import torchimport torch.nn.functional as F class Net (torch.nn.Module): def __init__ (self ): super (Net, self).__init__() self.conv1 = torch.nn.Conv2d(1 , 10 , kernel_size=5 ) self.conv2 = torch.nn.Conv2d(10 , 20 , kernel_size=5 ) self.pooling = torch.nn.MaxPool2d(2 ) self.fc = torch.nn.Linear(320 , 10 ) def forward (self, x ): batch_size = x.size(0 ) x = self.pooling(F.relu(self.conv1(x))) x = self.pooling(F.relu(self.conv2(x))) x = x.view(batch_size, -1 ) x = self.fc(x) return x
将上面的 Net 代码整个替换掉之前的 FCN 代码中的 Net 就可以跑通了
CNN on MNIST
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 import torchfrom torchvision import transforms from torchvision import datasetsfrom torch.utils.data import DataLoaderimport torch.nn.functional as F import torch.optim as optim batch_size = 64 transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307 ,),(0.3081 ,)) ]) train_dataset = datasets.MNIST(root='../dataset/mnist/' , train=True , download=False , transform=transform) test_dataset = datasets.MNIST(root='../dataset/mnist/' , train=False , download=False , transform=transform) train_loader = DataLoader(train_dataset, shuffle=True , batch_size=batch_size) test_loader = DataLoader(test_dataset, shuffle=False , batch_size=batch_size) class Net (torch.nn.Module): def __init__ (self ): super (Net, self).__init__() self.conv1 = torch.nn.Conv2d(1 , 10 , kernel_size=5 ) self.conv2 = torch.nn.Conv2d(10 , 20 , kernel_size=5 ) self.pooling = torch.nn.MaxPool2d(2 ) self.fc = torch.nn.Linear(320 , 10 ) def forward (self, x ): batch_size = x.size(0 ) x = self.pooling(F.relu(self.conv1(x))) x = self.pooling(F.relu(self.conv2(x))) x = x.view(batch_size, -1 ) x = self.fc(x) return x model = Net() criterion = torch.nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=0.01 , momentum=0.5 ) def train (epoch ): running_loss = 0.0 for batch_idx, data in enumerate (train_loader, 0 ): inputs, target = data optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, target) loss.backward() optimizer.step() running_loss += loss.item() if batch_idx % 300 == 299 : print ('[%d, %5d] loss: %.3f' % (epoch + 1 , batch_idx + 1 , running_loss / 300 )) running_loss = 0.0 def test (): correct = 0 total = 0 with torch.no_grad(): for data in test_loader: images, labels = data outputs = model(images) _, predicted = torch.max (outputs.data, dim=1 ) total += labels.size(0 ) correct += (predicted == labels).sum ().item() print ('Accuracy on test set: %d %%' % (100 * correct / total)) if __name__=='__main__' : for epoch in range (10 ): train(epoch) test()