MNIST手写数字数据集介绍
参考:MNIST手写数据集的Python读取
网络结构设计
采用两层前馈网络结构, 输入层设置784个节点, 隐藏层设置30个节点, 输出层设置10个节点
正向传播
输入层
o i I = x i I , i = 1 , 2 , ⋯ , 784 o^I_i= x_i^I,\qquad i=1,2,\cdots,784
o i I = x i I , i = 1 , 2 , ⋯ , 784
隐藏层
x j J = ∑ i = 1 784 w j i J o i I + b j J o j J = σ ( x j J ) , j = 1 , 2 , ⋯ , 30 \begin{split}
x_j^J &= \sum_{i=1}^{784} w_{ji}^Jo_i^I + b_j^J\\
o_j^J &= \sigma(x_j^J)
\end{split}
,\qquad j=1,2,\cdots,30
x j J o j J = i = 1 ∑ 784 w ji J o i I + b j J = σ ( x j J ) , j = 1 , 2 , ⋯ , 30
[ o 1 J o 2 J ⋮ o 30 J ] = σ ( [ w 11 J w 12 J ⋯ w 1 , 784 J w 21 J w 22 J ⋯ w 2 , 784 J ⋮ ⋮ ⋱ ⋮ w 30 , 1 J w 30 , 2 J ⋯ w 30 , 784 J ] [ o 1 I o 2 I ⋮ o 784 I ] + [ b 1 J b 2 J ⋮ b 30 J ] ) \begin{bmatrix}
o_1^J\\ o_2^J \\ \vdots \\o_{30}^J
\end{bmatrix} =
\sigma\left(
\begin{bmatrix}
w_{11}^J & w_{12}^J & \cdots & w_{1,784}^J\\
w_{21}^J & w_{22}^J & \cdots & w_{2,784}^J\\
\vdots & \vdots & \ddots & \vdots \\
w_{30,1}^J & w_{30,2}^J & \cdots & w_{30,784}^J\\
\end{bmatrix}
\begin{bmatrix}
o_1^I\\ o_2^I \\ \vdots \\o_{784}^I
\end{bmatrix}
+
\begin{bmatrix}
b_1^J\\ b_2^J \\ \vdots \\b_{30}^J
\end{bmatrix}
\right)
o 1 J o 2 J ⋮ o 30 J = σ w 11 J w 21 J ⋮ w 30 , 1 J w 12 J w 22 J ⋮ w 30 , 2 J ⋯ ⋯ ⋱ ⋯ w 1 , 784 J w 2 , 784 J ⋮ w 30 , 784 J o 1 I o 2 I ⋮ o 784 I + b 1 J b 2 J ⋮ b 30 J
输出层
x k K = ∑ j = 1 30 w k j K o j J + b k K o k K = σ ( x k K ) , k = 1 , 2 , ⋯ , 10 \begin{split}
x_k^K &= \sum_{j=1}^{30} w_{kj}^Ko_j^J+b_k^K \\
o_k^K &= \sigma(x_k^K)
\end{split}
,\qquad k=1,2,\cdots,10
x k K o k K = j = 1 ∑ 30 w kj K o j J + b k K = σ ( x k K ) , k = 1 , 2 , ⋯ , 10
[ o 1 K o 2 K ⋮ o 10 K ] = σ ( [ w 11 K w 12 K ⋯ w 1 , 30 K w 21 K w 22 K ⋯ w 2 , 30 K ⋮ ⋮ ⋱ ⋮ w 10 , 1 K w 10 , 2 K ⋯ w 10 , 30 K ] [ o 1 J o 2 J ⋮ o 30 J ] + [ b 1 K b 2 K ⋮ b 10 K ] ) \begin{bmatrix}
o_1^K\\ o_2^K \\ \vdots \\o_{10}^K
\end{bmatrix} =
\sigma\left(
\begin{bmatrix}
w_{11}^K & w_{12}^K & \cdots & w_{1,30}^K\\
w_{21}^K & w_{22}^K & \cdots & w_{2,30}^K\\
\vdots & \vdots & \ddots & \vdots \\
w_{10,1}^K & w_{10,2}^K & \cdots & w_{10,30}^K\\
\end{bmatrix}
\begin{bmatrix}
o_1^J\\ o_2^J \\ \vdots \\o_{30}^J
\end{bmatrix}
+
\begin{bmatrix}
b_1^K\\ b_2^K \\ \vdots \\b_{10}^K
\end{bmatrix}
\right)
o 1 K o 2 K ⋮ o 10 K = σ w 11 K w 21 K ⋮ w 10 , 1 K w 12 K w 22 K ⋮ w 10 , 2 K ⋯ ⋯ ⋱ ⋯ w 1 , 30 K w 2 , 30 K ⋮ w 10 , 30 K o 1 J o 2 J ⋮ o 30 J + b 1 K b 2 K ⋮ b 10 K
整体
o k K = σ ( ∑ j = 1 30 w k j K ( σ ( ∑ i = 1 784 w j i J o i I + b j J ) ) + b k K ) , k = 1 , 2 , ⋯ , 10 o_k^K = \sigma{\Huge(}\sum_{j=1}^{30} w_{kj}^K\Big(\sigma(\sum_{i=1}^{784} w_{ji}^Jo_i^I+b_j^J)\Big)+b_k^K{\Huge)}
,\qquad k=1,2,\cdots,10
o k K = σ ( j = 1 ∑ 30 w kj K ( σ ( i = 1 ∑ 784 w ji J o i I + b j J ) ) + b k K ) , k = 1 , 2 , ⋯ , 10
o K = σ ( W K o J + b K ) = σ ( W K [ σ ( W J x I + b J ) ] + b K ) \begin{split}
\boldsymbol{o}^K &= \sigma(W^K\boldsymbol{o}^J+\boldsymbol{b}^K)\\
&=\sigma\Big(W^K\big[\sigma(W^J\boldsymbol{x}^I+\boldsymbol{b}^J)\big]+\boldsymbol{b}^K\Big)
\end{split}
o K = σ ( W K o J + b K ) = σ ( W K [ σ ( W J x I + b J ) ] + b K )
误差函数
对于一个样本( [ x 1 , ⋯ , x 784 ] , [ t 1 , ⋯ , t 10 ] ) ([x_1,\cdots,x_{784}],[t_1,\cdots,t_{10}]) ([ x 1 , ⋯ , x 784 ] , [ t 1 , ⋯ , t 10 ]) 的输出[ o 1 , ⋯ , o 10 ] [o_1,\cdots,o_{10}] [ o 1 , ⋯ , o 10 ] , 采用平方和来计算误差
E = 1 2 ∑ k = 1 10 ( o k K − t k ) 2 E = \frac{1}{2}\sum_{k=1}^{10}(o_k^K-t_k)^2
E = 2 1 k = 1 ∑ 10 ( o k K − t k ) 2
反向传播
输出层梯度计算
∂ E ∂ w k j K = ∂ E ∂ o k K ⋅ ∂ o k K ∂ x k K ⋅ ∂ x k K ∂ w k j K = ( o k K − t k ) ⋅ o k K ( 1 − o k K ) ⋅ o j J k = 1 , 2 , ⋯ , 10 j = 1 , 2 , ⋯ , 30 \begin{gather*}
\begin{split}
\frac{\partial E}{\partial w_{kj}^K}
&=
\frac{\partial E}{\partial o_k^K}\cdot
\frac{\partial o_k^K}{\partial x_k^K}\cdot
\frac{\partial x_k^K}{\partial w_{kj}^K}\\
&=
(o_k^K - t_k) \cdot o_k^K(1-o_k^K) \cdot o_j^J \\
\end{split}
\\
k=1,2,\cdots,10\qquad j=1,2,\cdots,30
\end{gather*}
∂ w kj K ∂ E = ∂ o k K ∂ E ⋅ ∂ x k K ∂ o k K ⋅ ∂ w kj K ∂ x k K = ( o k K − t k ) ⋅ o k K ( 1 − o k K ) ⋅ o j J k = 1 , 2 , ⋯ , 10 j = 1 , 2 , ⋯ , 30
∂ E ∂ b k K = ∂ E ∂ o k K ⋅ ∂ o k K ∂ x k K ⋅ ∂ x k K ∂ b k K = ( o k K − t k ) ⋅ o k K ( 1 − o k K ) ⋅ 1 k = 1 , 2 , ⋯ , 10 \begin{gather*}
\begin{split}
\frac{\partial E}{\partial b_{k}^K}
&=
\frac{\partial E}{\partial o_k^K}\cdot
\frac{\partial o_k^K}{\partial x_k^K}\cdot
\frac{\partial x_k^K}{\partial b_{k}^K}\\
&=
(o_k^K - t_k) \cdot o_k^K(1-o_k^K) \cdot1\\
\end{split}
\\
k=1,2,\cdots,10
\end{gather*}
∂ b k K ∂ E = ∂ o k K ∂ E ⋅ ∂ x k K ∂ o k K ⋅ ∂ b k K ∂ x k K = ( o k K − t k ) ⋅ o k K ( 1 − o k K ) ⋅ 1 k = 1 , 2 , ⋯ , 10
隐藏层梯度计算
∂ E ∂ w j i J = ∑ k = 1 10 ∂ E ∂ o k K ⋅ ∂ o k K ∂ x k K ⋅ ∂ x k K ∂ o j J ⋅ ∂ o j J ∂ x j J ⋅ ∂ x j J ∂ w j i J = ∑ k = 1 10 ( o k K − t k ) ⋅ o k K ( 1 − o k K ) ⋅ w k j K ⋅ o j J ( 1 − o j J ) ⋅ o i I = ( o j J ( 1 − o j J ) ⋅ o i I ) ⋅ ( ∑ k = 1 10 ( o k K − t k ) ⋅ o k K ( 1 − o k K ) ⋅ w k j K ) j = 1 , 2 , ⋯ , 30 i = 1 , 2 , ⋯ , 784 \begin{gather*}
\begin{split}
\frac{\partial E}{\partial w_{ji}^J}
&=
\sum_{k=1}^{10}
\frac{\partial E}{\partial o_k^K}\cdot
\frac{\partial o_k^K}{\partial x_k^K}\cdot
\frac{\partial x_k^K}{\partial o_j^J}\cdot
\frac{\partial o_j^J}{\partial x_j^J}\cdot
\frac{\partial x_j^J}{\partial w_{ji}^J}
\\
&=
\sum_{k=1}^{10}
(o_k^K - t_k) \cdot o_k^K(1-o_k^K) \cdot w_{kj}^K \cdot o_j^J(1-o_j^J) \cdot o_i^I\\
&=
\Big(
o_j^J(1-o_j^J) \cdot o_i^I
\Big)
\cdot
\Big(
\sum_{k=1}^{10}(o_k^K - t_k) \cdot o_k^K(1-o_k^K) \cdot w_{kj}^K
\Big)
\end{split}
\\
j=1,2,\cdots,30\qquad i=1,2,\cdots,784
\end{gather*}
∂ w ji J ∂ E = k = 1 ∑ 10 ∂ o k K ∂ E ⋅ ∂ x k K ∂ o k K ⋅ ∂ o j J ∂ x k K ⋅ ∂ x j J ∂ o j J ⋅ ∂ w ji J ∂ x j J = k = 1 ∑ 10 ( o k K − t k ) ⋅ o k K ( 1 − o k K ) ⋅ w kj K ⋅ o j J ( 1 − o j J ) ⋅ o i I = ( o j J ( 1 − o j J ) ⋅ o i I ) ⋅ ( k = 1 ∑ 10 ( o k K − t k ) ⋅ o k K ( 1 − o k K ) ⋅ w kj K ) j = 1 , 2 , ⋯ , 30 i = 1 , 2 , ⋯ , 784
∂ E ∂ b j J = ∑ k = 1 10 ∂ E ∂ o k K ⋅ ∂ o k K ∂ x k K ⋅ ∂ x k K ∂ o j J ⋅ ∂ o j J ∂ x j J ⋅ ∂ x j J ∂ b j J = ∑ k = 1 10 ( o k K − t k ) ⋅ o k K ( 1 − o k K ) ⋅ w k j K ⋅ o j J ( 1 − o j J ) ⋅ 1 = o j J ( 1 − o j J ) ⋅ ( ∑ k = 1 10 ( o k K − t k ) ⋅ o k K ( 1 − o k K ) ⋅ w k j K ) j = 1 , 2 , ⋯ , 30 \begin{gather*}
\begin{split}
\frac{\partial E}{\partial b_j^J}
&=
\sum_{k=1}^{10}
\frac{\partial E}{\partial o_k^K}\cdot
\frac{\partial o_k^K}{\partial x_k^K}\cdot
\frac{\partial x_k^K}{\partial o_j^J}\cdot
\frac{\partial o_j^J}{\partial x_j^J}\cdot
\frac{\partial x_j^J}{\partial b_j^J}
\\
&=
\sum_{k=1}^{10}
(o_k^K - t_k) \cdot o_k^K(1-o_k^K) \cdot w_{kj}^K \cdot o_j^J(1-o_j^J) \cdot 1\\
&=
o_j^J(1-o_j^J)
\cdot
\Big(
\sum_{k=1}^{10}(o_k^K - t_k) \cdot o_k^K(1-o_k^K) \cdot w_{kj}^K
\Big)
\end{split}
\\
j=1,2,\cdots,30
\end{gather*}
∂ b j J ∂ E = k = 1 ∑ 10 ∂ o k K ∂ E ⋅ ∂ x k K ∂ o k K ⋅ ∂ o j J ∂ x k K ⋅ ∂ x j J ∂ o j J ⋅ ∂ b j J ∂ x j J = k = 1 ∑ 10 ( o k K − t k ) ⋅ o k K ( 1 − o k K ) ⋅ w kj K ⋅ o j J ( 1 − o j J ) ⋅ 1 = o j J ( 1 − o j J ) ⋅ ( k = 1 ∑ 10 ( o k K − t k ) ⋅ o k K ( 1 − o k K ) ⋅ w kj K ) j = 1 , 2 , ⋯ , 30
向量化计算梯度
计算输出层的误差项 δ K → δ k K = ( o k K − t k ) ⋅ o k K ( 1 − o k K ) , k = 1 , 2 , ⋯ , 10 输出层偏置向量梯度 → ∂ E ∂ b k K = δ k , k = 1 , 2 , ⋯ , 10 输出层系数矩阵梯度 → ∂ E ∂ w k j K = δ k K ⋅ o j J , k = 1 , 2 , ⋯ , 10 j = 1 , 2 , ⋯ , 30 = [ δ 1 K δ 2 K ⋮ δ 10 K ] [ o 1 J o 2 J ⋯ o 30 J ] 计算隐藏层的误差项 δ J → δ j J = o j J ( 1 − o j J ) ⋅ ( ∑ k = 1 10 δ k K ⋅ w k j K ) , j = 1 , 2 , ⋯ , 30 = [ o 1 J ( 1 − o 1 J ) ⋯ 0 ⋮ ⋱ ⋮ 0 ⋯ o 30 J ( 1 − o 30 J ) ] [ w 1 , 1 K ⋯ w 1 , 30 K ⋮ ⋱ ⋮ w 10 , 1 K ⋯ w 10 , 30 K ] T [ δ 1 K ⋮ δ 10 K ] 隐藏层偏置向量梯度 → ∂ E ∂ b j J = δ j J , j = 1 , 2 , ⋯ , 30 隐藏层系数矩阵梯度 → ∂ E ∂ w j i J = δ j J ⋅ o i I , j = 1 , 2 , ⋯ , 30 i = 1 , 2 , ⋯ , 784 = [ δ 1 J δ 2 J ⋮ δ 30 J ] [ o 1 I o 2 I ⋯ o 784 I ] \begin{split}
计算输出层的误差项\delta^K\rightarrow
\delta_k^K &= (o_k^K - t_k) \cdot o_k^K(1-o_k^K),\qquad k=1,2,\cdots,10
\\
输出层偏置向量梯度\rightarrow
\frac{\partial E}{\partial b_k^K} &= \delta_k, \qquad k=1,2,\cdots,10
\\
输出层系数矩阵梯度\rightarrow
\frac{\partial E}{\partial w_{kj}^K} &= \delta_k^K \cdot o_j^J, \qquad k=1,2,\cdots,10\quad j=1,2,\cdots,30
\\
&= \begin{bmatrix}\delta^K_1\\\delta^K_2\\\vdots\\\delta^K_{10}\end{bmatrix}
\begin{bmatrix} o^J_1 & o^J_2 & \cdots & o^J_{30}\end{bmatrix}
\\\\
计算隐藏层的误差项\delta^J\rightarrow
\delta_j^J &=
o_j^J(1-o_j^J) \cdot
\Big(\sum_{k=1}^{10}\delta_k^K \cdot w_{kj}^K\Big),\qquad
j=1,2,\cdots,30
\\
&=
\begin{bmatrix}
o_1^J(1-o_1^J)&\cdots&0\\ \vdots&\ddots &\vdots\\ 0 &\cdots&o_{30}^J(1-o_{30}^J)
\end{bmatrix}
\begin{bmatrix}
w_{1,1}^K&\cdots&w_{1,30}^K
\\ \vdots&\ddots&\vdots \\
w_{10,1}^K&\cdots&w_{10,30}^K
\end{bmatrix}^T
\begin{bmatrix}
\delta^K_1\\ \vdots\\ \delta^K_{10}
\end{bmatrix}
\\
隐藏层偏置向量梯度\rightarrow
\frac{\partial E}{\partial b_j^J} &=
\delta_j^J,\qquad
j=1,2,\cdots,30
\\
隐藏层系数矩阵梯度\rightarrow
\frac{\partial E}{\partial w_{ji}^J} &=
\delta_j^J\cdot o_i^I,\qquad
j=1,2,\cdots,30\quad i=1,2,\cdots,784
\\
&= \begin{bmatrix}\delta^J_1\\\delta^J_2\\\vdots\\\delta^J_{30}\end{bmatrix}
\begin{bmatrix} o^I_1 & o^I_2 & \cdots & o^I_{784}\end{bmatrix}
\end{split}
计算输出层的误差项 δ K → δ k K 输出层偏置向量梯度 → ∂ b k K ∂ E 输出层系数矩阵梯度 → ∂ w kj K ∂ E 计算隐藏层的误差项 δ J → δ j J 隐藏层偏置向量梯度 → ∂ b j J ∂ E 隐藏层系数矩阵梯度 → ∂ w ji J ∂ E = ( o k K − t k ) ⋅ o k K ( 1 − o k K ) , k = 1 , 2 , ⋯ , 10 = δ k , k = 1 , 2 , ⋯ , 10 = δ k K ⋅ o j J , k = 1 , 2 , ⋯ , 10 j = 1 , 2 , ⋯ , 30 = δ 1 K δ 2 K ⋮ δ 10 K [ o 1 J o 2 J ⋯ o 30 J ] = o j J ( 1 − o j J ) ⋅ ( k = 1 ∑ 10 δ k K ⋅ w kj K ) , j = 1 , 2 , ⋯ , 30 = o 1 J ( 1 − o 1 J ) ⋮ 0 ⋯ ⋱ ⋯ 0 ⋮ o 30 J ( 1 − o 30 J ) w 1 , 1 K ⋮ w 10 , 1 K ⋯ ⋱ ⋯ w 1 , 30 K ⋮ w 10 , 30 K T δ 1 K ⋮ δ 10 K = δ j J , j = 1 , 2 , ⋯ , 30 = δ j J ⋅ o i I , j = 1 , 2 , ⋯ , 30 i = 1 , 2 , ⋯ , 784 = δ 1 J δ 2 J ⋮ δ 30 J [ o 1 I o 2 I ⋯ o 784 I ]
Python 代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 import numpy as npimport randomdef sigmoid (x ): return 1.0 /(1.0 +np.exp(-x)) class MLP_np : def __init__ (self, sizes ): """ : param sizes: [784,30, 10] """ self.sizes = sizes ''' weights : (2,) <class 'list'> weights[0] : (30, 784) <class 'numpy.ndarray'> weights[1] : (10, 30) <class 'numpy.ndarray'> biases : (2,) <class 'list'> biases[0] : (30,) <class 'numpy.ndarray'> biases[1] : (10,) <class 'numpy.ndarray'> ''' self.weights = [np.random.randn(size_out, size_in) for size_in, size_out in zip (sizes[:-1 ], sizes[1 :])] self.biases = [np.random.randn(size_out, 1 ) for size_out in sizes[1 :]] def forward (self, x ): ''' 一次处理一个样本 :parm x: [784, 1] :return: [10, 1] ''' o = x for w,b in zip (self.weights, self.biases): x = np.dot(w, o) + b o = sigmoid(x) return o def backprop (self, x, t ): ''' :parm x: [784, 1] :parm t: [10, 1], one_hot encoding ''' nabla_w = [np.zeros(w.shape) for w in self.weights] nabla_b = [np.zeros(b.shape) for b in self.biases] os = [x] xs = [] o = x for w, b in zip (self.weights, self.biases): x = np.dot(w, o) + b o = sigmoid(x) xs.append(x) os.append(o) loss = sum (np.power(os[-1 ]-t, 2 ))/2 ''' (30,784)@(784,1) => (10,30)@(30,1) => (10,1) os[-1]:(10,1) os[-2]:(30,1) os[-3]:(784,1) nable_b[-1]:(10,1) nable_b[-2]:(30,1) nable_w[-1]:(10,30) nable_w[-2]:(30,784) delta_K:(10,1) delta_J:(30,1) ''' delta_K = os[-1 ]*(1 -os[-1 ])*(os[-1 ]-t) nabla_b[-1 ] = delta_K nabla_w[-1 ] = np.dot(delta_K, os[-2 ].T) delta_J = os[-2 ]*(1 -os[-2 ])*np.dot(self.weights[-1 ].T, delta_K) nabla_b[-2 ] = delta_J nabla_w[-2 ] = np.dot(delta_J, os[-3 ].T) return nabla_w, nabla_b, loss def train (self, training_data, epochs, batch_size, lr, test_data ): ''' :param training_data: list of (x,t) :param epochs: 1000 :param batch_size: 10 :lr: 0.01 learning rate :test_data: list of (x,t) ''' n_test = len (test_data) n_train = len (training_data) for j in range (epochs): random.shuffle(training_data) mini_batches = [ training_data[k:k+batch_size] for k in range (0 , n_train - batch_size, batch_size) ] for batch in mini_batches: loss = self.update_mini_batch(batch, lr) if test_data: print ("Epoch {0}: {1} / {2}" .format (j, self.evaluate(test_data), n_test)) print ("Loss:{}" .format (loss)) else : print ("Epoch {0} complete" .format (j)) def update_mini_batch (self, batch, lr ): """ batch: list of (x,y) lr: 0.01 """ nabla_w = [np.zeros(w.shape) for w in self.weights] nabla_b = [np.zeros(b.shape) for b in self.biases] loss = 0 for x, t in batch: nabla_w_, nabla_b_, loss_= self.backprop(x, t) nabla_w[0 ] += nabla_w_[0 ] nabla_w[1 ] += nabla_w_[1 ] nabla_b[0 ] += nabla_b_[0 ] nabla_b[1 ] += nabla_b_[1 ] loss += loss_ nabla_w = [w/len (batch) for w in nabla_w] nabla_b = [b/len (batch) for b in nabla_b] loss = loss/len (batch) self.weights = [w - lr*nabla for w, nabla in zip (self.weights, nabla_w)] self.biases = [b - lr*nabla for b, nabla in zip (self.biases, nabla_b)] return loss def evaluate (self, test_data ): """ test_data: list of (x, t) """ result = [(np.argmax(self.forward(x)), (np.argmax(t))) for x, t in test_data] correct = sum ([int (pred == t) for pred, t in result]) return correct def main (): from mnist_np import get_dataset training_data, test_data = get_dataset() print (len (training_data), training_data[0 ][0 ].shape, training_data[0 ][1 ].shape) net = MLP_np([784 , 30 , 10 ]) net.train(training_data, 1000 , 10 , 0.01 , test_data=test_data) if __name__ == '__main__' : main()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 import structimport numpy as npimport matplotlib.pyplot as pltdef load_images (file_name ): binfile = open (file_name, 'rb' ) buffers = binfile.read() magic, num, rows, cols = struct.unpack_from('>IIII' , buffers, 0 ) Bytes = num * rows * cols images = struct.unpack_from('>' + str (Bytes) + 'B' , buffers, struct.calcsize('>IIII' )) binfile.close() images = np.reshape(images, [num, rows * cols]) return images def load_labels (file_name ): binfile = open (file_name, 'rb' ) buffers = binfile.read() magic, num = struct.unpack_from('>II' , buffers, 0 ) labels = struct.unpack_from('>' + str (num) + "B" , buffers, struct.calcsize('>II' )) binfile.close() labels = np.reshape(labels, [num]) return labels def onehot (label ): ''' label: (), <np.int32>, 取值0,1,2,...,9 ''' label_onehot = np.zeros([10 ,1 ]) label_onehot[label][0 ] = 1 ''' label_onehot: (10,1) <np.ndarray> ''' return label_onehot def get_dataset (): """ train_images: (60000, 784) <class 'numpy.ndarray'> train_labels: (60000,) <class 'numpy.ndarray'> test_images: (10000, 784) <class 'numpy.ndarray'> test_labels: (10000,) <class 'numpy.ndarray'> """ train_images = load_images('train-images.idx3-ubyte' ) train_labels = load_labels('train-labels.idx1-ubyte' ) test_images = load_images('t10k-images.idx3-ubyte' ) test_labels = load_labels('t10k-labels.idx1-ubyte' ) train_data = [] for image,label in zip (train_images, train_labels): ''' image: (784,) <class 'numpy.ndarray'> label: () <class 'numpy.int32'> ''' image = image[:,np.newaxis] label = onehot(label) ''' image: (784, 1) <class 'numpy.ndarray'> label: (10, 1) <class 'numpy.ndarray'> ''' train_data.append([image,label]) test_data = [] for image,label in zip (test_images, test_labels): ''' image: (784,) <class 'numpy.ndarray'> label: () <class 'numpy.int32'> ''' image = image[:,np.newaxis] label = onehot(label) ''' image: (784, 1) <class 'numpy.ndarray'> label: (10, 1) <class 'numpy.ndarray'> ''' test_data.append([image,label]) ''' train_data: (60000, 2) <class 'list'> test_data: (10000, 2) <class 'list'> ''' return train_data, test_data if __name__ == '__main__' : train_images = load_images('train-images.idx3-ubyte' ) train_labels = load_labels('train-labels.idx1-ubyte' ) test_images = load_images('t10k-images.idx3-ubyte' ) test_labels = load_labels('t10k-labels.idx1-ubyte' ) print (train_images.shape, type (train_images)) print (train_labels.shape, type (train_labels)) print (test_images.shape, type (test_images)) print (test_labels.shape, type (test_labels)) fig = plt.figure(figsize=(8 , 8 )) fig.subplots_adjust(left=0 , right=1 , bottom=0 , top=1 , hspace=0.05 , wspace=0.05 ) for i in range (30 ): images = np.reshape(train_images[i], [28 , 28 ]) ax = fig.add_subplot(6 , 5 , i+1 , xticks=[], yticks=[]) ax.imshow(images, cmap=plt.cm.binary, interpolation='nearest' ) ax.text(0 , 7 , str (train_labels[i])) plt.show()