问题描述

Github地址

1
2
3
4
5
6
7
8
9
10
11
0.8 0.8 0
0.6 0.6 0
0.4 0.4 0
.........
0.2 0.2 0
.........
1.0 0.8 1
1.0 0.6 1
0.12 0.23 1
.........
0.89 0.99 1

这是一个数据集, 它的第一项和第二项是输入x1,x2x_1,x_2, 第三项是输出yy, 构建一个前馈神经网路模型并采用反向传播算法训练它的参数.

前馈神经网路的数学描述

bpnn

激活函数

f(x)=11+exf(x) = \frac{1}{1+e^{-x}}

f(x)=ex(1+ex)2=(11+ex)(111+ex)=f(x)(1f(x))\begin{split} f'(x) &= \frac{e^{-x}}{(1+e^{-x})^2} \\ & = (\frac{1}{1+e^{-x}})(1-\frac{1}{1+e^{-x}}) \\ & = f(x)(1-f(x)) \end{split}

输入层

输入层输入有x1x_1x2x_2

中间层

  • 中间层输入q1,q2,q3q_1,q_2,q_3

q1=x1ω11+x2ω12β1q2=x1ω21+x2ω22β2q3=x1ω31+x2ω32β3\begin{split} q_1 = x_1\omega_{11}+x_2\omega_{12}-\beta_1 \\ q_2 = x_1\omega_{21}+x_2\omega_{22}-\beta_2 \\ q_3 = x_1\omega_{31}+x_2\omega_{32}-\beta_3 \end{split}

  • 中间层输出h1,h2,h3h_1,h_2,h_3

h1=f(q1)=f(x1ω11+x2ω12β1)h2=f(q2)=f(x1ω21+x2ω22β2)h3=f(q3)=f(x1ω31+x2ω32β3)\begin{split} h_1 = f(q_1) =f(x_1\omega_{11}+x_2\omega_{12}-\beta_1)\\ h_2 = f(q_2) =f(x_1\omega_{21}+x_2\omega_{22}-\beta_2)\\ h_3 = f(q_3) =f(x_1\omega_{31}+x_2\omega_{32}-\beta_3)\\ \end{split}

输出层

  • 输出层输入uu

u=h1v1+h2v2+h3v3λu = h_1v_1+h_2v_2+h_3v_3 - \lambda

  • 输出层输出y^\hat{y}

y^=f(u)=f(h1v1+h2v2+h3v3λ)\hat{y} = f(u) = f(h_1v_1+h_2v_2+h_3v_3 - \lambda)

损失函数

yy ---- 真实值

y^\hat{y} ---- 预测值

Loss=12(yy^)2Loss = \frac{1}{2}(y-\hat{y})^2

反向传播梯度下降法

中间层到输出层的参数更新

η\eta ---- 学习率(步长)

Δλ=ηLossλ=ηLossy^y^uuλ=η(yy^)y^(1y^)(1)\begin{split} \Delta\lambda &= -\eta\frac{\partial Loss}{\partial \lambda} \\ &= -\eta \frac{\partial Loss}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial u} \frac{\partial u}{\partial \lambda} \\ &= \eta(y-\hat{y})\hat{y}(1-\hat{y})(-1) \end{split}

Δvi=ηLossvi=ηLossy^y^uuvi=η(yy^)y^(1y^)(hi)i=1,2,3\begin{split} \Delta v_i &= -\eta\frac{\partial Loss}{\partial v_i} \\ &=-\eta \frac{\partial Loss}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial u} \frac{\partial u}{\partial v_i} \\ &=\eta(y-\hat{y})\hat{y}(1-\hat{y})(h_i) \\ \\ & \qquad i = 1,2,3 \end{split}

输入层到中间层的参数更新

Δβi=ηLossβi=ηLossy^y^uuhihiqiqiβi=η(yy^)y^(1y^)vihi(1hi)(1)i=1,2,3\begin{split} \Delta\beta_i &= -\eta\frac{\partial Loss}{\partial \beta_i} \\ &= -\eta \frac{\partial Loss}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial u} \frac{\partial u}{\partial h_i} \frac{\partial h_i}{\partial q_i} \frac{\partial q_i}{\partial \beta_i} \\ &= \eta(y-\hat{y})\hat{y}(1-\hat{y})v_ih_i(1-h_i)(-1) \\ \\ & \qquad i = 1,2,3 \end{split}

Δωij==ηLossωij=ηLossy^y^uuhihiqiqiωij=η(yy^)y^(1y^)vihi(1hi)(xj)i=1,2,3j=1,2\begin{split} \Delta \omega_{ij} = &= -\eta\frac{\partial Loss}{\partial \omega_{ij}} \\ &= -\eta \frac{\partial Loss}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial u} \frac{\partial u}{\partial h_i} \frac{\partial h_i}{\partial q_i} \frac{\partial q_i}{\partial \omega_{ij}} \\ &= \eta(y-\hat{y})\hat{y}(1-\hat{y})v_ih_i(1-h_i)(x_j) \\ \\ & \qquad i = 1,2,3 \qquad j=1,2 \end{split}