写在前面

此文是我学习AI入门的笔记。学习教材《neural networks and deep learning》，作者Michael Nielsen。
这是一本免费的书籍，网址在这里
这是我对第二章的反向传播算法的总结笔记，请看完第二章之后再来看这篇文章。
第一章的学习笔记在这里，会对理解此篇有帮助。
初学者入门，可能会有错误，请大家指正。

概念准备

带权输入Z

z^l^=w^l^a^l^^-^^1^+b^l^，其中l为层数

激活值a

a^l^=σ（z^l^），其中σ为S型神经元的输出函数

误差

δ^L^~j~=$\frac{\partial C}{\partial z}$

四个基本方程

在这里插入图片描述

输出层误差方程BP1
其中右边第一项是二次代价函数对激活值的偏导数，由二次代价函数的表达式可知此式还可以写成这个形式
使用上一层的误差δ^L^^+^^1^来表示当前层的误差δ^L^方程BP2
我们可以把矩阵的转置近似于看做反向移动，它提供给了我们通过l层的误差反向传递回来给第l-1层的误差的计算方法。
代价函数关于网络中任意偏置的改变率方程BP3
给了我们由误差计算偏置改变率的方法。注意，此处体现出了反向传播算法的优点，可以同时计算所有的偏置改变率，而不用单独计算每一个的改变率。
代价函数关于网络中任意权重的改变率方程BP4
给了我们由误差计算偏置改变率的方法，同样提高了运算效率。

代码分析

注：英文注解为本书原本的文档注解，中文注解是我的注解。

class Network(object):
...
"""backprop函数实现反向传播算法"""
   def backprop(self, x, y):
        """Return a tuple "(nabla_b, nabla_w)" representing the
        gradient for the cost function C_x.  "nabla_b" and
        "nabla_w" are layer-by-layer lists of numpy arrays, similar
        to "self.biases" and "self.weights"."""
        #初始化b,w的偏导数，得到相应的结构，数值均为0
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        # feedforward
        #第一层的激活值等于输入值
        activation = x
        activations = [x] # list to store all the activations, layer by layer
        zs = [] # list to store all the z vectors, layer by layer
        #依次计算每层的带权输入和激活值
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        # backward pass
        #输出层误差delta
        delta = self.cost_derivative(activations[-1], y) * \
            sigmoid_prime(zs[-1])
        #使用公式BP3和BP4，由误差得到b和w的偏导
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        # Note that the variable l in the loop below is used a little
        # differently to the notation in Chapter 2 of the book.  Here,
        # l = 1 means the last layer of neurons, l = 2 is the
        # second-last layer, and so on.  It's a renumbering of the
        # scheme in the book, used here to take advantage of the fact
        # that Python can use negative indices in lists.
        #反向传播，得到每一层的误差，再得到每一层的偏导
        for l in xrange(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
        #返回本次迭代得到的b和w
        return (nabla_b, nabla_w)

...
#二次代价函数对激活值a求导
    def cost_derivative(self, output_activations, y):
        """Return the vector of partial derivatives \partial C_x /
        \partial a for the output activations."""
        return (output_activations-y) 
#z型神经元的输出函数
def sigmoid(z):
    """The sigmoid function."""
    return 1.0/(1.0+np.exp(-z))
#输出函数的导数
def sigmoid_prime(z):
    """Derivative of the sigmoid function."""
    return sigmoid(z)*(1-sigmoid(z))

总结

1.反向传播算法描述

输入x：为输入层设置对应的激活值a^1^
向前传播：对每个l=2，3，…，L，计算相同的带权输入和激活值
输出层误差：由公式BP1得到输出层误差
反向传播误差：由公式BP2，计算每一层的误差。
输出：代价函数的梯度由BP3和BP4两个公式得出。

2.对给定一个大小为m的mini-batch，计算相应的梯度

输入训练样本的集合
对每个训练样本x:设置对应的输入激活a，并执行反向传播算法。
梯度下降：对每个l=2,3,…,L，根据对上述m个样本求平均值后，依次更新每层的权重和偏置。