简简又单单的神经网络（1）-反向传播

既然有反向传播，就要先来说说前向传播；我们可以用如下式子表示。

$y = \sigma(Wx + b)$

其中 $W$ 是权重矩阵； $x$ 是输入向量； $b$ 是偏置向量； $\sigma$ 是激活函数。

其次，为了检查模型的表现，我们还需要一个损失函数来计算损失，比如利用MSE：

$\text{Loss} = \frac{1}{n} \sum_{i=1}^{n} (y_{true}^{(i)} - y_{pred}^{(i)})^2$

有了这些，现在我们可以来说说反向传播了。

首先，什么是反向传播？

我们要计算损失函数关于权重与偏差的梯度，来允许使用梯度下降进行更新。听起来很奇怪的话没关系，继续看下面的式子吧。

计算损失梯度：

$\frac{\partial \text{Loss}}{\partial y} = -\frac{2}{n}(y_{true} - y_{pred})$

相对于权重的梯度：

$\frac{\partial y}{\partial W} = \frac{\partial y}{\partial z} \cdot \frac{\partial z}{\partial W}$

其中 $z = Wx + b$ ，且

$\frac{\partial y}{\partial z} = \sigma'(z) = \sigma(z)(1 - \sigma(z))$

以及

$\frac{\partial z}{\partial W} = x$

进行了这些计算之后，最终，我们需要对权重进行更新（ $\eta$ 是学习率）：

$W_{\text{new}} = W - \eta \frac{\partial \text{Loss}}{\partial W}$

这样，我们就完成了反向传播。原理很简单，一句话概括就是“从错误中学习并改进”。既然理论存在，那么实践开始，让我们手搓一个反向传播吧。

首先，定义一个Tensor类。

class Tensor:
    def __init__(self, data, requires_grad=False):
        self.data = np.array(data)
        self.requires_grad = requires_grad
        self.grad = None
        self._backward = None

    def backward(self):
        if self._backward is not None:
            self._backward()

    def zero_grad(self):
        self.grad = None

然后，定义 $\text{Sigmoid}$ 函数以及它的导数（简化形式）（ $\text{Sigmoid}$ 函数就是我们上面提到的激活函数）。

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

之后，定义Layer类，我们在里面封装神经网络的一个层的功能。

class Layer:

    # 随机初始化层的权重与偏差
    def __init__(self, input_size, output_size):
        self.weights = np.random.rand(input_size, output_size)
        self.bias = np.random.rand(output_size)

    def forward(self, input_tensor):
        self.input = input_tensor.data
        self.output = sigmoid(np.dot(self.input, self.weights) + self.bias)

        # 保存反向函数
        def backward():
            self.grad_input = np.dot(self.output * (1 - self.output), self.weights.T) * input_tensor.grad
            self.grad_weights = np.dot(self.input.T, self.output * (1 - self.output))
            self.grad_bias = np.sum(self.output * (1 - self.output), axis=0)

            if input_tensor.requires_grad:
                input_tensor.grad = self.grad_input

        self._backward = backward
        return Tensor(self.output, requires_grad=False)

其中的三个梯度：

grad_input是输入梯度，计算输入对输出的影响，用于前一层的反向传播；grad_weights是权重梯度，计算每个权重应根据输出的误差改变多少；在grad_bias将偏差的梯度相加。如果input_tensor需要梯度，就会相应地更新。

最后，定义NeuralNetwork类。

class NeuralNetwork:
    def __init__(self, input_size, output_size):
        self.layer = Layer(input_size, output_size)

    def forward(self, x):
        return self.layer.forward(x)

    def backward(self, loss):
        loss.backward()
        # 使用梯度下降更新权重与偏差
        self.layer.weights -= 0.1 * self.layer.grad_weights
        self.layer.bias -= 0.1 * self.layer.grad_bias

到此为止，我们就完成了所有的工作了。总结一下就是，我们用前向传递输入数据，流经神经网络，使用层和激活函数 $\text{Sigmoid}$ 产生输出；然后我们计算损失并通过网络反向传播，再根据计算出的梯度更新权重和偏差，来尽可能减少预测的误差。这种迭代对于模型的训练至关重要。