I’ve been trying for some time to learn and actually understand how Backpropagation (aka backward propagation of errors) works and how it trains the neural networks. Since I encountered many problems while creating the program, I decided to write this tutorial and also add a completely functional code that is able to learn the XOR gate.

Since it’s a lot to explain, I will try to stay on subject and talk only about the backpropagation algorithm.

## 1. What is Backpropagation?

Backpropagation is a supervised-learning method used to train neural networks by adjusting the weights and the biases of each neuron.

Important: do NOT train for only one example, until the error gets minimal then move to the next example - you have to take each example once, then start again from the beginning.

Steps:

1. forward propagation - calculates the output of the neural network
2. back propagation - adjusts the weights and the biases according to the global error

In this tutorial I’ll use a 2-2-1 neural network (2 input neurons, 2 hidden and 1 output). Keep an eye on this picture, it might be easier to understand.

## 2. How it works?

1. initialize all weights and biases with random values between 0 and 1
2. calculate the output of the network
3. calculate the global error
4. adjust the weights of the output neuron using the global error
5. calculate the hidden neurons’ errors (split the global error)
6. adjust the hidden neurons’ weights using their errors
7. go to step 2) and repeat this until the error gets minimal

## 3. Some math…

As any neural network requires an activation function, we’ll use sigmoid activation. The main idea is to adjust that function so it will produce the correct output (and the minimum error). This is done by modifying the weights and the biases.

Its graph looks like this (note that the output values range from 0 to 1)

Sigmoid formulas that we’ll use (where f(x) is our sigmoid function)

1) Basic sigmoid function:
$f(x) = \frac{1}{1+e^{-x}}$

2) Sigmoid Derivative (its value is used to adjust the weights using gradient descent):
$f'(x) = f(x)(1-f(x))$

Backpropagation always aims to reduce the error of each output. The algorithm knows what output is correct when the error is getting under a threshold.

For a better understanding of this, take a look at the graph below which shows the error, based on the output:

I won’t dive into the gradient descent method, as I wrote a separate article that contains both theory and examples.

## 4. Formulas

Calculate the output of a neuron (f is the sigmoid function, f’ is the derivative of f, aka df/dx):
actualOutput = f(weights[0] * inputs[0] + weights[1] * inputs[1] + biasWeight)

Calculate the global error (error for the output neuron)
globalError = f’(output) * (desiredOutput - actualOutput)

Adjust the weights/bias of the output neuron
W13 += globalError * input13
W23 += globalError * input23
bias += globalError

Calculate the error for each hidden neuron
error1 = f’(x) * globalError * W13
error2 = f’(x) * globalError * W23

Adjust the weights of the hidden neurons

-» first hidden neuron
W11 += error1 * input11
W21 += error1 * input21
bias1 += error1;

-» second hidden neuron
W12 += error2 * input12
W22 += error2 * input22
bias2 += error2;

## 5. The code

The best part and also the easiest. There are many things backpropagation can do but as an example we can make it learn the XOR gate…since it’s so special.
I used 2 classes just to make everything more “visible” and OOP-ish.

Note: it requires about 2000 epochs to learn.

## 7. Wrong values?

Yep, this happens sometimes, when the algorithm gets stuck on the local minima: the algorithm thinks it has found the minimum error, it doesn’t know that the error could be even smaller.

This is usually solved by resetting the weights of the neural network and training again.