All right, people; now that I’m done with the projects & finals for…a couple of days…I thought it would be a good idea to keep an old promise and finally publish this article too. It’s about how to make a neural network that learns to play a simple game (Flappy Bird in this case) - and training it with a genetic algorithm. This way, maybe, I won’t have to answer any more comments on YouTube…

Btw, here’s a short demo:

Hopefully I can still understand the code that I wrote ~7 months ago. Yep.

Basics

Well, to better understand this concept, a good starting point is to think of how you play this game: what are the factors that you take into account? How do you know when the bird should jump?

Pretty sure you are probably using a set of metrics like the following ones:

• horizontal distance between the bird and the closest set of pipes (dist1)
• vertical distance between the bird and the lower pipe (dist2)
• verctical distance between the bird and the upper pipe (dist3)

What we actually want now is a function that takes these 3 parameters and has 1 output (because the whole game can be resumed to a single command).

Considering we don’t know the relationship between the 3 distances, we can try to approximate the behaviour of this function using a neural network.

In my project I used a network with:

• 3 input neurons
• 2 hidden ones
• 1 output neuron.

I’m going to presume you know the basics of multilayer neural networks - especially the forward propagation (y’know, that part where you multiply each input with the corresponding weight, then sum all of this and pass it to an activation function…and the result gets fed as an input to a neuron from the next layer). I’m using the matrix version in this code because it looks a little bit more “elegant”, but it’s the same formula, ok?

The training part

Ehm…this is tricky. We don’t have a training set - so…we can’t use the traditional gradient descent (we don’t know the derivatives of the error function) for our neural network.

The alternative is to empirically find a set of weights using a genetic algorithm. Basically you start with a generation of birds which are rather…dumb (they will either spam the jump command or not jump at all) - because the weights used by their neural networks are initially some random values. The goal is to simulate the evolutionary process (survival of the fittest) so you’ll obtain a generation of birds that will jump perfectly each time.

Here are the steps:

1. Start with N sets of random weights (that means N different networks) - this would be the first generation.
2. Allow each network to play until the bird hits a pipe and save the scores
3. Keep only the sets of weights that performed better than the others (first M from a list ordered by score) - these will make it to the next generation. Also keep the solutions with the highest score (this is called elitism, btw. - it’s needed because you may not know if your new solutions are better than the old ones)
4. Pick 2 sets from the new population of weights, encode them, apply crossover (also mutation) and then decode. Save the results and add them to the new generation too.
5. Go back to step 2 and repeat until you get decent results.

Fitness function

Now let’s talk a bit about the function that assigns the score; it’s quite an important part too, because it makes the difference between an algorithm that actually converges and one that just runs mindlessly and picks random / wrong solutions.

Usually it’s not a good idea to say that the in-game score is the actual score of a solution; the approach doesn’t offer a score with a good “granularity” so you can end up with the algorithm not making a difference between a bird that actually learned to go between pipes but hits one by mistake and another which just hit the ground (they’ll both have a 0 score).

It’s not wrong, indeed, but it takes waaay too much time for the algorithm to discover a better solution - we need to offer a few more hints.

So…we can also take into account:

• how long the bird stayed alive (provides more accurate scores)
• the distance between the bird’s last position and the center of the space between the pipes (because we prefer the birds that were actually close and didn’t try to go straight through the pipe).

The fitness function that I used looks like this (if the in-game score is greater than 0)

(game.time + game.score) / (1 + Math.Abs(game.floppy_bird.birdPosition.Y - centerPos) / 100) * 0.01f

and this one if the bird didn’t pass any group of pipes:

weightsList[crtIndex].fitness = (game.time + game.score);

Mutation & Crossover Rates / Population Size

Another important part too; crossover tries to “center” the population around the best solution so far (in an attempt to rise the average score of the generation) while mutation brings diversity by altering various genes (in our case weights), enabling the algorithm to discover better (or worse) solutions.

Anyway, the whole point is to keep these in a balance; values that are too high or too low will not lead to convergence. Also a population size that is too large will just slow down the whole process but going with a small number of candidates could limit your search domain - so there’s a chance of getting stuck in an unsatisfying local optimum.

In my code I used a CROSSOVER_RATE of 0.8 and a MUTATION_RATE of 0.05, with POPULATION_SIZE = 25.

I know this might look boring (it’s already written in the code snippet, right?) but I felt that it should be mentioned here just in case someone enounters this exact problem. This and a wrong fitness function are usually the main reasons the algorithm might not work as expected.

It is fair to also mention this; there might be cases when a good solution fails (the bird hits one of the first pipes) and is thrown away (lost) - and the algorithm kind of falls back. I guess it’s probably caused by a particular set of distances that won’t make the neural network trigger the jump function (considering that the pipes are random, there might be a chance).

Might be a good idea to make some kind of average between the previous scores and the current score. Never tried it though…

Aand…this project also uses some fancy mutex synchronization between processes - so multiple instances of the same game can share the best solution between them using a memory mapped file (lazy parallelization as I like to call it). Wrote an article about using memory mapped files, you can find it here: /tips-and-tricks/c-send-data-between-processes-w-memory-mapped-file.

TODOs

Finally some small improvements that could prove useful; they’re not included in the code but implementing them might boost the performance:

• picking weights for crossover using a weighted random function (roulette wheel selection); this way better solutions have a higher chance to create new candidates for the next generation
• weighted crossovers - this implies that instead of a simple arithmetic mean between the weights you should assign more “importance” to the solution which has a higher score (weighted arithmetic mean).
• adding a few random weights in each new generation (boosting diversity).

Sourcecode

Ok, enough of me talking; I know this is the part that gets the most attention - that’s why I’m always writing it at the end :P

// I’m publishing only the part that is relevant - because I consider that having a wall of code attached in an article looks rather bad from the reader’s point of view.

Auxiliary files

The DataSharer class: