For a game project I needed to recognize certain drawings and that preferably in javascript as it`s a web game. There are multiple solutions for tackling this problem and the use of a neural network is one of them!

Neural networks was for a long time something that I understood conceptually, but found very daunting to use. As a result experiences that I had with recognizing things were mostly built around support vector machines.

The problem was that at the moment of writing I didn’t found a suitable SVM JavaScript library that worked at the client side. I did however find some ANN libraries, so it was time to try tackling it once and for all.

But what are neural networks ?

An ANN (Artificial Neural Networks) is simply put a computational model that is inspired by biological neural networks that (among other things) can be used to recognize patterns or simple drawings in my case.

With that in mind it’s important to note that an ANN is just a model and not a complete simulation of how a brain works. But like a biological neuron the artificial version has some inputs (dendrites), a part (soma) that processes those inputs and certain outputs (axon).

biological neuron

The cool thing of ANN in general is that you don’t need to explicitly program it to learn to recognize patterns. It’s an adaptable system that can teach itself.

The perceptron

To keep it simple we will start with the perceptron which is a mathematical representation of a single neuron.

It’s invented in 1957 by Frank Rossenblatt and was funded by the American navy. At the time it was a machine which filled a whole room (Mark 1 perceptron) rather than being a pure software solution.

There was a lot of hyperbole around the capabilities of the perceptron, but the ability to learn and classify even simple images gave a rise to a huge interest in self-learning machines that could mimic human intelligence.

The embryo of an electronic computer that the American navy expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence

The New York Times (1959)

In diagram a single perceptron looks like this

perceptron

In this example we have a neuron with 2 inputs. Each input (x) has a weights (w) associated with it.

Weights are floating numbers (positive and negative) and it refers to the strength of the individual signals. The higher a weight, the stronger an input is.

Our artificial neuron also has a part which processes our inputs and signals an output. The first thing it does is take the sum of all results of inputs multiplied by their associated weights.

input multiplied with their weights

Then it uses an activation function to determine what it will output. There are multiple activation functions you can use, but to keep it simple we will focus on the use of a step function.

step function

This activation function is quite simple to explain, from the moment a given threshold is reached (activated) it will signal a 1 otherwise a 0. In this case if the weighted sum of our inputs is bigger than 0 our neuron will signal a 1 otherwise a 0.

It is not always the case of signaling a 1 or 0 as an artificial neuron can output any (positive or negative) number depending on the kind of activation function that is in use.

Logic gates

Believe it or not but now we have already enough extra knowledge to build a single neuron system that can function as a logic gate!

We will start off with the NOT gate

not gate

We set our threshold to -0.5 and our weight to -1. When we feed our neuron with a value of 1 (input) and multiply it with -1 (weight), we will get a result of -1. As -1 is smaller than -0.5 (threshold) the condition is false so it will output a 0.

When we feed it a 0 (input) and multiply that with -1 (weight) we will get a result of 0. As 0 is bigger than -0.5 (threshold) the condition is true and will output a 1. Really simple!

The next gate that we will be creating is the OR gate.

or gate

The functional difference with our NOT gate is that here we calculate the sum of the results of our inputs multiplied with their weights and that before comparing it against our threshold.

We set our threshold to 0.5 and our weights to 0.6. The first input we provide a 1 which we multiply with 0.6 (weight) to get the result of 0.6. The second input we provide a zero which multiplied with our weight (0.6) results in a 0. When we combine those results we will get a sum of 0.6. As 0.6 is bigger than 0.5 (threshold) the condition is true and output a 1.

When we provide both inputs with a 0, multiply it with their weights and sum that up we will get a 0. As 0 is smaller than 0.5 (threshold) the condition is false and outputs a 0.

The last logic gate we are going to build is the AND gate.

and gate

We set our threshold to 1 and our weights to 0.6. When we provide bot inputs with a 1, multiply it by their weights and combine them we will get a 1.2 as a result. As 1.2 is bigger than our threshold which is 1, the condition is true and outputs a 1.

When we provide one input with a one and the other one with a zero, multiply it with their weights and combine them we will get a 0.6 as a result. As 0.6 is smaller than 1, the condition is false and output a 0.

Multilayer network

In the previous examples we have only worked with a single neuron and manually adjusted our weights to recognize certain logical patterns.

Unfortunately this simply won’t work from the moment you deal with complex patterns. Recognizing (even simple) drawings is one of those cases you will need a complex network also called a Multilayer Network.

A Multilayer network exists out of one or more input nodes, one or more hidden layers and one of multiple output nodes. In a diagram a Multilayer looks like this

ml

Sadly there isn’t a magic formula that determines how many hidden layers or how many nodes you need to have in those hidden layers. There are some guidelines you could follow but in the end it’s also a bit trial and error.

Learning by using back-propagation

Dealing with multiple nodes also means that it is extremely difficult to calculate all the weights by hand. We need an algorithm that could try some weights and adjusts them until at a certain point our neural network is able to recognize our patterns.

Fortunately there is a learning algorithm called back-propagation that does that. Simply put it starts with random weights, compares the result with the desired output (which it can take from training data) and feeds this back to previous layers which adjust their weights accordingly.

This is then repeated multiple (sometimes a hundred thousand times and more) until we get the desired minimum error (MSE) we have defined. This is the part which takes the most time while training a neural network. It’s also important to know that we won’t be able to build a neural network that is always 100% correct.

I’m not going into all the mathematical details because there are resources out there that explains back-propagation better than I personally could. Fortunately you don’t need to know all the lower mathematical formula’s to be able to use an ANN!

Synaptic.js

I already mentioned there are multiple JavaScript ANN solutions available. In the end I settled on synaptic.js as the documentation was good and the code looked clean.

To get a feel for this library we will still be following the path of logic gates and build a XOR gate

xor gate

The first step is creating a perceptron network with 2 inputs, 2 hidden layers and 1 output

JavaScript

var xorNetwork = new Architect.Perceptron(2, 2, 1);

We also need to have some training data which our network can learn from

JavaScript

var xorTrainingSet = [
  {
    input: [0,0],
    output: [0]
  },
  {
    input: [0,1],
    output: [1]
  },
  {
    input: [1,0],
    output: [1]
  },
  {
    input: [1,1],
    output: [0]
  },
]

The next logical step is to train our network

JavaScript

trainer.train(xorTrainingSet,{
    error: .005,
    log: 1000,
    cost: Trainer.cost.CROSS_ENTROPY
});

The minimal error we are targeting here is .005 , the log parameter commands the trainer to console.log the error and iterations every X number of iterations and is set at 1000.

The cost function is a function that returns a number representing how well the neural network performed, in this case we are using cross entropy.

When running this code and viewing the console output, we could see something like the following

JavaScript

...
iterations 2000 error 0.013387428689509984 rate 0.2
iterations 3000 error 0.004284549328391866 rate 0.2

In this case it will stop at iteration 3000 as the error is smaller than our targeted minimal error of .005. We should now be able to classify some input.

JavaScript

console.log(xorNetwork.activate([0, 1])); // 0.99...
console.log(xorNetwork.activate([1, 0])); // 0.99...
console.log(xorNetwork.activate([0, 0])); // 0.006...

As you can see the output will never be a clear cut 1 or 0 but it gives a strong indication of what it’s seeing. In only a couple of lines we have build a JavaScript neuron that can solve a XOR gate!

From images to input for our neurons

The thing that I needed for my web game was being able to recognize simple drawings. For this i’ve chosen the same strategy for recognizing numbers i.e. using individual pixel values.

Each individual pixel will be an input of our neural network.

input image

We are working in a JavaScript environment, so unfortunately we don’t have the luxury of being able to use OpenCV excellent utility methods. Luckily it’s not that difficult as long as we follow certain steps.

One of those steps is to use black and white as we only are interested in the shape of the drawings and not the colors that were used. Working in B&W gives us some advantages when we need to normalize our pixel values to a value between 1.0 and 0.0.

As the Red, Green and Blue (RGB) values are equal for all the gray colors between black (255,255,255) and white (0,0,0) we can easily drop 2 channels and just focus on a single channel.

So how do we normalize that data ? Quite easily if you use the following formula

formula

As the min value in this case is zero (255-0) we can drop it and just divide our values by 255. So 255 becomes 1.0, 128 becomes 0.5 and so on.

scale

Reducing our number of inputs with pica

An extra step before training or classifying the drawing is resizing our drawing to reduce the number of inputs we are dealing with. An image with the dimensions of 250x250 will otherwise result in 62.500 (!) input nodes. Training these kinds of big networks would take a lot of time, too long on most machines.

For this I found a solution in Pica, a fast JavaScript library that does quality resizing of images but can also work with a Canvas element. Resizing a canvas to an output canvas (which we will be using to retrieve training and classify data) is a simple as using the next snippet.

JavaScript

Pica.resizeCanvas(sourceCanvas, targetCanvas, {quality: 3});

Retrieving those pixels is easy when using the following code that uses the getImageData function.

JavaScript

let normalizedPixels = [];
let pixelsData = targetCanvasCtx.getImageData(
  0, 0,
  targetCanvas.width, targetCanvas.height
).data;

// iterate over pixel data
for(let i=0; i<pixelsData.length; i+=4) {
   let red = pixelsData[i];  // get red channel value
   normalizedPixels.push(red / 255.0); // normalize and store values
}

A training and playground environment

I get a lot of inquiries in regard to the datasets that I created and used in previous machine learning articles or experiments. I also wanted to have some kind of web app where I could quickly create sets, train and test them.

So it looked like a good idea to create an environment where I could just do that.

environment

It’s now targeted to tic-tac-toe but I developed it in such a way that is easy to adapt to other scenario’s. It is also something that I’m releasing as open source so that people can create their own datasets and easily test them.

A 2D (Phaser) and 3D (BabylonJS) Tic-Tac-Toe

The proof of the pudding is in the eating, so I developed 2 open source games that uses on of the networks that I have trained to recognize Tic-Tac-Toe symbols.

It are 2 very rudimentary demo’s, but it is just to show that it is possible to use machine learning in web games which also completely runs on the client-side.

games

The 2D version is build using Phaser and the 3D version using BabylonJS. Nevertheless this can easily be applied to other JavaScripts engines!

Improvements

There are some improvements that we could make. The biggest one is finding the bounding box of our drawings and only use that part for classification and/or training.

One of the things that I have noticed, is that it works best when the thing we want to recognize is drawn at the center. The reason simply being that the drawings in my training data is also drawn at the center. We could generate a lot of extra training data where we draw at the sides or even better use a bounding box algorithm so the position doesn’t matter that much.

That being said as I have only 300 entries in my training set, in practice you certainly want to train a neural network on a lot more examples!

You can find all the relevant source code on my github account.