README.md

Simplest artificial neural network

This is the simplest artificial neural network possible explained and demonstrated.

This is part 1 of a series of github repos on neural networks

part 1 - simplest network (you are here)
part 2 - backpropagation
part 3 - backpropagation-continued
part 4 - hopfield networks

Theory
Code example
References

Theory

Mimicking neurons

Artificial neural networks are inspired by the brain by having interconnected artificial neurons store patterns and communicate with each other. The simplest form of an artificial neuron has one or multiple inputs $9fc20fb1d3825674c6a279cb0d5ca636.svg?invert_in_darkmode&sanitize=true$ each having a specific weight $c2a29561d89e139b3c7bffe51570c3ce.svg?invert_in_darkmode&sanitize=true$ and one output $deceeaf6940a8c7a5a02373728002b0f.svg?invert_in_darkmode&sanitize=true$ .

At the simplest level, the output is the sum of its inputs times its weights.

$c2d2775d67e954682fac686e557baed2.svg?invert_in_darkmode&sanitize=true$

A simple example

Say we have a network with two inputs $f9b6dcc9279f659321ac3e1098b0ba4f.svg?invert_in_darkmode&sanitize=true$ and $bf84a893effff44b6d014b2b60460585.svg?invert_in_darkmode&sanitize=true$ and two weights $4b4518f1b7f0fb1347fa21506ebafb19.svg?invert_in_darkmode&sanitize=true$ and $f7eb0e840408d84a0c156d6efb611f3e.svg?invert_in_darkmode&sanitize=true$ .

The idea is to adjust the weights in such a way that the given inputs produce the desired output.

Weights are normally initialized randomly since we can't know their optimal value ahead of time, however for simplicity we will initialize them both with $f58ed17486d1735419372f2b7d091779.svg?invert_in_darkmode&sanitize=true$ .

Then the output $deceeaf6940a8c7a5a02373728002b0f.svg?invert_in_darkmode&sanitize=true$ will be

$48c4f6073c4655b74cebf396493c9228.svg?invert_in_darkmode&sanitize=true$

The error

If the output $deceeaf6940a8c7a5a02373728002b0f.svg?invert_in_darkmode&sanitize=true$ doesn't match the expected result, then we have an error.
For example, if we wanted to get an expected output of $ad35a4143e0a34d97d3abc63c4dc81a3.svg?invert_in_darkmode&sanitize=true$ then we would have a difference of

$c744817f1f470ba09c3750aadef1c2a9.svg?invert_in_darkmode&sanitize=true$

The most common way to measure the error is to use the square difference:

$215d8df8edc921e2d5c6c45e3cf05508.svg?invert_in_darkmode&sanitize=true$

If we had multiple associations of inputs and expected outputs, then the error becomes the sum of each association.

$fafaf847c9561ad1dc74e0260f6d1291.svg?invert_in_darkmode&sanitize=true$

To rectify the error, we would need to adjust the weights in a way that the actual output matches the expected output. In our example, lowering $4b4518f1b7f0fb1347fa21506ebafb19.svg?invert_in_darkmode&sanitize=true$ from $f58ed17486d1735419372f2b7d091779.svg?invert_in_darkmode&sanitize=true$ to $cde2d598001a947a6afd044a43d15629.svg?invert_in_darkmode&sanitize=true$ would do the trick, since

$e6f831d1a270623d0d7f7ed67ad50360.svg?invert_in_darkmode&sanitize=true$

However, in order to adjust the weights of our neural networks for many different inputs and expected outputs, we need a learning algorithm.

Gradient descent

The idea is to use the error in order to adjust each weight so that the error is minimized.

What is a gradient?

It's essentially a vector pointing to the direction of the steepest ascent of a function. The gradient is denoted with $47c28f1929c18f887420345e9225e08b.svg?invert_in_darkmode&sanitize=true$ and is simply the partial derivative of each variable of a function expressed as a vector.

Example for a two variable function:

$b142e84f3f77e6dc3144eb723cd4510d.svg?invert_in_darkmode&sanitize=true$

What is gradient descent?

The descent part simply means using the gradient to find the direction of steepest ascent of our function and then going in the opposite direction by a small amount many times to find the function global minimum.

We use a constant called the learning rate, denoted with $7ccca27b5ccc533a2dd72dc6fa28ed84.svg?invert_in_darkmode&sanitize=true$ to define how small of a step to take in that direction.

If $7ccca27b5ccc533a2dd72dc6fa28ed84.svg?invert_in_darkmode&sanitize=true$ is too large, then we risk overshooting the function minimum, but if it's too low then the network will take longer to learn and we risk getting stuck in a local minimum.