Creating a Neural Network from Scratch

Understanding the key concepts behind the algorithm by building one yourself

Jun 13 ·16min read

Z7zeiqI.png!web

[ Image by author ]

Before We Begin — A Little History…

Neural networks have been around for many years. In fact, the idea behind the algorithm was first introduced over 60 years ago by a psychologist named Frank Rosenblatt.

It wasn’t until the start of the last decade, however, that these machine learning models started getting more attention, with the publication of this paper, proving the use and effectiveness of neural networks in machine learning.

Today, neural networks are at the core of deep learning. The most complex tasks in artificial intelligence are usually in the hands of Artificial Neural Networks and there are many libraries that abstract the creation of NNs in extremely few lines of code.

The Goal

In this article we will walk through the fundamental concepts behind neural networks, and understand their inner workings. We will do this by creating a flexible NN from scratch.

The full source code for this article can be found at this link! :point_down::point_down::point_down:

jzsiggy/NN-from-scratch

Contribute to jzsiggy/NN-from-scratch development by creating an account on GitHub.

github.com

The code we will be looking at is written in javascript, however, each concept and step will be deeply documented and explained, so you can follow along with any language you like!

What you have to know before we move on…

Neural networks are a type of algorithm that belong inside deep learning. Deep learning, for that matter, is a subclass of machine learning, that is itself a subclass of artificial intelligence.

r67RN3j.png!web

source — https://www.qubole.com/

The core idea behind machine learning is that algorithms can be trained to classify and process data without being explicitly told the rules of classification.

What this means is that instead of having a model that makes decisions based on hard coded instructions, we can train the model on a great amount of input/output pairs (the training set ), and after some time, have the model come up with it’s own rules of classification for a never-before-seen given input.

This idea is valid on all machine learning algorithms.

All right — now that the basics of machine learning are out of the way, let’s move on to neural nets!

Part 1 — Layers??

Let’s talk about the fundamental structure of a neural network. The data processing works through layers. Any network will have an input layer, a subset of hidden layers (we’ll talk about them in just a sec), and an output layer.

Each layer of a network is composed by a number of neurons , and the neurons of each layer are connected to the neurons of the next through weights .

vaUF7nf.png!web

Simple network with 1 input layer, 1 hidden layer and 1 output layer — [ Source: Wikipedia ]

The input layer is where the data will enter the algorithm. Say we want to create a network to predict the probability of a patient having a certain disease parting from the symptoms he has shown.

In this case, our training data will be structured as an input/output pair. The input will be an array of 0's and 1's representing the symptoms we are analyzing, and the output will be a 0 or 1, representing the infectious status of the patient that presented the ‘input’ symptoms.

iMz6VzZ.png!web

Training data for an example ML algorithm

If we were to train a neural network on this data, we would have to have an input layer with five neurons, one for each of the symptoms. We can have an arbitrary amount of hidden layers, with an arbitrary amount of neurons in each one, but we must have an output layer of just one neuron.

The neuron in the output layer will have an activation close to 1 when the algorithm thinks that the input represents a sick patient and close to 0 when the algorithm thinks that the input represents a non-infected patient.

7FZ7ZnV.png!web

Layer structure for our example neural network

You can think of the activation of a neuron as being the value that it stores. The activations of the neurons in the first layer will be the same as the values of our input data. These values will be translated on to the neurons of the following layers through mathematical operations we will check out in a bit.

The activations of the neurons in our output layer will result in our network’s prediction. (It may be just one neuron, as in the case of the example above)

Part 2 — Activations, Weights & Biases

(and the sigmoid function!)

Alright, you might be wondering how the data that enters the input layer transverses the network and ends up in the output layer totally different than how it came in. This process is called feedforward and the full explanation involves some linear algebra and matrix multiplications that we will look at soon. For now we’ll stick with the idea behind the process.

As we saw earlier, each neuron is connected to a neuron of the previous layer through what we called weights . What this means is that the activation of a neuron in, say, the second layer will be relative to the sum of all the activations of the neurons in the layer before it multiplied by the weights connecting them.

3qEVNrR.png!web

Network representation

If we represent the activation of a neuron with the letter a and the weight connecting two neurons as w , the activation of a single neuron in the second layer may be represented as in the below image.

‘Activation’ of a neuron

The equation above is actually way oversimplified for this matter. In reality this is just the first step of calculating the activation of any given neuron. The image below will introduce us to the remaining operations we must execute to find the final activation.

6vquUz2.png!web

source — hackernoon.com

The two new factors that we haven’t yet looked at are: the bias, represented by the letter b; and the activation function, represented by the function g(z) .

Let’s start by tackling the bias.

The idea here is that each neuron will have a unique number that will be added to the weighted sum of the the previous neurons’s activations.

Adding the bias is useful so that neural networks can generalize better to any given data. You can think about it as being the constant in a linear function y=mx+c.

Adding a bias to each neuron isn’t fundamental for a network to work properly on simple data, however, when we are tackling very complex tasks it becomes indispensable.

Now that we got down the bias, our activation equation looks something like this:

Weighted sum + bias

The last thing we must understand to make sense of how data transverses the network are activation functions.

Lets represent with the letter z the weighted sum of a given neuron added up with it’s bias. The activation of the neuron in question will be the result of an activation function applied to the z factor.

The activation function is used to smoothen out small changes to a neurons activation and keep it between 0 and 1. One very common activation function, and the one that we’ll be using on our own network, is the sigmoid function.

The Sigmoid is useful as an activation function because inputs way greater than 0 will quickly result in a number very close to 1, once passed through the sigmoid function. By the same principle, the smaller the input, the closer the resulting number will be to 0. Input numbers very close to 0 will have a value that smoothly oscillates between 0 and 1.

n2aIrym.png!web

Sigmoid function

To sum it all up, we can find the activation of any neuron by adding up the weighted sum of the activations of the neurons in the previous layer with the bias, and passing this result through the sigmoid function.

rUjuAj3.png!web

Finding the activation of a neuron

The process of finding the activation of the neurons in the output layer from any given input is called feedforward. It works by finding all the activations of the neurons in the second layer from the neurons in the input layer and the weights connecting them. The activations of the neurons in the second layer are used to find the activations of the following layer, and so on until the output layer.

The weights and biases are all initialized randomly in an untrained network. The process of training a network means finding the values of these weights and biases that make the resulting output layer of any given input extremely close to the expected output.

Let’s code all of this up, and then we’ll check out how we can teach our network by tweaking these weights and biases to have an optimal performance!

Creating a Neural Network from Scratch