(In-depth) Machine Learning Image Classification With TensorFlow 2.0

(In-depth) Machine Learning Image Classification With TensorFlow

Understand the processes involved in implementing neural networks for image classification.

Mar 25 ·13min read

YVfyQfm.jpg!web

Photo by Arif Riyanto on Unsplash

Introduction

T his is going to be a lengthy article since I go into great detail in regard to the components and processes that are integral to the implementation of an image classification neural network.

Feel free to take some breaks, or even skip directly to sections with code.

This article aims to present practical implementation skills, accompanied by explanations into terms and terminologies involved in machine learning development.

The content of this article is intended for beginners and intermediate machine learning practitioners.

There is a link to a notebook for the code presented within this article, located at the bottom of the page.

Enjoy.

Aim

Neural networks solve a variety of tasks, such as classification, regression, and plenty more.

This article examines the process involved in developing a simple neural network for image classification.

An exploration into the following will be conducted:

Definition of Image classification and other terms
Theories and concepts in machine learning (Multilayer Perceptron)
How to leverage tools and libraries like TensorFlow, Keras and more
How to build, train and evaluate a neural network

Image Classification

Image classification is a task that is associated with multi-label assignments.

It involves the extraction of information from an image and then associating the extracted information to one or more class labels. Image classification within the machine learning domain can be approached as a supervised learning task.

But before we go further, an understanding of a few fundamental terms and the tools and libraries that are utilized are required to understand the implementation details properly

Perceptron

A Perceptron is a fundamental component of an artificial neural network, and it was invented by Frank Rosenblatt in 1958. A perceptron utilizes operations based on the threshold logic unit.

Perceptrons can be stacked in single layers format, which is capable of solving linear functions. Multilayer perceptrons are capable of solving even more complex functions and have greater processing power.

maEbi2u.png!web

Perceptron image from missinglink.ai

MLP

A Multilayer perceptron (MLP) is several layers of perceptrons stacked consecutively one after the other. The MLP is composed of one input layer, and one or more layers of TLUs called hidden layers, and one final layer referred to as the output layer.

Tools and Libraries

TensorFlow : An open-source platform for the implementation, training, and deployment of machine learning models.
Keras : An open-source library used for the implementation of neural network architectures that run on both CPUs and GPUs.
Pandas : Data analysis and modification library.
Matplotlib : Tool utilized to create visualization plots in Python such as charts, graphs and more
Numpy : Enables several mathematical computations and operations of array data structures.

import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import os
import time

Dataset

The Fashion-MNIST dataset consists of images of clothing (T-shirts, Trousers, Dresses and more) that originate from Zalando’s image directory. Zalando is a European e-commerce company founded in 2008.

fm26jyv.png!web

Fashion-MNIST classes and examples of associated images

The researchers in Zalando have created the Fashion-MNIST dataset that contains 70,000 images of clothing. More specifically, it contains 60,000 training examples and 10,000 testing examples, that are all grayscale images with the dimension 28 x 28 categorized into 10 classes.

The classes correspond to what item of clothing is present in the image. For example, an image of an ankle boot corresponds to the numeric label ‘9’.

Visualization of the distribution of the Fashion-MNIST dataset

Dataset partitions

For this particular classification task, 55,000 training images, 10,000 test images, and 5,000 validation images are utilized.

Training Dataset: This is the group of our dataset used to train the neural network directly. Training data refers to the dataset partition exposed to the neural network during training.
Validation Dataset: This group of the dataset is utilized during training to assess the performance of the network at various iterations.
Test Dataset: This partition of the dataset evaluates the performance of our network after the completion of the training phase.

E3Ij2mf.png!web

illustration of dataset partitioning

The Keras library has a suite of datasets readily available for use with easy accessibility.

fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

60,000 training images with the 28x28 dimensions

train_images.shape
>> (60000, 28, 28)

60,000 training labels, each label corresponding to an item of clothing, for example, the label 9 corresponds to Ankle boots

train_labels.shape
>> (60000,)train_labels[0]
>> 9

Visualization and Preprocessing the Data

Before we proceed, we have to normalize the training image pixel values to values within the range 0 and 1. This is done by dividing each pixel value within the train and test images by 255.

train_images = train_images / 255.0
test_images = test_images / 255.0

Below are the class names the images in the fashionMNIST dataset corresponds to.

class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

fm26jyv.png!web

Fashion-MNIST classes and examples of associated images

Visualize the dataset

plt.figure(figsize=(10,10))
for i in range(20):
    plt.subplot(5,4, i+1)
    plt.xticks([])
    plt.imshow(train_images[i])
    plt.xlabel(class_names[train_labels[i]])
plt.show()

yMzyEzI.png!web

The validation partitions of the dataset are derived from the training dataset. 5000 images and labels will be utilized for validation purposes.

validation_images = train_images[:5000]
validation_labels = train_labels[:5000]

Here is an example of a corresponding clothing name identified with a specific index position.

class_names[train_labels[2]]
>> 'T-shirt/top'

Building the Model

Keras provides tools required to implement the classification model. Keras presents a Sequential API for stacking layers of the neural network on top of each other.

The classification network is a shallow network with 3 hidden layers, an input layer, and 1 output layer. The input layer is built using the ‘Flatten’ constructor that takes in the input shape as its arguments, in this case [28,28].

Each image input data is converted or flattened into a 1D array. The Dense layers have a defined number of neurons/units, and the amount of units is passed in as the first argument. Each dense layer also has a second argument that takes in the activation function to be utilized within each layer.

The first three layers use the ReLU activation function, while the last layer uses a softmax activation.

Definitions

Activation Function : A mathematical operation that transforms the result or signals of neurons into a normalized output. An activation function is a component of a neural network that introduces non-linearity within the network. The inclusion of the activation function enables the neural network to have greater representational power and solve complex functions.
ReLU activation : Stands for ‘rectified linear unit’ ( y=max(0, x)) . It’s a type of activation function that transforms the value results of a neuron. The transformation imposed by ReLU on values from a neuron is represented by the formula y=max(0,x). The ReLU activation function clamps down any negative values from the neuron to 0, and positive values remain unchanged. The result of this mathematical transformation is utilized as the output of the current layer, and as input to the next.
Softmax : An activation function that is utilized to derive the probability distribution of a set of numbers within an input vector. The output of a softmax activation function is a vector in which its set of values represents the probability of an occurrence of a class/event. The values within the vector all add up to 1.

# Classification MLP(Multilayer perceptron) with two hidden layers
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28,28]),
    keras.layers.Dense(500, activation=keras.activations.relu),
    keras.layers.Dense(250, activation=keras.activations.relu),
    keras.layers.Dense(100, activation=keras.activations.relu),
    keras.layers.Dense(10, activation=keras.activations.softmax)
])

A visual statistical summary of the model implemented above is obtainable by calling the ‘ summary ’ method available on our model. By calling the summary method, we gain information on the model properties such as layers, layer type, shapes, number of weights in the model, and layers.

model.summary()

Provides the output below

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 500)               392500    
_________________________________________________________________
dense_1 (Dense)              (None, 250)               125250    
_________________________________________________________________
dense_2 (Dense)              (None, 100)               25100     
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1010      
=================================================================
Total params: 543,860
Trainable params: 543,860
Non-trainable params: 0

Each layer in the model as a number of perceptrons and each layer has a set of attributed weights and biases.

The model’s weights initialized randomly. The weights values within the network are initialized using a glorot uniform initializer, which turns out to be the default initializer for Dense layers in Keras.

Glorot uniform initializer : A neural network’s weight initialization method utilized as a solution to solve unsteady gradients within a neural network. Weights within a network are initialized from a distribution of values over a certain range, with the mean of the values evaluating to zero and a constant variance. The maximum of the distribution is the positive value of the range, and the minimum is the negative value of the range. range = [value, -value]

The value used to determine the distribution range is derived from the formula:

value = sqrt(6 / fan_in + fan_out)

‘fan_in’ is the number input to layer.

‘fan_out’ is the number of neurons within the layer.

More information is provided in the official research paper .

first_hidden_layer = model.layers[1]
weights, biases = first_hidden_layer.weights
print(weights)
print('_____________')
print('_____________')
print(biases)

Training The Model

Compilation

Keras provides the ‘ compile’ method through the model object we have instantiated earlier. The compile function enables the actual building of the model we have implemented behind the scene with some additional characteristics such as the loss function, optimizer, and metrics.

To train the network, we utilize a loss function that calculates the difference between the predicted values provided by the network and actual values of the training data.

The loss values accompanied by an optimizer algorithm facilitates the number of changes made to the weights within the network. Supporting factors such as momentum and learning rate schedule, provide the ideal environment to enable the network training to converge, herby getting the loss values as close to zero as possible.

Definitions

Learning Rate is an integral component of a neural network as it’s a factor value that determines the level of updates that are made to the values of the weights of the network.

In a visualization exercise, the function to be solved can be depicted as a hyperbolic curve in n-dimensional parameter space.

The learning rate is a component that affects the step size that the current parameter values take towards a local/global minima; hence the learning rate directly affects the rate of convergence of a network during training. If the learning rate is too small the network might take several iterations and epochs to converge. On the other hand, if the learning rate is too high, there is a risk of overshooting the minima, and as a result of this our training doesn’t converge. Selecting the appropriate learning rate can be a time staking exercise.

Learning rate schedule : A constant learning rate can be utilized during the training of a neural network, but this can increase the amount of training that has to take place to arrive at optimal neural network performance. By utilizing the learning rate schedule, we introduce a timely reduction or increment of the learning rate during training to arrive at an optimal training outcome of the neural network.
Learning rate Decay: Learning rate decay reduces the oscillations of steps taken towards a local minimum during gradient descent. By reducing the learning rate to a smaller value compares to the learning rate value utilized at the start of the training, we can steer the network towards a solution that oscillates in smaller ranges around a minimum.
Loss Function : This is a method that quantifies ‘how well’ a machine learning model performs. The quantification is an output(cost) based on a set of inputs, which are referred to as parameter values. The parameter values are used to estimate a prediction, and the ‘loss’ is the difference between the prediction and the actual values.
Optimizer : An optimizer within a neural network is an algorithmic implementation that facilitates the process of gradient descent within a neural network by minimizing the loss values provided via the loss function. To minimize the loss, it is paramount the values of the weights within the network are selected appropriately.

Examples of Optimization algorithms:

Stochastic Gradient Descent
Mini Batch Gradient Descent
Nesterov Accelerated Gradient

For more information on gradient descent, refer to the article below:

Understanding Gradient Descent And Its Variants

Gain a brief understanding of how the learning process in machine learning models are supported by optimization…

towardsdatascience.com

sgd = keras.optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss="sparse_categorical_crossentropy", optimizer=sgd, metrics=["accuracy"])

TensorBoardprovides visual insights into the events that occur during training at each epoch.

The training visualization provided by TensorBoard is stored in a ‘runs’ folder directory. We create a function to generate a folder directory and identify each log via a timestamp.

root_logdir = os.path.join(os.curdir, "runs")def get_run_logdir():
    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
    return os.path.join(root_logdir, run_id)run_logdir = get_run_logdir()
tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)

The Model’s functional API ‘ fit ’ method provides the tools to train the implemented network.

Passing specific arguments into the fit function:

we can specify the training data used for training
the number of epochs we are to train the network for
and also validation dataset to be used to validate the performance of the network during training to unseen data.

We’ll also utilize the ‘callbacks’ argument, which in this instance, calls the TensorBoard callback created.

The default batch size within Keras when training a neural network is 32. The network is trained for a total of 60 epochs. With the utilization of early stopping , a halt to training is made once no improvement in the validation loss is recorded after 3 epochs. Early stopping can save you hours, especially in the scenario where your network begins to overfit and stops converging.

In summary, we train the model for a maximum of 60 epochs, where we feed forward all our training data in batches of 32 (batch size) through the network at each epoch.

An update is made to our network’s weights parameters after it’s seen 32 training images and labels.

The ‘fit’ method takes additional arguments that are in the official Keras documentation .

early_stopping_cb = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=3, verbose=0, mode='auto')
model.fit(train_images, train_labels, epochs=60, validation_data=(validation_images, validation_labels), callbacks=[tensorboard_cb, early_stopping_cb])

To run TensorBoard, place the command below in your terminal, and navigate to localhost:6006.

tensorboard --logdir=runs

zqmeamu.png!web

Training snapshot from TensorBoard

Evaluation

To evaluate a model requires feed-forwarding through the network data that hasn’t been exposed to the network during training.

Evaluation of the model before actual utilization is a good indicator of observing how well the model can generalize to unseen data.

With the evaluation results, you can decide either to fine-tune the network hyperparameters or move forward to production after observing the accuracy of the evaluation over the test dataset.

model.evaluate(test_images, test_labels)
>> 10000/10000 [==============================] - 1s 74us/sample - loss: 0.3942 - accuracy: 0.8934
[0.3942159619651735, 0.8934]

Predictions

To make predictions with the trained model, 5 images from our test dataset are used to emulate real-life scenario-based testing.

By using the ‘predict’ method made available through our trained model, we can pass in the batch of practical test images to our model and extract the probability vector for each image.

The probability vector contains 10 elements, and each element in the vector corresponds to the likelihood of the occurrence of a class from the 10 pieces of clothing classes defined earlier.

practical_test_images =  test_images[:10]
prediction_probabilites = model.predict(practical_test_images)
prediction_probabilites

We can create a function to loop through each vector and obtain the highest confidence score, which corresponds to the class that our model predicts the image belongs to.

def derive_predicted_classes(prediction_probabilites):
    batch_prediction = []
    for vector in prediction_probabilites:
        batch_prediction.append(np.argmax(vector))
    return batch_predictionmodel_prediction = derive_predicted_classes(prediction_probabilites)
model_prediction
>> [9, 2, 1, 1, 6, 1, 4, 6, 5, 7]

Another method we can utilize to gain the classes each image corresponds to is to leverage the ‘ predit_classes ’ method.

model_prediction = model.predict_classes(practical_test_images)
model_prediction

The ‘ predict_classes ’ method provides a 1-dimensional vector or an array containing the classes each of the images corresponds to.

np.array(class_names)[model_prediction]
>>array(['Ankle boot', 'Pullover', 'Trouser', 'Trouser', 'Shirt', 'Trouser',
       'Coat', 'Shirt', 'Sandal', 'Sneaker'], dtype='<U11')

Let’s visualize the images within the practical_test_images and the predicted classes from the model.

# Visualise the prediction result
plt.figure(figsize=(10,10))
for i in range(len(practical_test_images)):
    plt.subplot(5,5, i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(practical_test_images[i])
    plt.xlabel(class_names[model_prediction[i]])
plt.show()

EZNr2mN.png!web

Saving the Model

The last step involves saving our model for future use.

Saving a trained TensorFlow model involves the calling of the ‘save’ function on the model itself.

model.save("image_classification_model.h5")

Using a saved model is achievable by calling the ‘load_model’ function made available via the Keras.models API.

loaded_model = keras.models.load_model("image_classification_model.h5")
predictions = loaded_model.predict_classes(practical_test_images)
print(predictions)
print(np.array(class_names)[predictions])>>[9 2 1 1 6 1 4 6 5 7]
['Ankle boot' 'Pullover' 'Trouser' 'Trouser' 'Shirt' 'Trouser' 'Coat'
 'Shirt' 'Sandal' 'Sneaker']

Conclusions

This section contains affiliate links.

Through this article we have done the following:

Implemented a model
Trained a model
Evaluate a model
Saved a model

Following on from here you can explore more neural network architectures you can implement or dive into the TensorFlow and Keras library.

Below is a link to a GitHub repository that includes all code presented in this article.

RichmondAlake/tensorflow_2_tutorials

Permalink Dismiss GitHub is home to over 40 million developers working together to host and review code, manage…

github.com

Also below is a book I highly recommend in order to gain a good understanding of practical machine learning. Many readers will probably be familiar with this book or its previous edition

Hands-On Machine Learning With Scikit-Learn, Keras & TensorFlow

iI3eueq.jpg!web

Hands-On Machine Learning