Semantic Image Segmentation using Fully Convolutional Networks

Severstal is among the top 50 producers of steel in the world and Russia’s biggest player in efficient steel mining and production. One of the key products of Severstal is steel sheets. The production process of flat sheet steel is delicate. From heating and rolling, to drying and cutting, several machines touch flat steel by the time it’s ready to ship. To ensure quality in the production of steel sheets, today, Severstal uses images from high-frequency cameras to power a defect detection algorithm.

Through this competition, Severstal expects the AI community to improve the algorithm by localizing and classifying surface defects on a steel sheet .

Business objectives and constraints

A defective sheet must be predicted as defective since there would be serious concerns about quality if we misclassify a defective sheet as non-defective. i.e. high recall value for each of the classes is needed.
We need not give the results for a given image in the blink of an eye. (No strict latency concerns)

2. Machine Learning Problem

2.1. Mapping the business problem to an ML problem

Our task is to

Detect/localize the defects in a steel sheet using image segmentation and
Classify the detected defects into one or more classes from [1, 2, 3, 4]

To put it together, it is a semantic image segmentation problem.

2.2. Performance metric

The evaluation metric used is the mean Dice coefficient. The Dice coefficient can be used to compare the pixel-wise agreement between a predicted segmentation and its corresponding ground truth. The formula is given by:

where X is the predicted set of pixels and Y is the ground truth.

2.3. Data Overview

We have been given a zip folder of size 2GB which contains the following:

train_images —a folder containing 12,568 training images (.jpg files)
test_images — a folder containing 5506 test images (.jpg files). We need to detect and localize defects in these images
train.csv — training annotations which provide segments for defects belonging to ClassId = [1, 2, 3, 4]
sample_submission.csv — a sample submission file in the correct format, with each ImageId repeated 4 times, one for each of the 4 defect classes.

More details about data have been discussed in the next section.

3. Exploratory Data Analysis

The first step in solving any machine learning problem should be a thorough study of the raw data. This gives a fair idea about what our approaches to solving the problem should be. Very often, it also helps us find some latent aspects of the data which might be useful to our models.

Let’s analyse the data and try to draw some meaningful conclusions.

3.1. Loading train.csv file

train.csv tells which type of defect is present at what pixel location in an image. It contains the following columns:

ImageId : image file name with .jpg extension
ClassId : type/class of the defect, one of [1, 2, 3, 4]
EncodedPixels : represents the range of defective pixels in an image in the form of run-length encoded pixels(pixel number where defect starts <space> pixel length of the defect).
e.g. ‘29102 12’ implies the defect is starting at pixel 29102 and running a total of 12 pixels, i.e. pixels 29102, 29103,………, 29113 are defective. The pixels are numbered from top to bottom, then left to right: 1 corresponds to pixel (1,1), 2 corresponds to (2,1), and so on.

train_df.ImageId.describe()count 7095
unique 6666
top ef24da2ba.jpg
freq 3
Name: ImageId, dtype: object

There are 7095 data points corresponding to 6666 steel sheet images containing defects.

3.2. Analysing train_images & test_images folders

Number of train and test images

Let’s get some idea about the proportion of train and test images and check how many train images contain defects.

Number of train images : 12568
Number of test images : 5506
Number of non-defective images in the train_images folder: 5902

There are more images in the train_images folder than unique image Ids in train.csv . This means, not all the images in the train_images folder have at least one of the defects 1, 2, 3, 4.

Percentage of defective and non-defective images in train data

Sizes of train and test images

Let’s check if all images in train and test are of the same size. If not, we must make them of the same size.

{(256, 1600, 3)}
{(256, 1600, 3)}

All images in train and test folders have the same size (256 x 1600 x 3)

3.3. Analysis of labels: ClassId

Let’s see how train data is distributed among various classes.

Number of images in class 1 : 5150 (77.258 %)
Number of images in class 2 : 897 (13.456 %)
Number of images in class 3 : 801 (12.016 %)
Number of images in class 4 : 247 (3.705 %)

The dataset looks imbalanced.
The number of images with class 3 defect is very high compared to that of other classes. 77% of the defective images have class 3 defects.
Class 2 is the least occurring class, only 3.7 % of images in train.csv belong to class 2.

Note that the Sum of percentage values in the above analysis is more than 100, which means some images have defects belonging to more than one class.

Number of labels tagged per image

Number of images having 1 class label(s): 6239 (93.594%)
Number of images having 2 class label(s): 425 (6.376%)
Number of images having 3 class label(s): 2 (0.03%)

The majority of the images (93.6%) have only one class of defects.
Only 2 images (0.03%) have a combination of 3 classes of defects.
The rest of the images (6.37%) have a combination of 2 classes of defects.
No image has all 4 classes of defects.

4. Data Preparation

Before we move ahead to training deep learning models, we need to convert the raw data into a form that can be fed to the models. Also, we need to build a data pipeline, which would perform the required pre-processing and generate batches of input and output images for training.

As the first step, we create a pandas dataframe containing filenames of train images under the column ImageId , and EncodedPixels under one or more of the columns Defect_1 , Defect_2, Defect_3, Defect_4 depending on the ClassId of the image in train.csv. The images that do not have any defects have all these 4 columns blank. Below is a sample of the dataframe:

4.1. Train, CV split 85:15

I would train my models on 85% of train images and validate on 15%.

(10682, 5)
(1886, 5)

4.2. Utility Functions for converting RLE encoded pixels to masks and vice-versa

Let’s visualize some images from each class along with their masks. The pixels belonging to the defective area in the steel sheet image are indicated by yellow color in the mask image.

Our deep learning model would take steel sheet image as input (X) and return four masks (Y)(corresponding to 4 classes) as output. This implies, for training our model we would need to feed batches of train images and their corresponding masks to the model.

We generate masks for all the images in the train_images folder and store them into a folder called train_masks.

4.3. Data generator using tensorflow.data

The below code is data pipeline for applying pre-processing, augmentation to input images and generating batches for training.

4.4. Defining metric and loss function

I have used a hybrid loss function which is a combination of binary cross-entropy (BCE) and dice loss . BCE corresponds to binary classification of each pixel (0 indicating false prediction of defect at that pixel when compared to the ground truth mask and 1 indicating correct prediction). Dice loss is given by (1- dice coefficient).

BCE dice loss = BCE + dice loss

5. Models

There are several models/architectures that are used for semantic image segmentation. I have tried two of them in this case study: i)U-Net and ii) Google’s DeepLabV3+.

5.1. First cut Solution: U-Net for Semantic Image Segmentation

This model is based on the research paper U-Net: Convolutional Networks for Biomedical Image Segmentation , published in 2015 by Olaf Ronneberger , Philipp Fischer , and Thomas Brox of University of Freiburg, Germany . In this paper, the authors build upon an elegant architecture, called “ Fully Convolutional Network ”. They have used this for segmentation of neuronal structures in electron microscopic stacks and few other biomedical image segmentation datasets.

5.1.1. Architecture

The Architecture of the network is shown in the image below. It consists of a contracting path (left side) and an expansive path (right side). The expanding path is symmetric to the contracting path giving the network a shape resembling the English letter ‘U’. Due to this reason, the network is called U-Net.

Business objectives and constraints

2. Machine Learning Problem

2.1. Mapping the business problem to an ML problem

2.2. Performance metric

2.3. Data Overview

3. Exploratory Data Analysis

3.1. Loading train.csv file

3.2. Analysing train_images & test_images folders

3.3. Analysis of labels: ClassId

Number of labels tagged per image

4. Data Preparation

4.1. Train, CV split 85:15

4.2. Utility Functions for converting RLE encoded pixels to masks and vice-versa

4.3. Data generator using tensorflow.data

4.4. Defining metric and loss function

5. Models

5.1. First cut Solution: U-Net for Semantic Image Segmentation

Recommend

Being Slow to Criticise

Go pprof 与线上事故：一次成功的定位与失败的复现

Essential features of data specification libraries

裁员乌云笼罩硅谷：就业熊市会持续多久？

董明珠再战直播间，30分钟破亿，其他那些为直播焦虑的人还好吗？

阅文旗下起点中文网公开标注获吴承恩授权《西游记》

索尼PS国区商店暂时停服

一个三流SEOer从业记录

数据湖漫谈｜Hive ACID vs. Delta Lake

Reanimate: swearing at bad documentation

About Joyk