1st Place Solution for Intel Scene Classification Challenge

Introduction

Problem

You are provided with a dataset of ~25k images from a wide range of natural scenes from all around the world.

Your task is to identify which kind of scene can the image be categorized into.

Data Classes

Approach

Building and training a Convolutional Neural Network that can classify above mentioned categories of images correctly.

Language And Frameworks

To be able to quickly experiment through various models, I chose Python as a language and FastAI and PyTorch as the DL framework.

Transfer Learning with Progressive Image Resizing

Having taken the FastAI online course about 2 months ago I learnt a few important tips and tricks that would help in training models with the help of limited data and reach high accuracy quickly.

Transfer Learning and Progressive Image Resizing were two of these very useful techniques.

A great example of which is this cifar-10 notebook by FastAI https://github.com/fastai/fastai/blob/master/courses/dl1/cifar10.ipynb

I used built-in resnet50 architecture from FastAI library with pretrained ImageNet weights to train a model on progressively increasing image sizes of 32x32 , 64x64 , 128x128 and 224x224 . Default Image transformation, Normalization and Learning Rate Optimizer were applied.

Many intermediate submissions were generated between running multiple epochs out of which the best test accuracy I could get with this technique was 0.946575342465753

First Ensemble

At this point I had 20 odd submissions which I ensembled using a simple voting mechanism, a common employed technique in Data Science competitions and my test accuracy increased to 0.958904109589041

Mixup Augmentation

FastAI has a built-in callback for a new technique called Mixup Augmentation https://docs.fast.ai/callbacks.mixup.html

This technique didn’t show any significant improvement in itself but submission generated by it was included in the ensembling of submissions.

LR Tuning

With the help of lr_find function, FastAI provides a way to get an optimum learning rate for training a model. This helped in setting learning rate for the next epochs before Progressive Resizing step.

https://docs.fast.ai/basic_train.html#lr_find

Places365 Dataset Weights

For my next experiment, instead of using ImageNet pretrained weights I started using freely available Places365 dataset weights for resnet50 architecture for transfer learning: http://places2.csail.mit.edu/models_places365/resnet50_places365.pth.tar

Applying the same technique as before i.e. progressive resizing and default transformations I trained few epochs on this model and tried submitting all intermediate models with and without mixup augmentation .

Test Accuracy: 0.953424657534247

Second Ensemble

Similar to before now ensembling all previous submissions gave test accuracy of 0.958904109589041

kNN With Embeddings

After searching for inspiration I stumbled upon the solution of Google Landmark Recognition Challenge Winners (4th place) that had an interesting idea of predicting the class using kNN of embeddings or feature vector representation of images which is output of last layer of our model before softmax. They called it few shot learning: https://www.kaggle.com/c/landmark-recognition-challenge/discussion/57896

I experimented with k=5 , k=10 , k=50 , k=100 , k=500 and used voting as well as average weighted distance to identify which class an image belongs to.

This technique didn’t give a direct improvement in accuracy but it contributed in the ensembling of submissions in the future.

Final Ensemble

This was my final submission which was a simple ensemble of all the submission that scored above 0.95 in test accuracy giving me a final score of 0.964840182648402 on public leaderboard.

Data Preprocessing

Image Transformations

I used the basic transformations which come as defaults with FastAI’s get_transform function. More on this here: https://docs.fast.ai/vision.transform.html

Normalization

Images were normalized during training with built-in normalize function

https://docs.fast.ai/vision.data.html#Data-normalization

Train Validation Split

I used the default train validation split in fastai which is 0.2

https://docs.fast.ai/vision.data.html#ImageDataBunch

Removing Confusing Images

One of the largest boosts in accuracy was achieved by removing images which might be mislabeled or confusing to the model. For this I loaded a previously trained resnet50 model and ran a prediction on the entire training dataset. I chose to remove images with wrong predictions having confidence less than 0.55 as well as more than 0.999999 .

The ones with lower accuracy represent images having two or more classes present at the same time (like buildings and street, or mountains and glaciers). Hence, the confidence of the highest probable class is relatively lower.

Removing images having a wrong prediction with confidence greater than 0.999999 shows the case of blatant misclassification on images with high confidence which might be due to mislabeling of training samples.

This yielded a test accuracy of 0.963013698630137 on public leaderboard.

Test Time Augmentation

Using the TTA function in FastAI – https://docs.fast.ai/basic_train.html#Test-time-augmentation , not only improved the test accuracy of the above model to 0.963470319634703 but also contributed significantly in the ensemble of submissions. Hence it turned out useful to generate submissions both with and without TTA in the final ensemble.

Results

0.9648401826
0.9559773039

Code

https://github.com/afzalsayed96/intel_scene_classification

Key Takeaways

A great library like FastAI with sanely optimized defaults helps
Try to find pretrained weights of a similar dataset relevant to the problem
Trust your solution and don’t try to overfit to get up on public leaderboard.
Ensembling your best submissions help especially if you have variation such as mixup, kNN, TTA, etc.
Follow basic hygiene and best practices like setting seeds, journaling models, etc to get reproducibility on models.

Credits

Thanks to Soumendra P for providing his valuable mentorship and guidance during the contest.

Introduction

Problem

Data Classes

Approach

Language And Frameworks

Transfer Learning with Progressive Image Resizing

First Ensemble

Mixup Augmentation

LR Tuning

Places365 Dataset Weights

Second Ensemble

kNN With Embeddings

Final Ensemble

Data Preprocessing

Image Transformations

Normalization

Train Validation Split

Removing Confusing Images

Test Time Augmentation

Results

Code

Key Takeaways

Credits

Recommend

Raft算法及etcd/raft的实现思路借鉴

大小端与序列化

Laravel's factory method but for JavaScript

Java并发 -- Disruptor

JavaScript 基础：异步

DDoS Protection With IPtables

毕设开源了，126个star，39个fork

『互联网架构』软件架构-电商系统架构（上）（69）

第221期

Creating a CRNN model to recognize text in an image

About Joyk