1st Place Solution for Intel Scene Classification Challenge
source link: https://www.tuicool.com/articles/aEfiiin
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Introduction
Problem
You are provided with a dataset of ~25k images from a wide range of natural scenes from all around the world.
Your task is to identify which kind of scene can the image be categorized into.
Data Classes
Approach
Building and training a Convolutional Neural Network that can classify above mentioned categories of images correctly.
Language And Frameworks
To be able to quickly experiment through various models, I chose Python as a language and FastAI and PyTorch as the DL framework.
Transfer Learning with Progressive Image Resizing
Having taken the FastAI online course about 2 months ago I learnt a few important tips and tricks that would help in training models with the help of limited data and reach high accuracy quickly.
Transfer Learning and Progressive Image Resizing were two of these very useful techniques.
A great example of which is this cifar-10 notebook by FastAI https://github.com/fastai/fastai/blob/master/courses/dl1/cifar10.ipynb
I used built-in resnet50
architecture from FastAI library with pretrained ImageNet
weights to train a model on progressively increasing image sizes of 32x32
, 64x64
, 128x128
and 224x224
. Default Image transformation, Normalization and Learning Rate Optimizer were applied.
Many intermediate submissions were generated between running multiple epochs out of which the best test accuracy I could get with this technique was 0.946575342465753
First Ensemble
At this point I had 20 odd submissions which I ensembled using a simple voting mechanism, a common employed technique in Data Science competitions and my test accuracy increased to 0.958904109589041
Mixup Augmentation
FastAI has a built-in callback for a new technique called Mixup Augmentation
https://docs.fast.ai/callbacks.mixup.html
This technique didn’t show any significant improvement in itself but submission generated by it was included in the ensembling of submissions.
LR Tuning
With the help of lr_find
function, FastAI provides a way to get an optimum learning rate for training a model. This helped in setting learning rate for the next epochs before Progressive Resizing step.
https://docs.fast.ai/basic_train.html#lr_find
Places365 Dataset Weights
For my next experiment, instead of using ImageNet
pretrained weights I started using freely available Places365
dataset weights for resnet50
architecture for transfer learning: http://places2.csail.mit.edu/models_places365/resnet50_places365.pth.tar
Applying the same technique as before i.e. progressive resizing and default transformations I trained few epochs on this model and tried submitting all intermediate models with and without mixup augmentation
.
Test Accuracy: 0.953424657534247
Second Ensemble
Similar to before now ensembling all previous submissions gave test accuracy of 0.958904109589041
kNN With Embeddings
After searching for inspiration I stumbled upon the solution of Google Landmark Recognition Challenge Winners (4th place) that had an interesting idea of predicting the class using kNN of embeddings or feature vector representation of images which is output of last layer of our model before softmax. They called it few shot learning: https://www.kaggle.com/c/landmark-recognition-challenge/discussion/57896
I experimented with k=5
, k=10
, k=50
, k=100
, k=500
and used voting as well as average weighted distance to identify which class an image belongs to.
This technique didn’t give a direct improvement in accuracy but it contributed in the ensembling of submissions in the future.
Final Ensemble
This was my final submission which was a simple ensemble of all the submission that scored above 0.95
in test accuracy giving me a final score of 0.964840182648402
on public leaderboard.
Data Preprocessing
Image Transformations
I used the basic transformations which come as defaults with FastAI’s get_transform
function. More on this here: https://docs.fast.ai/vision.transform.html
Normalization
Images were normalized during training with built-in normalize
function
https://docs.fast.ai/vision.data.html#Data-normalization
Train Validation Split
I used the default train validation split in fastai which is 0.2
https://docs.fast.ai/vision.data.html#ImageDataBunch
Removing Confusing Images
One of the largest boosts in accuracy was achieved by removing images which might be mislabeled or confusing to the model. For this I loaded a previously trained resnet50 model and ran a prediction on the entire training dataset. I chose to remove images with wrong predictions having confidence less than 0.55
as well as more than 0.999999
.
The ones with lower accuracy represent images having two or more classes present at the same time (like buildings and street, or mountains and glaciers). Hence, the confidence of the highest probable class is relatively lower.
Removing images having a wrong prediction with confidence greater than 0.999999
shows the case of blatant misclassification on images with high confidence which might be due to mislabeling of training samples.
This yielded a test accuracy of 0.963013698630137
on public leaderboard.
Test Time Augmentation
Using the TTA
function in FastAI – https://docs.fast.ai/basic_train.html#Test-time-augmentation , not only improved the test accuracy of the above model to 0.963470319634703
but also contributed significantly in the ensemble of submissions. Hence it turned out useful to generate submissions both with and without TTA
in the final ensemble.
Results
0.9648401826 0.9559773039
Code
https://github.com/afzalsayed96/intel_scene_classification
Key Takeaways
- A great library like FastAI with sanely optimized defaults helps
- Try to find pretrained weights of a similar dataset relevant to the problem
- Trust your solution and don’t try to overfit to get up on public leaderboard.
- Ensembling your best submissions help especially if you have variation such as mixup, kNN, TTA, etc.
- Follow basic hygiene and best practices like setting seeds, journaling models, etc to get reproducibility on models.
Credits
Thanks to Soumendra P for providing his valuable mentorship and guidance during the contest.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK