1

What is R-CNN? - Summarizing Regions with CNN Features

 2 years ago
source link: https://hackernoon.com/what-is-r-cnn-summarizing-regions-with-cnn-features
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

What is R-CNN? - Summarizing Regions with CNN Features

RCNN was first proposed in a paper by (Girshick et al.) in the year 2013 and became very famous soon after. The system consists of multiple major modules and the combination of these modules gave a result that was better than SOTA at that time. On PASCAL VOC 2010–12 dataset, the RCNN method has achieved a significant mAP of \~53%. On the 200-class ILSVRC2013 dataset, RCNN achieved an improvement over the second-best 24.3%.

Listen to this story

Speed:
Read by:
Dipankar Das

Aspiring Applied Scientist

Original Paper — Rich feature hierarchies for accurate object detection and semantic segmentation

Object Detection and Image Segmentation are two topics that are most researched in the field of Computer Vision. In this article, we take a look at one of the famous yet ‘old’ algorithms in the domain of Object Detection — RCNN. This algorithm was first proposed in a paper by (Girshick et al.) in the year 2013 and became very famous soon after.

Overview

The system itself consists of multiple major modules and the combination of these modules gave a result that was better than SOTA at that time.

Major Modules

The first module or the Region Proposal Module was used to generate category-independent region proposals. There are several methods available for region proposals but the authors in the paper chose “Selective Search(Fast Mode)” for result comparison purposes. R-CNN is however agnostic of these methods.

The second module or the Feature Extraction module is where CNN is used to extract features from the Proposed Regions of Interest. The image data in the proposed region is converted into a form that is compatible with CNN architecture. All the pixels are warped in a tight bounding box as per the required size irrespective of the size or the aspect ratio.

The features are extracted by forward propagating a mean subtracted 227*227 RGB image through five convolutional layers and two fully connected layers.”

0B6otOsIL4f8s4wa8LooI6NQbmC3-jl137g5.png

Prediction

The selective search is run on the test set images to extract around 2000 region proposals and forward propagate it through the CNN to extract the required features in the form of a feature vector. Now, for each of the candidate classes of the objects, an SVM is trained and then is used to classify whether the feature vector belongs to that class or not. Now, what happens for images which have an overlapping region of two objects? In such cases, a greedy non-maximum suppression technique is used where it rejects a region if it has an intersection over union(IOU) overlap with a higher scoring selected region larger than a learned threshold.

Training

  1. Pretraining — The authors discriminatively had pre-trained the CNN on a large auxiliary dataset using image-level annotations only.

  2. Domain-specific fine-tuning — The difference with regular CNN training was that here object detection was the primary task and there were warped region proposal windows. The authors did a SGD training of CNN parameters using only warped region proposals. Aside from replacing the classification layer in CNN; everything was unchanged.

Results

  1. On PASCAL VOC 2010–12 dataset, the RCNN method has achieved a significant mAP of ~53%.

  2. On the 200-class ILSVRC2013 dataset, RCNN achieved an mAP of 31.4% which is almost a 7% improvement over the second-best 24.3%.

If you have liked the summary, you can consider buying me a coffee here.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK