Convolution Vs Correlation

u6Bbqme.jpg!web

Convolutional Neural Networks which are the backbones of most of the Computer Vision Applications like Self-Driving Cars, Facial Recognition Systems etc are a special kind of Neural Network architectures in which the basic matrix-multiplication operation is replaced by a convolution operation. They specialize in processing data which has a grid-like topology. Examples include time-series data and image-data which can be thought of as a 2-D grid of pixels.

HISTORY

The Convolutional Neural Networks was first introduced by Fukushima by the name Neocognitron in 1980. It was inspired by the hierarchical model of the nervous system as proposed by Hubel and Weisel. But the model was not popular because of its complex unsupervised learning algorithm referred to as learning without a teacher. Yann LeCun in 1989 used backpropagation along with the concepts of Neocognitron to propose an architecture named LeNet which was used for handwritten Zip Code Recognition by U.S Postal Service. Yann LeCun further worked on this project and finally in 1998 released LeNet-5 — the first modern convnet that introduced some of the essential concepts we still use in CNN today. He also released MNIST dataset of handwritten digits which is perhaps the most famous benchmark dataset in machine learning. In the 1990’s the field of Computer Vision shifted its focus and a lot of researchers stopped trying working on CNN architectures. There was a cold winter for the Neural Network research until 2012 when a group of researchers from the University of Toronto entered a CNN based model(AlexNet) in the famous ImageNet challenge and ended up winning it with an error rate of 16.4%. Since then the Convolutional Neural Networks keep progressing forward and CNN based architectures keep winning the ImageNet and in 2015 the Convolutional Neural Networks based architecture ResNet surpassed human-level error rate of 5.1% with an error rate of 3.57%.

THE MISNOMER:

The Convolutional operation widely used in CNN is a misnomer. The operation that is used is strictly speaking a correlation instead of convolution. Both the operators have a slight difference and we will go through each of them separately to understand the difference.

Cross-Correlation:

Correlation is the process of moving a filter mask often referred to as kernel over the image and computing the sum of products at each location. Correlation is the function of displacement of the filter. In other words, the first value of the correlation corresponds to zero displacement of the filter, the second value corresponds to one unit of displacement, and so on.

i2eimyy.png!web

Figure 1.Cross-Correlation in 1-D

fqIBRjq.png!web

Figure 2.Cross-Correlation in 1-D

Mathematical Formula :

The mathematical formula for the cross-correlation operation in 1-D on an Image I using a Filter F is given by Figure 3. It would be convenient to suppose that F has an odd number of elements, so we can suppose that as it shifts, its centre is right on top of an element of Image I. So we say that F has 2N+1 elements, and these are indexed from -N to N, so that the centre element of F is F(0).

Figure 3. The formula of Cross-Correlation in 1-D

Similarly, we can extend the notion to 2-D which is represented in Figure 4. The basic idea is the same, except the image and the filter are now 2D. We can suppose that our filter has an odd number of elements, so it is represented by a (2N+1)x(2N+1) matrix.

Figure 4. The Formula of Cross-Correlation in 2-D.

The Correlation operation in 2D is very straight-forward. We just take a filter of a given size and place it over a local region in the image having the same size as the filter. We continue this operation shifting the same filter through the entire image. This also helps us achieve two very popular properties :

Translational Invariance: Our vision system should be to sense, respond or detect the same object regardless of where it appears in the image.
Locality: Our vision system focusses on the local regions, without regard to what else is happening in other parts of the image.

The Cross-Correlation function has a limitation or characteristic property that when it is applied on a discrete unit impulse(a 2D matrix of all zeros and just single 1) yields a result that is a copy of the filter but rotated by an angle of 180 degrees.

mqa2M3Z.png!web

Figure 5. The complete correlation operation

Convolution:

The convolution operation is very similar to cross-correlation operation but has a slight difference. In Convolution operation, the kernel is first flipped by an angle of 180 degrees and is then applied to the image. The fundamental property of convolution is that convolving a kernel with a discrete unit impulse yields a copy of the kernel at the location of the impulse.

We saw in the cross-correlation section that a correlation operation yields a copy of the impulse but rotated by an angle of 180 degrees. Therefore, if we pre-rotate the filter and perform the same sliding sum of products operation, we should be able to obtain the desired result.

YJNBbuI.png!web

Figure 6. Applying the convolutional operation on Image b in Figure 5.

Mathematical Formula:

The convolution operation applied on an Image I using a kernel F is given by the formula in 1-D. Convolution is just like correlation, except we flip over the filter before correlating.

Figure 7. Convolution Operation in 1-D.

In the case of 2D convolution, we flip the filter both horizontally and vertically. This can be written as:

Figure 8. Convolution Operation in 2-D.

The same properties of Translational Invariance and Locality are followed by Convolution operation as well.

Figure 9. The Correlation Operation demonstrated which many would refer as convolution.

NOTE:

Though both the operations are different slightly yet it doesn’t matter if the kernel used is symmetric.

Conclusion:

In this post, we briefly discussed the history and some properties of the Convolutional Neural Networks. We discussed the misnomer that the convolutional operation often mentioned in the various text is actss-correlation operation. The difference is very slight yet very useful and should be known by everyone entering, practising or experienced in the wide field of Computer Vision. I hope you liked the post and for any question, queries or discussion, DM me on twitter or Linkedin .

References:

Deep Learning book by Ian Goodfellow and Yoshua Bengio and Aaron Courville.
Digital Image Processing by Rafael C. Gonzalez.
Dive into Deep Learning by Aston Zhang, Zack C. Lipton, Mu Li and Alex J. Smola.
Correlation and Convolution by David Jacobs .
Figure 9 taken from https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2 .
https://spatial-lang.org/conv
The meme is taken from https://www.mihaileric.com/posts/convolutional-neural-networks/.

HISTORY

THE MISNOMER:

Cross-Correlation:

Mathematical Formula :

Convolution:

Mathematical Formula:

NOTE:

Conclusion:

References:

Recommend

Edit Images on Fedora Easily With GIMP

16英寸Macbook Pro体验：大一点强不只一点点

脸书推出支付工具Facebook Pay

Functional Programming with Kotlin and Arrow Part 2: Categories and Functors [FR...

UIScrollView Tutorial: Getting Started [FREE]

Getting Started With Python IDLE

Keras 作者François新作：通往真正的智能需要测量「智慧」，而非测量某个具体能力

Can Neural Networks Develop Attention? Google Thinks they Can

Intent to Explain: Demystifying the Blink Shipping Process

Keeping your PHP code under control with a few simple tips | TSH.io

About Joyk