Computer Vision 101: Working with Color Images in Python

Learn the basics of working with RGB and Lab images to boost your computer vision projects!

Mar 25 ·8min read

Every computer vision project — be it a cat/dog classifier or bringing colors to old images/movies — involves working with images. And in the end, the model can only be as good as the underlying data — garbage in, garbage out . That is why in this post I focus on explaining the basics of working with color images in Python, how they are represented and how to convert the images from one color representation to another.

Setup

In this section, we set up the Python environment. First, we import all the required libraries:

import numpy as npfrom skimage.color import rgb2lab, rgb2gray, lab2rgb
from skimage.io import imread, imshowimport matplotlib.pyplot as plt

We use scikit-image , which is a library from scikit-learn ’s family that focuses on working with images. There are many alternative approaches, some of the libraries include matplotlib , numpy , OpenCV , Pillow , etc.

In the second step, we define a helper function for printing out a summary of information about the image — its shape and the range of values in each of the layers.

The logic of the function is pretty straightforward, and the slicing of dimensions will make sense as soon as we describe how the images are stored.

Grayscale

We start with the most basic case possible, a grayscale image. Such images are made exclusively of shades of gray. The extremes are black (weakest intensity of contrast) and white (strongest intensity).

Under the hood, the images are stored as a matrix of integers, in which a pixel’s value corresponds to the given shade of gray. The scale of values for grayscale images ranges from 0 (black) to 255 (white). The illustration below provides an intuitive overview of the concept.

IBNNV3y.png!web

Source

In this article, we will be working with the image you already saw as the thumbnail, the circle of colorful crayons. It was not accidental that such a colorful picture was selected :)

We start by loading the grayscale image into Python and printing it.

image_gs = imread('crayons.jpg', as_gray=True)fig, ax = plt.subplots(figsize=(9, 16))
imshow(image_gs, ax=ax)
ax.set_title('Grayscale image')
ax.axis('off');

ZnARvef.png!web

As the original image is in color, we used as_gray=True to load it as a grayscale image. Alternatively, we could have loaded the image using the default settings of imread (which loads an RGB image — covered in the next section) and converted it to grayscale using the rgb2gray function.

Next, we run the helper function to print the summary of the image.

print_image_summary(image_gs, ['G'])

Running the code produces the following output:

--------------
Image Details:
--------------
Image dimensions: (1280, 1920)
Channels:
G : min=0.0123, max=1.0000

The image is stored as a 2D matrix, 1280 rows by 1920 columns (high-definition resolution). By looking at the min and max values, we can see that they are in the [0,1] range. That is because they were automatically divided by 255, which is a common preprocessing step for working with images.

RGB

Now it is time to work with colors. We start with the RGB model . In short, it is an additive model, in which shades of red, green and blue (hence the name) are added together in various proportions to reproduce a broad spectrum of colors.

In scikit-image , this is the default model for loading the images using imread :

image_rgb = imread('crayons.jpg')

Before printing the images, let’s inspect the summary to understand the way the image is stored in Python.

print_image_summary(image_rgb, ['R', 'G', 'B'])

Running the code generates the following summary:

--------------
Image Details:
--------------
Image dimensions: (1280, 1920, 3)
Channels:
R : min=0.0000, max=255.0000
G : min=0.0000, max=255.0000
B : min=0.0000, max=255.0000

In comparison to the grayscale image, this time the image is stored as a 3D np.ndarray . The additional dimension represents each of the 3 color channels. As before, the intensity of the color is presented on a 0–255 scale. It is frequently rescaled to the [0,1] range. Then, a pixel’s value of 0 in any of the layers indicates that there is no color in that particular channel for that pixel.

A helpful note: When using the OpenCV’s imread function, the image is loaded as BGR instead of RGB. To make it compatible with other libraries, we need to change the order of the channels.

It is time to print the image and the different color channels:

fig, ax = plt.subplots(1, 4, figsize = (18, 30))ax[0].imshow(image_rgb/255.0) 
ax[0].axis('off')
ax[0].set_title('original RGB')for i, lab in enumerate(['R','G','B'], 1):
    temp = np.zeros(image_rgb.shape)
    temp[:,:,i - 1] = image_rgb[:,:,i - 1]
    ax[i].imshow(temp/255.0) 
    ax[i].axis("off")
    ax[i].set_title(lab)plt.show()

In the image below, we can see the original image and the 3 color channels separately. What I like about this image is that by focusing on individual crayons, we can see which colors from the RGB channels and in which proportions constitute the final color in the original image.

Setup

Grayscale

RGB

Recommend

Yac 2.1 升级说明

岂止于大，一文读懂大数据及其在推荐系统的应用

(In-depth) Machine Learning Image Classification With TensorFlow 2.0

做一个好看的数据展示「小标签」，在个人网站实时展示你的粉丝数

如何用github共享Axure原型文件？

CoreJS project is dead after maintainer is sentenced to 1.5 years jail

傅盛：猎豹移动有能力战胜短期困难，在未来几个季度恢复增长

新基建下的充电桩：少数人的游戏，还是多数人的狂欢？

三大运营商艰难转型：增长见顶 5G用户数存疑

苹果五大系统全家桶推送 iPad优化鼠标触控版支持

About Joyk