Object (Drones) Detection: Step-by-Step Guide on Mask R-CNN

For this guide, I chose to use a drones dataset which you can download here .

First up — Libraries & Packages

The main package for the algorithm is mrcnn. Start by downloading and import the library into your environment.

!pip install mrcnnfrom mrcnn.config import Config
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
from mrcnn.model import log

I will explain each imported class as we get there. For now, just know that these are the import statements we need.

As for TensorFlow, mrcnn is not yet compatible with TensorFlow 2.0 onward so make sure you revert to TensorFlow 1.x. Since I’m developing on Colab, I will be using the magic function to revert to TensorFlow 1.x.

%tensorflow_version 1.x
import tensorflow as tf

If I’m not wrong, tf.random_shuffle was renamed to tf.random.shuffle in TensorFlow 2.0, causing the incompatibility issue. You might be able to work with TensorFlow 2.0 by changing your shuffle function in the mrcnn code.

I also had to revert my Keras to the previous version but I can’t remember the reason. Just put it out there in case you encounter some errors due to Keras.

!pip install keras==2.2.5

Preprocessing

The mrcnn package is rather flexible in terms of the format of data it accepts. As such, I will be processing into NumPy arrays due to its simplicity.

Before that, I realised that video17_295 and video19_1900 can’t be read properly by cv2. Hence, I filtered out these images and created a list of file names.

dir = "Database1/"# filter out image that cant be read
prob_list = ['video17_295','video19_1900'] # cant read format
txt_list = [f for f in os.listdir(dir) if f.endswith(".txt") and f[:-4] not in prob_list]
file_list = set([re.match("\w+(?=.)",f)[0] for f in txt_list])# create data list as tuple of (jpeg,txt)
data_list = []
for f in file_list:
 data_list.append((f+".JPEG",f+".txt"))

Few things to do next;

Check if label exist (some images don’t contain drones)
Read and process image
Read and process coordinates of the bounding box
Draw the bounding box for visualization purposes

X,y = [], []
img_box = []
DIMENSION = 128 # set low resolution to decrease training timefor i in range(len(data_list)):
 # get bounding box and check if label exist
 with open(dir+data_list[i][1],"rb") as f:
 box = f.read().split()
 if len(box) != 5: 
 continue # skip data if does not contain labelbox = [float(s) for s in box[1:]]# read imageimg = cv2.imread(dir+data_list[i][0])
 img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)# resize img to 128 x 128
 img = cv2.resize(img, (DIMENSION,DIMENSION), interpolation= cv2.INTER_LINEAR)# draw bounding box (for visualization purposes)
 resize1, resize2 = img.shape[0]/DIMENSION, img.shape[1]/DIMENSION
 p1,p2,p3,p4 = int(box[0]*img.shape[1]*resize2), int(box[1]*img.shape[0]*resize1) ,int(box[2]*img.shape[1]*resize2) ,int(box[3]*img.shape[0]*resize1)ymin, ymax, xmin, xmax = p2-p4//2, p2+p4//2, p1-p3//2, p1+p3//2draw = cv2.rectangle(img.copy(),(xmax,ymax),(xmin,ymin),color=(255,255,0),thickness =1)# store data if range of y is at least 20 pixels (remove data with small drones)
 if ymax - ymin >=20:
 X.append(img)
 y.append([ymin, ymax, xmin, xmax])
 img_box.append(draw)# convert to numpy arraysX = np.array(X).astype(np.uint8)
y = np.array(y)
img_box = np.array(img_box)

Before converting to NumPy arrays, I grab a sub-population of the dataset to keep the training time down. If you have the computing power, feel free to omit that.

Here are some sample images.

MRCNN — Processing

Now to look at mrcnn proper, we will need to define an mrcnn Dataset class before the training process. This Dataset class provides the image’s information such as the class it belongs to and positions of the objects within them. The mrcnn.utils which we had previously imported contains this Dataset class.

Here is where things get a little tricky and require some reading into the source code .

These are the functions you need to modify;

add_class, which determine the number of classes for the model
add_image, where you define the image_id and the path to the image if applicable
load_image, where images data are loaded
load_mask, which grabs information about the mask/bounding box of the image

# define drones dataset using mrcnn utils classclass DronesDataset(utils.Dataset):
 def __init__(self,X,y): # init with numpy X,y
 self.X = X
 self.y = y
 super().__init__()def load_dataset(self):
 self.add_class("dataset",1,"drones") # only 1 class, drones
 for i in range(len(self.X)):
 self.add_image("dataset",i,path=None)def load_image(self,image_id):
 image = self.X[image_id] # where image_id is index of X
 return imagedef load_mask(self,image_id):
 # get details of image
 info = self.image_info[image_id]
 #create one array for all masks, each on a different channel
 masks = np.zeros([128, 128, len(self.X)], dtype='uint8')class_ids = []
 for i in range(len(self.y)):
 box = self.y[info["id"]]
 row_s, row_e = box[0], box[1]
 col_s, col_e = box[2], box[3]
 masks[row_s:row_e, col_s:col_e, i] = 1 # create mask with similar boundaries as bounding box
 class_ids.append(1)return masks, np.array(class_ids).astype(np.uint8)

Since we took the effort to format our images into NumPy arrays, we can simply initialize the Dataset class with our array and load the images and bounding boxes by indexing into the array.

Next to do a train-test split the old traditional way,

# train test split 80:20np.random.seed(42) # for reproducibility
p = np.random.permutation(len(X))
X = X[p].copy()
y = y[p].copy()split = int(0.8 * len(X))X_train = X[:split]
y_train = y[:split]X_val = X[split:]
y_val = y[split:]

Now to load your data into the Dataset class.

# load dataset into mrcnn dataset classtrain_dataset = DronesDataset(X_train,y_train)
train_dataset.load_dataset()
train_dataset.prepare()val_dataset = DronesDataset(X_val,y_val)
val_dataset.load_dataset()
val_dataset.prepare()

The prepare() function uses the image_ids and class_ids information to prep your data for the mrcnn model,

Following on is the modification of the config class we imported from mrcnn. The Config class determines the variables used in training and should be tweak according to your dataset. These variables below are not exhaustive, you can refer to the documentation for the full list.

class DronesConfig(Config):
 # Give the configuration a recognizable name
 NAME = "drones"# Train on 1 GPU and 2 images per GPU.
 GPU_COUNT = 1
 IMAGES_PER_GPU = 2# Number of classes (including background)
 NUM_CLASSES = 1+1 # background + drones# Use small images for faster training. 
 IMAGE_MIN_DIM = 128
 IMAGE_MAX_DIM = 128# Reduce training ROIs per image because the images are small and have few objects.
 TRAIN_ROIS_PER_IMAGE = 20# Use smaller anchors because our image and objects are small
 RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128) # anchor side in pixels# set appropriate step per epoch and validation step
 STEPS_PER_EPOCH = len(X_train)//(GPU_COUNT*IMAGES_PER_GPU)
 VALIDATION_STEPS = len(X_val)//(GPU_COUNT*IMAGES_PER_GPU)# Skip detections with < 70% confidence
 DETECTION_MIN_CONFIDENCE = 0.7config = DronesConfig()
config.display()

Depending on your computing power, you might have to adjust these variables accordingly. Else, you will face the issue of getting stuck at ‘Epoch 1’ with no error message given. There is even a GitHub issue raised for this problem and many solutions were proposed. Do check it out if it happens to you and test out a few of these suggestions.

MRCNN — Training

mrcnn has been trained on the COCO and I mageNet dataset. To make use of these pre-trained weights for transfer learning, we need to download it into our environment (remember to define your ROOT_DIR first).

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
 utils.download_trained_weights(COCO_MODEL_PATH)

Creating the model and initiating with the pre-trained weights.

# Create model in training mode using gpuwith tf.device("/gpu:0"):
 model = modellib.MaskRCNN(mode="training", config=config,model_dir=MODEL_DIR)# Which weights to start with?
init_with = "imagenet" # imagenet, cocoif init_with == "imagenet":
 model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with == "coco":
# Load weights trained on MS COCO, but skip layers that
# are different due to the different number of classes
# See README for instructions to download the COCO weights
 model.load_weights(COCO_MODEL_PATH, by_name=True,exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])

Finally, we can proceed on to the actual training.

model.train(train_dataset, val_dataset,learning_rate=config.LEARNING_RATE,epochs=5,layers='heads') # unfreeze head and just train on last layer

For this exercise, I will only train the last layer to detect drones in our dataset. If time allows, you should also fine-tune your model by training all the preceding layers.

model.train(train_dataset, val_dataset, 
 learning_rate=config.LEARNING_RATE / 10,
 epochs=2, 
 layers="all")

And you’re done with training your mrcnn model. You can save the model’s weights with these 2 lines of code.

# save weights
model_path = os.path.join(MODEL_DIR, "mask_rcnn_drones.h5")
model.keras_model.save_weights(model_path)

MRCNN — Inference

To make inference on other images, you will need to create a new inference model with a custom Config.

# make inferenceclass InferenceConfig(DronesConfig):
 GPU_COUNT = 1
 IMAGES_PER_GPU = 1inference_config = InferenceConfig()# Recreate the model in inference mode
model = modellib.MaskRCNN(mode="inference",config=inference_config, model_dir=MODEL_DIR)# Load trained weightsmodel_path = os.path.join(MODEL_DIR, "mask_rcnn_drones.h5")
model.load_weights(model_path, by_name=True)

The visualize class from mrcnn comes in handy here.

def get_ax(rows=1, cols=1, size=8):
 _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))return ax# Test on a random image
image_id = random.choice(val_dataset.image_ids)
original_image, image_meta, gt_class_id, gt_bbox, gt_mask =\
modellib.load_image_gt(val_dataset, inference_config,image_id, use_mini_mask=False)results = model.detect([original_image], verbose=1)
r = results[0]visualize.display_instances(original_image, r['rois'], r['masks'], r['class_ids'],val_dataset.class_names, r['scores'], ax=get_ax())

First up — Libraries & Packages

Preprocessing

MRCNN — Processing

MRCNN — Training

MRCNN — Inference

Recommend

使用nginx转换HTTPS流量

vSphere 7 – Lifecycle Management

Ways to increase your productivity as a web developer

Google readies its own chip for future Pixels and Chromebooks - Axios

CUE

Release Streamlined and enhanced CLI capabilities · cuelang/cue · GitHub

宇宙有多重？

An algorithm to optimize database queries that run multiple times

New implementation of Dhall for Java and Scala

腾讯：网传深圳南山软件园腾讯相关谣言系拼凑而成，为严重失实和造谣

About Joyk