Face Mask Detection using Mask R-CNN(Part-1)

7 min readNov 3, 2020

This tutorial helps you with how to develop a Mask R-CNN model for Face Mask Detection in images.

Object detection is a challenging problem in computer vision as it inculcates both predictions of where and what is the object is present in the image. In this tutorial, we will be using the Mask Region-based Convolutional Neural Network, or Mask R-CNN, model which is one of the state-of-the-art approaches for object recognition tasks. We will be not working from scratch but will be using a library Mattport Mask R-CNN, this project provides a library that allows you to develop and train Mask R-CNN Keras models for your own object detection problems. The library is not very easy to use if you don’t know how to use it also needs a well-prepared dataset. The library uses transfer learning algorithms and top-performing models trained on challenging object detection problems.

Install Mask R-CNN for Keras

Object Detection is a challenging problem that involves building methods for object recognition, object localization, and object classification.

Region-based Convolutional Neural Networks have been used for tracking objects from a drone-mounted camera, locating text in an image, and enabling object detection in Google Lens Mask R-CNN serves as one of seven tasks in the MLPerf Training Benchmark, which is a competition to speed up the training of neural networks.

The Region-Based Convolutional Neural Network, or R-CNN, is a family of convolutional neural network models designed for object detection, developed by Ross Girshick, et al. There are perhaps four main variations of the approach, resulting in the current pinnacle called Mask R-CNN. The Mask R-CNN introduced in the 2018 paper titled “Mask R-CNN” is the most recent variation of the family of models and supports both object detection and object segmentation. Object segmentation not only involves localizing objects in the image but also specifies a mask for the image, indicating exactly which pixels in the image belong to the object.

The best-of-breed third-party implementations of Mask R-CNN is the Mask R-CNN Project developed by Matterport. The project is open source released under a permissive license (e.g. MIT license) and the code has been widely used on a variety of projects and Kaggle competitions, which we will be also using for our problem statement.

So, let’s install the library.

The first step is to clone the Mask R-CNN GitHub repository.

git clone https://github.com/matterport/Mask_RCNN.git

The next step is to Install the Mask R-CNN Library.

This can be done directly using pip

cd Mask_RCNNpython setup.py install

The library will then install directly and you will see a lot of successful installation messages ending with the following:

...Finished processing dependencies for mask-rcnn==2.1

You can confirm the library was installed using `pip show mask-rcnn`. If this works fine then we are ready to use the library.

Prepare the Dataset for Object Detection

Next, we need a dataset model.

In this tutorial, we will use a Face Mask dataset made available by Cognitel. The dataset consists of approx 850 images of people with or without masks with XML annotation files with every face in the image annotated and labeled.

The Mask R-CNN is designed to learn to predict both bounding boxes for objects as well as masks for those detected objects, but the face mask dataset does not provide masks. As such, we will use the dataset to learn a kangaroo object detection task, and ignore the masks and not focus on the image segmentation capabilities of the model.

First, we need the dataset to be in the current working directory and with a directory structure with the main directory as `face mask detection data` and then 2 subdirectories inside it as ‘annotations’ and ‘images’. All the images will be in the ‘images’ subfolder and all the respective annotations will be in the ‘annotations’ folder.

The next step was to load the annotation files. Let’s have a look.

<?xml version="1.0"?><annotation><folder>images</folder><filename>maksssksksss0.png</filename><size><width>512</width><height>366</height><depth>3</depth></size><segmented>0</segmented><object><name>without_mask</name><pose>Unspecified</pose><truncated>0</truncated><occluded>0</occluded><difficult>0</difficult><bndbox><xmin>79</xmin><ymin>105</ymin><xmax>109</xmax><ymax>142</ymax></bndbox></object><object><object><name>without_mask</name><pose>Unspecified</pose><truncated>0</truncated><occluded>0</occluded><difficult>0</difficult><bndbox><xmin>325</xmin><ymin>90</ymin><xmax>360</xmax><ymax>141</ymax></bndbox></object></annotation>

Here we can see the ‘object’ elements describe the bounding boxes and other details about the faces of people with or without masks.

Python provides the ElementTree API that can be used to load and parse an XML file and we can use the find() and findall() functions to perform the XPath queries on a loaded document.

# example of extracting bounding boxes from an annotation filefrom xml.etree import ElementTree# function to extract bounding boxes from an annotation filedef extract_boxes(filename):# load and parse the filetree = ElementTree.parse(filename)# get the root of the documentroot = tree.getroot()# extract each bounding boxboxes = list()for box in root.findall('.//bndbox'):xmin = int(box.find('xmin').text)ymin = int(box.find('ymin').text)xmax = int(box.find('xmax').text)ymax = int(box.find('ymax').text)coors = [xmin, ymin, xmax, ymax]boxes.append(coors)# extract image dimensionswidth = int(root.find('.//size/width').text)height = int(root.find('.//size/height').text)return boxes, width, height# extract details form annotation fileboxes, w, h = extract_boxes('/content/drive/My Drive/face mask detection data/annotations/maksssksksss0.xml')# summarize extracted detailsprint(boxes, w, h)

Running the above returns a list that contains the details of each bounding box in the annotation file, as well as two integers for the width and height of the photograph.

[[79, 105, 109, 142], [185, 100, 226, 144], [325, 90, 360, 141]] 512 366

Prepare FaceMaskDataset Object

The mask-rcnn library requires that train, validation, and test datasets be managed by a mrcnn.utils.Dataset object.

So, have to make a new derived class that extends the mrcnn.utils.Dataset class and defines a function to load the dataset, with any name you like such as load_dataset(), and override two functions, one for loading a mask called load_mask() and one for loading an image reference (path or URL) called image_reference(). This class can then be used to make dataset objects for train and test datasets. Also we will use the above extract_boxes function in this class to make the dataset object

We can define classes by calling the in-built add_class() function and specifying the ‘source‘, the ‘class_id‘ or integer for the class, and the ‘class_name‘.

For our problem, we have 2 classes ‘without_mask’ and ‘with_mask’

self.add_class("dataset", 1, "without_mask")self.add_class("dataset", 2, "with_mask")

We also have to divide the dataset into train and test, we have about 850 images, so we can use about 80% for the training and 20% for testing.

So, while writing the ‘load_dataset()’ function we will use the loop to iterate over about 680 images(80% of the dataset) for training.

So after all this and writing the code for image_reference() function, the complete code will look like.

# split into train and test setfrom os import listdirfrom xml.etree import ElementTreefrom numpy import zerosfrom numpy import asarrayfrom mrcnn.utils import Dataset# class that defines and loads the faceMask datasetclass faceMaskDataset(Dataset):# load the dataset definitionsdef load_dataset(self, dataset_dir, is_train=True):# define one classself.add_class("dataset", 1, "without_mask")self.add_class("dataset", 2, "with_mask")# define data locationsimages_dir = dataset_dir + '/images/'annotations_dir = dataset_dir + '/annotations/'# find all imagesif is_train:for image_id in range(0,682):img_path = images_dir + "maksssksksss"+ str(image_id) + '.png'ann_path = annotations_dir +"maksssksksss"+ str(image_id) + '.xml'# add to datasetself.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path, class_ids = [0,1,2])image_id+=1else:for image_id in range(682,853):img_path = images_dir + "maksssksksss"+ str(image_id) + '.png'ann_path = annotations_dir +"maksssksksss"+ str(image_id) + '.xml'# add to datasetself.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path, class_ids = [0,1,2])image_id+=1# extract bounding boxes from an annotation filedef extract_boxes(self, filename):# load and parse the filetree = ElementTree.parse(filename)# get the root of the documentroot = tree.getroot()# extract each bounding boxboxes = list()for box in root.findall('.//object'): #Change requiredname = box.find('name').text #Change requiredxmin = int(box.find('xmin').text)ymin = int(box.find('ymin').text)xmax = int(box.find('xmax').text)ymax = int(box.find('ymax').text)coors = [xmin, ymin, xmax, ymax, name] #Change requiredboxes.append(coors)# extract image dimensionswidth = int(root.find('.//size/width').text)height = int(root.find('.//size/height').text)return boxes, width, height# load the masks for an imagedef load_mask(self, image_id):# get details of imageinfo = self.image_info[image_id]# define box file locationpath = info['annotation']# load XMLboxes, w, h = self.extract_boxes(path)# create one array for all masks, each on a different channelmasks = zeros([h, w, len(boxes)], dtype='uint8')# create masksclass_ids = list()for i in range(len(boxes)):box = boxes[i]row_s, row_e = box[1], box[3]col_s, col_e = box[0], box[2]if (box[4] == 'without_mask'): #Change required #change this to your .XML filemasks[row_s:row_e, col_s:col_e, i] = 1 #Change required #assign number to your class_idclass_ids.append(self.class_names.index('without_mask')) #Change requiredelse:masks[row_s:row_e, col_s:col_e, i] = 2 #Change requiredclass_ids.append(self.class_names.index('with_mask')) #Change requiredreturn masks, asarray(class_ids, dtype='int32')def image_reference(self, image_id):info = self.image_info[image_id]return info['path']# train settrain_set = faceMaskDataset()train_set.load_dataset('../drive/My Drive/face mask detection data', is_train=True)train_set.prepare()print('Train: %d' % len(train_set.image_ids))# test/val settest_set = faceMaskDataset()test_set.load_dataset('../drive/My Drive/face mask detection data', is_train=False)test_set.prepare()print('Test: %d' % len(test_set.image_ids))

This will output the size of both test and train object sizes —

Train: 682 
Test: 171

we can also write a small script to check if the code is working fine-

This will print the first 9 images of the dataset with bounding boxes highlighted on each face.

# plot first few imagesfor i in range(9):# define subplotpyplot.subplot(330 + 1 + i)# plot raw pixel dataimage = train_set.load_image(i)pyplot.imshow(image)# plot all masksmask, _ = train_set.load_mask(i)for j in range(mask.shape[2]):pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)# show the figurepyplot.show()

This completes the preparation of the dataset object, the next part is we use this dataset object that we created to train the model and make predictions. That will be covered in the next part of this blog.

Thank you for reading!

Face Mask Detection using Mask R-CNN(Part-1)

Install Mask R-CNN for Keras

Prepare the Dataset for Object Detection

Prepare FaceMaskDataset Object

Written by Rishikesh Pathak

Responses (1)