Whether you're wearing a face mask or respirator for a medical profession, industrial environment, or as a member of the public, the use and correct positioning of a mask is essential for maximum protection.
Now, with the spread of COVID-19, many places (e.g., airports, workplaces, etc.) require that masks be worn and correctly positioned over the nose and mouth. To help detect and enforce these requirements, we set out to build an image recognition model in PerceptiLabs which could classify different ways that people wear masks. A model like this could then be used at checkpoints, entrances, and other locations to help staff or authorities identify individuals who aren't complying with their organizations' mask-wearing rules.
To train our model, we grabbed the Ways To Wear a Mask or a Respirator Database (WWMR-DB) that contains images of people wearing face masks as shown in Figure 1.
The dataset comprises .jpg images divided into eight classes, depicting the most common ways in which masks or respirators are worn.
The original dataset was partitioned into a series of subdirectories based on the individuals wearing the masks. To make things smoother and compatible with PerceptiLabs, we simplified this by creating eight subdirectories corresponding to the eight classifications, and then moved the appropriate images into the respective subdirectories. Each image was then resized to 224x224 pixels using the resize feature in PerceptiLabs' Data Wizard.
To map the classifications to the images, we created a .csv file (mask_log.csv) that associates each image file with a numeric label for use in loading the data using PerceptiLabs' Data Wizard:
0: mask above chin
1: mask worn correctly
2: mask hanging from wearer's ear
3: mask not worn
4: mask on forehead
5: mask on tip of nose
6: mask under chin
7: mask under nose
Below is a partial example of how the .csv file looks:
Example of the .csv file to load data into PerceptiLabs that maps the image files to their associated labels.
Our model was built with just three Components:
Note that although InceptionV3 requires images to be 229x229 pixels, setting include_top to no, allows images of other dimensions (e.g., 224x224 pixels) to be used.
Training and Results
We trained the model with 20 epochs in batches of 32, using the ADAM optimizer, a learning rate of 0.001, and a Cross Entropy loss function. Figure 3 shows PerceptiLabs' Statistics view during training.
With a training time of around 34 minutes and 28 seconds, we were able to achieve a training accuracy of 99.4% and a validation accuracy of 77.1%. The lower validation accuracy reflects the difficulty in training models on facial detection-related data
In the following screenshot (Figure 4) from PerceptiLabs, you can see how the training validation accuracy ramped up during the first six or so epochs, after which it remained fairly stable. Training mostly ramped up during the first couple of epochs, and fluctuated until around epoch 14 before stabilizing:
The following screenshot shown in Figure 5, identifies the corresponding loss during training and validation for the first 14 epochs:
Here we can see that training loss dropped significantly during the first epoch before stabilizing, while validation loss started low and remained relatively stable throughout.
A model like this could be used for security or health and safety purposes to ensure that workers or visitors have a mask and that they're wearing it correctly prior to access. For example, the model could be used to analyze photos or video frames acquired through on-site security cameras. The model itself could also be used as the basis for transfer learning to create additional models for detecting the presence and correct use of other types of health or safety equipment.
This use case is an example of how image recognition can be used to help ensure health and safety. If you want to build a deep learning model similar to this, run PerceptiLabs and grab a copy of our .csv file from GitHub.