Use Case: Breast Cancer Detection

Inspired by the growing use of ML in healthcare, we decided to build an image recognition model in PerceptiLabs trained on malignant and benign microscopic biopsy images.

Use Case: Breast Cancer Detection
Inspired by the growing use of ML in healthcare, we decided to build an image recognition model in PerceptiLabs trained on malignant and benign microscopic biopsy images.

Worldwide, breast cancer is the most-common invasive cancer affecting one in seven women worldwide. Along with lung cancer, breast cancer is the most commonly diagnosed cancer, with 2.09 million cases each in 2018.  

Inspired by the growing use of ML in healthcare, we decided to build an image recognition model in PerceptiLabs trained on malignant and benign microscopic biopsy images. Using an ML model like this could help doctors, mammographers, researchers, and other medical practitioners to more easily classify and detect breast cancer.

Dataset

To train our model, we grabbed the BreaKHis 400X dataset from Kaggle that comprises microscopic biopsy images of benign and malignant breast tumors.

Figure 1: Examples of images from the dataset.
Figure 1: Examples of images from the dataset.‌‌

The dataset comprises 1146 malignant images and 547 benign image at a 400x optical zoom. Each image is a .png file with 700x460 pixels.

We created a .csv file (dataset.csv) to map the image files to their respective labels (benign and malignant) for use in loading the data using PerceptiLabs' Data Wizard.

Below is a partial example of how the .csv file looks:

images

labels

data/train/benign/SOB_B_F-14-29960AB-400-015.png

benign

data/train/benign/SOB_B_PT-14-21998AB-400-039.png

benign

data/train/malignant/SOB_M_DC-14-20636-400-007.png

malignant

data/train/malignant/SOB_M_DC-14-15572-400-003.png

malignant

Example of the .csv file to load data into PerceptiLabs that maps the image files to their associated labels.

Model Summary

Our model was built with four Components:

Component 1: ResNet50

include_top=false, pretrained=imagenet

Component 2: Dense 

Activation=ReLU, Neurons=128

Component 3: Dense 

Activation=ReLU, Neurons=64

Component 4: Dense

Activation=Softmax, Neurons=2

Figure 2: Topology of the model in PerceptiLabs.
Figure 2: Topology of the model in PerceptiLabs.

Training and Results

Figure 3: PerceptiLabs' Statistics View during training.
Figure 3: PerceptiLabs' Statistics View during training.

With a training time of just over 256 seconds, we were able to achieve a training accuracy of 100% and a validation accuracy of 85.5%. In the following screenshot from PerceptiLabs, you can see how the training and validation accuracy quickly ramped up during the first epoch, after which validation accuracy remained fairly stable. Training accuracy continued to climb until around the third epoch before stabilizing at 100%:

Figure 4: Accuracy Plot.

Vertical Applications

A model like this could potentially help increase the number of diagnoses performed by medical practitioners. In particular, it could help to discard the true negative cases, leaving only the positive and false negative cases for doctors to discern.

Such a project could also be used by medical students or practitioners looking to build next-generation ML-based medical technology. The model itself could also be used as the basis for transfer learning to create additional models for detecting tumours in other types of medical scans/imagery.

Summary

This use case is a simple example of how ML can be used to identify ailments using image recognition. If you want to build a deep learning model similar to this, run PerceptiLabs and grab a copy of our pre-processed dataset from GitHub.