Case Study: Genetic Screening Using CNNs in PerceptiLabs

One group of scientists leveraging ML is the Buchser lab at Washington University in Saint Louis School of Medicine in the Department of Genetics. There, Dr. Buchser and his team are working in functional genomics, High-Throughput Screening, and Next-Generation Sequencing.

Case Study: Genetic Screening Using CNNs in PerceptiLabs

Machine learning (ML) is forever changing the face of science and is now playing a growing role in the field of genomics and neuroscience.

One group of scientists leveraging ML is the Buchser lab at Washington University in Saint Louis School of Medicine in the Department of Genetics. There, Dr. Buchser and his team are working in functional genomics, High-Throughput Screening, and Next-Generation Sequencing.

One project in the group, led by Jack Bramley, is neuronal phenotyping using convolutional neural networks (CNNs). This project uses genomic editing technology coupled with high-throughput microscopy to screen variants (mutations) and gain insight into potentially disease-causing variants.

We recently had a chance to catch up with Dr. Buchser and Jack to learn about the team's work and how they have integrated PerceptiLabs into their project.

Rapid advances in genomic sequencing technologies, coupled with the CRISPR revolution, are opening up new frontiers in precision medicine and N of 1 care. In order to make forays into these frontiers, platforms are needed to screen patient genomic variants (mutations) in a massively-parallel setting.

The team's project uses CRISPR-Cas9-enabled genetic screening in neuronal cells and has employed PerceptiLabs to classify microscopy images of neurons using CNNs and transfer learning. The output of the model helps determine which neurons are predicted to contain a genomic edit, that will be isolated for next-generation sequencing. The team's goal of replicating patient mutations in neurons inspired them to develop tools to screen neuronal cell types in high-throughput. As a result, their data source comprises microscopy images of cells grown on a specialized grid enabling single-cell imaging within each square of the grid.

Figure 1: Examples of the microscopy images from the team's dataset.

The team currently uses MobileNetV2 to assign a classification to each square of the grid for target identification. Their model is currently deployed on-premise, but will likely evolve into a hybrid (i.e., cloud/on-premise) deployment.

Figure 2a/2b: Results of the project.

Figure 2 shows the model's predictions for the different classes. On the left is the confusion matrix and on the right is a bar graph of distributions. For example, the leftmost bar in the bar graph indicates that out of the intact samples in the test set, 97.5% were classified as intact (correctly classified), while 2.5% were falsely classified as a mix of other classes..

Given PerceptiLabs ease of facilitating transfer learning, the team was also able to use the model for a different type of experiment where they were attempting to differentiate stem cells into neurons. The model was trained on a different type of neuronal cell and then deployed on an unannotated series of images depicting a completely different cell type:

Figure 3: Image from the team's cell experiment.

Figure 4: Another Image from the team's stem cell experiment.

The model helped them identify ones which were successfully differentiated so they could isolate them for DNA sequencing.

Genomics Meets PerceptiLabs

PerceptiLabs' ease of use and GUI were key advantages that drew the team to the tool. Since the team strives to make each component of the workflow accessible to every team member, the ability to develop models in a low to no-code setting aids in that goal. Additionally, visualization of the model flow helps in explaining the process to a greater audience.

Building on the team's ML modeling experience, they were able to develop and iterate on a preliminary model in PerceptiLabs within only a few hours. The primary use case was classification and anomaly detection models using features extracted from image data instead of using the image directly. However, within two weeks, they had a much larger training dataset and an even better performing model. Much of that time was spent generating new training data and building a preprocessing workflow.

As for takeaways, Jack recommends establishing a standardized preprocessing workflow to ensure a balanced and high-quality training set. This will help prevent overfitting and facilitate better training performance.

A Vision of the Future

The Buchser lab are strong believers that the growing availability and accessibility of low to no-code ML tools will result in greater democratization of ML. This will allow ML to be used by a wider range of groups resulting in a broader understanding and acceptance of ML across fields. For life sciences, it will enable subject matter experts to more efficiently conduct and analyze larger-scale experiments, and as a result, accelerate discovery and progress.

For more information on getting started with PerceptiLabs, check out our documentation.