Machine learning models for image classification often use convolutional neural networks (CNNs) to extract features from images while employing max-pooling layers to reduce dimensionality. The goal is to extract increasingly higher-level features from regions of the image, to ultimately make some kind of prediction such as an image classification.
To increase accuracy, it can be tempting to simply add more layers to the network. However, research has shown that adding too many layers can increase training time and even lead to decreased accuracy.
In this blog, we'll dive into why this occurs with deep neural networks, how this problem can be solved, and look at how to build a more efficient model in Perceptilabs, using an industrial IoT use case as an example.
The Vanishing Gradient Problem
Image classification models typically accomplish their goal using a "chain" of layers that feed forward into each other, where additional layers can lead to better accuracy in the final classification. However, the addition of too many layers can lead to the vanishing gradient problem during back propagation, in which smaller and smaller gradients of the loss function cause updates to the weights to become smaller to the point where they tend towards zero. This in turn can lead to the training problems mentioned above.
Residual Networks to the Rescue
A popular method to overcome this problem is to incorporate residual blocks, which collectively form a residual network (ResNet). A residual block solves this problem by introducing an architectural pattern in which earlier layers skip over future layers. These “skip connections” are generally implemented by an addition operation whereby the features from earlier in the network are added to the newly-computed features after some convolutional layers as depicted here:
Fig 1: Overview of a residual block (image source).
A skip connection provides an additional, shorter path for gradients to flow during back propagation and it has been empirically shown that this enables deeper computer vision networks to train faster.
A residual block works as follows:
- an input tensor X flows to the ResNet block and flows down two paths.
- X flows down the main path through convolution layers (represented in Fig 1 weight layers), just like in a normal CNN, and this is often called the convolution block. The resulting output approximates the function F(x).
- X also flows across the skip connection, also known as the identity block because it simply forwards X while retaining its dimensionality.
- F(x) and X are then added together before being sent to the activation function (ReLU, in this example).
Residual blocks are typically "chained" together as shown in Fig 2 below to form ResNets. The output from the final residual block along with that from the final skip connection is combined and then passed into other types of layers (e.g., average pooling and fully connected layers) for classification:
Fig 2: Example of a 50-layer ResNet model (image source).
In a typical CNN for image classification, earlier layers usually learn lower level abstractions (e.g., lines, corners, etc.), while subsequent layers learn higher level abstractions (e.g., groups of shapes, etc.). By using skip connections we allow gradients to pass through the network without needing to pass through the activation functions. This helps the network pass information to later layers in the network, enabling them to learn complex features without the risk of vanishing gradients.
Fig 3 depicts the application of a ResNet block to an image classification model:
Fig 3: Example of a ResNet block consisting of two layers computed using Convolution and Batch Normalization. In this example, the output from the ResNet block is added to the identity block (x) via the skip connection prior to generating its activations via RelU. This architecture can be used when the input has the same dimension as the output activation.
Note that in some cases, the identity block can be replaced with a convolution block such as when the input and output dimensions are different:
Fig 4: Example of a ResNet block in which the input and output have different dimensions. In this case the output from the ResNet block is added to the convolution block (x) via the skip connection prior to generating its activations (image source).
Thanks to residual blocks, gradients can now flow back across the network during back propagation across two paths:
Fig 5: Diagram showing the two pathways through which gradients can flow during back propagation (image source).
When the gradient flows back across Gradient Pathway-2, the weight layers in the residual block (represented as F(x) in Fig 5) are updated, and new gradient values are calculated. As the gradient continues to flow through earlier residual blocks, it will continue to diminish (i.e., tend towards 0). However, the gradient also flows back across the skip connections, thus avoiding the weight layers within the residual blocks altogether, and therefore will retain its value. This allows the gradient value that has been kept intact, to propagate back to the earlier layers, allowing them to learn the activations, and thus avoiding the vanishing gradient problem.
Applying ResNets to Industrial IoT Applications
Image classification techniques are particularly useful for computer vision problems, such as those encountered in industrial IoT (e.g., manufacturing).
To illustrate this, we've put together a Textile-Classification project on GitHub. This project takes as input, 72000 closeup images of textile fibers each containing one of six possible manufacturing classifications (e.g., color issues, cuts in the fabric, etc.), along with the corresponding classifications for each image:
Fig 6: Examples of textile fabric images with various types of manufacturing classifications. Five of them represent anomalies while the "good" classification means that no issues were found in the image.
The example PerceptiLabs model in the project, incorporates three ResNet blocks, each of which convolves the image to extract features, and incorporates skip connections that directly forward each block's input as is (i.e., as an identity block). This is added to the block's output using a Merge component configured to apply an addition operation:
Figure 7: Overview of the Textile Image classification model in PerceptiLabs.
This project shows how such a model could be used for quality control in industrial IoT. For example, a camera can take random close up pictures of fabrics during manufacturing, and then present them to the model for anomaly detection.
Most importantly, this project demonstrates just how easy it is to build ResNet blocks in PerceptiLabs.
Neural networks with a large number of layers can easily suffer from the vanishing gradient problem whereby the gradient tends towards zero during back propagation. As a result of such a small gradient value, layers located earlier in the network may have their activation updated very little or not at all, causing training to slow down, stall, and even lose its accuracy.
ResNet blocks are a great solution to the problem because they allow you to use large numbers of layers, and best of all, are easy to build within PerceptiLabs.