Picture this: you're standing high up a mountain with limited visibility and your goal is to find your way to the bottom. How would you accomplish this? One possible method would be to look around for paths, rejecting those which go up because they would cost you too much time and energy only to learn that they don't help you meet your goal, while analyzing and selecting from those that will get you to a lower point on the mountain. You then choose a downward path that you think will get you to the bottom using the least amount of time and energy. You then follow that path which leads you to a new point with new choices to make, and continue to repeat the process until you get to the bottom.

This is what a machine learning (ML) algorithm does during training. More specifically, the optimizer, which in this mountain analogy roughly describes stochastic gradient descent (SGD) optimization, continually tries new weights and biases until it reaches its goal of finding the optimal values for the model to make accurate predictions. But how does the optimizer know if it's trying good values and whether the results are trending in the right direction as it progresses through the training data? This is where a loss function comes in.

Enter the Loss Function

A loss function plays a key role when training (optimizing) ML models. It essentially calculates how good the model is at making predictions using a given set of values (i.e., weights and biases). The calculated output is the loss or error, which is the difference between the prediction the model made using a set of parameter values versus actual ground-truth.

For example, if using a neural network1 to perform image classification on blood-cell medical images, the loss function is used during training to gauge how well the model is able to correlate incoming pixels to varying levels of features across the network's hidden layers, and to ultimately set the correct probabilities for each classification. In the blood-cell image example, earlier layers could represent basic patterns (e.g., arcs, curves, shapes etc.), while subsequent layers could start to represent the higher-level features of blood cells of interest to medical practitioners. Here, the loss function's role is to help the optimizer correctly predict these different levels of features – from basic patterns through to the final blood cells.

The term loss function (sometimes called error function) is often used interchangeably with cost function. However, it's generally accepted that the former computes loss for one single training example, while the latter computes the average loss across all training data. The overall goal is to find parameter values across all training samples which minimize the average cost (i.e., decrease the cost to some acceptably small value).

The cost function takes in all of the model's parameters and outputs the cost as a single scalar. This function is used by the model's optimizer which seeks to find the ideal set of parameters to minimize the cost (aka minimizing the function).

As we'll see in this blog, there are a number of cost functions you can use, and you can even customize your own. Thus choosing the right loss function for your use case is as important as having good data labels in order to impose subject matter expertise into a model. In other words, both are critical for reflecting what it means to have a correct model and what to optimize against. The model itself (i.e., the DNN operations) can then be thought of more or less as just a medium for holding and learning that information.

Common Loss Functions

Loss functions generally originate from different mathematical areas like statistical analysis, information theory etc., and thus employ a variety of equations for calculating loss in different ways. It shouldn't come as any surprise then, that each loss function has its pros and cons, and selecting an appropriate loss function depends on many factors including the use case, type of data, optimization method, etc.

Loss functions generally fall under two categories: Classification and Regression losses. Classification seeks to predict a value from a finite set of categories, while the goal of regression is to predict a continuous value based on a number of parameters.

The following are some common loss functions that you'll find in PerceptiLabs:  

Classification Loss Functions:

  • Quadratic (aka mean squared error or MSE): averages the squared difference between predictions and ground truth, with a focus on the average magnitudes of errors regardless of direction2.
  • Cross-Entropy (aka log loss): calculates the differences between the predicted class probabilities and those from ground truth across a logarithmic scale. Useful for object detection.
  • Weighted Cross-Entropy: improves on Cross-Entropy accuracy by adding weights to certain aspects (e.g., certain object classes) which are under-represented in the data (e.g., objects occurring in fewer data samples3). Useful for imbalanced datasets (e.g., when the backgrounds of images over-represent certain objects while objects of interest in the foreground are under-represented).
  • DICE: calculates the Dice coefficient which measures the overlap between the predicted and ground truth samples, where a result of 1 represents a perfect overlap. Useful for image segmentation.

Regression Loss Functions:

  • Mean Square Error/Quadratic Loss/L2 Loss: averages the squared difference between predictions and ground truth, with a focus on the average magnitudes of errors regardless of direction.
  • Mean Absolute Error, L1 Loss (used by PerceptiLabs' Regression component): sums the absolute differences between the predictions and ground truth, and finds the average.

Loss functions are used in a variety of use cases. The following table shows common image processing use cases where you might apply these, and other loss functions:


Quadratic 

Cross-Entropy

Weighted Cross-Entropy

DICE 

Regression

Other

Image Classification

X

X

X

X



Image Segmentation



X

X



Optical Character Recognition (OCR)






X (SSD Loss and CTC loss)

Object Detection


X



X

X

Loss in PL

Configuring a loss function is extremely easy to do in PerceptiLabs – it's simply a matter of selecting the desired loss function in your model's Training component:

Figure 1: Selecting a loss function is easy to do in PerceptiLabs.


PerceptiLabs will then update the component's underlying TensorFlow code as required to integrate that loss function. For example, the following code snippet shows the code for a Training component configured with a Quadratic (MSE) loss function and an SGD optimizer:

# Defining loss function
loss_tensor = tf.reduce_mean(tf.square(output_tensor - target_tensor))

...

optimizer = tf.compat.v1.train.GradientDescentOptimizer(learning_rate=0.001)

layer_weight_tensors = {}
layer_bias_tensors = {}        
layer_gradient_tensors = {}
for node in graph.inner_nodes:
	...compute gradients

update_weights = optimizer.minimize(loss_tensor, global_step=global_step)

You can also easily customize the loss function by modifying the Training component's code. Simply configure and create a different loss function and pass it to optimizer.minimize(). For example, the following code creates a cross-entropy loss function:

# Defining loss function
n_classes = output_tensor.get_shape().as_list()[-1]
flat_pred = tf.reshape(output_tensor, [-1, n_classes])
flat_labels = tf.reshape(target_tensor, [-1, n_classes])
loss_tensor = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=flat_labels, logits=flat_pred))

...

update_weights = optimizer.minimize(loss_tensor, global_step=global_step)

When you train your model in PerceptiLabs, the Loss tab on the statistics window shows you both the calculated loss during one epoch, and the average loss over all epochs, and this is updated in real-time as the model trains.

Figure 2: Viewing the loss during training in PerceptiLabs.
Figure 2: Viewing the loss during training in PerceptiLabs.

Since the goal is to minimize loss, you'll want to see the Loss over all epochs graph gradually decrease, which means the model's predictions are gradually improving at matching ground truth. Viewing this UI is one of the benefits of using PerceptiLabs, because you can quickly see if training is trending in the right direction, or if you should stop training and adjust your hyperparameters, change your loss function, or switch optimizers.

Conclusion

Loss functions play a key role when training your model, and are an essential component for your optimizer. While the PerceptiLabs' UI makes choosing a loss function a small detail, knowing which loss function to choose for a given use case and model architecture is really a big deal.

For more information about loss functions, check out the following articles which do a great job of explaining some of them in detail:

And for those just starting out with ML or need a refresher on loss functions, be sure to check out Gradient descent, how neural networks learn. This is Part 2 of a great YouTube video series that explains how a neural network works and this episode covers the role of loss functions in gradient descent.

1For an introduction or refresher on neural networks, check out this excellent YouTube video series.

2https://towardsdatascience.com/common-loss-functions-in-machine-learning-46af0ffc4d23

3https://arxiv.org/ftp/arxiv/papers/2006/2006.01413.pdf