Every industry has its lingo and machine learning (ML) is no different. We thought we’d provide you with a brief checklist of key terminology that all aspiring ML developers should be familiar with when building models. In case you missed it, you can also read our Tips to Steer Your Developer Career Towards Machine Learning, where we listed some skills and tips on how to get started in ML.
Layer and Model Types
Within ML, deep learning (DL) is the current state of the art. All DL models today are based around neural networks. Neural networks comprise layers of neurons (aka tensors) which store the state of something (e.g., pixel values), and are connected to neurons in subsequent layers which store the state of higher-level entities (e.g., groups of pixels). In between these layers sit weights which are adjusted during training and ultimately used to make predictions.
Below are common types of neural network layers to become familiar with:
- Input Layer: the first layer of a neural network that represents the input data (e.g., image data).
- Output Layer: the final layer of a neural network that contains the model's result for a given input (e.g., image classification).
- Hidden Layer(s): the layers which sit between a neural network's input and output layers to perform transformations on the data.
- Dense (aka fully-connected layer): a layer where all of its outputs are connected to all inputs of the next layer.
- Convolution: the foundation for building Convolution Neural Networks (CNN).
- Recurrent: provides looping capability where the layer's input includes both the data to analyze and output from a previous calculation performed on that layer. Recurrent layers form the basis of Recurrent Neural Network (RNNs).
- Residual Block: Residual Neural Networks (ResNets) group together two or more layers as residual blocks to avoid the vanishing gradient problem.
ML practitioners use different types of layers to architect neural networks in different ways. Common neural networks examples include VGG16, ResNet50, InceptionV3, and MobileNetV2. Note that many of these models are provided by Keras Applications and are available through components in PerceptiLabs, allowing you to quickly build models using transfer learning.
A model's layers and interconnections form the model's internal or learnable parameters which are adjusted during training. Key parameters include:
- Weights: degree to which a given neuron influences the tensor(s) which it's connected to in the next layer.
- Biases: shifts the results up or down, making it easier or harder to activate a neuron.
During training, the underlying ML engine optimizes the model by adjusting its weights and biases. In order to do this, the following algorithms are employed:
- Optimizer: updates the model to help it learn complex patterns of data. Common algorithms include: ADAM, Stochastic Gradient Descent (SGD), Adagrad, and RMSprop.
- Activation Function: mathematical function that adds non-linearity to the neuron's output. Examples include: Sigmoid, ReLU, and Tanh.
- Loss Function (aka Error function): gauges if the adjustments by the optimizer for a single training sample results in better performance (i.e., indicates how well a model makes predictions given a set of weights and bias values). Common loss functions include Cross-Entropy, Quadratic, and Dice.
- Cost Function: gauges if the adjustments by the optimizer across all training samples result in better model performance.
Training typically involves partitioning the data into three sets:
- Training Data: used to optimize the model's weights and biases by providing ground truth data (e.g., images with corresponding labels which are known to be correct).
- Validation Data: used to test the accuracy of the model as it's being trained.
- Test Data: used to test the performance of the final trained model on data which it hasn't seen before.
It's important to avoid under-fitting during training (i.e., training a model that is too simple to learn from the data very well), as well as over-fitting (i.e., training a model that is too complex to predict accurately on new, unseen data). ML practitioners can adjust a number of user-controllable settings called hyperparameters to develop as good of a model as possible:
- Batch Size: number training samples to train on before the optimizer updates the model's parameters.
- Epochs: number of passes to make over the entire training dataset (including all batches).
- Loss: loss function algorithm to use.
- Learning Rate: how much to change the model's parameters based on the error calculated after they are updated.
- Optimizer: optimizer algorithm to use.
- Shuffle: randomly changes the training data order to make the model more robust.
- Dropout: a regularization trick to reduce overfitting by ignoring randomly chosen layer outputs. Each update to a layer is then effectively performed on a different view of that layer.
This checklist is certainly not exhaustive – we could easily write a whole set of books about the fundamentals! However, these key elements are a good start.