Creating a machine learning (ML) model involves a lot of variables such as deciding what data to analyze, which approach to employ (e.g., a neural network), and what type of result to generate (e.g., probabilities, classifications, etc.). The key parts of an ML model that define its structure and behavior are its hyperparameters.
You can think of these hyperparameters as the model's configuration settings, which specify how the model is laid out and how it will work. Hyperparameters are often referred to as the model's external parameters, because they cannot be estimated from data while the training process is running (i.e., training that does not include hyperparameter tuning). Hyperparameters can be binary (i.e., true/false values), continuous (i.e., floats), or discrete values (e.g., integers, enumerations, etc.). Examples of hyperparameters include the number of layers to include in a model, the number of epochs to run during training, and the loss function to employ
Models also consist of model parameters. These are the model's internal parameters that are estimated during training, such as the weights calculated by the model for a given layer. How these internal parameters are set, is dependent upon how the model's hyperparameters are configured. Note that hyperparameters are sometimes also referred to as model parameters which can make things confusing. A good rule to follow to avoid confusion is that if a given parameter has to be manually configured, it's most likely a hyperparameter.
The selection and adjustment of hyperparameters has a direct and significant impact on how the model performs. Thus a process called hyperparameter optimization is performed for selecting and tuning hyperparameter values that cause the trained model to provide the best accuracy.
Given all of the variations of hyperparameter values, there can easily be a combinatorial explosion of configurations. Numerous optimization methods can be employed including manual approaches such as simple trial and error, to automatic approaches such as grid search. Currently PerceptiLabs supports manual methods, but will soon support automatic methods including Bayesian Optimization, which is considered a state-of-the-art approach.
Given the number of hyperparameters that you may want to tune, knowing where to start can be overwhelming. With this in mind, here are five common hyperparameters commonly found in ML projects and how you can configure them in PerceptiLabs visual tool for machine learning.
1. Model Architecture
The model architecture is itself a hyperparameter, that is inherent in all ML models. This broad hyperparameter is composed of numerous settings such as the number of layers, methods for processing the data (e.g., reshaping), and other components that dictate how the actual data analysis will be done.
In PerceptiLabs, users start by dragging and dropping components into their workspace and creating connections between them to form the basis of the model architecture. From those components, numerous parameters can be exposed and set through either user interface fields or programmatically.
2. Model Complexity
Model complexity refers to the number of model parameters (i.e., weights) that will be available in each layer. Tuning the model's complexity requires a careful balance. While more complex models may produce better accuracy, their complexity also means a higher load on the processor.
In PerceptiLabs, complexity is controlled indirectly via Deep Learning components (the ones with red icons), whose various hyperparameters affect the sizes of the layers that those components define. For example, you can specify the number of feature maps in a Convolution Layer component which in turn, sets the complexity and the number of model parameters. Another example is the Fully Connected Layer's Neurons parameter which directly specifies the number of weights.
3. Loss Function and Learning Rate
A loss function maps decisions to associated costs, and the learning rate (aka step size) defines how much to change the model based on the estimated error when weights are updated. These are used by the models' optimizer (discussed below) to control the training process.
One or both of these hyperparameters may be set in PerceptiLabs' various Training components (the ones with the green icons).
Not to be confused with hyperparameter optimization, a model's optimizer is the algorithm that updates the weights of the model during training, using output from the loss function along with other model parameters.
The optimizer is specified by selecting an algorithm in PerceptiLabs' Training components. PerceptiLabs currently allows for the selection of SGD (stochastic gradient descent) and the following variants: Adam (adaptive moment estimation), Momentum, and RMSprop (root mean square propagation). Users also have the option to specify any TensorFlow optimizer by modifying the code in a Training component.
Regularization is a process to reduce overfitting. PerceptiLabs currently supports random dropout as a binary hyperparameter in Deep Learning components which enables or disables this for the model. We also have plans to support L1 and L2 regularization, as well as Batch Normalization.
Hyperparameters are fundamental building blocks that exist in all machine learning models to control their structure and performance. Through PerceptiLabs' inherently visual workflow, it's quick and easy to locate and tune hyperparameter values.
Based on the five hyperparameters we discussed above, we encourage you to explore the rich set of components in PerceptiLabs and discover even more hyperparameters that you can tune for your model. If you have any questions or comments on the subject, be sure to share them through our community resources such as our Slack workspace.