In our blog The Importance of Transparency in Machine Learning Models, we talked about how transparency in machine learning (ML) models helps us to build an understanding of the model, provide insight into why it’s generating certain results, and ultimately increase our trust that the model will perform as expected in the real world. However, achieving such transparency in a typical real-world ML workflow requires the right processes and tools to be in place.
At PerceptiLabs, we believe the best approach for achieving transparency is being able to “visualize” your model and how it performs, which is why our modeling tool’s workflow, user interface, and functionality are based around this philosophy. In this blog, we will review a typical ML workflow, and take a closer look at some of the key features of PerceptiLabs visual modeling tool that can help you and different members of your team, increase transparency for your ML workflows.
Elements of a Typical ML Workflow
The diagram below shows the elements of a typical ML workflow with three key phases:
- Data Management: data is obtained and validated to ensure it’s suitable for training.
- Model Management: the ML model is developed and trained.
- Serving Management: the model is deployed in the real world and monitored to ensure it continues to predict as expected.
It’s also important to note that the workflow doesn’t just end after deployment. The ML workflow is iterative because factors such as model decay, changes to business requirements, or simply just another round of enhancements, can necessitate the need to retrain or even redesign and build a new model, often around new data.
While iterating through the ML workflow, there are a lot of moving parts spread across different types of contributors, which all need to be aligned around the model. Most importantly, an iteration needs to be repeatable which means the model and the surrounding processes need to be well understood. Focusing on repeatability from the start of your project, is essential to facilitate experimentation, handling new business requirements, or even surviving turnover of team members.
This means any haphazard approaches which may have sufficed for developing deterministic systems, are not going to cut it for ML. If you’ve been around development for a while, the “visualization” of your model was probably spread across disparate resources spanning code, design documents, emails, whiteboard drawings, obfuscated build scripts, and other artifacts.
Without some sort tool around which everyone can collaborate on the model, the reliance on such resources for ML can make repeatability of your ML workflow difficult and error prone.
In recognizing this, and considering that ML is still in its infancy, we wanted a way for ML teams to do things right. That is why we are focused on enabling two particular processes within the Model Management phase of the ML workflow: “Design & Build”, and “Train & Tune”. Let’s take a closer look at how we do this.
Visually Designing and Building a Model Architecture
The design and construction of your model’s architecture sits at the core of your ML solution, so the ability to see the big picture of the architecture at a glance is essential. Having a visual “blueprint” of the model, allows all team members, regardless of their knowledge levels, to immediately understand what is being built and how it’s put together. In recognizing the importance of this, we built the UI and workflow in our modeling tool around a visual representation of the model:
The screenshot above, shows a simple ML model example in PerceptiLabs, that solves the classic problem of identifying digits from a collection of bitmaps of hand-drawn numbers.
Without even knowing the ins and outs of how to solve this problem, it’s easy for any team member to see that the model starts with two sets of data that flow through some processes and eventually converge. And even with just a little bit of knowledge about the model, any team member can quickly investigate to see what training and label data is being used, the transformative processes involved, and how everything is configured.
Of course the real power comes from the details, which is where our visual modeling tool really shines. Team members can dig in further to obtain varying degrees of information to further their understanding of the model’s elements that are important to them.
For example, a data scientist can double click the top (blue) Data component in the model, to see or set the data source:
Along with this data, they can also see and tweak how much of the data is being used for training, validation, and testing. In the example above, that split is defaulted to 70/20/10.
A data scientist might also double click the model’s Reshape component to reconfigure the dimensions of the data, and then see a preview showing the results of their tweak:
This example shows what the data looks like after it has been processed by the Reshape component. In this case, the data has been transformed from a one-dimensional array of normalized values, into a two-dimensional array of grayscale values representing the digit “1”.
Obviously a model isn’t just about the components, parameters, and connections. In PerceptiLabs visual modeling tool, each component also distills down to code behind the scenes. Programmers will likely be interested in the model’s underlying Python code, and how the data scientists’ tweaks affected that code. To see this, you can easily double click any component and select the Code tab to view and edit the code:
This example shows the code for the Reshape component. Here, a programmer can immediately see both the APIs being invoked and the “tweaked” input values being passed in, and they can modify the code for that component, as they see fit. They can also programmatically access any variable from the previous component via the “X” variable.
At a higher level, a technical lead or project manager might view the model as a whole, watching for the addition or removal of nodes, or changes to connections as they oversee development of the model. Similarly, they might start with a blank slate to architect some or all of the model, after which, team members tweak and program the various components. Alternatively, they might open other models to visually compare their design and/or inspect their parameter settings.
Visually Training and Tuning the Model
Training a model is an essential step in ML, because it prepares the model for inference. And training is always an iterative process as team members experiment with the model to tune values and code, run training epochs, compare results, and repeat the whole process until they’re happy with the model’s performance.
Those responsible for training the model, will be particularly interested in the rich set of statistics displayed as graphs and charts by PerceptiLabs. The Statistics pane is displayed as soon as training begins, and is updated each epoch:
It consists of five tabs that provide information about prediction, accuracy, loss, F1, and AUC.
The updates to the graphs during each epoch are particularly useful, because they allow you to see how the weights and biases are being set during training. Using this feature, you should ideally see that prediction starts to approach ground truth over the epochs and across repeated training and tuning sessions.
You also have the option to pause training, which is handy if you want to spend some time reviewing the statistics for a given iteration in an epoch, in the midst of training. This is useful when trying to detect overfitting.
Perhaps one of the coolest aspects of this, is that you can view the output and variable of any component of the model while paused. When training is launched, PerceptiLabs’ Map pane provides a miniature view of the model below the stats, where you can click on components and see their output for the current epoch, in the ViewBox to the right of the Map:
In this example, we’ve clicked on the Convolution component for which PerceptiLabs shows the weights and output of convolution for the current iteration of an epoch. The ability to view output at such a granular level, enables contributors to better experiment, understand, and analyze the model while training and tuning.
Also present during training is the Metrics pane, that developers may find useful for understanding memory and compute performance of the underlying framework (e.g., TensorFlow):
When training is complete, a summary of the result is shown in a popup that includes the accuracy and loss that occurred during training and validation of the model:
Data scientists will be particularly interested in this data, and possibly recording it for comparison in future experiments.
Transparency is essential for explainability, experimentation, and repeatability of the ML workflow, and from the beginning, our belief has been that “visualization” is the key to enabling transparency. PerceptiLabs has manifested into a visual modeling tool where all of the resources and output from the model are available for all team members to see.
We encourage you to try our free version and follow the guided tutorial included in the application, that shows how straight forward it is to build and train a model in PerceptiLabs.