A generative adversarial network (GAN) is a powerful approach to machine learning (ML). At a high level, a GAN is simply two neural networks that feed into each other. One produces increasingly accurate data while the other gradually improves its ability to classify such data.
In this blog we'll dive a bit deeper into how this mechanism works, where you might use it, and the potential ramifications of using a GAN in the real world.
A GAN in a Nutshell
The first neural network in a GAN is called the generator. It starts with random input1, and repeatedly generates data that approaches the quality of real-world data. It does this by sending its output to another neural network, the discriminator, which gradually improves its ability to classify that output from training data, and feeds its output (classification) back to the generator. From an implementation standpoint, the generator and discriminator have their own loss functions, where the loss function of the discriminator contains the loss function of the generator. The output (classification) is included in the loss functions, where the generator gets its weights updated through back propagation during training.
This design is illustrated in the following flow chart:
The adversarial aspect of a GAN is that the discriminator's results may be fed back into itself for self improvement, and/or back into the generator to improve the generator's output. In this sense, the ability for the generator to improve its output, somewhat competes with the discriminator's ability to classify data as training progresses. Moreover, the automatic training of a generative model (i.e., the generator) by the discriminator, means the GAN effectively turns an otherwise unsupervised ML process into an automated, supervised ML process.
To support such functionality, the generator is commonly built using an inverse convolutional neural network (sometimes called a deconvolutional network), because of that neural network's ability to generate data (e.g., upsampling feature maps to create new images). The discriminator is often built using a regular CNN because of its ability to break data (e.g., images) down into feature maps, and to ultimately classify data (e.g., to determine if an image is real or fabricated). Note that GANs can also be built using other types of neural network layers.
At the end of the training process, an ML practitioner might use the fully-trained generator, discriminator, or both components for inference, depending on what real-world problem they are trying to solve.
The "Hello World" of GANs
In the context of GANs, a good "hello world"2 project can be created around the MNIST dataset, a library of images with handwritten digits ranging from 0 through 9. Users who are learning neural networks for the first time, often use this dataset as input, to tackle the problem of classifying the digits represented in those images.
Thus this problem can be further extended as a starting point for learning about GANs. Here, the goal is to gradually generate new images of handwritten digits which approach or even match the quality and style of those in the MNIST dataset, while also increasing the ability to classify whether a given image was generated by the GAN or is in fact a real-world image. The GAN for such a problem would look as follows:
The generator is seeded with random noise (data) and generates an image of a handwritten digit. At this point the output is probably pretty bad since the random noise likely doesn't reflect a handwritten digit very well. This output is then fed to the discriminator along with images from the MNIST dataset (the training data). The discriminator in this example is binomial, classifying a given image from the generator as a real-world image or fake (generated) image.
The generator's output, along with the discriminator's classification, are then recursively fed back into the generator to repeat the process and hopefully improve both generator's next output. At this point the discriminator may also feed its output back into itself, along with more training data, to further improve its ability to classify images.
Training a GAN can take a long time, on the order of a few hours to even days, depending on the data, compute resources available, and the level of accuracy that the ML practitioner is trying to achieve. An idealized case is to train until the discriminator incorrectly classifies the image around 50% of the time, at which point, ML practitioners often assume that the generator is outputting plausible data. However, ML practitioners may train to different levels of accuracy depending on their needs.
Uses and Ethics
The power of GANs opens up a whole world of possibilities. GANs can be used for a wide variety of image processing tasks such as translating photos taken in the summer to look like they were taken in the winter, or similarly from day to night. They can also generate photorealistic photos of objects, scenes, and people that many of us would mistake as real. GANs can be used for similar translations in audio, and are also being used to help identify different types of cyber threats.
If you're starting to become a bit spooked by the possibilities you're not alone. Since data generated by GANS can be indistinguishable from real-world data, you don't have to ponder much to realize the potential implications of this. Thus it's important that GANs be put to good use, in ways that benefit society.
GANs are a powerful and clever way to harness the power of neural networks. By simultaneously training one neural network to create increasingly plausible data, while the other increases its ability to classify such data, ML practitioners can potentially build solutions which perform feats that seem like magic.
In the near future we will be releasing a new version of our PerceptiLabs visual modeling tool that includes built-in support for GANs. So keep an eye out for this announcement, and as always, check out our community resources for more information.
1 Input to a generator is often called the latent space.
2 A hello world project is the name given to your first project in a given framework or development environment (e.g., programming language) to prove that you can build and run it.