9th February 2021
Because if you don’t write it down, it’s not science.
A GAN, or generative adversarial network, is a type of neural network designed by Goodfellow et al. in 2014. To go into their origins and inner workings now would be redundant, there are plenty of resources that will do a far better job of introducing the topic than me, and at your required level of explanation. This article simply seeks to serve the purpose of documenting my own experiments with this architecture, both for my own reference and for anyone that may stumble upon this while frantically searching for a solution to this incredibly niche and specific task.
You might ask, why am I drawn to GANs? To me, artificial intelligence is sometimes greatly exaggerated. What we call AI today is largely the result of linear algebra and statistics, and while it can produce amazing results, it can drift from the true essence of ‘intelligence’. A CNN identifying a dog or a cat may seem impressive, but it feels like it’s intuitive to understand what is going on under the hood (i.e. a load of statistics and pattern identification). NLP is cool but it doesn’t understand the language it’s using. It can’t attend to the meaning of new words. With GANs and other generative algorithms there is a sense of getting out more than you put in, and it’s hard to intuitively understand what’s going on behind the maths and statistics that underlie their operation (don’t get me wrong, maths is important and I could explain mathematically how a GAN works, but that won’t explain why it does what it does on a holistic level). This makes them feel like they’re more intelligent. (NB: I have referenced feeling a lot, and if you think science and data should be divorced of feeling and qualia then you might disagree with my arguments, but it’s these feelings and intuitions that lead to the greatest discoveries and are yet so unexplained by science. Maybe I’m too much of a dualist…). A big component of the definition of intelligence is creativity, or the ability to create new things or transfer knowledge from one area to another. GANs aren’t doing exactly that but I would argue they are being creative in a very narrowly defined way.
And so onto this project. Being the first GAN experiment, I started with an easily obtainable dataset with a large amount of resources and code available. I chose a Kaggle competition (see References) designed to introduce the topic of GANs. However, to make it more interesting, I decided to not follow the code templates provided and attempt a slightly different task. The tutorials decided to implement style transfer via the CycleGAN architecture whereas I decided to generate entirely new images with a standard deep convolutional GAN (DC-GAN). The competition provided the dataset, namely 300 images of paintings by Monet in 256x256 resolution. This wasn’t ideal as generally neural networks require much larger sets of training data (for example, the Tensorflow DC-GAN tutorial (see References) uses the MNIST dataset which has over 70,000 examples). However for my purposes, which were to understand how a GAN works practically in code and get an intuition for the aspects and procedures for training a GAN, it was suitable enough. Plus I wouldn’t have to wait 9 days for it to train.
Experiment 1.1
Batch size = 32, small learning rates (generator LR = 0.00001, discriminator LR = 0.0001), hybrid architecture from various tutorials, high dropout
This was the original schematic I used while developing the GAN. It was designed from a combination of the Tensorflow tutorial as well as random code snippets from various sources. Initially it didn’t show anything more than vague shapes and was only in the 3-colour palette that matplotlib uses for a non-RGB image (as the template I was following was for greyscale images). With some tweaking and experimenting with the data types, I was able to produce images with similiar colours to the data. This shows the network is learning something, which at the time was a huge accomplishment on par with cavemen learning to create fire. You can see with the examples above that the colours are clearly influenced by the colour palette of Monet. The textures also seemed slightly reminiscent of the paintings, but there is still a large amount of the checkerboard pattern visible. As for the shape and high-level details, this is blatantly non-existent. Additionally the information in the latent space seems to have nearly collapsed into one image. This was alleviated somewhat with a dropout layer rate of 0.5, but it is still a problem. Whether this constitutes mode collapse is debatable as it hasn’t learned to reproduce a single image yet and so early stopping, normally a way to mitigate mode collapse, wouldn’t be effective here.
Experiment 1.2
Batch size of 1
A disaster. But we learn from failures as much as we learn from successes.
Experiment 1.3
Updated architecture, dropout rate of 0.1
The updated architecture involved playing with the earlier convolutional layers and increasing the number of neurons in the initial dense (or fully-connected) layer. Results suggest that it may pick up slightly more shape information and be less prone to the checkerboard and ladder patterns seen in previous iterations. A lower dropout was also applied to see the effects.
Experiment 1.4
Variable dropout rate for each layer between 0.3–0.5
This experiment again focussed on the dropout as it seemed that the lack of differentiation within the latent space was influenced by the dropout. This time I tried a gradually decreasing dropout as we go deeper into the network. Results were not impressive, and even seem worse than the previous one. This highlights the temperamental nature of GANs and how mysterious and quite frankly irritating they are. Below is secret bonus experiment 1.4.5 which I refuse to comment on.
Experiment 1.5
Dropout rate back to 0.5 minus last layers
Here we can see some pleasant results. The colours of Monet’s paintings really came through well and there is some slight differentiation within the latent space (where a GAN learns the distribution of the dataset in a low-dimensional space). In terms of shape, let’s just say it’s a good job Monet was an impressionist.
Experiment 1.6
No batch normalisation
This experiment began with the hypothesis that perhaps the lack of shape and detail within the output so far was due to the GAN generalising too much over the images and trying to reproduce the mean of the data (which is a vague blur). This is still entirely possible as there is very little consistency in terms of shape within the paintings, as opposed to something like a face dataset where the outline and general shape of a face would be largely consistent between images and give the network something specific to learn. To reduce this possible over-generalisation I removed the batch normalisation within the generator network. Results show that it makes the network slightly unstable, especially during the early phases of training where random bright flashes of colour appear. The images do seem to have slightly more detail towards the end of training though, and the colours are more varied, but nothing I would describe as having a ‘shape’.
Experiment 1.7
Noise dimension increased to 256 (previously 128), discriminator architecture updated
Furthering the efforts to produce non-blurry images and prevent this quasi-mode collapse we’ve seen so far, I developed new hypotheses. Perhaps the dsicriminator wasn’t functioning correctly and wasn’t able to ‘see’ the finer details and so wasn’t providing any useful feedback to the generator about the shape and fine detail. And maybe the noise dimension wasn’t big enough to capture the complexity of Monet’s paintings and the latent space was getting entangled. I attempted to find out if this was indeed true by modifying these aspects of the network. The results show that the network does seem to be considering different parts of the image independently now as there is more variation between different parts, but no shape emerges. The latent space still seems void of differentiation.
Experiment 1.8
Batch normalisation returns, sigmoid activation added to the final layer of the discriminator, learning rates changed (G LR = 0.001, D LR = 0.0005)
These changes were taken from a notebook by a Kaggle user. The increases in learning rate seem to allow the GAN to explore more colour and shape variation, although whether this creates a more desirable output is questionable.
Experiment 1.9
Back to initial learning rates, sigmoid removed
The final experiment hoped to utilise the best hyperparameter values and most successful architectural elements in order to produce the most ideal output. While the colours are impressive and there is variation within the image, there is still little in the way of high-level shape, clearly defined texture or fine details.
Final Remarks
These experiments sought to explain and document the process of building and training a GAN to produce images in the vain of Monet’s impressionist paintings. While largely unsuccessful, a lot can be learned about the mechanisms of the generative adversarial network and how tricky they can be to train. As way of an explanation for the failure to reproduce a high-fidelity image with a reasonable level of detail, I can only suggest the thought that the GAN is in fact working correctly, it’s simply learnt the mean of the data (provided below). If this is true, then the solution is to try again with a different, more consistent, larger dataset. Once we can ascertain the data is suitable then we can go back to tweaking hyperparameters and finetuning the network.
By Alex Whelan