Standard data augmentation techniques.
First of all, by standard data augmentation I'll be referring to techniques like flipping an image (up/down, left/right), affine transformations (rotations, translations, crops, shears, etc.), adjusting the brightness/contrast of an image, adding noise to the image (salt&pepper, gaussian, etc.) and so on.
Before describing the pros/cons of standard vs GAN augmentation, we should note on why data augmentation is effective. In short, deep neural networks have the capacity of memorizing smaller datasets leading them to overfit. They benefit from more images and a higher variety in the images. Data augmentation is a method of generating new images from the existing ones, that have the same semantic content as the originals. E.g. if I have a cat image and I flip it, it is still a cat; the network, however, thinks that this is a new image. These techniques are so effective, that they are even be used in large datasets, which don't have the aforementioned problems, to boost their performance even more.
What are the problems of standard data augmentation techniques?
The main issue is that the augmentation strategies we can use vary depending on the input images. For instance, the mnist dataset is one of the most popular datasets in machine learning, for recognizing handwritten digits. In this case, we can't flip the images or rotate them too much. Another case is medical images which adhere to strict formats. For example MRIs are centered, aligned, laterally/horizontally asymmetric and somewhat normalized regarding brightness and contrast. This severely limits what augmentations we can accomplish. This makes their application ad-hoc in most cases.
Cons of using standard data augmentation techniques
- Might damage the semantic content of the image (e.g. rotating too much might cause a "6" to turn into a "9", or translating too much might cause the object of interest to fall out of the image).
- Augmentation schemes are dependent on the problem .
- Empirical/Ad-hoc application.
- Naive method: looks at one image at a time, can't gather information from the whole dataset.
These techniques might motivate us to use a more advanced data augmentation technique, i.e. generate synthetic images with GANs. In fact, GAN-augmentation, if done properly will solve all of these problems.
However, they too have their drawbacks.
Cons of using GANs for data augmentation.
- They require training. Training a GAN can take a lot of time and it isn't the easiest thing to do.
- They can't be applied on-line. Instead after training, you need to generate a pool of synthetic images and add them to your original dataset.
One final remark
Even though I was comparing one technique vs the other, I'd like to point out that using one technique does not exclude the other. In fact we found that by combining both standard and GAN-based augmentation helps even more than each one individually.
If you're interested more you can read this study we did which focuses on the use of GANs for data augmentation in medical images.