Inception Score (IS) and Fréchet Inception Distance (FID), which one is better for GAN evaluation?

Question

IS uses two criteria in measuring the performance of GAN: The quality of the generated images, and their diversity based on the entropy of the distribution of synthetic data.

On the other hand, FID uses the Inception network to extract features from an intermediate layer.

But how can I know that what to be used in a given situation. Any comparisons between them or recommendation for usage?

score 5 · Answer 1 · edited Jul 27 '20 at 04:03

Here is the original paper proposing FID.

Here is an excerpt from Jason Brownlee's https://machinelearningmastery.com/how-to-implement-the-frechet-inception-distance-fid-from-scratch/ along with a couple of quotes from the paper:

The inception score estimates the quality of a collection of synthetic images based on how well the top-performing image classification model Inception v3 classifies them as one of 1,000 known objects. The scores combine both the confidence of the conditional class predictions for each synthetic image (quality) and the integral of the marginal probability of the predicted classes (diversity).

The inception score does not capture how synthetic images compare to real images. The goal in developing the FID score was to evaluate synthetic images based on the statistics of a collection of synthetic images compared to the statistics of a collection of real images from the target domain.

For the evaluation of the performance of GANs at image generation, we introduce the “Frechet Inception Distance” (FID) which captures the similarity of generated images to real ones better than the Inception Score.

Like the inception score, the FID score uses the inception v3 model. Specifically, the coding layer of the model (the last pooling layer prior to the output classification of images) is used to capture computer-vision-specific features of an input image. These activations are calculated for a collection of real and generated images. The activations are summarized as a multivariate Gaussian by calculating the mean and covariance of the images. These statistics are then calculated for the activations across the collection of real and generated images. The distance between these two distributions is then calculated using the Frechet distance, also called the Wasserstein-2 distance.

The difference of two Gaussians (synthetic and real-world images) is measured by the Frechet distance also known as Wasserstein-2 distance.

can we use other models than InceptionNet? ResNet or MobileNet? any nets trained on imagenet ? — Giang Nguyen, Mar 12 '20 at 05:30
and from your comment, can I compute IS or FID scores on CIFAR or MNIST dataset? I am asking that because the InceptionNet is trained on 1000 classes ImageNet dataset. — Giang Nguyen, Mar 12 '20 at 05:50

Inception Score (IS) and Fréchet Inception Distance (FID), which one is better for GAN evaluation?

1 Answers1