Here is the original paper proposing FID.
Here is an excerpt from
Jason Brownlee's https://machinelearningmastery.com/how-to-implement-the-frechet-inception-distance-fid-from-scratch/ along with a couple of quotes from the paper:
The inception score estimates the quality of a collection of synthetic images based on how well the top-performing image classification model Inception v3 classifies them as one of 1,000 known objects. The scores combine both the confidence of the conditional class predictions for each synthetic image (quality) and the integral of the marginal probability of the predicted classes (diversity).
The inception score does not capture how synthetic images compare to real images. The goal in developing the FID score was to evaluate synthetic images based on the statistics of a collection of synthetic images compared to the statistics of a collection of real images from the target domain.
For the evaluation of the performance of GANs at image generation, we introduce the “Frechet Inception Distance” (FID) which captures the
similarity of generated images to real ones better than the Inception
Score.
Like the inception score, the FID score uses the inception v3 model. Specifically, the coding layer of the model (the last pooling layer prior to the output classification of images) is used to capture computer-vision-specific features of an input image. These activations are calculated for a collection of real and generated images.
The activations are summarized as a multivariate Gaussian by calculating the mean and covariance of the images. These statistics are then calculated for the activations across the collection of real and generated images.
The distance between these two distributions is then calculated using the Frechet distance, also called the Wasserstein-2 distance.
The difference of two Gaussians (synthetic and real-world images) is measured by the Frechet distance also known as Wasserstein-2
distance.