Why KL Divergence instead of Cross-entropy in VAE

Question

I understand how KL divergence provides us with a measure of how one probability distribution is different from a second, reference probability distribution. But why are they particularly used (instead of cross-entropy) in VAE (which is generative)?

related: https://stats.stackexchange.com/questions/489087/why-kl-divergence-instead-of-cross-entropy-in-vae — denfromufa, Aug 03 '22 at 15:49

score 1 · Answer 1 · answered Sep 25 '20 at 11:57

Answering with some theoretical understanding of Variational auto-encoders.

In the general architecture of encoders and decoders, the encoder encodes the input a latent-space, and the decoder reconstructs the input from the encoded latent space.

However, the Variational auto-encoders (VAE), the input is encoded to a latent-distribution instead of a point in a latent space. This latent distribution is considered to be Normal Gaussian distribution (Which can be expressed in terms of mean and variance). Further, decoders samples a point in this distribution and reconstructs the input. Since, VAE encoder encodes to a distribution than a point in a latent space, and KL divergence is use to measure the difference between the distribution, it is used as a regularization term in the loss function.

This does not answer the question of why not instead of KL use cross-entropy loss? — denfromufa, Aug 03 '22 at 15:48

Why KL Divergence instead of Cross-entropy in VAE

1 Answers1