Implementation Considerations
Markov chains are sequential, because they describe one state t_0 based on the previous state t_-1. When you have long Markov chains, you basically have a long sequence of calculations, each relying on the previous state to be calculated. Due to this sequential nature, parallelizing the computation, as you can do with the gradients and inference in a neural net, is not possible.
Architecture Considerations
In the paper they state several issues with Markov chains:
- "[...] methods based on Markov chains require that the distribution be somewhat blurry in order for the chains to be able to mix between modes."
- In addition, in Section 6 they state that Markov chains require "blurry" data distributions as opposed to GANs, which can represent "sharp" distributions as well.
- Section 2, first paragraph: They talk about Markov chains as a means for approximating the partition function of Deep/Restricted Boltzman Machines, which would otherwise be intractable. However, they state that mixing is a problem here. I am not sure what they mean by mixing here.
- As I understand the caption of Figure 2, they state that Markov chain mixing leads to correlated samples. This might be due to the seed, i.e., the inital state, you must provide in order to sample from a Markov chain.