1

Can someone share the derivation of Evidence Lower Bound in this paper ?
Zero-Shot Text-to-Image Generation

The overall procedure can be viewed as maximizing the evidence lower bound (ELB) (Kingma & Welling, 2013; Rezende et al., 2014) on the joint likelihood of the model distribution over images x, captions y, and the tokens z for the encoded RGB image. We model this distribution using the factorization ${p_\theta,_\psi(x, y, z) = p_\theta(x | y, z)p_\psi(y, z)}$, which yields the lower bound: ${\ln p_\theta,_\psi(x, y) > E_{z∼q_\phi(z | x)}\ln p_\theta(x | y, z) − \beta D_{KL}(q_\phi(y, z | x), p_\psi(y, z))}$

p1p13
  • 11
  • 1

0 Answers0