Deep generative models have received much attention recently since they can generate very realistic synthetic images. There are two regimes for the estimation of deep generaitve models. One is generative adversarial network (GAN) and the other is variational auto encoder (VAE), Even though GAN is known to generate more clean synthetic images, it suffers from numerical instability and mode collapsing problems. VAE is an useful alternative to GAN and an important advantage of VAE is that the representational learning (i.e. learning latent variables) is possible.
In this talk, I explain my recent studies about VAE. The first topic is computation. Typically, the estimation of VAE is done by maximizing the ELBO – an upper bound of the marginal likelihood. However, it is known that ELBO is inferior to the maximum likelihood estimator. I propose an efficient EM algorithm for VAE which directly finds the maximizer of the likelihood (the maximum likelihood estimator, MLE).
The second topic is theory. I explain how the MLE of VAE behaves asymptotically. I derive a convergence rate which depends on the noise level as well as the complexity of a deep architecture. A surprising observation is that the convergence rate of the MLE becomes slower when the noise level is too low. A new technique to modify the MLE when the noise level is small is proposed and is shown to outperform the original MLE by analyzing real data.
|