Relevant Paper(s):
Abstract: Deep energy-based models have quickly become a popular and successful approach for generative modeling in high dimensions. The success of these models can mainly be attributed to improvements in MCMC sampling such as Langevin Dynamics and training with large persistent Markov chains. Because of the reliance on gradient-based sampling, these models have been successful in modeling continuous data. As it stands, the same solution cannot be applied when dealing with discrete data. In this work, we propose a general and automatic approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to discrete assignments to propose Metropolis-Hastings updates. These updates can be incorporated into larger Markov chain Monte Carlo or learning schemes. We show theoretically and empirically that this simple approach outperforms generic samplers in a number of difficult settings including Ising Models, Potts Models, Restricted Boltzmann Machines, and Factorial Hidden Markov Models -- even outperforming some samplers which exploit known structure in these distributions. We also show that our improved sampler enables the training of deep energy-based models on high dimensional discrete data which outperform variational auto-encoders and previous energy-based models.
Bio: Will Grathwohl is a PhD student at the University of Toronto supervised by Richard Zemel and David Duvenaud. His work mainly focuses on generative models and their applications to downstream discriminative tasks. His work has covered variational inference, normalizing flows and now focuses mainly on energy-based models. Will is currently a student researcher on the Google Brain team in Toronto. Prior to graduate school, Will worked on machine learning applications in silicon valley and did his undergraduate degree in mathematics at the Massachusetts Institute of Technology.