Tuesday, March 28, 2017. 12:00PM. NSH 3305.
Manzil Zaheer - Exponential Stochastic Cellular Automata For Massively Parallel Inference
Abstract: Often statistical models and inference procedures thereof are directly not good fit for the modern computational resources. To elaborate, current computational resources are racks of fast, cheap, and heavily multicored machines yet with a limited memory bandwidth whereas inference strategies can be inherently sequentially like Gibbs sampling or memory access intensive like expectation-maximization or other variational inference.
In this talk, we discuss an embarrassingly parallel, memory efficient inference algorithm for latent variable models in which the complete data likelihood is in the exponential family. The algorithm is a stochastic cellular automaton and converges to a valid maximum a posteriori fixed point. We explore further tricks to improve performance by reducing pressure on memory bandwidth by use of better data structures.
We apply the algorithm to Gaussian mixture model (GMM) and latent Dirichlet allocation (LDA) and empirically find that our algorithm is order of magnitudes faster than state-of-the-art approaches. A simple C++/MPI implementation on a 16-node cluster can sample more than a billion tokens per second in case of LDA and a million images in case of GMM.
This is a joint work with Alex Smola, Jean-Baptiste Tristan, Michael Wick, and Satwik Kottur.