Motivation
Latent variable graphical models such as hidden Markov models and topic models have become a popular framework for modeling complex dependencies among variables. Although, latent variables can present a compact parameterization of an otherwise complex model, learning in these models can often be challenging. Maximum likelihood learning of the parameter of latent variable models results in a nonconvex objective which is often optimized with local search heuristics like Expectation Maximization. EM often suffers from local minima and slow convergence.
In many of these cases, approaching the problem by exploiting the underlying spectral properties provides a fundamentally different perspective. For example, consider learning Hidden Markov Models. While the problem is NP-hard under cryptographic assumptions, recent results have given algorithms with polynomial computational and sample complexity given modest assumptions on the rank and singular values of the transition/observation matrices. Interestingly, the sample complexity is a function of not only the topology of the model, but also the underlying spectral properties of the parameters. This is fundamentally different from learning in observed models where learning primarily depends on the topology.
Inspired by these results, the past two years have seen the rise of many fast, provably consistent spectral methods for latent variable model learning including algorithms for predictive state representations, finite state transducers, latent trees, latent junction trees, probabilistic context free grammars, and mixture/admixture models. While some of these methods aim to recover the original parameters, most propose an alternative parameterization, called the observable representation, that still allows for efficient inference of observed marginals (but not latent ones), and is useful for many predictive tasks.
Due to their speed and ability to handle complex features, spectral algorithms have proven useful for dynamical systems and natural language processing tasks such as parsing.
Moreover, when combined with kernel methods, such as Hilbert Space Embeddings, many spectral algorithms can be elegantly generalized to domains where kernels can be defined, such as continuous, non Gaussian data where it is difficult to run EM.
Despite these advances, the area is relatively unexplored and many challenges remain. Some examples include (but are not limited to): (1) extending these algorithms to more general latent models, (2) theoretical analysis when the true model deviates from the assumed one, (3) applying similar principles to other graphical models problems beyond learning (4) new applications such as computer vision.