Topic Modeling: Proof to Practice
April 12, 2017
Topic Modeling is widely used. To scale up to very large corpora, (i) the number of topics has to grow beyond vocabulary size and (ii) algorithms with provably low polynomial time and space complexity are needed. For (i), we develop a new model of ``deep topics'' obtained by compounding pairs or triples of basic topics. For (ii) we develop an importance sampling algorithm inspired by Randomized Linear Algebra and prove that it reconstructs the generating model. Our algorithm also performs to scale. We present empirical results. The talk will be self-contained.
Joint work with: Chiranjib Bhattacharyya, Harsha Simhadri, Kushal Dave, Shrutendra Horsala