Relevant Paper(s):
Abstract: Machine learning models are rapidly growing in size, leading to increased training and deployment costs. While the most popular approach for training compressed models is trying to guess good "lottery tickets" or sparse subnetworks, we revisit the low-rank factorization approach, in which weights matrices are replaced by products of smaller matrices. We extend recent analyses of optimization of deep networks to motivate simple initialization and regularization schemes for improving the training of these factorized layers. Empirically these methods yield higher accuracies than popular pruning and lottery ticket approaches at the same compression level. We further demonstrate their usefulness in two settings beyond model compression: simplifying knowledge distillation and training Transformer-based architectures such as BERT. This is joint work with Neil Tenenholtz, Lester Mackey, and Nicolo Fusi.
Bio: Misha Khodaka is a PhD student in Carnegie Mellon University's Computer Science Department advised by Nina Balcan and Ameet Talwalkar. His research focuses on foundations and applications of machine learning, most recently neural architecture search, meta-learning, and unsupervised representation learning. He recently spent time as an intern with Nicolo Fusi at Microsoft Research - New England and previously received an AB in Mathematics and an MSE in Computer Science from Princeton University, where he worked with Sanjeev Arora.