Research Summary
My doctoral research is focused on enabling machine learning researchers and practitioners to efficiently train large and complex models with big data on distributed clusters.
I spent a few years on developing
parameter server systems to make machine learning programs run fast and efficiently and worked
on
automating dependence-aware parallelization of serial, imperative ML programs for distributed training.
I am now working on
dynamic scheduling (i.e., distributed device placement) for neural network training to complete my thesis.