Machine learning (ML) techniques, especially recent advances in deep neural networks, have surpassed human predictive performance in a variety of real-world tasks. This success is enabled by the recent development of ML systems (e.g., TensorFlow and PyTorch) that provide high-level programming interfaces for people to easily prototype different ML models on modern hardware platforms.
In this course, we will explore the design of modern ML systems by learning how an ML model written in high-level languages is decomposed into low-level kernels and executed across heterogeneous hardware accelerators (e.g., TPUs and GPUs) in a distributed fashion. Topics covered in this course include: neural networks and backpropagation, programming models for expressing ML models, automatic differentiation, deep learning accelerators, distributed training techniques, computation graph optimizations, automated kernel generation, memory optimizations, etc. The main goal of this course is to provide a comprehensive view on how existing ML systems work. Throughout this course, we will also learn the design principles behind these systems and discuss the challenges and opportunities for building future ML systems for next-generation ML applications and hardware platforms.
 
     
    
    