CMU 10-806 Course Information

Instructors: Nina Balcan and Avrim Blum

Mon/Wed 4:30-5:50, GHC 4303


Course description: This course will cover fundamental topics in Machine Learning and Data Science, including powerful algorithms with provable guarantees for making sense of and generalizing from large amounts of data. The course will start by providing a basic arsenal of useful statistical and computational tools, including generalization guarantees, core algorithmic methods, and fundamental analysis models. We will examine questions such as: Under what conditions can we hope to meaningfully generalize from limited data? How can we best combine different kinds of information such as labeled and unlabeled data, leverage multiple related learning tasks, or leverage multiple types of features? What can we prove about methods for summarizing and making sense of massive datasets, especially under limited memory? We will also examine other important constraints and resources in data science including privacy, communication, and taking advantage of limited interaction. In addressing these and related questions we will make connections to statistics, algorithms, linear algebra, complexity theory, information theory, optimization, game theory, and empirical machine learning research.


Evaluation and Responsibilities: Grading will be based on a collection of homework assignments (primarily proof-based), a take-home final, and a class project. We will use two grading schemes. To determine your final grade we will use whichever scheme is best for you:
  1. Homework-oriented: 60% homeworks, 10% final, 30% project.
  2. Project-oriented: 30% homeworks, 10% final, 60% project.

Topics to be covered will include:

Recommended (but not required) textbooks:

Additionally, we will use a number of survey articles and tutorials.