Georgia Tech, 8803 Machine Learning Theory, Fall 2011

MACHINE LEARNING THEORY

Maria Florina Balcan

Time: Tues/Thurs 12:05-1:25, Place: Instr. Center 111.
Course description: Machine learning studies automatic methods for learning to make accurate predictions or useful decisions based on past observations and experience, and it has become a highly successful discipline with applications in many areas such as natural language processing, speech recognition, computer vision, or gene discovery. This course on the design and theoretical analysis of machine learning methods will cover a broad range of important problems studied in theoretical machine learning. It will provide a basic arsenal of powerful mathematical tools for their analysis, focusing on both statistical and computational aspects. We will examine questions such as "What guarantees can we prove on the performance of learning algorithms? " and "What can we say about the inherent ease or difficulty of learning problems?". In addressing these and related questions we will make connections to statistics, algorithms, complexity theory, information theory, game theory, and empirical machine learning research. You can find more info here.

Recommended (but not required) textbooks:

An Introduction to Computational Learning Theory by M. Kearns and U. Vazirani
A Probabilistic Theory of Pattern Recognition by L. Devroye, L. Györfi, G. Lugosi.

Additionally, we will use a number of survery articles and tutorials.

Office hours: Wed 11:30 -- 1:00 in Klauss 2144.

Homeworks

Homework 1 [ps, pdf].
Homework 2 [ps, pdf].
Homework 2.5 [ps, pdf].
Homework 3 [ps, pdf].
Homework 4 [ps, pdf].

Project ideas

Lecture Notes & Handouts

08/23: Introduction. The consistency model.
See also Chapter 1 in the Kearns-Vazirani book.
08/25: The PAC model for passive supervised learning.
See also Chapter 1 in the Kearns-Vazirani book.
08/30: The Perceptron algorithm for learning linear separators.
09/01: The mistake bound model.
See also: An Online Learning Survey by Avrim Blum (Section 3 in particular).
09/06: The Winnow algorithm.
See also: The original Winnow paper. See also: Kakade and Tewari lecture notes.
09/08: The mistake bound model (cont'd).
See also: Blum's paper separating the PAC and Mistake Bound models.
09/13: Tail Inequalities. Simple sample complexity results for the agnostic case. The Vapnik Chervonenkis dimension.
09/15: Sample complexity results for infinite hypothesis spaces.
09/20: Guest lecture by Prasad Raghavendra. Hardness of learning.
09/22: Sample complexity results for infinite hypothesis spaces (cont'd). Sauer's lemma .
See also Chapter 3 in the Kearns-Vazirani book.
09/27: Sample complexity lower bounds for passive supervised learning.
See also Chapter 3.6 in the Kearns-Vazirani book.
09/29: Boosting: Weak Learning and Strong Learning.
See Chapter 4 in the Kearns-Vazirani book.
10/04: Adaboost. See Rob Schapire's lecture notes.
See also: A Short Introduction to Boosting, by Yoav Freund and Rob Schapire.
10/06: Adaboost. Generalization error bounds: naive and margins-based.
10/11: Finish margins-based generalization error bounds for Adaboost.
10/13: Introduction to kernels methods.
10/20: Properties of Legal Kernel Functions. More Examples of Kernels.
10/25: Kernels and Margins.
See also: Kernels as Features: On Kernels, Margins, and Low-dimensional Mappings, Machine Learning Journal 2006.
10/27: Support Vector Machines. See Andrew Ng's notes. See also Steve Boyd's lectures on Convex Optimization.
11/01: Semi-Supervised Learning.
11/03: A PAC-style model for Semi-Supervised Learning.
See also: A Discriminative Model for Semi-Supervised Learning, JACM 2010.
The Encyclopedia of Machine Learning entry on Semi-Supervised Learning. by X. Zhu.
11/08: Active Learning.
11/10: Sample complexity bounds for Active Learning. The CAL and A^2 algorithms. Disagreement coefficient.
See also Shai Shalev-Shwartz's notes.
11/15: Generalization bounds based on the Rademacher Complexity.
11/17: Generalization bounds based on the Rademacher Complexity (cont'd).
See also Introduction to Statistical Learning Theory. by O. Bousquet, S. Boucheron, and G. Lugosi.
11/22: Project Presentations.
11/29: Project Presentations.
12/01: Project Presentations.
12/06: Online Learning, Combining Experts Advice, and Regret Minimization. The Weighted Majority Algorithm.
See also The original Weighted Majority paper by N. Littlestone and M. Warmuth.
12/08: Experts Learning and The Minimax Theorem for Zero-Sum Games.

Additional Readings & More Information

Books and tutorials:

PASCAL video lectures.
Kernel Methods for Pattern Analysis by N. Cristianini and J. Shawe-Taylor. Cambridge University Press, 2004.
An Introduction to Support Vector Machines (and other kernel-based learning methods) by N. Cristianini and J. Shawe-Taylor. Cambridge University Press, 2000.
Learning in Neural Networks: Theoretical Foundations by M. Anthony and P. Bartlett. Cambridge University Press, 1999.
Statistical Learning Theory by V. Vapnik. Wiley, 1998.