10701/15781 Machine Learning

Syllabus and (tentative) Course Schedule

Date	Lecture	Topics	Readings and useful links	Handouts
Module 1
Intro to Functional Approximation
Mon 1/11	1.Overview and Decision Trees Lecturer: Eric Xing Slides (Annotated Slides)	Overview of Machine Learning Why Machine Learning? Designing a learning system Issues in Machine Learning Decision Trees Representation ID3 learning algorithm Entropy, Information gain overfitting	Mitchell: Chap 1,3 Decision Tree Learning [Applet]
Wed 1/13	2.Probability Review Lecturer: Aarti Singh Slides (Annotated Slides)	Probability basics Kolmogorov Axioms Random variables (discrete, continuous) Independence Bayes rule Joint distribution and inference Density estimation Maximum Likelihood estimate Maximum A Posteriori estimate	Bishop: Chap 1, 2 Probability for Data Miners by Andrew Morre.	HW1 out
Mon 1/18	3.Instance-based "Learning" Lecturer: Eric Xing Slides (Annotated Slides)	Introduction to Classification Theory: 1. Bayesian Optimal Classifier 2. Nonparametric Methods & Instance-based Learning Bayesian decision rule Bayes error Parzen and nearest neignbor density estimation K-nearest neighbor (kNN) classifier Case study: classification of text documents	Bishop: Chap 2.5 Fukunaga (Intro to Statistical PR) hypothesis test nonparametric density est. nonparametric classification Tutorial on another instance of "instance-based" learning: locally weighted regression, by Andrew Moore.
Approximating Linear Seperation Function
Wed 1/20	4.Naive Bayes Lecturer: Tom Mitchell Slides (Annotated Slides)	Generative classifiers: Naive Bayes classifiers with discrete and continuous (Gaussian) features Case study: classification of text documents	Naive Bayes classifiers [Applet]. Naive Bayes and Logistic Regression, Mitchell's chapter draft. Bishop: Chap 4	HW1 Due HW2 out
Mon 1/25	5.Logistic Regression Lecturer: Tom Mitchell Slides (Annotated Slides)	Discriminative classifiers : Logistic regression [Applet] Relationship to Naive Bayes Case study: comparison of LR and NB on text mining	Naive Bayes and Logistic Regression, Mitchell chapter draft. Bishop: Chap 4, 5 Mitchell: Chap 4 On Discriminative and Generative Classifiers, Ng and Jordan, NIPS, 2001.
Wed 1/27	6.Linear Regression Lecturer: Aarti Singh Slides (Annotated Slides)	Discriminative classifiers: Discriminative vs generative classifiers Regression Linear regression and its probabilistic interpretation as MLE Regularized linear regression and MAP Nonlinear regression (Polynomial, Nonlinear basis, Locally-weighted/Kernel Regression, Trees)	Linear regression [Applet]. Bishop: Chap 3 Mitchell: Chap 8.3 Tutorial on regression by Andrew Moore.
Mon 2/1	7. Neural Networks Lecturer: Tom Mitchell Slides (Annotated Slides)	Neural networks slides	recommended reading Mitchell Ch. 4
Wed 2/3	8. Model Selection Lecturer: Aarti Singh Slides (Annotated Slides)	Overfitting Bias-Variance Decomposition Model Selection (Cross Validation, SRM, Complexity regularization, Information Criteria)	Bishop: Chap 1, 2 Mitchell: Chap 5, 6 Matlab demo code for understanding overfitting Model comparison and Occam's Razor,Chapter 28 from David Mackay's book Model selection and Minimum Description Length principle,Mark Hansen and Bin Yu, J. Amer. Statist. Assoc. vol.96,746-774, 2001.	HW2 due HW3 out
Mon 2/8	Class Canceled : CMU was closed due to the snow storm.
Wed 2/10	Class Canceled: CMU was closed due to the snow storm.
Clustering
Mon 2/15	9. K-means and Hierarchical Clustering Lecturer: Aarti Singh Slides (Annotated Slides)	Introduction to Unsupervised Learning Clustering K-means clustering [Applet] Hierarchical clustering [Applet]	Bishop: Chap 9
Wed 2/17	10.Probabilistic Models for Clustering Lecturer: Aarti Singh Slides	Mixture model The Theory of Expectation-Maximization [Applet: Mixture of Gaussians]	Bishop: Chap 9
Introduction to Graphical Models
Mon 2/22	11.HMM and Bayesian Network I Lecturer: Eric Xing Slides (Annotated Slides)	Bayesian Network I: Representation and Inference HMM representation Evaluating marginal probabilities: Forward Algorithm Inference: Forward-backward Algorithm Viterbi Decoding	Bishop: Chap 8 Kevin Murphy's tutorial BayesNet Toolbox in Matlab by Kevin Murphy	HW3 Due
Wed 2/24	12.Bayesian Network II (HMM) and CRF. Lecturer: Eric Xing Slides (Annotated Slides)	HMM Viterbi continued. Learning: Baum-Wallach algorithm Conditional Random Field (CRF): Representation Inference and learning.	Same as Lecture 17	Project Proposal Due
Mon 3/1	13.Bayesian Network III: Representation and Learning Lecturer: Eric Xing Slides (Annotated Slides)	Bayesian network semantic. Conditional independence and D-Separation Parameter learning for fully observed BN.	Bishop: Chap 8
Wed 3/3	Midterm Exam	open book, open notes, no computers
Mon 3/8	Spring Break
Wed 3/10	Spring Break
Module 2:TBA
Mon 3/15	14.Bayesian Networks IV: Exact Inference Lecturer: Eric Xing Slides (Annotated Slides)	Learning: fully observed models. Inference: Variable Elimination Junction trees and message passing	Bishop: Chap 13
Wed 3/17	15. Learning Theory I Lecturer: Tom Mitchell Annotated Slides	Computational Learning Theory Probably approximately correct (PAC) learning sample complexity VC-dimension	Mitchell: Chap 7
Mon 3/22	16. Learning Theory II Lecturer: Tom Mitchell Slides (Annotated Slides)	Computational Learning Theory II Agnostic learning Mistake bounds Weighted Majority algorithm	Mitchell: Chap 7
Wed 3/24	17. Support Vector Machines I Lecturer: Eric Xing Slides (Annotated Slides)	Max-margin classification and SVM Lagrangian Duality and KKT conditions Solving Optimal margin Classifiers The non-separable case: soft-margin and slack variables SMO: sequential minimal optimization	SVM [Applet] Bishop: Chap 6, 7 Burgess tutorial The Sequential Minimal Optimization page, by Platt Pegasos: Primal Estimated Sub-Gradient Solver for SVM Fast and efficient online SVM
Mon 3/29	18. Support Vector Machines II Lecturer: Eric Xing Slides (Annotated Slides)	Kernel methods Maximum-entropy Discrimination Structured SVM	Maximum-entropy Discrimination, Jaakolla et al. Max-margin Markov network, Taskar et al. Laplace Max-margin Markov networks, Zhu et al.
Wed 3/31	19. Boosting Lecturer: Aarti Singh Slides	Combining weak classifiers Adaboost Comparison with logistic regression	Bishop: Chap 14.3 Boosting homepage Schapire: Boosting Tutorial, Video Adaboost Applet	Project Progress Report Due
Mon 4/5	20. Dimensionality Reduction Lecturer: Aarti Singh Slides	Feature Selection Identifying latent features Linear Methods - PCA Nonlinear Methods - ISOMAP	Bishop: Chap 12 Shlens' PCA tutorial Applet PCA
Wed 4/7	21. Spectral Clustering Lecturer: Aarti Singh Slides	Graph-Theoretic Methods for Clustering Graph Laplacian Balanced min-cut Spectral clustering	Ulrike von Luxburg's Tutorial	HW4 due HW5 out
Mon 4/12	22. Structure Learning I Lecturer: Eric Xing Slides (Annotated Slides)	Graphical Gaussian Model Neighborhood selection Graphical lasso Sparsistency Time-varying GGM	Bishop: Chap 8 Sparse inverse covariance estimation with the graphical lasso. Jerome Friedman, Trevor Hastie and Robert Tibshirani. Biostatistics, December 12, 2007. High-dimensional graphs and variable selection with the Lasso. Meinshausen, N. and Buhlmann, P. The Annals of Statistics, 2006, Vol. 34, No. 3
Wed 4/14	23. Structure Learning II: Bayesian Structure Learning Guest Lecturer: Zoubin Ghahramani Slides (Annotated Slides)	Parameter learning in directed models: complete and incomplete data ML and Bayesian methods Bayesian model comparison and Occam’s Razor Structure learning in directed models: complete and incomplete data Causality Parameter and Structure learning in undirected models
Mon 4/19	24. Semi-Supervised Learning Lecturer: Aarti Singh Slides	Semi-Supervised Learning Learning from labeled and unlabeled data Generative Mixture model approach Co-training and Multi-view methods Graph regularization	Co-training paper X. Zhu's SSL survey Intro to SSL - Online book Video of lecture by X .Zhu
Wed 4/21	25. Active Learning Lecturer: Aarti Singh Slides	Active learning Feedback-driven sequential learning Binary bisection search Query-by-Committee Density weighting,Estimated Error Reduction	Active learning literature survey	HW5 due
Mon 4/26	26. Reinforcement Learning I Lecturer: Tom Mitchell Slides	Reinforcement Learning Markov Decision Processes Q learning	Reinforcement Learning: A Survey, Kaelbling et al., JAIR, 1995.
Wed 4/28	27. Reinforcement Learning II Lecturer: Tom Mitchell Slides	Reinforcement Learning Models of human learning Quick course summary
Tuesday May 4th	Poster Session	NSH Atrium, 3:00pm-6:00pm
Friday May 7th	Final Exam	DH 2302, 5:30pm-8:30pm