Machine Learning

10-601, Fall 2012

Carnegie Mellon University

Tom Mitchell and Ziv Bar-Joseph

Previous material

Date	Lecture	Topics	Readings and useful links	Handouts
Sep 4	Intro to ML Decision Trees Slides	Machine learning examples Well defined machine learning problem Decision tree learning	Mitchell: Ch 3 Bishop: Ch 14.4 The Discipline of Machine Learning	HW1 out
Sep 6	Decision Tree learning Review of Probability slides	The big picture Overfitting Random variables, probabilities	Andrew Moore's Basic Probability Tutorial Bishop: Ch. 1 thru 1.2.3 Bishop: Ch 2 thru 2.2
Sep 11	Probability and Estimation slides annotated slides	Bayes rule MLE MAP	Andrew Moore's Basic Probability Tutorial Bishop: Ch. 1 thru 1.2.3 Bishop: Ch 2 thru 2.2 Tom's VIDEO: Probability and Estimation
Sep 13	Naive Bayes slides annotated slides	Conditional independence Naive Bayes	Mitchell: Naive Bayes and Logistic Regression Tom's VIDEO: Naive Bayes
Sep 18	Gaussian Naive Bayes Slides Annotated slides	Gaussian Bayes classifiers Document classification Brain image classification Form of decision surfaces	Mitchell: Naive Bayes and Logistic Regression Tom's VIDEO: Gaussian Bayes	HW 2 out HW2 data
Sep 20	Logistic Regression Slides Annotated slides	Naive Bayes - the big picture Logistic Regression: Maximizing conditional likelihood Gradient ascent as a general learning/optimization method	Mitchell: Naive Bayes and Logistic Regression Optional: Ng & Jordan: On Discriminative and Generative Classifiers, NIPS, 2001. Tom's VIDEO: Logistic regression
Sep 25	Linear Regression Slides Annotated slides	Generative/Discriminative models minimizing squared error and maximizing data likelihood regularization bias-variance decomposition	Bishop: Ch. 1 thru 1.2.5, Ch. 3 thru 3.2 Optional: Mitchell: Ch. 6.4 Tom's VIDEO: Linear regression
Sep 27	Neural Networks Slides	Non-linear regression Gradient descent Learning of representations Deep Belief Networks	Mitchell: Ch. 4, or Bishop: Ch. 5 Optional: Le et al., 2012 Tom's VIDEO: Neural networks
Oct 2	Graphical models 1 Slides Annotated slides	Bayes nets representing joint distributions with conditional independence assumptions	Bishop: Ch 8, through 8.2 Tom's VIDEO: Graphical models 1
Oct 4	Graphical models 2 Slides Annotated slides	Inference Learning from fully observed data Learning from partially observed data	Intro. to Graphical Models, K. Murphy Tom's VIDEO: Graphical models 2	HW3 out 10/8
Oct 9	Graphical models 3 Annotated slides	EM Semi-supervised learning Mixture of Gaussian clustering K-Means clustering	Bishop: Ch. 9 through 9.2 Optional: EM and HMM tutorial J.Bilmes (sec. 1-3) Tom's VIDEO: Graphical models 3 Tom's VIDEO: Graphical models 4
Oct 11	PAC Learning I Slides Annotated slides	Computational Learning Theory Probably approximately correct learning	Mitchell: Ch. 7 Tom's VIDEO: Learning theory 1
Oct 16	PAC Learning II Annotated Slides	VC Dimension Agnostic learning models Mistake bound models	Mitchell: Ch. 7 Tom's VIDEO: Learning theory 2
Oct 18		Midterm
Oct 23	Hierarchical Clustering Slides	Distance functions Hierarchical clustering Number of clusters	Bishop: 9-9.2 Optional: Tutorial on clustering Hierarchical clustering app
Oct 25	Semi-Supervised Learning Slides	Semi-supervised learning Re-weighting labeled examples CoTraining Detecting overfitting	Optional: Advanced tutorial	HW 4 out HW 4 data
Oct 30	Bayesian Networks Slides	Graphical models Constructing a BN Inference in BNs Why its hard Variable elimination Stochastic inference	Chap 8.1 and 8.2.2 (Bishop) Optional: Tutorials: 1 2
Nov 1	Inference in Bayesian Networks Slides	Why its hard Variable elimination Stochastic inference Introduction to HMMs	Chap 8.1 and 8.2.2 (Bishop) Optional: Tutorials: 1 2
Nov 6	Inference in Hidden Markov models Slides	Formal definition of HMMs Inference in HMMs With no observations With observations The Viterbi algorithm	Bishop - 13-13.2.1 (inclusive) Tutorial Tutorial
Nov 8	Learning in HMMs Slides	Learning parameters when states can be observed Fully unsupervised Forward backward algorithm EM for HMM learning Introduction to Markov decision processes (MDPs)	Bishop - 13.2.1-13.2.2 Tutorial Tutorial
Nov 13	Markov Decision Processes (MDP) Slides	Formal definition of MDPs Inference in MDPs With no actions With actions	Tutorial Demo
Nov 15	HMMs in Biology Slides
Nov 20	Dimensionality reduction (PCA) Notes		Chapter 4.1.4 - 4.1.6 in Bishop Tutorial for PCA including MATLAB code.
Nov 27	Suport Vector Machine (SVM) Slides	Max margin Support vectors Quadratic programming Linear separation
Nov 29	Suport Vector Machine (SVM) Slides	Lagrange multiplies Dual formulation of SVM Transformation of the input vector The kernel trick	Software Optional reading
Dec 4	Boosting Slides	Weak classifiers AdaBoost Boosting and logistic regression
Dec 6	Model and feature selection Slides	Cross validation Regularization Information theoretical selection methods Feature selection