Machine Learning

10-601, Fall 2012

Carnegie Mellon University

Tom Mitchell and Ziv Bar-Joseph


Date Lecture Topics Readings and useful links Handouts
Sep 4 Intro to ML
Decision Trees
Slides

  • Machine learning examples
  • Well defined machine learning problem
  • Decision tree learning
Mitchell: Ch 3
Bishop: Ch 14.4
The Discipline of Machine Learning
HW1 out
Sep 6 Decision Tree learning

Review of Probability

slides
  • The big picture
  • Overfitting
  • Random variables, probabilities
Andrew Moore's Basic Probability Tutorial
Bishop: Ch. 1 thru 1.2.3
Bishop: Ch 2 thru 2.2
Sep 11
Probability and Estimation

slides
annotated slides
  • Bayes rule
  • MLE
  • MAP
Andrew Moore's Basic Probability Tutorial
Bishop: Ch. 1 thru 1.2.3
Bishop: Ch 2 thru 2.2

Tom's VIDEO: Probability and Estimation

Sep 13
  • Conditional independence
  • Naive Bayes
Mitchell: Naive Bayes and Logistic Regression

Tom's VIDEO: Naive Bayes

Sep 18
Gaussian Naive Bayes
Slides
Annotated slides

  • Gaussian Bayes classifiers
  • Document classification
  • Brain image classification
  • Form of decision surfaces
Mitchell: Naive Bayes and Logistic Regression

Tom's VIDEO: Gaussian Bayes

Sep 20
Logistic Regression

Slides
Annotated slides
  • Naive Bayes - the big picture
  • Logistic Regression: Maximizing conditional likelihood
  • Gradient ascent as a general learning/optimization method
Mitchell: Naive Bayes and Logistic Regression

Optional: Ng & Jordan: On Discriminative and Generative Classifiers, NIPS, 2001.

Tom's VIDEO: Logistic regression

Sep 25
Linear Regression
Slides
Annotated slides
  • Generative/Discriminative models
  • minimizing squared error and maximizing data likelihood
  • regularization
  • bias-variance decomposition
Bishop: Ch. 1 thru 1.2.5, Ch. 3 thru 3.2
Optional: Mitchell: Ch. 6.4

Tom's VIDEO: Linear regression

Sep 27
Neural Networks
Slides
  • Non-linear regression
  • Gradient descent
  • Learning of representations
  • Deep Belief Networks
Mitchell: Ch. 4, or Bishop: Ch. 5
Optional: Le et al., 2012

Tom's VIDEO: Neural networks

Oct 2
Graphical models 1
Slides
Annotated slides
  • Bayes nets
  • representing joint distributions with conditional independence assumptions

Oct 4
Graphical models 2
Slides
Annotated slides
  • Inference
  • Learning from fully observed data
  • Learning from partially observed data
Intro. to Graphical Models, K. Murphy

Tom's VIDEO: Graphical models 2
Oct 9
Graphical models 3

Annotated slides
  • EM
  • Semi-supervised learning
  • Mixture of Gaussian clustering
  • K-Means clustering
Bishop: Ch. 9 through 9.2

Optional: EM and HMM tutorial J.Bilmes (sec. 1-3)

Tom's VIDEO: Graphical models 3
Tom's VIDEO: Graphical models 4
Oct 11
PAC Learning I
Slides
Annotated slides
  • Computational Learning Theory
  • Probably approximately correct learning
Mitchell: Ch. 7

Tom's VIDEO: Learning theory 1
Oct 16
PAC Learning II
Annotated Slides
  • VC Dimension
  • Agnostic learning models
  • Mistake bound models
Mitchell: Ch. 7

Tom's VIDEO: Learning theory 2
Oct 18 Midterm
Oct 23
Hierarchical Clustering
Slides
  • Distance functions
  • Hierarchical clustering
  • Number of clusters
Bishop: 9-9.2
Optional: Tutorial on clustering
Hierarchical clustering app
Oct 25
Semi-Supervised Learning
Slides
  • Semi-supervised learning
  • Re-weighting labeled examples
  • CoTraining
  • Detecting overfitting
Optional: Advanced tutorial
HW 4 out
HW 4 data
Oct 30
Bayesian Networks
Slides
  • Graphical models
  • Constructing a BN
  • Inference in BNs
    • Why its hard
    • Variable elimination
    • Stochastic inference
Chap 8.1 and 8.2.2 (Bishop)
Optional: Tutorials: 1 2
Nov 1
Inference in Bayesian Networks
Slides
  • Why its hard
  • Variable elimination
  • Stochastic inference
  • Introduction to HMMs
Chap 8.1 and 8.2.2 (Bishop)
Optional: Tutorials: 1 2
Nov 6
Inference in Hidden Markov models
Slides
  • Formal definition of HMMs
  • Inference in HMMs
    • With no observations
    • With observations
    • The Viterbi algorithm
Bishop - 13-13.2.1 (inclusive)
Tutorial Tutorial
Nov 8
Learning in HMMs
Slides
  • Learning parameters when states can be observed
  • Fully unsupervised
    • Forward backward algorithm
    • EM for HMM learning
  • Introduction to Markov decision processes (MDPs)
Bishop - 13.2.1-13.2.2
Tutorial Tutorial
Nov 13
Markov Decision Processes (MDP)
Slides
  • Formal definition of MDPs
  • Inference in MDPs
    • With no actions
    • With actions
Tutorial
Demo
Nov 15
HMMs in Biology
Slides
Nov 20
Dimensionality reduction (PCA)
Notes
Chapter 4.1.4 - 4.1.6 in Bishop
Tutorial for PCA including MATLAB code.
Nov 27
Suport Vector Machine (SVM)
Slides
  • Max margin
  • Support vectors
  • Quadratic programming
  • Linear separation
Nov 29
Suport Vector Machine (SVM)
Slides
  • Lagrange multiplies
  • Dual formulation of SVM
  • Transformation of the input vector
  • The kernel trick
Software
Optional reading
Dec 4
Boosting
Slides
  • Weak classifiers
  • AdaBoost
  • Boosting and logistic regression
Dec 6
Model and feature selection
Slides
  • Cross validation
  • Regularization
  • Information theoretical selection methods
  • Feature selection