Date |
Lecture |
Topics |
Readings and useful links |
Handouts |
Sep 4 |
Intro to ML
Decision Trees
Slides
|
- Machine learning examples
- Well defined machine learning problem
- Decision tree learning
|
Mitchell: Ch 3
Bishop: Ch 14.4
The
Discipline of Machine Learning
|
HW1 out |
Sep 6 |
Decision Tree learning
Review of Probability
slides
|
- The big picture
- Overfitting
- Random variables, probabilities
|
Andrew Moore's Basic Probability Tutorial
Bishop: Ch. 1 thru 1.2.3
Bishop: Ch 2 thru 2.2
|
|
Sep 11 |
|
|
Andrew Moore's Basic Probability Tutorial
Bishop: Ch. 1 thru 1.2.3
Bishop: Ch 2 thru 2.2
Tom's VIDEO:
Probability and Estimation
|
|
Sep 13 |
|
- Conditional independence
- Naive Bayes
|
Mitchell:
Naive Bayes and Logistic Regression
Tom's
VIDEO: Naive Bayes
|
|
Sep 18 |
|
- Gaussian Bayes classifiers
- Document classification
- Brain image classification
- Form of decision surfaces
|
Mitchell:
Naive Bayes and Logistic Regression
Tom's
VIDEO: Gaussian Bayes
|
|
Sep 20 |
|
- Naive Bayes - the big picture
- Logistic Regression: Maximizing conditional likelihood
- Gradient ascent as a general learning/optimization method
|
Mitchell:
Naive Bayes and Logistic Regression
Optional: Ng & Jordan: On
Discriminative and Generative Classifiers, NIPS, 2001.
Tom's
VIDEO: Logistic regression
|
|
Sep 25 |
|
- Generative/Discriminative models
- minimizing squared error and maximizing data likelihood
- regularization
- bias-variance decomposition
|
Bishop: Ch. 1 thru 1.2.5, Ch. 3 thru 3.2
Optional: Mitchell: Ch. 6.4
Tom's
VIDEO: Linear regression
|
|
Sep 27 |
|
- Non-linear regression
- Gradient descent
- Learning of representations
- Deep Belief Networks
|
Mitchell: Ch. 4, or Bishop: Ch. 5
Optional: Le et al., 2012
Tom's
VIDEO: Neural networks
|
|
Oct 2 |
|
- Bayes nets
- representing joint distributions with conditional independence assumptions
|
|
|
Oct 4 |
|
- Inference
- Learning from fully observed data
- Learning from partially observed data
|
Intro. to Graphical Models, K. Murphy
Tom's VIDEO: Graphical models 2
|
|
Oct 9 |
|
- EM
- Semi-supervised learning
- Mixture of Gaussian clustering
- K-Means clustering
|
Bishop: Ch. 9 through 9.2
Optional: EM
and HMM tutorial J.Bilmes (sec. 1-3)
Tom's VIDEO: Graphical models 3
Tom's VIDEO: Graphical models 4
|
|
Oct 11 |
|
- Computational Learning Theory
- Probably approximately correct learning
|
Mitchell: Ch. 7
Tom's
VIDEO:
Learning theory 1
|
|
Oct 16 |
|
- VC Dimension
- Agnostic learning models
- Mistake bound models
|
Mitchell: Ch. 7
Tom's
VIDEO:
Learning theory 2
|
|
Oct 18 |
|
Midterm
|
|
|
Oct 23 |
Hierarchical Clustering
Slides
|
- Distance functions
- Hierarchical clustering
- Number of clusters
|
Bishop: 9-9.2
Optional: Tutorial on clustering
Hierarchical clustering app
|
|
Oct 25 |
Semi-Supervised Learning
Slides
|
- Semi-supervised learning
- Re-weighting labeled examples
- CoTraining
- Detecting overfitting
|
Optional: Advanced tutorial
|
HW 4 out HW 4 data |
Oct 30 |
|
- Graphical models
- Constructing a BN
- Inference in BNs
- Why its hard
- Variable elimination
- Stochastic inference
|
Chap 8.1 and 8.2.2 (Bishop)
Optional: Tutorials: 1
2
|
|
Nov 1 |
Inference in Bayesian Networks
Slides
|
- Why its hard
- Variable elimination
- Stochastic inference
- Introduction to HMMs
|
Chap 8.1 and 8.2.2 (Bishop)
Optional: Tutorials: 1
2
|
|
Nov 6 |
Inference in Hidden Markov models
Slides
|
- Formal definition of HMMs
- Inference in HMMs
- With no observations
- With observations
- The Viterbi algorithm
|
Bishop - 13-13.2.1 (inclusive)
Tutorial Tutorial
|
|
Nov 8 |
|
- Learning parameters when states can be observed
- Fully unsupervised
- Forward backward algorithm
- EM for HMM learning
- Introduction to Markov decision processes (MDPs)
|
Bishop - 13.2.1-13.2.2
Tutorial Tutorial
|
|
Nov 13 |
Markov Decision Processes (MDP)
Slides
|
- Formal definition of MDPs
- Inference in MDPs
- With no actions
- With actions
|
Tutorial
Demo
|
|
Nov 15 |
|
|
|
|
Nov 20 |
Dimensionality reduction (PCA)
Notes
|
|
Chapter 4.1.4 - 4.1.6 in Bishop
Tutorial for PCA including MATLAB code.
|
|
Nov 27 |
Suport Vector Machine (SVM)
Slides
|
- Max margin
- Support vectors
- Quadratic programming
- Linear separation
|
|
|
Nov 29 |
Suport Vector Machine (SVM)
Slides
|
- Lagrange multiplies
- Dual formulation of SVM
- Transformation of the input vector
- The kernel trick
|
Software
Optional reading
|
|
Dec 4 |
|
- Weak classifiers
- AdaBoost
- Boosting and logistic regression
|
|
|
Dec 6 |
Model and feature selection
Slides
|
- Cross validation
- Regularization
- Information theoretical selection methods
- Feature selection
|
|
|