Date |
Lecture |
Topics |
Readings and useful links |
Handouts |
Sept 8 |
Intro to ML
Slides |
- ML applications
- What consitutes an ML algorithm?
- Learning paradigms, Loss functions
- Supervised learning (classification, regression)
- Unsupervised learning (density estimation, clustering, dimensionality reduction)
- Bayes Optimal Learning Rule
|
Bishop: Sec 2.1, Appendix B Mithcell: Ch 1 |
|
Sept 13 |
Learning distributions
Slides |
- Learning parametric distributions
- Maximum Likelihood Estimation (MLE)
- Maximum A Posterior (MAP) Estimation
|
Andrew Moore's Basic Probability Tutorial
Bishop: Sec 2.2, 2.3 (up to 2.3.6) |
HW1 is out |
Sept 15 |
Optimal Classifier
Slides |
- MLE vs. MAP
- Bayes Optimal Classifier
|
Bishop: Sec 1.5
| |
Sept 20 |
Naive Bayes
Slides |
- Conditional Independence
- Naive Bayes Classifer
- Discrete Features
- Continuous Features
|
Mitchell's Chapter Draft
|
|
Sept 22 |
Logistic regression
Slides |
- Generative vs. Discrimiative Classifiers
- Logistic regression
|
Mitchell's Chapter Draft
Bishop: Sec 4.1-4.3
On Discriminative and Generative Classifiers, Ng and Jordan, NIPS, 2001 (pdf)
On gradient descent and Newton's method: Boyd's slides and Chapter 9 of Convex Optimization.
|
|
Sept 27 |
Regression
Slides |
- Linear Regression
- Polynomial Regression
|
Least Squares Applet
Tutorial on regression by Andrew Moore
Bishop: Sec 3.1
|
HW1 due
|
Sept 29 |
Nonparametric methods
Slides |
- Histogram, Kernel Density Estimation
- K-NN Classifier
- Kernel Regression
|
Bishop: Sec 2.5, 6.3
Mitchell: Ch 8
Tutorial on Instance-based Learning by Andrew Moore
|
HW2 is out |
Oct 4 |
Model Selection
Slides
|
- Overfitting
- Bias-Variance Tradeoff
- Model Selection
- Cross-validation
- Structural Risk Minimization
- Complexity Regularization
- Information Criteria (AIC, BIC, MDL)
|
Bishop: Sec 1.3, 3.1.4
Hastie: Ch 7 (recommended)
A study of CV and Bootstrap (optional)
MDL website (optional)
Model Selection and MDL principle paper by M. Hansen and B. Yu (optional)
|
|
Oct 6 |
Decision Trees
Slides
|
- Decision Tree Representation
- Entropy, Information gain
- Overfitting, Pre-and Post-pruning, MDL
|
Mitchell: Ch 3
Decision Tree Applet
|
|
Oct 11 |
Boosting
Slides
|
- Combining weak classifiers
- Adaboost algorithm
- Comparison with logistic regression and bagging
|
Bishop: Sec 14.3
Boosting homepage
Schapire: Boosting Tutorial, Video
Adaboost Applet
|
Project Proposal due
|
Oct 13 |
Support Vector Machines
Slides
|
- Maximizing margin
- SVM formulation
- Slack variables, Hinge loss
- Multi-class SVM
|
Bishop: Sec 7.1, Sec 4.1.1, 4.1.2, Appendix E
Stephen Boyd's book: Ch 5 (optional)
|
HW2 due
HW3 is out
|
Oct 18 |
Suuport Vector Machines
Slides
|
- Constrained Optimization
- Dual SVM
- Kernel Trick
- Comparison with Kernel regression and Logistic Regression
|
Bishop: Sec 6.1, 6.2
Tutorials on SVMs and Kernels
Additional resource: SVM website
|
|
Oct 20 |
|
Midterm Exam
|
Score distribution
|
Exam
Solution
|
Oct 25 |
Clustering
Slides
|
- What is clustering?
- Hierarhical Clustering
- Single linkage
- Complete linkage
- Average linkage
- Partition based Clustering
|
Bishop: Sec 9.1
|
|
Oct 27 |
EM Algorithm
Slides
|
- Gaussian Mixture Model
- Expectation Maximization Algo
|
Bishop: Ch 9
|
|
Nov 1 |
Learning Theory I
Slides
Annotated Slides
|
- Sample complexity
- Haussler bound
- PAC Learning
- Hoeffding's bound
|
Mitchell: Ch 7
|
HW3 due
HW4 is out
|
Nov 3 |
Learning Theory II
Slides
|
- VC dimension
- Mistake Bounds
|
Mitchell: Ch 7
|
|
Nov 8 |
HMM
Slides
|
- HMM Representation
- Forward Algorithm
- Forward-Backward Algorithm
- Viterbi Algorithm
- Baum-Welch Algorithm
|
Bishop: Ch 13
HMM and EM Tutorial
|
Midterm project report due
|
Nov 10 |
Graphical Models I
Slides
|
Representation - Directed models
- Factorization of joint distrubtion
- Local Markov Assumption
- D-separation
- Representation Theorem
|
Bishop: Ch 8
Graphical Models tutorial by M. Jordan
Intro to Graphical Models by K. Murphy
|
|
Nov 15 |
Graphical Models II
Slides
|
Representation - Undirected models
- Factorization of joint distribution
- Graph separation
- Hammersley-Clifford Theorem
Inference
|
Bishop: Ch 8
Graphical Models tutorial by M. Jordan
Intro to Graphical Models by K. Murphy
|
HW4 due
|
Nov 17 |
Graphical Models III
Dimensionality Reduction
Slides
|
Learning - Graphical Models
- Learning CPTs
- Learning structure - Chow-Liu Algorithm
Dimensionality Reduction
- Feature Selection
- PCA (Principal Components Analysis)
|
|
HW5 is out
|
Nov 22 |
Nonlinear Dim Red
Slides
Spectral Clustering
Slides
|
- Laplacian Eigenmaps
- Spectral Clustering
|
Belkin-Niyogi Paper on Laplacian Emaps
Spectral Clustering tutorial by Ulrike von Luxburg
Spectral Clustering demo
|
|
Nov 29 |
Neural Networks
Slides
|
Neural Networks
- Prediction - Forward Propagation
- Training - Backpropagation
|
Derivation of Backpropagation (pdf)
|
|
Dec 1 |
Semi-Supervised Learning
Slides
|
|
|
|
Dec 2 |
|
Project Poster Presentation (3-6 pm NSH Atrium)
|
|
|
Dec 7 |
|
Final Project report due (by 10:30 am)
|
Both project report and HW5 are due by 10:30 am in Michelle's office (GHC 8001)
|
HW5 due (by 10:30 am)
|
Dec 14 |
|
Final Exam (1-4 pm), DH 2210
|
|
|