ml

Common distribution:

beta. prior for bernoulli. apparently the larger the pesudo count is, the more skewed the beta distribution is. \alpha,\beta<1 favor sparse distribution.
dirichlet similar to beta. multivariate version

About conjugate prior relationship

http://www.johndcook.com/conjugate_prior_diagram.html

Time Series Analysis

EM algorithms:

EM:

E: Get a lowerbound of likelihood.Jensen's Inequality: E(f(x)) >= f(E(X))

M: Maximize that lowerbound

Resources

Machine Learning Summer School 2009 - Cambridge

Overview of ML methods

SVM: max-marginal
HMM
CRF
MEMM
SVM-HMM
HMM-LDA
Discriminative vs. Non-Discriminative. model posterior prob vs. distribution/likelihood? directly

sLDA HDP. YW Teh EPX

Prior: representing knowledge or belief about an unknown quantity
Point estimaation:
P(theta|x) = p(x|theta)p(theta)/p(x)
MLE: maximize likelihood probability -> p(x|theta). fits the data as much as possible
MAP: maximize posteriori prob -> p(theta|x)