Common distribution:
- beta. prior for bernoulli. apparently the larger the pesudo count is, the more skewed the beta distribution is. \alpha,\beta<1 favor sparse distribution.
- dirichlet similar to beta. multivariate version
About conjugate prior relationship
http://www.johndcook.com/conjugate_prior_diagram.html
Time Series Analysis
EM algorithms:
Andrew Ng
EM:
E: Get a lowerbound of likelihood.Jensen's Inequality: E(f(x)) >= f(E(X))
M: Maximize that lowerbound
Resources
Machine Learning Summer School 2009 - Cambridge
Overview of ML methods
- SVM: max-marginal
- HMM
- CRF
- MEMM
- SVM-HMM
- HMM-LDA
- Discriminative vs. Non-Discriminative. model posterior prob vs. distribution/likelihood? directly
sLDA
HDP. YW Teh EPX
Prior: representing knowledge or belief about an unknown quantity
Point estimaation:
P(theta|x) = p(x|theta)p(theta)/p(x)
MLE: maximize likelihood probability -> p(x|theta). fits the data as much as possible
MAP: maximize posteriori prob -> p(theta|x)