Statistical Approaches to
Learning and Discovery
[ home | schedule | texts | assignments | readings ]
A highly selective and incomplete list of some of the papers that we will draw material from in the course. This will be updated throughout the semester.
General
- D. Mumford (1999). Dawning of the age of stochasticity.
Likelihoods and Posteriors
- P. Diaconis and Ylvisaker (1979). Conjugate priors for exponential families. Annals of Statistics 7, 269-281.
- D. Geiger and D. Heckerman, (1997). A characterization of the Dirichlet distribution through global and local parameter independence. Annals of Statistics, 25(3):1344-1369.
The EM Algorithm and Data Augmentation
- G. Givens, D. Smith and R. Tweedie (1997). Publication bias in meta-analysis: A Bayesian data-augumentation approach to account for issues exemplified in the passive smoking debate. Statistical Science, Vo. 12, No. 4, 221-250.
- X.L. Meng and D.B. Rubin (1993). Maximum likelihod estimation via the ECM algorithm: A general framework. Biometrika 80, 267-278.
Markov Chain Monte Carlo
- P. Diaconis and Laurent Aaloff-Coste (1998). What do we know about the Metropolis algorithm? Journal of Computer and System Sciences 57(1):20-36.
- J. Propp and D. Wilson (1998). How to get a perfectly random sample from a generic Markov chain and generate a random spanning tree of a directed graph. Journal of Algorithms, 27(2):170-217.
- A. Sinclair and M. Jerrum (1989). Approximate counting, uniform generation and rapidly mixing Markov chains. Information and Computation, 82(1):93-133, July.
Techniques for Supervised and Unsupervised Learning
- S. Della Pietra, V. Della Pietra, and J. Lafferty, (1997). Inducing features of random fields, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380-393, April.
- D. Burshtein, V. Della Pietra, D. Kanensky, A. Nadas, (1992). Minimum impurity partitions, Annals of Statistics.
- J. Friedman, T. Hastie, R. Tibshirani (2000). A statistical view of boosting, Annals of Statistics, to appear.
- J. Lafferty (1999). Additive models, boosting, and inference for generalized divergences, in Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT'99).
Information Theory and Statistics
- I. Csiszar and Tusnady (1984). Information geometry and alternating minimization procedures. Statistics and Decisions, Supplement Issue 1, 205-237.
- J. O'Sullivan (1998). Alternating minimzation algorithms: From Blahut-Arimoto to expectation-maximization. In Codes, Curves and Signals: Common Threads in Communications (A. Vardy, ed.) Kluwer, Boston.
[ home | schedule | texts | assignments | readings ]
lafferty@cs.cmu.edu