10-602 Selected Readings
CALD logo


Statistical Approaches to
Learning and Discovery

[ home | schedule | texts | assignments | readings ]


A highly selective and incomplete list of some of the papers that we will draw material from in the course. This will be updated throughout the semester.


General

  1. D. Mumford (1999). Dawning of the age of stochasticity.


Likelihoods and Posteriors

  1. P. Diaconis and Ylvisaker (1979). Conjugate priors for exponential families. Annals of Statistics 7, 269-281.
  2. D. Geiger and D. Heckerman, (1997). A characterization of the Dirichlet distribution through global and local parameter independence. Annals of Statistics, 25(3):1344-1369.


The EM Algorithm and Data Augmentation

  1. G. Givens, D. Smith and R. Tweedie (1997). Publication bias in meta-analysis: A Bayesian data-augumentation approach to account for issues exemplified in the passive smoking debate. Statistical Science, Vo. 12, No. 4, 221-250.
  2. X.L. Meng and D.B. Rubin (1993). Maximum likelihod estimation via the ECM algorithm: A general framework. Biometrika 80, 267-278.


Markov Chain Monte Carlo

  1. P. Diaconis and Laurent Aaloff-Coste (1998). What do we know about the Metropolis algorithm? Journal of Computer and System Sciences 57(1):20-36.
  2. J. Propp and D. Wilson (1998). How to get a perfectly random sample from a generic Markov chain and generate a random spanning tree of a directed graph. Journal of Algorithms, 27(2):170-217.
  3. A. Sinclair and M. Jerrum (1989). Approximate counting, uniform generation and rapidly mixing Markov chains. Information and Computation, 82(1):93-133, July.


Techniques for Supervised and Unsupervised Learning

  1. S. Della Pietra, V. Della Pietra, and J. Lafferty, (1997). Inducing features of random fields, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380-393, April.
  2. D. Burshtein, V. Della Pietra, D. Kanensky, A. Nadas, (1992). Minimum impurity partitions, Annals of Statistics.
  3. J. Friedman, T. Hastie, R. Tibshirani (2000). A statistical view of boosting, Annals of Statistics, to appear.
  4. J. Lafferty (1999). Additive models, boosting, and inference for generalized divergences, in Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT'99).


Information Theory and Statistics

  1. I. Csiszar and Tusnady (1984). Information geometry and alternating minimization procedures. Statistics and Decisions, Supplement Issue 1, 205-237.
  2. J. O'Sullivan (1998). Alternating minimzation algorithms: From Blahut-Arimoto to expectation-maximization. In Codes, Curves and Signals: Common Threads in Communications (A. Vardy, ed.) Kluwer, Boston.

[ home | schedule | texts | assignments | readings ]


lafferty@cs.cmu.edu