Outline for 11-761: Language and Statistics


Foundations [1 week]

Basic Tools from Probability and Statistics: Laws of probability, Bayes' theorem, Maxium likelihood, Estimators (variance, bias, consistency, efficiency).

Basic Concepts from Information Theory: Properties of entropy, Kullback-Leibler divergence, mutual information, data processing inequality, compression and coding, arithmetic coding, intuitive interpretation .

The Noisy Channel Model [1 week]
The source-channel model. Applications: speech, translation, spelling correction, OCR, speech processing with side information, other problems in language processing.
Language Modeling and N-grams [1.5 weeks]
perplexity and alternative measures, data sparseness, conditional modeling, history partitioning, N-grams. Word frequencies, Zipf's law, type-token curves, vocabulary and n-gram growth, the zero frequency problem, smoothing, discounting, the Good-Turing estimate. The backoff model. A Dirichlet language model. Ngram data structures, the CMU-Cambridge toolkit .
The EM Algorithm [1 week]
The basic algorithm and example applications. The mathematics underlying the algorithm.
Finite state models [2 weeks]
Markov chains, hidden Markov models and the forward-backward algorithm, the Cave-Neuwirth analysis of English, deleted interpolation, class-based n-grams, tagging.
Clustering and Decision Trees [1.5 weeks]
Clustering: hierarchical clustering, mutual information techniques, word compounds.
Decision Trees: The CART technique, applications.
Stochastic Grammars [1 week]
The inside-outside algorithm, context-sensitive models, automatic grammar induction, link grammars.
Maximum Entropy [1.5 weeks]
Exponential models, triggers, feature induction and iterative scaling, priors and distance models
Language Model Adaptation [1/2 week]
caches, smoothing, Bayesian methods.
Special Topics [2 weeks]
statistical machine translation, text segmentation , tokenization and text conditioning, Search algorithms.


Back to Language and Statistics homepage.