Foundations [1 week]
Basic Tools from Probability and Statistics: Laws of probability, Bayes' theorem, Maxium likelihood, Estimators (variance, bias, consistency, efficiency).The Noisy Channel Model [1 week]Basic Concepts from Information Theory: Properties of entropy, Kullback-Leibler divergence, mutual information, data processing inequality, compression and coding, arithmetic coding, intuitive interpretation .
The source-channel model. Applications: speech, translation, spelling correction, OCR, speech processing with side information, other problems in language processing.Language Modeling and N-grams [1.5 weeks]
perplexity and alternative measures, data sparseness, conditional modeling, history partitioning, N-grams. Word frequencies, Zipf's law, type-token curves, vocabulary and n-gram growth, the zero frequency problem, smoothing, discounting, the Good-Turing estimate. The backoff model. A Dirichlet language model. Ngram data structures, the CMU-Cambridge toolkit .The EM Algorithm [1 week]
The basic algorithm and example applications. The mathematics underlying the algorithm.Finite state models [2 weeks]
Markov chains, hidden Markov models and the forward-backward algorithm, the Cave-Neuwirth analysis of English, deleted interpolation, class-based n-grams, tagging.Clustering and Decision Trees [1.5 weeks]
Clustering: hierarchical clustering, mutual information techniques, word compounds.Stochastic Grammars [1 week]
Decision Trees: The CART technique, applications.
The inside-outside algorithm, context-sensitive models, automatic grammar induction, link grammars.Maximum Entropy [1.5 weeks]
Exponential models, triggers, feature induction and iterative scaling, priors and distance modelsLanguage Model Adaptation [1/2 week]
caches, smoothing, Bayesian methods.Special Topics [2 weeks]
statistical machine translation, text segmentation , tokenization and text conditioning, Search algorithms.
Back to Language and Statistics homepage.