This course seeks to cover statistical modeling techniques for
discrete, structured data such as text. It brings together content
previously covered in Language and Statistics 2 (11-762) and
Information Extraction (10-707 and 11-748), and aims to define a
canonical set of models and techniques applicable to problems in
natural language processing, information extraction, and other
application areas. Upon completion, students will have a broad
understanding of machine learning techniques for structured outputs,
will be able to develop appropriate algorithms for use in new
research, and will be able to critically read related literature. The
course is organized around methods, with example tasks introduced
throughout.
We expect that the course will be of interest not only to LTI and MLD students, but also to students in the Lane Center, RI, and CSD.
Topics
Subject to change. Parenthesized numbers are approximate numbers of lectures.
Sequence models: HMMs and MEMMs for part-of-speech tagging, BIO tagging/chunking, and segmentation; CRFs; cyclic models and pseudolikelihood
Large margin models: structured and ranking perceptrons; structured SVMs
Inference: dynamic programming; search; integer linear programming; stacking and Searn
Tree models: PCFGs and phrase-structure parsing; spanning trees and dependency parsing
Kernels: kernels for inputs and relation extraction; kernels for outputs, reranking, and non-local features
Alignment models: edit distances for text or genomics; weighted FSTs; non-monotonic alignment and machine translation
Incomplete data: EM; latent-variable CRFs and SVMs; regularizers from unlabeled data; graph-based semisupervised learning; associative Markov networks; Bayesian grammars
Readings
Many lectures will be accompanied by a reading from
recent literature (typically machine learning or natural language
processing publications from the past decade). Supplementary readings
will be suggested from Prof. Smith's book, Linguistic Structure
Prediction.