Information extraction is finding names of entities in unstructured or partially structured text, and determining the relationships that hold between these entities. More succinctly, information extraction is the problem of deriving structured factual information from text.
This course considers the problem of information extraction from a machine-learning prospective. We will survey a variety of learning methods that have been used for information extraction, including rule-learning, boosting, and sequential classification methods such as hidden Markov models, conditional random fields, and structured support vector machines. We will also look at experimental results from a number of specific information extraction domains, such as biomedical text, and discuss semi-supervised "bootstrapping" learning methods for information extraction.