models for semisupervised learning

Models for Natural Language Learning using Unlabeled Data

Here is a lightly-annotated bibliography of papers on learning from labeled and unlabeled data. It focuses on methods especially relevant to bootstrap learning for natural language analysis, and on theoretical models for how and when we should expect unlabeled data to be helpful.

Please edit this file ( /afs/cs/project/theo-21/www/semisupervised.html) to add more citations.

Natural language bootstrap learning:

Yarowsky wrote an early paper describing how to learn to disambiguate word senses. It makes the assumption that each occurance of a word (e.g., "bank") in a document has the same meaning (e.g., river bank or financial bank). Abney's paper is a more recent formal analysis of why Yarowsky's algorithm works.

Yarowsky, David. 1995. Unsupervised word sense disambiguation rivaling supervised methods.
In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics,
pages 189–196.
Abney, Steven. Understanding the Yarowsky Algorithm, Association for Computational Linguistics, ???

Cotraining uses unlabeled data together with labeled data to learn f(x)=y when x can be expressed as a pair of features x=<x1,x2> such that both x1 and x2 are individually sufficient to predict y. This has been used to train web page classifiers, named entity recognizers, image classifiers, and more. The idea was introduced in 1998 by Blum & Mitchell and has been applied and extended in several directions.

A. Blum and T. Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT-98).
Michael Collins and Yoram Singer: Unsupervised models for named entity classification. In Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999.
K. Nigam & R.Ghani, 2000. Analyzing the Effectiveness and Applicability of Co-Training. Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM 2000).
Sarkar, A. 2001. Applying Co-Training methods to Statistical Parsing, in Proceedings of the 2nd Annual Meeting of the NAACL, pages 95C102, Pittsburgh, PA.
D. Pierce & C. Cardie, 2001. Limitations of Co-Training for Natural Language Learning from Large Datasets, In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing

Another line of bootstrapping algorithms was started by Sergei Brin's paper on using web search as a subroutine for bootstrap learning using the web as a training corpus.

S. Brin, 1998. Extracting patterns and relations from the World Wide Web, EDBT'98

Etzioni's group has pushed very large scale extraction from the web, based on bootstrap learning of named entity extractors and relation extractors

O. Etzioni et al., 2005. "Unsupervised Named-Entity Extraction from the Web: An Experimental Study," AI Journal, 2005.

Theoretical models for bootstrap learning:

The above papers contain a number of theoretiical models, especially the Blum & Mitchell 1998 paper, and the Abney paper. Following are more recent theoretical models for how and when unlabeled data can improve learning.

These papers provide PAC-style bounds on co-training and related learning settings that go beyond those provided in the original co-training paper.

S. Dasgupta, M. Littman, D. McAllester, 2001. PAC Generalization Bounds for Co-Training , NIPS 2001.

This paper extends the co-training theory model to capture the iterative expansion of the domain of the learned function

M. Balcan, A. Blum, & K. Yang, 2004. Co-Training and Expansion: Towards Bridging Theory and Practice , NIPS 2004.

This paper considers multiple function approximators instead of multiple views on the data, leading to a Boosting-style approach.

B. Leskes. 2005. The Value of Agreement, a New Boosting Algorithm , COLT 2005.

Whereas the above papers focus on PAC bounds, the following paper has a very different focus. It presents a statistical model for estimating accuracy for bootstrap learning of named entity and relation extractors, under the assumption that correct entities and relations will be repeatedly extracted from a large corpus, and that correct extractions will be repeatedly more frequently than incorrect extractions. This is used by Etzioni's system described above.

Downey, Etzioni, & Soderland, 2005. A Probabilistic Model of Redundancy in Information Extraction, IJCAI 2005.