Next: The Lexicon Acquisition Problem
Up: Background
Previous: CHILL
Jeff Siskind's Lexicon Learning Research
The most closely related previous research into automated lexicon
acquisition is that of Siskind (1996), itself inspired by work
by Rayner (1988). As we will be comparing
our system to his in Section 5, we describe the main
features of his research in this section. His goal is one of
cognitive modeling of children's acquisition of the lexicon, where
that lexicon can be used for both comprehension and generation. Our
goal is a machine learning and engineering one, and focuses on a lexicon for
comprehension and use in parsing,
using a learning process that does not claim any cognitive
plausibility, and with the goal of learning a lexicon that generalizes
well from a small number of training examples.
His system takes an incremental approach to acquiring
a lexicon. Learning proceeds in two stages. The first stage learns
which symbols in the representation are to be used in the final
``conceptual expression'' that represents the meaning of a word, by
using a version-space approach. The
second stage learns how these symbols are put together to form
the final representation. For example, when learning the meaning of
the word ``raise'', the algorithm may learn the set {CAUSE, GO, UP}
during the first stage and put them together to form the expression
CAUSE(x, GO(y, UP)) during the second stage.
Siskind (1996) shows the effectiveness of his approach on a
series of artificial corpora. The system handles noise, lexical ambiguity,
referential uncertainty, and very large corpora, but the usefulness of
lexicons learned is only compared to the ``correct,'' artificial
lexicon. The goal of the experiments presented there was to evaluate
the correctness and completeness of learned lexicons. Earlier work
[Siskind1992] also evaluated versions of his technique on a
quite small corpus of real English and Japanese sentences. We extend that
evaluation to a demonstration of the system's usefulness in
performing real world natural language processing tasks, using a larger
corpus of real sentences.
Next: The Lexicon Acquisition Problem
Up: Background
Previous: CHILL
Cindi Thompson
2003-01-02