Next: Acknowledgements
Up: Acquiring Word-Meaning Mappings for
Previous: Future Work
Conclusions
Acquiring a semantic lexicon from a corpus of sentences labeled with
representations of their meaning is an important problem that has not
been widely studied. We present both a formalism of the learning problem and a greedy algorithm to find an approximate solution to it.
WOLFIE demonstrates that a fairly simple, greedy,
symbolic learning algorithm performs well on this task and obtains
performance superior to a previous lexicon acquisition system on a
corpus of geography queries. Our results also demonstrate that our
methods extend to a variety of natural languages besides English, and
that they scale fairly well to larger, more difficult corpora.
Active learning is a new area of machine learning that has been almost
exclusively applied to classification tasks. We have demonstrated its
successful application to more complex natural language
mappings from phrases to semantic meanings, supporting the acquisition
of lexicons and parsers. The wealth of unannotated natural language
data, along with the difficulty of annotating such data, make
selective sampling a potentially invaluable technique for natural
language learning. Our results on realistic corpora indicate that
example annotations savings as high as
22% can be achieved by employing active
sample selection using only simple certainty measures for predictions
on unannotated data. Improved sample selection methods and
applications to other important language problems hold the promise of
continued progress in using machine learning to construct effective
natural language processing systems.
Most experiments in corpus-based natural language have presented
results on some subtask of natural language, and there are few results
on whether the learned subsystems can be successfully integrated to
build a complete NLP system. The experiments presented in this paper
demonstrated how two learning systems, WOLFIE and CHILL, were
successfully integrated to learn a complete NLP system for parsing
database queries into executable logical form given only a single
corpus of annotated queries, and further demonstrated the potential of
active learning to reduce the annotation effort for learning for NLP.
Next: Acknowledgements
Up: Acquiring Word-Meaning Mappings for
Previous: Future Work
Cindi Thompson
2003-01-02