Next: Conclusions
Up: Acquiring Word-Meaning Mappings for
Previous: Active Learning
Future Work
Although WOLFIE's current greedy search method has performed quite
well, a better search heuristic or alternative search strategy could
result in improvements.
We should also more thoroughly evaluate WOLFIE's ability to learn long
phrases, as we restricted this ability in the evaluations here.
Another issue is robustness in the face of
noise. The current algorithm is not
guaranteed to learn a correct lexicon in even a noise-free corpus.
The addition of noise complicates an analysis of circumstances in
which mistakes are likely to happen. Further theoretical and
empirical analysis of these issues is warranted.
Referential uncertainty could be handled, with an increase in
complexity, by forming LICS from more pairs of representations with
which a phrase appears, but not between alternative representations of
the same sentence. Then, once a pair is added to the lexicon, for
each sentence containing that word, representations can be eliminated
if they do not contain the learned meaning, provided another
representation does contain it (thus allowing for lexical ambiguity).
We plan to flesh this out and evaluate the results.
A different avenue of exploration is to apply
WOLFIE to a corpus of sentences paired with the more common query language,
SQL. Such corpora should be easily constructible by recording
queries submitted to existing SQL applications along with their
English forms, or translating existing lists of SQL queries into
English (presumably an easier direction to translate). The fact that
the same training data can be used to learn both a semantic lexicon
and a parser also helps limit the overall burden of constructing a
complete natural language interface.
With respect to active learning, experiments on additional corpora are
needed to test the ability of our approach to reduce annotation costs
in a variety of domains. It would also be interesting to explore
active learning for other natural language processing problems such as
syntactic parsing, word-sense disambiguation, and machine translation.
Our current results have involved a certainty-based approach; however,
proponents of committee-based approaches have convincing arguments for their
theoretical advantages. Our initial attempts at adapting committee-based
approaches to our systems were not very successful; however, additional
research on this topic is indicated. One critical problem is obtaining diverse
committees that properly sample the version space [Cohn et al.1994].
Next: Conclusions
Up: Acquiring Word-Meaning Mappings for
Previous: Active Learning
Cindi Thompson
2003-01-02