Next: Conclusions Up: Acquiring Word-Meaning Mappings for Previous: Active Learning

Future Work

Although WOLFIE's current greedy search method has performed quite well, a better search heuristic or alternative search strategy could result in improvements. We should also more thoroughly evaluate WOLFIE's ability to learn long phrases, as we restricted this ability in the evaluations here. Another issue is robustness in the face of noise. The current algorithm is not guaranteed to learn a correct lexicon in even a noise-free corpus. The addition of noise complicates an analysis of circumstances in which mistakes are likely to happen. Further theoretical and empirical analysis of these issues is warranted. Referential uncertainty could be handled, with an increase in complexity, by forming LICS from more pairs of representations with which a phrase appears, but not between alternative representations of the same sentence. Then, once a pair is added to the lexicon, for each sentence containing that word, representations can be eliminated if they do not contain the learned meaning, provided another representation does contain it (thus allowing for lexical ambiguity). We plan to flesh this out and evaluate the results. A different avenue of exploration is to apply WOLFIE to a corpus of sentences paired with the more common query language, SQL. Such corpora should be easily constructible by recording queries submitted to existing SQL applications along with their English forms, or translating existing lists of SQL queries into English (presumably an easier direction to translate). The fact that the same training data can be used to learn both a semantic lexicon and a parser also helps limit the overall burden of constructing a complete natural language interface. With respect to active learning, experiments on additional corpora are needed to test the ability of our approach to reduce annotation costs in a variety of domains. It would also be interesting to explore active learning for other natural language processing problems such as syntactic parsing, word-sense disambiguation, and machine translation. Our current results have involved a certainty-based approach; however, proponents of committee-based approaches have convincing arguments for their theoretical advantages. Our initial attempts at adapting committee-based approaches to our systems were not very successful; however, additional research on this topic is indicated. One critical problem is obtaining diverse committees that properly sample the version space [Cohn et al.1994].

Next: Conclusions Up: Acquiring Word-Meaning Mappings for Previous: Active Learning

Cindi Thompson
2003-01-02