A Larger Corpus

Next: LICS versus Fracturing Up: A Database Query Application Previous: Performance for Other Natural

A Larger Corpus

Next, we present results on a larger, more diverse corpus from the geography domain, where the additional sentences were collected from computer science undergraduates in an introductory AI course. The set of questions in the smaller corpus was collected from students in a German class, with no special instructions on the complexity of queries desired. The AI students tended to ask more complex and diverse queries: their task was to give five interesting questions and the associated logical form for a homework assignment, though again they did not have direct access to the database. They were requested to give at least one sentence whose representation included a predicate containing embedded predicates, for example largest(S, state(S)), and we asked for variety in their sentences. There were 221 new sentences, for a total of 471 (including the original 250 sentences). For these experiments, we split the data into 425 training sentences and 46 test sentences, for 10 random splits, then trained WOLFIE and then CHILL as before. Our goal was to see whether WOLFIE was still effective for this more difficult corpus, since there were approximately 40 novel words in the new sentences. Therefore, we tested against the performance of CHILL with an extended hand-built lexicon. For this test, we stripped sentences of phrases known to have empty meanings, as in the example of Section 4.2. Again, we did not use phrases of more than one word, since these do not seem to make a significant difference in this domain. For these results, we compare WOLFIE's lexicons for CHILL using hand-built lexicons without phrases that only appear in the test set. Figure 12 shows the resulting learning curves. The differences between CHILL using the hand-built and learned lexicons are statistically significant at 175, 225, 325, and 425 examples (four out of the nine data points). The more mixed results here indicate both the difficulty of the domain and the more variable vocabulary. However, the improvement of machine learning methods over the GEOBASE hand-built interface is much more dramatic for this corpus.

**Figure 12:** Accuracy on the Larger Geography Corpus
$\begin{figure}\centerline{\epsfxsize=4.5in \epsfbox{newlrg0.ps}} \end{figure}$

Next: LICS versus Fracturing Up: A Database Query Application Previous: Performance for Other Natural

Cindi Thompson
2003-01-02