Next: LICS versus Fracturing
Up: A Database Query Application
Previous: Performance for Other Natural
Next, we present results on a larger, more diverse corpus from the
geography domain, where the additional sentences were collected from
computer science undergraduates in an introductory AI course. The set
of questions in the smaller corpus was collected from students in a
German class, with no special instructions on the complexity of
queries desired. The AI students tended to ask more complex and
diverse queries: their task was to give five interesting questions and
the associated logical form for a homework assignment, though again
they did not have direct access to the database.
They were requested to give at least one sentence whose representation included
a predicate containing embedded predicates, for example
largest(S, state(S)), and we asked for variety in their sentences.
There were 221 new sentences, for a total of 471 (including the
original 250 sentences).
For these experiments, we split the data into 425 training sentences
and 46 test sentences, for 10 random splits, then trained WOLFIE and
then CHILL as before. Our goal was to see whether WOLFIE was still
effective for this more difficult corpus, since there were
approximately 40 novel words in the new sentences. Therefore, we
tested against the performance of CHILL with an extended hand-built
lexicon. For this test, we stripped sentences of phrases known to have empty
meanings, as in the example of Section 4.2.
Again, we did not use phrases of more than one word, since these do
not seem to make a significant difference in this domain. For these
results, we compare WOLFIE's lexicons for CHILL using hand-built
lexicons without phrases that only appear in the test set.
Figure 12 shows the resulting learning curves.
The differences between CHILL using the hand-built and
learned lexicons are statistically significant
at 175, 225, 325, and 425 examples (four out of the nine data points).
The more mixed results here
indicate both the difficulty of the domain and the more variable vocabulary.
However, the improvement of machine learning
methods over the GEOBASE hand-built interface is much more dramatic
for this corpus.
Figure 12:
Accuracy on the Larger Geography Corpus
|
Next: LICS versus Fracturing
Up: A Database Query Application
Previous: Performance for Other Natural
Cindi Thompson
2003-01-02