Summary
The development and coverage of NL-Soar has been driven mainly by the various applications
of the system. Each application has linguistic (i.e. grammatical,
semantic, and pragmatic) characteristics which may be ideosyncratic to it or which may
be shared by other applications. Our approach to handling the language required
in a particular domain is to collect a corpus of utterances in that domain and to
extend NL-Soar to cover that corpus.
This is an index into the corpora that the NL9702 release of NL-Soar handles.
- Regression
This corpus contains a list of sentences which exemplify the core syntactic
structures that NL-Soar covers. The form of sentences do not reflect a particular
domain, but since its goal is to exemplify frequently occuring structures in English
we would expect similar sentences to occur in several domains.
- Simultaneous Interpretation.
This corpus is a selection of sentences simplified from the Dillinger simultaneous
translation corpus.
This is an index of other corpora that previous or future releases of NL-Soar may handle.
- TacAir
This corpus contains utterances in the tactical air domain.
- Dialogue
This corpus is a collection of dialogues which occur in the Tactical Air domain.
- Processing Breakdown
This corpus contains sentences testing various problematic and unproblematic sentences,
including garden paths and other syntactic ambiguities.
- Second Language Learning
This corpus contains a collection of sentence pairs in English and in Polish which
require the production of definite and indefinite NPs in various contexts.
- Speech
These corpora are from the ATIS Air Traffic Controller task, and from traffic reports
recorded from the radio. They were chosen because of the accompanying speech files.
Maintenance
NL-Soar's coverage of these corpora is maintained from release to release by use of
a suite of regression
testing tools .
(Last updated 08-14-97 by vandyke@cs.cmu.edu)