Known NL-Soar bugs & black holes

Known NL-Soar bugs & black holes

Summary

This page contains a list of known NL-soar bugs and places where the system needs additional development.

Two (and sometimes three?)
Here is a directory containing mail messages about various problematic sentences. A large majority have to do with problems for the limit of two indescriminable items on the A/R set. Rick Lewis' newest (Soar Workshop 17, June 1997) formulation of the limit two treats the most recent item separately, so in fact we have "three". This could be used to help in a number of these cases.

Remove operator
Additional help for some of the problems in the above directory could come from a smarter implementation of the remove operator (i.e. knowledge about when to consider a constituent "closed" and consequently take it off the A/R set.) We have so far done very little to fill out the ar-set space, relying mainly on a "remove-less-recent" strategy and a few exceptions. One important aspectof development which generation may require is to allow the remove operator to build chunks. Right now it does not because of potential masking problems where in once case strategy 1 is appropriate, but in the next case strategy 2 is appropriate. We saw this with some of the tacair sentences, (e.g. contact bearing 260 range 35) where we would want to remove "bearing" instead of "contact", but in other cases, the least recent NP would be the right one to remove. Some experimental code which allows the remove operatorto build chunks is found here.

Communication from semantics back to syntax.
Currently we have none- semantics uses the check-syntax constraint to keep the semantics model consistent with semantics commitments already made. If however the resulting semantics model is itself inconsistent, then we would want to go back and correct the syntax. Some early experimentation with these sorts of snip operators is described here . Since this code is experimental, it has not been integrated into the nl9702 release.

Selection Restrictions
Check syntax is a constraint we use to keep the semantics model selection restrictions between which semantics items can reasonably go with which others. One idea for dimensions along which we might fill out these selection restrictions is to use Pustejovsky's qualia structures.

Type coercion
NL-Soar is currently very rigid about matching sense in Semantics link proposals. This requires us to have multiple semantics entries for words so that the category and psense will be defined appropriately to the sentences in our corpora. This in turn causes a proliferation of semantics empty-ops to be built to allow all the unattached profiles. If the implementation of these constraints were smarter, we could reduce to number of polysemous cases we have in the lexicon and also reduce the need to use the wildcard "*" in lcs entries.

Polysemy
Polysemy of syntax is handled fairly well when the items have different lexical categories (i.e. GREEN the noun and GREEN the adj in our regression corpus .) We have tried to handle polysemy within the same category with our "have" sentences (i.e. HAVE main verb and HAVE auxiliary). Although comprehension can do fairly well with choosing an initial interpretation and then doing the appropriate snips when the alternate is choosen, generation is not as able to accommodate the chunks produced. Additionally, polysemy of have very often causes the limit of "two" to be met sooner than desired (i.e. see the above note and the mail message for the sentence "the doctor suspects the patient has a tumor"). Polysemy in semantics could be helped by the type coercion discussed above. We do not run into the problem with the limit "two" here as long as each polysemous word has a different psense value. This is because the implementation of the semantics A/R set uses the psense as a discriminator.

Lexicon
Currently we have separate entries for the syntax and semantics profiles of each word. This was because of the history of development and the undesireability of rewriting all the lexical entries. The two profiles are joined at lexical access time by a number of productions that calculate ^semantics and ^syntax pointers to go on the zero-head of each profile. This strategy works fine as long as there is only one correspondence. We have run in to problems when a word is polysemous across two category syntactic types which both map to a particular semantics category type. For example, when a word is both a determiner and an adjective, or both an adjective and an adverb, these would both map to the semantics category PROPERTY. Short of rewriting the lexicon to include both profiles in the entry, our only recourse has been to prohibit these situations from occurring. So far it hasn't been a major problem, but no doubt will be as coverage increases.

(Last updated 08-14-97 by vandyke@cs.cmu.edu)