"The Lexicon"

THE LEXICON

Overview

This page details the structure of the lexicon used in nl-soar for both syntactic and semantic entries. Examples of lexical entries for the word 'wanted' are here.

The left hand side of the lexical productions match for the access operator for that word in question. The right hand side of the productions add profiles to the top-state sentence attribute; these profiles (or sem-profiles) add which contain information about the syntactic (or semantic) properties of the word. The attributes containing this information are discussed below. During processing attributes will be added, removed and modified from thses profiles.

Syntactic attributes

There are two syntactic lexical entries for each verb [WHY?]. One contains the information about the verb itself, the other information about the tense of the verb ???

word-id. The unique word-identifier copied dwon from the top-sate sentence attribute.
category. The syntactic category of the word. Categories include Verb, Noun, Determiner, Adjective, Prepostion and ADVerbs.
bar-level. The bar level of the word (from X-Bar theory ). For lexical entiries this is always zero, other levels are projected from this information.
sense. An identifier to distinguish between different senses of the same word. Eventually this number will be the WORDNET sense number.
language. An attribute to indicate the language of the word. This is obviously important when more than one language is being considered (e.g., in translation.). The language attribute has recently (nl9701) been added to the left hand side of lexical productions.
downward-movement-projection. Initially set to be *empty*, the value of this pointer will be the value of the I profile associated with the verb. When the I profile is created by project-new-word productions during the access operator, the value is set to the appropriate I profile. The attribute is then used in generate-operator proposals to link the VP as the comp of the IP.
root . The morphological root of the word.
complements. The number of complements. Can have a value of zero (for intransitive verbs such as die), one (for transitive verbs such as want) or two (for di-transitive verbs such as give ). Prepositions usually have a value of one as do some nouns.
subcat. An attribute containing a list of attributes (also called subcat) which dictate what kind of categories a verb can assign as complements. In the wanted example the subcat attributes of n and c indicates that the verb can have a noun (e.g., the man wanted the horse or c (e.g., the man wanted the horse to eat the apple ) the complement.
empty-node This is required for adjoinlinks, so that new nodes do not need to be gensymed. Doing this would create problems for chunking the adjoin u-constructor.
right-edge. An attribute which holds the word-id of the right edge of this word in a u-model structure. As, when initially accessed, the word is isolated and not linked to any other structure, its right edge is itself. When the word is incorporated into a u-model, this attribute will be updated by compute-edges productions. For example when the verb takes a noun complement as in the man wanted the horse the right edge will be the word-id of horse. When the verb takes a C complement as in the man wanted the horse to like the apple the right-edge will be the word-id of apple.
left-edge. As above but contains the left edge information. For exabmple in the sentence the man wanted the horse the left edge of wanted will be the word-id of the first the .
agr. A attribute which contains information for noun-verb agreement. The agr attribute has a token attribute which represents a concatenation of the grammatical person and number that the verb agrees with. In the wanted example ^token 1s2s3s1p2p3p indicates wanted is first, second and third person singular and plural.
tense. The tense of the verb.
index. This is the index for traces in a verb (as in tense) or with chains.
label Carries the word-name of the word.

Semantic attributes

category. The semantic category of the word. Categories include thing, state, action, property, place and event . These were taken from Jackendoff (1990), as described with Semantics A/R set.
psense. The psense of the word. Psenses are an intermediate level of meaning abstraction which are used as the intermediate level of indexing on the Semantics A/R set . These are meant to come from Wordnet. The choice of the appropriate psense has important implications for transfer of s-constructors since the psense value is used in the A/R set and will therfore be part of the proposal conditions of s-constructors.
zero-head. A pointer to the profile of a word. This is the same as zero-nodes in syntax entries, and the name should ideally be changed.
wordnetsense. This attribute will eventual hold the correct WORDNET sense of the word. In nl9701 however, these values are not correct.
language. The language of the entry.
word-name. The word-name of the entry.
word-id. The unique identifier of the word.
external, internal, internal2 For each possible role that the semantic entry can assign, there will be sub-structure which specifies constraints on how these roles can be filled. The substructure is like a semantic profile itself-- specifying the nature of the (L)CS that should fill the role. It therefore contains the following attributes:

category. The category that a LCS entry must be in order to fill this role. This will be checked by the check-category constraint .
psense. The psense that a LCS entry must have in order to fill this role. If this attribute is * (wildcard) then it can accept any psense. This will be checked by the check-psense constraint .
animate inanimate This attributes (with values t or f are also used for ascertaining compatibility for semantic fuses. In is unclear however, if these attributes will be needed as well as the category and psense restrictions. These attributes will be checked for compatibility with the check-restrictions constraint . Note that in 9702, this constraint has been removed.
internal *empty* An attribute to indicate that this role is not yet filled. Once a fuse has been made and the role is filled, this attribute is removed because the profile fo the receiver replaces this pseudo-profile.
zero-head. An attribute which points back to the semprofile of the LCS.

Linking the Syntactic and Semantic profiles
In order to keep the syntactic u-model and the semantic s-models compatible with one another there must be a way in which we can go from an entry in one model to the corresponding entry in the other. An original attempt at achieving this involved matching the sense and wordnetsense of the syntactic and semantic entries. However, because there is not necessairily a one to one correspondence between syntactic and semantic entries this could not be continued - a syntatic profile can have links to multiple semantic profiles but semantic profile can only be linked to one syntactic profile. For example, for wanted there is only one syntactic entry to cover the cases when a N or a C is a complement; in semantics there are two seperate entries one where the action takes an internal of category thing, the other which takes an internal for category state. Instead a series of productions adds links between semantics and syntax based on the category of the entries (there are probably more of these required):

top*ao*access*profile*syn-sem-pointer*verb-action which links a verb to actions and states.
top*ao*access*profile*syn-sem-pointer*noun-thing which links a noun to a thing .
top*ao*access*profile*syn-sem-pointer*noun-place which links a noun to a place .
top*ao*access*profile*syn-sem-pointer*adj-property which links a adjective to a property .
top*ao*access*profile*syn-sem-pointer*prep-property which links a preposition to a property .

This page written by Mark H. Smith, April 1997.
Updated by Julie Van Dyke, August 1997