Implications of the Definition

Next: The WOLFIE Algorithm and Up: The Lexicon Acquisition Problem Previous: Formal Definition

Implications of the Definition

This definition of the lexicon acquisition problem differs from that given by other authors, including Riloff and Jones (1999), Siskind (1996), Manning (1993), Brent (1991) and others, as further discussed in Section 7. Our definition of the problem makes some assumptions about the training input. First, by making f a function instead of a relation, the definition assumes that the meaning for each phrase in a sentence appears once in the representation of that sentence, the single-use assumption. Second, by making f one-to-one, it assumes exclusivity, that each vertex in a sentence's representation is due to only one phrase in the sentence. Third, it assumes that a phrase's meaning is a connected subgraph of a sentence's representation, not a more distributed representation, the connectedness assumption. While the first assumption may not hold for some representation languages, it does not present a problem in the domains we have considered. The second and third assumptions are perhaps less problematic with respect to general language use. Our definition also assumes compositionality: that the meaning of a sentence is derived from the meanings of the phrases it contains, in addition, perhaps to some ``connecting'' information specific to the representation at hand, but is not derived from external sources such as noise. In other words, all the vertices of a sentence's representation are included within the meaning of some word or phrase in that sentence. This assumption is similar to the linking rules of Jackendoff (1990), and has been used in previous work on grammar and language acquisition (e.g., Hass, 1997; Siskind, 1996 ⁴) While there is some debate in the linguistics community about the ability of compositional techniques to handle all phenomena [Fillmore1988,Goldberg1995], making this assumption simplifies the learning process and works reasonably for the domains of interest here. Also, since we allow multi-word phrases in the lexicon (e.g., (``kick the bucket'', die(_))), one objection to compositionality can be addressed. This definition also allows training input in which:

1.: Words and phrases have multiple meanings. That is, homonymy might occur in the lexicon.
2.: Several phrases map to the same meaning. That is, synonymy might occur in the lexicon.
3.: Some words in a sentence do not map to any meanings, leaving them unused in the assignment of words to meanings.⁵
4.: Phrases of contiguous words map to parts of a sentence's meaning representation.

Of particular note is lexical ambiguity (1 above). Note that we could have also derived an ambiguous lexicon such as:

$\{$	(``girl'', `[person, sex:female, age:child]`),
	(``ate'', `[ingest]`),
	(``ate'', `[ingest, agent:[person, sex:male, age:adult]]`),
	(``pasta'', `[food, type:pasta]`),
	(``the cheese'', `[food, type:cheese]`) $\}$ .

from our sample corpus. In this lexicon, ``ate'' is an ambiguous word. The earlier example minimizes ambiguity resulting in an alternative, more intuitively pleasing lexicon. While our problem definition first minimizes the number of entries in the lexicon, our learning algorithm will also exploit a preference for minimizing ambiguity. Also note that our definition allows training input in which sentences themselves are ambiguous (paired with more than one meaning), since a given sentence in S (a multiset) might appear multiple times appear with more than one meaning. In fact, the training data that we consider in Section 5 does have some ambiguous sentences. Our definition of the lexicon acquisition problem does not fit cleanly into the traditional definition of learning for classification. Each training example contains a sentence and its semantic parse, and we are trying to extract semantic information about some of the phrases in that sentence. So each example potentially contains information about multiple target concepts (phrases), and we are trying to pick out the relevant ``features,'' or vertices of the representation, corresponding to the correct meaning of each phrase. Of course, our assumptions of single-use, exclusivity, connectedness, and compositionality impose additional constraints. In addition to this ``multiple examples in one'' learning scenario, we do not have access to negative examples, nor can we derive any implicit negatives, because of the possibility of ambiguous and synonymous phrases. In some ways the problem is related to clustering, which is also capable of learning multiple, potentially non-disjoint categories. However, it is not clear how a clustering system could be made to learn the phrase-meaning mappings needed for parsing. Finally, current systems that learn multiple concepts commonly use examples for other concepts as negative examples of the concept currently being learned. The implicit assumption made by doing this is that concepts are disjoint, an unwarranted assumption in the presence of synonymy.

Next: The WOLFIE Algorithm and Up: The Lexicon Acquisition Problem Previous: Formal Definition

Cindi Thompson
2003-01-02