Formal Definition

Next: Implications of the Definition Up: The Lexicon Acquisition Problem Previous: The Lexicon Acquisition Problem

Formal Definition

To present the learning problem more formally, some definitions are needed. While in the following we use the terms ``string'' and ``substring,'' these extend straight-forwardly to natural language sentences and phrases, respectively. We also refer to labeled trees, making the assumption that the semantic meanings of interest can be represented as such. Most common representations can be recast as labeled trees or forests, and our formalism extends easily to the latter.

Definition: Let $\Sigma_V$ , $\Sigma_E$ be finite alphabets of vertex labels and edge labels, respectively. Let V be a finite nonempty set of vertices, l a total function $l : V \rightarrow \Sigma_V$ , E a set of unordered pairs of distinct vertices called edges, and a a total function $a: E \rightarrow \Sigma_E$ . G = (V,l,E,a) is a labeled graph.

Definition: A labeled tree is a connected, acyclic labeled graph.

Figure 3 shows the labeled tree t₁ (with vertices 1-8) on

**Figure 3:** Labeled Trees and Interpretations
$\includegraphics[height=5in,angle=-90]{fig3.eps}$

the left, with associated vertex and edge labels on the right. The function l is:¹

$\{$	(1, `ingest`), (2, `person`), (3, `food`), (4, `female`), (5, `child`), (6, `pasta`),
	(7, `food`), (8, `cheese`) $\}$ .

The tree t₁ is a semantic representation of the sentence s₁: ``The girl ate the pasta with the cheese.'' Using a conceptual dependency [Schank1975] representation in Prolog list form, the meaning is:

`[ingest,`	`agent:[person, sex:female, age:child],`
	`patient:[food, type:pasta, accomp:[food, type:cheese]]]`.

Definition: A u-v path in a graph G is a finite alternating sequence of vertices and edges of G, in which no vertex is repeated, that begins with vertex u and ends with vertex v, and in which each edge in the sequence connects the vertex that precedes it in the sequence to the vertex that follows it in the sequence.

Definition: A directed, labeled tree T=(V,l,E,a) is a labeled tree whose edges consist of ordered pairs of vertices, with a distinguished vertex r, called the root, with the property that for every $v\in V$ , there is a directed r-v path in T, and such that the underlying undirected unlabeled graph induced by (V,E) is a connected, acyclic graph.

Definition: An interpretation f from a finite string s to a directed, labeled tree t is a one-to-one function mapping a subset s' of the substrings of s, such that no two strings in s' overlap, into the vertices of t such that the root of t is in the range of f.

The interpretation provides information about what parts of the meaning of a sentence originate from which of its phrases. In Figure 3, we show an interpretation, f₁, of s₁ to t₁. Note that ``with'' is not in the domain of f₁, since s' is a subset of the substrings of s, thus allowing some words in s to have no meaning. Because we disallow overlapping substrings in the domain, both ``cheese'' and ``the cheese'' could not map to vertices in t₁.

Definition: Given an interpretation f of string s to tree t, and an element p of the domain of f, the meaning of p relative to s, t, f is the connected subgraph of t whose vertices include f(p) and all its descendents except any other vertices in the range of f and their descendents.

Meanings in this sense concern the ``lowest level'' of phrasal meanings, occurring at the terminal nodes of a semantic grammar, namely the entries in the semantic lexicon. The grammar can then be used to construct the meanings of longer phrases and entire sentences. This is our motivation for the previously stated constraint that the root must be included in the range of f: we want all vertices in the sentence representation to be included in the meaning of some phrase. Note that the meaning of p is also a directed tree with f(p) as its root. Figure 4 shows the meanings of each phrase in the domain of interpretation function f₁ shown in Figure 3. We show only the labels on the vertices and edges for readability.

**Figure 4:** Meanings
$\includegraphics[height=4.5in,angle=-90]{fig4.eps}$

Definition: Given a finite set STF of triples $<s_1,t_1,f_1>,\ldots,<s_n,t_n,f_n>$ , where each s_i is a finite string, each t_i is a directed, labeled tree, and each f_i is an interpretation function from s_i to t_i, let the language $\mathcal{L}_{STF}=\{p_1,\ldots,p_k\}$ of STF be the union of all substrings² that occur in the domain of some f_i. For each $p_j \in \mathcal{L}_{STF}$ , the meaning set of p_j, denoted M_STF(p_j),³ is the set of all meanings of p_j relative to s_i, t_i, f_i for some $<s_i, t_i, f_i>\in STF$ . We consider two meanings to be the same if they are isomorphic trees taking labels into account.

For example, given sentence s₂: ``The man ate the cheese,'' the labeled tree t₂ pictured in Figure 5, and f₂ defined as: f₂(``ate'') =1, f₂(``man'') =2, f₂(``the cheese'') = 3; the meaning set of ``the cheese'' with respect to $STF=\{<s_1,t_1,f_1>, <s_2,t_2,f_2>\}$ is {[food, type:cheese]}, just one meaning though f₁ and f₂ map ``the cheese'' to different vertices in the two trees, because the subgraphs denoting the meaning of ``the cheese'' for the two functions are isomorphic.

**Figure 5:** A Second Tree
$\includegraphics[height=5in,angle=-90]{fig5.eps}$

$\begin{displaymath}\{(p,m) : p \in \mathcal{L}_{STF}, m \in M(p) \}.\end{displaymath}$

The covering lexicon L expressed by $STF=\{<s_1,t_1,f_1>, <s_2,t_2,f_2>\}$ is:

$\{$	(``girl'', `[person, sex:female, age:child]`),
	(``man'', `[person, sex:male, age:adult]`),
	(``ate'', `[ingest]`),
	(``pasta'', `[food, type:pasta]`),
	(``the cheese'', `[food, type:cheese]`) $\}$ .

The idea of a covering lexicon is that it provides, for each string (sentence) s_i, a meaning for some of the phrases in that sentence. Further, these meanings are trees whose labeled vertices together include each of the labeled vertices in the tree t_i representing the meaning of s_i, with no vertices duplicated, and containing no vertices not in t_i. Edge labels may or may not be included, since the idea is that some of them are due to syntax, which the parser will provide; those edges capturing lexical semantics are in the lexicon. Note that because we only include in the covering lexicon phrases (substrings) that are in the domains of the f_i's, words with the empty tree as meaning are not included in the covering lexicon. Note also that we will in general use ``phrase'' to mean substrings of sentences, whether they consist of one word, or more than one. Finally the strings in the covering lexicon may contain overlapping words even though those in the domain of an individual interpretation function must not, since those overlapping words could have occurred in different sentences. Finally, we are ready to define the learning problem at hand. The Lexicon Acquisition Problem:
Given: a multiset of strings $S=\{s_1,\ldots,s_n\}$ and a multiset of labeled trees $T=\{t_1,\ldots,t_n\}$ ,
Find: a multiset of interpretation functions, $F=\{f_1,\ldots,f_n\}$ , such that the cardinality of the covering lexicon expressed by $STF=\{<s_1,t_1,f_1>,\ldots,<s_n,t_n,f_n>\}$ is minimized. If such a set is found, we say we have found a minimal set of interpretations (or a minimal covering lexicon). $\Box$

Less formally, a learner is presented with a multiset of sentences (S) paired with their meanings (T); the goal of learning is to find the smallest lexicon consistent with this data. This lexicon is the paired listing of all phrases occurring in the domain of some $f_i\in F$ (where F is the multiset of interpretation functions found) with each of the elements in their meaning sets. The motivation for finding a lexicon of minimal size is the usual bias towards simplicity of representation and generalization beyond the training data. While this definition allows for phrases of any length, we will usually want to limit the length of phrases to be considered for inclusion in the domain of the interpretation functions, for efficiency purposes. Once we determine a set of interpretation functions for a set of strings and trees, there is only one unique covering lexicon expressed by STF. However, this might not be the only set of interpretation functions possible, and may not result in the lexicon with smallest cardinality. For example, the covering lexicon given with the previous example is not a minimal covering lexicon. For the two sentences given, we could find minimal, though rather degenerate, lexicons such as:

{	(``girl'',	`[ingest, agent:[person, sex:female, age:child],`
		`patient:[food, type:pasta, accomp:[food, type:cheese]]]`),
	(``man'',	`[ingest, agent:[person, sex:male, age:adult],`
		`patient:[food, type:cheese]]`) }

This type of lexicon becomes less likely as the size of the corpus grows.

Next: Implications of the Definition Up: The Lexicon Acquisition Problem Previous: The Lexicon Acquisition Problem

Cindi Thompson
2003-01-02