Language Technologies Institute
11-712: Self-Paced Laboratory

Algorithms for NLP:
GLR Module Design Specs, Scone semantics

Goal

Write a small parsing system that can handle simple declarative sentences, simple NPs, and prepositional phrases, using simple semantic restrictions to block attachment of PPs that aren't licensed by a basic semantic lexicon.

Syntactic Lexicon

The syntactic lexicon should contain lines with the following form:

("word" (feature value)+)

Look at the given syntactic lexicon for an example.

The syntactic lexicon should encode the following slots:

Nouns: cat (n), number (sg or pl, for irregular forms), sem (see below)
Verbs: cat (v), valency (trans or intrans), sem (see below)
Prepositions: cat (p), semrole (see below)
Determiners: cat (det), reference (definite or indefinite), number (sg or pl)

The value of the sem feature is a Scone concept from your Scone knowledge base, representing aspects of the word's meaning. It is the unique name of a Scone type or individual element inside curly braces. A list of elements means that the word may correspond to any of the elements in the list. To differentiate between two or more meanings for the same word, indices may be used in the Scone element names. For example, to encode both the transitive and intransitive meanings of "see", we could create two concepts in the Scone KB: {see.01} and {see.02}.

The semrole feature appears with prepositions, and its value is a Scone concept corresponding to the role denoted by the preposition. It is the unique name of a Scone role element inside curly braces. For readability, role elements in Scone should include the designation "(role)" in their names. For example, the word "in" we might have the semrole value {location (role)}.

Since part of your assignment is to re-use the Tomita morphology code and combine it with lexical lookup to inflect lexical entries for number, you shouldn't have to include the number feature unless you are encoding an irregular form not handled by the morphology code (this piece of the assignment is described in more detail later on).

Scone Knowledge Base

Semantics in this module are represented using the Scone semantic network formalism. Details on Scone are given in the Scone User Manual

The Scone knowledge base should contain entries with the following form:

(add-type {element-name} :parent {parent-name} :english '("element-name1" "element-name2"))
(add-indv-role {role-name} {element-owner-name} :parent {role-parent-name})

Look at the given scone KB for an example.

The Scone KB should encode the following features:

Objects (Nouns): :parent (class), indv-roles (semroles), :english (english names)
Actions (Verbs): :parent (class), indv-roles (semroles), :english (english names)

Lexical concepts (those which appear in the sem feature in the syntactic lexicon) are Scone type elements. The English strings that should trigger the type are given in the :english field of the call to add-type.

Inheritance (e.g., IS-A) is determined by the :parent field of a Scone element. IS-A links may also be added using the following syntax:

(add-is-a {element-name} {parent-name})

Semantic role names (semroles) are also Scone elements. They have an owner, which is given as the second argument to the role definition {element-owner-name}. They also belong to a parent class, which is given in the :parent field of the role definition. This class restricts the set of legal fillers for the role.

For example, to encode the semantic role "location of an event", we could use the following syntax (Note: example depends on event and geographical-area also being defined in the KB)

(add-indv-role {location (role)} {event} :parent {geographical area})

Coverage

You will need to write syntactic lexicon entries to handle the word occurrences in these sentences:

a man
the man
the men
the boy
the boys
the man sees
the man sees the boy
the man sees the boy with the telescope
the man sees the boy with the dog

Some supporting Scone elements are already defined in the given scone KB. You will need to write additional knowledge base entries to handle these concepts:

{see.01} :parent {action}
{see.02} :parent {action}
{animate} 
{man.01} :parent {animate}
(boy.01} :parent {animate}
{dog.01} :parent {animate}
{telescope.01} :parent {optical instrument}

You will need to encode these semantic roles:

{see.01}, {see.02} ->  {instrument} :parent {optical_instrument}
{animate}  ->  {accomplice} :parent {animate}

Your knowledge base should model these hierarchy fragments:

Loading the Lexicons

In the given code file, you will find functions load-lexicon and load-semantics, which you can use to load your completed lexicons into Lisp.

Syntactic Grammar

The grammar should include rules for the following constructions:

<start> <==> (<np>)
<start> <==> (<vp>)
<start> <==> (<np> <vp>)
<np> <==> (<np> <pp>)
<np> <==> (<det> <n>)
<np> <==> (<n>)
<vp> <==> (<vp> <pp>)
<vp> <==> (<v> <np>)
<vp> <==> (<v>)
<pp> <==> (<p> <np>)

Lexical Lookup, Morphological Inflection

Your grammar should not use lexical rules inside the grammar; instead, you should use the Tomita "wildcard" rule syntax, and write a Lisp callout function to read in lexical items:

<n> <-- (%)
<v> <-- (%)
<det> <-- (%)
<p> <-- (%)

The form of each rule should be like this:

(<n> <-- (%)
     ((x0 <= (parse-eng-word (string-downcase (symbol-name (x1 value)))))
      ((x0 cat) = n)))

You should write a function called parse-eng-word, which performs morphology on its string argument, and returns the inflected lexical f-structure for the word. This should be done in three steps:

Use the built-in function parse-eng-morph to return the set of ("root" morph) pairs that are possible for the word;

Look up each root form in the lexicon to see if it exists;

If a morpheme was found attached to the root, inflect any lexical entries appropriately. Write a function called inflect-lex to inflect nouns and verbs, as follows:

INFLECT-LEX

Assigns agreement features for N and V, depending on presence or
absence of +S morpheme and/or explicit lexical features:

 N: 
   - Defaults to (PERSON 3), unless feature supplied by lexicon
   - Defaults to (NUMBER SG), unless:
            * feature supplied by lexicon
            * +S is present -> (NUMBER PL)
 V: 
   - If +S present, (PERSON 3), else will unify with any SUBJ
     (functionally the same as (*OR* 1 2 3)
   - Defaults to (NUMBER PL), unless:
            * feature supplied by the lexicon
            * +S is present -> (NUMBER SG)

(Hint: you should study the data structure provided by the load-lexicon function, so you can retrieve the uninflected lexical items from the lexicon using gethash).

Compiling and Loading the Grammar

You should use the compgra function to compile and load the grammar (see the example in the given code file). You will need to recompile your grammar with compgra each time you make a change to the grammar before you will be able to test the change.

Semantic Restrictions on PP Attachment

Once you have your grammar working, you should add Lisp callouts to the rules which attach PPs to NP and VP, in order to implement semantic restrictions.

The function semrole-filler-match, provided in the given code file, will do most of the work for you -- its arguments are the scone semantics for the head (NP or VP), the semrole (from the P's syntactic lexicon entry), and the scone semantics for the filler (the PP object). This function will return T or NIL depending on whether the semantic lexicon contains information that licenses the given attachment, using some inheritance methods defined in the function.

Two additional functions are provided to help you: lookup-scone-semantics, and lookup-definitions. Lookup-scone-semantics takes the name of an element and returns the element object if an element by that name exists (NIL otherwise). Lookup-definitions takes a plain string and returns any Scone elements that have that string in their :english field (NIL if none exist).

In order to use semrole-filler-match, you will have to write a grammar callout, called license-attachment, which takes two arguments: the f-structure for the head (NP or VP) and the f-structure for the filler (PP), extracts appropriate information from the f-structure(s) and/or the semantic lexicon, calls semrole-filler-match, and returns the appropriate new f-structure (head with PP attached) or NIL depending on the result of the call to semrole-filler-match.

Your code for license-attachment should print a trace message signalling the result of each call; see the examples (mentioned below) for the format of the messages.

Examples

When you're all done, you should get outputs like those shown in this set of examples, assuming you've got all the parts right. (See the instructions on how to run the testing function, (run-tests).

03-Feb-2005 by atribble@cs.cmu.edu Updated from original by EHN.

Language Technologies Institute11-712: Self-Paced Laboratory

Algorithms for NLP: GLR Module Design Specs, Scone semantics