Write a small parsing system that can handle simple declarative sentences, simple NPs, and prepositional phrases, using simple semantic restrictions to block attachment of PPs that aren't licensed by a basic semantic lexicon.
The syntactic lexicon should contain lines with the following form:
("word" (feature value)+)
Look at the given syntactic lexicon for an example.
The syntactic lexicon should encode the following slots:
The value of the sem
feature is a Scone concept from
your Scone knowledge base, representing aspects of the word's meaning.
It is the unique name
of a Scone type or individual element inside curly braces. A list of
elements means that the word may correspond to any of the elements
in the list. To differentiate between two or more meanings for the same
word, indices may be used in the Scone element names. For example, to
encode both the transitive and intransitive meanings of "see", we could
create two concepts in the Scone KB: {see.01} and {see.02}.
The semrole
feature appears with prepositions, and its
value is a Scone concept
corresponding to the role denoted by the preposition.
It is the unique name of a Scone role
element inside curly braces. For readability, role elements in Scone
should include the designation "(role)" in their names. For example, the word "in" we
might have the semrole
value {location (role)}
.
Since part of your assignment is to re-use the Tomita morphology
code and combine it with lexical lookup to inflect lexical entries for
number, you shouldn't have to include the number
feature
unless you are encoding an irregular form not handled by the
morphology code (this piece of the assignment is described in more
detail later on).
Semantics in this module are represented using the Scone semantic network formalism. Details on Scone are given in the Scone User Manual
The Scone knowledge base should contain entries with the following form:
(add-type {element-name} :parent {parent-name} :english '("element-name1" "element-name2"))
(add-indv-role {role-name} {element-owner-name} :parent {role-parent-name})
Look at the given scone KB for an example.
The Scone KB should encode the following features:
Lexical concepts (those which appear in the sem
feature in the syntactic lexicon) are Scone type elements. The
English strings that should trigger the type are given in the
:english field of the call to add-type.
Inheritance (e.g., IS-A) is determined by the :parent field of a Scone element. IS-A links may also be added using the following syntax:
(add-is-a {element-name} {parent-name})
Semantic role names (semroles) are also Scone elements.
They have an owner, which is given as the second argument to the
role definition {element-owner-name}
.
They also belong to a parent class, which is given in the :parent
field of the role definition. This class restricts the set of
legal fillers for the role.
For example, to encode the semantic role "location
of an event", we could use the following syntax
(Note: example depends on event and geographical-area also being defined in the KB)
(add-indv-role {location (role)} {event} :parent {geographical area})
You will need to write syntactic lexicon entries to handle the word occurrences in these sentences:
a man the man the men the boy the boys the man sees the man sees the boy the man sees the boy with the telescope the man sees the boy with the dog
Some supporting Scone elements are already defined in the given scone KB. You will need to write additional knowledge base entries to handle these concepts:
{see.01} :parent {action} {see.02} :parent {action} {animate} {man.01} :parent {animate} (boy.01} :parent {animate} {dog.01} :parent {animate} {telescope.01} :parent {optical instrument}
You will need to encode these semantic roles:
{see.01}, {see.02} -> {instrument} :parent {optical_instrument} {animate} -> {accomplice} :parent {animate}
Your knowledge base should model these hierarchy fragments:
In the given code file, you will find
functions load-lexicon
and load-semantics
,
which you can use to load your completed lexicons into Lisp.
The grammar should include rules for the following constructions:
<start> <==> (<np>) <start> <==> (<vp>) <start> <==> (<np> <vp>) <np> <==> (<np> <pp>) <np> <==> (<det> <n>) <np> <==> (<n>) <vp> <==> (<vp> <pp>) <vp> <==> (<v> <np>) <vp> <==> (<v>) <pp> <==> (<p> <np>)
Your grammar should not use lexical rules inside the grammar; instead, you should use the Tomita "wildcard" rule syntax, and write a Lisp callout function to read in lexical items:
<n> <-- (%) <v> <-- (%) <det> <-- (%) <p> <-- (%)
The form of each rule should be like this:
(<n> <-- (%) ((x0 <= (parse-eng-word (string-downcase (symbol-name (x1 value))))) ((x0 cat) = n)))You should write a function called
parse-eng-word
, which
performs morphology on its string argument, and returns the inflected
lexical f-structure for the word. This should be done in three
steps:
parse-eng-morph
to return the set of ("root" morph)
pairs that are possible for the word;
inflect-lex
to inflect nouns and verbs, as follows:
INFLECT-LEX Assigns agreement features for N and V, depending on presence or absence of +S morpheme and/or explicit lexical features: N: - Defaults to (PERSON 3), unless feature supplied by lexicon - Defaults to (NUMBER SG), unless: * feature supplied by lexicon * +S is present -> (NUMBER PL) V: - If +S present, (PERSON 3), else will unify with any SUBJ (functionally the same as (*OR* 1 2 3) - Defaults to (NUMBER PL), unless: * feature supplied by the lexicon * +S is present -> (NUMBER SG)
load-lexicon
function, so you
can retrieve the uninflected lexical items from the lexicon using
gethash
).
You should use the compgra
function to compile and
load the grammar (see the example in the given code file). You will need to
recompile your grammar with compgra
each time you make a
change to the grammar before you will be able to test the change.
Once you have your grammar working, you should add Lisp callouts to the rules which attach PPs to NP and VP, in order to implement semantic restrictions.
The function semrole-filler-match
, provided in the given code file, will do most of the work
for you -- its arguments are the scone semantics for the head
(NP or VP), the semrole (from the P's syntactic lexicon entry), and
the scone semantics for the filler (the PP object). This
function will return T or NIL depending on whether the semantic
lexicon contains information that licenses the given attachment, using
some inheritance methods defined in the function.
Two additional functions are provided to help you:
lookup-scone-semantics
,
and lookup-definitions
. Lookup-scone-semantics takes
the name of an element and returns the element object if an element
by that name exists (NIL otherwise). Lookup-definitions takes
a plain string and returns any Scone elements that have that string
in their :english field (NIL if none exist).
In order to use semrole-filler-match
, you will have to write a grammar
callout, called license-attachment
, which takes two
arguments: the f-structure for the head (NP or VP) and the f-structure
for the filler (PP), extracts appropriate information from the
f-structure(s) and/or the semantic lexicon, calls
semrole-filler-match
, and returns the appropriate new
f-structure (head with PP attached) or NIL depending on the result of
the call to semrole-filler-match
.
Your code for license-attachment
should print a trace
message signalling the result of each call; see the examples
(mentioned below) for the format of the messages.
When you're all done, you should get outputs like those shown in
this set of examples, assuming you've got
all the parts right. (See the instructions on how to run the testing function, (run-tests)
.