11-711: Nyberg's Lecture Notes

file:/afs/cs/project/cmt-55/lti/Courses/711/html/index.html



Natural Language Generation (2)

In this lecture, we cover more detailed examples of the basic NL generation phases introduced in the last lecture.

Frame Planning for Text Structuring

(DIOGENES, Nirenburg et al. 1990)

"Drop by your old favorite Dunkin' Donuts shop and you'll not only find fresh donuts made by hand, fresh Munchkins donut hole treats, the delicious smell of fresh-brewed coffee, and more. You'll also find a fresh new Dunkin' Donuts shop."

  1. The input is a frame representation of the entire textual meaning to be translated, which does not contain information about sentential boundaries.
  2. An explicit text planning representation is used as the basis for generation of individual sentences in the text.
  3. A set of text planning rules are used to map from the frame representation to the text plan.


Lexical Selection

(DIOGENES, Nirenburg et al. 1990; KANT, Mitamura et al. 1991)

Choosing target language lexical units to realize elements of a text meaning representation.

Two major tasks:

  • Decide on the the type of lexical realization
    1. full lexical realization
    2. a definite description
    3. an anaphor
    4. elliptically (elliptical syntactic construction, or simple omission)
  • For types 1, 2, and 3, the actual selection must be performed.

When there is more than one lexical realization for a unit of meaning, a variety of methods can be used to select among them.

  • Context-independent lexical selection: compare meanings of various lexical candidates to the desired meaning to determine the best lexical choice. Requires that lexical entries be explicitly represented in terms of their features:
  • "a person whose sex is male and whose age is between 13 and 15 years"
    ==> boy, kid, teenager, youth, child, young man, schoolboy, adolescent, man

  • Collocational restrictions (DIOGENES)
  • large_quantity_of ==> big, enormous, great, high, large, strong, wide

    We say high voltage and large amount, but it would be inappropriate to generate high selection or large difficulty.

    We can use explicit representations of collocational preference for each word or classes of words; or we can use corpus statistics (e.g., n-gram models) to prefer certain collocations.

  • Context-sensitive lexical rules (KANT)
  • The conceptual mirror-image of selectional restriction in parsing. If a given slot in the interlingua frame is filled with a certain filler, then select a certain lexical realization; otherwise use the default realization:

    "Operate the engine until the engine reaches normal 
     operating temperature." (atteindre)
    
    "If the temperature reaches the cloud point, a wax 
     forms in the fuel." (baisser)
    
    (glex *A-REACH
      (pattern (and (theme *O-TEMPERATURE-MEASUREMENT)
                    (goal  *O-LOW-TEMPERATURE-READING)))
      (lex "baisser"))
    
    (glex *A-REACH
      (lex "atteindre"))
    


Coreference Treatment

Consider the following excerpt from a newspaper article (quoted after Brown and Yule, 1983). All of the noun phrases marked with italics are coreferential, and might be represented in the text meaning as a single concept:

A dissident Spanish priest was charged here today with attempting to murder the Pope. Juan Fernandez Krohn, aged 32, was arrested after a man armed with a bayonet approached the Pope while he was saying prayers at Fatima on Wednesday night. According to the police, Fernandez told the investigating magistrates today, he trained for the past six months for the assault. If found guilty, the Spaniard faces a prison sentence of 15-20 years.

Choosing a lexical realization depends both on these coreferential relations, and also the style of text. Note that the first reference accomplishes a certain demographic/political placement and the second reference names the actor. The subsequent anaphors and definite descriptions are varied means whereby reference is made to the actor in the rest of the text.


Unification-based Realization

(Genkit, Tomita and Nyberg 1988; Morphe, Leavitt 1990)

  • In Genkit, grammar rules are used to decompose f-structures into smaller and smaller units via phrase structure rules:
  • (<vp> ==> (<v> <np>)
          ((x2 == (x0 obj))
           (x1 = x0)))
    
  • Grammars are compiled into Lisp functions, one function for each non-terminal (subset of rules) in the grammar.
  • Rule application is ordered and deterministic
  • Decomposition proceeds until a lexical rule is fired and a string (piece of the output) is produced
  • The results of RHS function calls are concatenated to derive the output of each rule.
  • To efficiently inflect lexemes depending on their features, Morphe compiles rules into a decision tree which eliminates redundant tests:
  • if (and (gender = m)                       gender
            (number = sg)                    m /    \ f
            (type = superlative))           number  <....>  
       ==> "ending1"                     sg /   \ pl 
                                          type  <...>
    if (and (gender = m)             sup  /  \ comp
            (number = sg)               "e1" "e2"
            (type = comparative))    
       ==> "ending2"
    
    if (and (gender = m)
            (number = pl)
            (type = superlative))
       ==> "ending3"
    
    etc.
    

    By precomputing the search space, more efficient run-time processing can be achieved.

Examples

  • Syntactic Grammar Examples (GenKit)
  • Morphology Classes/Rules (Morphe)


25-Nov-96 by ehn@cs.cmu.edu