11-711: Nyberg's Lecture Notes

file:/afs/cs/project/cmt-55/lti/Courses/711/html/index.html



Natural Language Generation (1)

In this lecture, we cover all of the basic problems which must be addressed in NL generation systems.

The Input

  • `the need to communicate'
  • propositional content of the target text
  • pragmatic profile of the speech situation (knowledge about the speaker, the hearer, the style of communication, etc.)

Content Delimitation

The system must select which of the propositions related to the propositional goals should be overtly realized, and which should be left for the human hearer/reader to infer.

  • Text Summarization
  • Explanation in Expert Systems
  • (not an issue for MT systems?)

Text Structuring

The system must determine the order of propositions, the boundaries of sentences in the \tl\ text and the nature of discourse connectives among the elements of the target text.

  • Most tasks are inherently ordered
  • Most representations choose units of rep. that correspond to utterances
  • An issue for systems that blur surface clausal boundaries
    (e.g., DIOGENES)

Lexical Selection

The system must select open-class lexical units to be used in the TL text.
(*A-INGEST (AGENT *O-BOB) (PATIENT *O-MILK)) => "drink"
(*A-INGEST (AGENT *O-BOB) (PATIENT *O-CHOCOLATE)) => "eat"

Syntactic Selection

The system must select syntactic structures for the TL clauses and perform closed-class lexical selection according to syntactic structure decisions.
  • Case Creation
    (*A-KICK (AGENT *O-JOHN) (PATIENT *O-BALL))
    "John propelled the ball with his foot"

  • Case Absorption
    (*A-FILE-LEGAL-ACTION 
      (AGENT *O-BOB) 
      (PATIENT *O-SUIT) 
      (RECIPIENT *O-ACME))
    "Bob sued Acme"

  • Feature => Constituent
    (*O-MAN (REFERENCE DEFINITE))
    ((cat n)(root "man")(det ((cat det)(root "the"))))
    "the man"

  • Structural Changes
    (*O-WATER (QUANTITY *O-BUCKET))
    ((cat n)(root "bucket")
     (pp ((cat p)(root "of")
          (obj ((cat n)(root "gallon"))))))
    "gallon of water"

Coreference Treatment

The system must introduce anaphora, deixis and ellipsis phenomena when appropriate
  • Failure to do so => "robot text" (redundant)
  • More difficult phenomena to model
    (require model of intersentential relations)

Constituent Ordering

The system must establish the order of syntactic constituents in a sentence
  • Typically via a phrase structure grammar
  • "Floating" elements (like cases) must be placed to preserve meaning

Realization

The system must map from syntactic representations with lexical insertions into surface strings
  • Morphological inflection
  • Orthographic phenomena (contraction, euphonic adaptation, etc.)

25-Nov-96 by ehn@cs.cmu.edu