CMU 11-731(MT&Seq2Seq) Applications 1 Monolingual Sequence-to-sequence Problems

re-wording sentences into other sentences with the same content but di↵erent surface features
e.g.
- query expansion for information retrieval
- improve robustness of machine translation to lexical variations
difficulty
- task definition
  - bidirectional entailment -> mostly bidirectional entailment
- paucity of data
  - dataset
    - Quora question pair dataset
    - MSCOCO captions dataset
  - methods
    - distributional similarity(words that appear in similar contexts tend to to be similar)
      - key word: empty “slots”,
      - major problem
        
        hard to distinguish between distributionally similar but semantically different words
        
        extremely sensitive to data sparsity
        
        sol: bilingual data to learn monolingual paraphrases
    - bilingual pivoting
      - [e.g.Paraphrasing with Bilingual Parallel Corpora, idea:phrase-based machine translation and pivoting]
        $$ P(e_2\vert e_1)=\sum_fP(e_2\vert f)P(f\vert e_1) $$
- evaluate the generated paraphrases
  - PINC (like BLEU but considers not only the BLEU score, but also the dissimilarity from the original input)

the same semantic content, but with a different style or register

Text Simplification (for second language reading comprehension)
Register Conversion (“Register” is the type of language used in a particular setting)
Personal Style Conversion
Demographics-level Conversion
method
- simplest method: large parallel corpus and train a supervised model
- tailor phrase-based translation models to the task of style transformation

most salient information

extractive summarization VS abstractive summarization

removing irrelevant content
- deleting words:
  - tree-based methods
  - form as constrained optimization problem: delete words to fixed length summary while maximizing the amount of relevant content
- sequence-to-sequence transduction problem
  - tree substitution grammars + copy words + control the length of the summary
  - attentional neural networks + copy words + control the length of the summary
evaluation
- the amount of recall of important information that can be achieved within the limited summary length (e.g. ROUGE)