Paraphrase Generation
re-wording sentences into other sentences with the same content but di↵erent surface features
e.g.
- query expansion for information retrieval
- improve robustness of machine translation to lexical variations
difficulty
- task definition
- bidirectional entailment -> mostly bidirectional entailment
- paucity of data
- dataset
- Quora question pair dataset
- MSCOCO captions dataset
- methods
- distributional similarity(words that appear in similar contexts tend to to be similar)
- key word: empty “slots”,
- major problem
- hard to distinguish between distributionally similar but semantically different words
- extremely sensitive to data sparsity
- sol: bilingual data to learn monolingual paraphrases
- bilingual pivoting
- [e.g.Paraphrasing with Bilingual Parallel Corpora, idea:phrase-based machine translation and pivoting]
$$ P(e_2\vert e_1)=\sum_fP(e_2\vert f)P(f\vert e_1) $$
- [e.g.Paraphrasing with Bilingual Parallel Corpora, idea:phrase-based machine translation and pivoting]
- distributional similarity(words that appear in similar contexts tend to to be similar)
- dataset
- evaluate the generated paraphrases
- PINC (like BLEU but considers not only the BLEU score, but also the dissimilarity from the original input)
- task definition
Style Transformation
the same semantic content, but with a different style or register
- Text Simplification (for second language reading comprehension)
- Register Conversion (“Register” is the type of language used in a particular setting)
- Personal Style Conversion
Demographics-level Conversion
method
- simplest method: large parallel corpus and train a supervised model
- tailor phrase-based translation models to the task of style transformation
Summarization
most salient information
- Sentence Compression
- Single-document Summarization
- Multi-document Summarization
extractive summarization VS abstractive summarization
- removing irrelevant content
- deleting words:
- tree-based methods
- form as constrained optimization problem: delete words to fixed length summary while maximizing the amount of relevant content
- sequence-to-sequence transduction problem
- tree substitution grammars + copy words + control the length of the summary
- attentional neural networks + copy words + control the length of the summary
- deleting words:
- evaluation
- the amount of recall of important information that can be achieved within the limited summary length (e.g. ROUGE)