Optimal Data Selection for Unit Selection Synthesis

In this work, we address the issue of creating a set of utterances with optimal coverage for reliable, high quality concatenative synthesis, whether for general synthesis or domain synthesis. We present an automatic method that takes into account the acoustic distinctions made by a particular speaker and selects prompts from large databases of typical utterances. A general unit selection text-to-speech system created by this process can synthesize any input text, but the output is best for content intended to be similar to that in the database in terms of style, delivery, and coverage.

Optimal Data Selection for Unit Selection Synthesis

Abstract: