Unit Selection and Emotional Speech

Unit Selection Synthesis, where appropriate units are selected from large databases of natural speech, has greatly improved the quality of speech synthesis. But the quality improvement has come at a cost. The quality of the synthesis relies on the fact that little or no signal processing is done on the selected units, thus the style of the recording is maintained in the quality of the synthesis. The synthesis style is implicitly the style of the database. If we want more general flexibility we have to record more data of the desired style. Which means that our already large unit databases must be made even larger.

This paper gives examples of how to produce varied style and emotion using existing unit selection synthesis techniques and also highlights the limitations of generating truly flexible synthetic voices.

Unit Selection and Emotional Speech

Abstract: