Unit Selection Synthesis, where appropriate units are selected from
large databases of natural speech, has greatly improved the quality of
speech synthesis. But the quality improvement has come at a cost.
The quality of the synthesis relies on the fact that little or no
signal processing is done on the selected units, thus the style of the
recording is maintained in the quality of the synthesis. The
synthesis style is implicitly the style of the database. If we want
more general flexibility we have to record more data of the desired
style. Which means that our already large unit databases must be made
even larger.
This paper gives examples of how to produce varied style and emotion
using existing unit selection synthesis techniques and also highlights
the limitations of generating truly flexible synthetic voices.