|
Audio files are 16bit Microsoft .WAV at (mostly) 16KHz sampling.
This is a short introduction to the Festival Speech Synthesis System. Festival was developed by Alan Black and Paul Taylor, at the Centre for Speech Technology Research, University of Edinburgh.
Festival currently uses a diphone synthesizer, both residual excited LPC and PSOLA methods are supported. The upper levels, duration and intonation, are generated from statistically trained models, built from databases of natural speech. The architecture of the system is designed to be flexible, including various tools, which allow new modules to be added easily.
Festival is a multilingual synthesizer. The default language may be set at start-up time or changed easily during a session.
This welsh synthesizer is was ported from a previous CSTR Welsh synthesizer
A Castillean Spanish synthesizer was built from diphone collected during an MSc project.
Two German synthesisers were developed as part of a summer project at Oregon Graduate Institute
Ihr naht euch wieder, schwankende Gestalten,
Die früh sich einst dem trüben Blick gezeigt.
Goethe Faust
Smith, Bobbie Q, 3337 St Laurence St,
Fort Worth, TX 71611-5484, (817)839-3689
Anderson, W, 445 Sycamore Way NE,
Lincoln, NE 98125-5108, (212)404-9988
The first was produced from 460 (TIMIT) phonetically balanced sentences, using only phonetic context and pitch as selection features with hand-tuned weights. No signal processing to modify pitch and duration was made to the selected units. The units selected typical contain 2-3 phones. The second example was synthesized using a diphone database from the same speaker. Only the waveform synthesizers differ, that is they use the same target phones.
A different technique for finding appropriate units is described in Black and Taylor 97 posctscript html. Here appropriate sub-word units (diphones or demi-phones) are clustered using an acoustic measure.
NoteIn both the above techniques the good examples are good, but the bad examples are much worse than diphones alone. These techniques are still need to be researched further until they are stable enough produce high quality all the time.