Once the phone set and the sort of carrier material for each type has been chosen, the prompt list is produced automatically from the specification using a set of templates. Consonant-vowel and vowel-consonant pairs are generally kept together, as shown in the example above - [B-AA] and [AA-B] are generated in a single prompt. Special contexts are also created for e.g. vowel-vowel diphones, also as shown above, and for transitions to and from silences into a phone.
We then synthesize these prompts using an existing synthesizer. The prompts are generated for a number of reasons: first, to play to the user while recording. Even highly trained phoneticians make mistakes in reading 2000 prompts so the synthesized prompts help guide the speaker to say the right thing. Secondly, as we generate these prompts at constant pitch and duration, this encourages the speaker to do likewise. As we are going to modify the pitch and duration independently, it is better if the recording is in a monotone, and consistent. This, we feel, is best done by effectively having the nonsense word delivered in what almost sounds like a chant.
The second reason for generating prompts is their use in labelling the spoken word which is described below.