A final voice consist of not just a diphone set, but also the front end of the TTS system including text analysis, lexicons and prosodic models. These, in contrast, although difficult in themselves, are much smaller that the diphone set, and for English at least are a standard part of our distribution.
Thus, the building a new English voice can simply be a matter of recording a new diphone set. Our tools provide complete scripts and detailed walk-through for this process building on top of the existing modules.