next up previous
Next: Recognition Up: Speechalator: two-way speech-to-speech translation Previous: Background

Data Collection

The Arabic language was mostly new to this group. Although we had some experience and resources from Tunisian Arabic in recognition as part of the GlobalPhone project [3], we were essentially starting in a new language. This was a good test of our existing speech-to-speech translation framework.

The normal Arabic script does not include all vowels, although there are diacritics which can be used to specify all vocalization, these only appear in childrens' books and the Koran and do not appear in conventional text. Thus normal script would be hard to use for speech processing. There are statistical techniques (e.g. [4]) that can be used to predict vocalization but it is much easier if we could use a script with all vowels fully specified. There have been successful attempts to do Arabic speech recognition without explicit vowels [5], but for synthesis this would be much harder if actually possible. Therefore because we are embedding this use of Arabic within speech-to-speech translation where we control in the input and output mechanisms we are in a position to stipulate that the internal form may be a romanization which contains full vocalization. Transliterating from a romanized script into Arabic script is easy (it involves removing information) so we can still display the translation in Arabic script, but internally preserve the vowel information. Others have noted this problem and we based our romanization on the Arabic CallHome [6] romanization, but made several refinements, from which phonetic forms can be easily derived.

The second major issue was what dialect of Arabic to use. Although there is a standard written form for Arabic, Modern Standard Arabic, (MSA) this is not used for normal conversation. As we are specifically interested in spoken language translation we decided to chose a major spoken dialect for which local experts were available. Thus we settled on Egyptian Arabic.

There were three areas for which data had to be collected: recognition, synthesis and translation. From an existing database of English medical expressions used for another speech-to-speech translation system, Arabic foreign language experts (FLE) hand translated each utterance into a number of different paraphrases in Egyptian Arabic (up to 10 different examples). The FLEs were then asked to speak each of the utterances. So that we collected recordings of some 7500 in-domain utterances with romanized transcriptions.


next up previous
Next: Recognition Up: Speechalator: two-way speech-to-speech translation Previous: Background
Alan W Black 2003-10-27