The principal difficulty in simply transcribing spoken Arabic is that the dialects lack a fixed orthography. Individual words consist of a root, a sequence of 3-4 consonants that represent a broad concept; vowel diacritics and other phonological annotations, which are usually omitted in the written form; and morphological components. For example, the consonant sequence ktb represents the concept of writing, and has standalone readings such as ``kutub'' or ``kattaba,'' or can add morphological components to become ``aktib,'' ``maktaba,'' and so forth. Because only the voweling for MSA is learned in school, speakers of the same dialect can differ significantly in their sense of which vowel is being used in the spoken language, and there is a strong tendency to write the standard orthography as prescribed by MSA even when the morphology is not the same.
The Egyptian Arabic database that we have collected (described in Section 6.2) has been spoken and transcribed by native speakers of Cairene, led by an experienced Arabic linguist. Care has been taken to remove influence from written language in both the transcription and the elicitation. The proper phonological description of individual words, however, remains an open question.