In this section we will describe our flat category representations. First, we will show the categories for the syntactic analysis before we will depict the categories for the semantic analysis.
Flat syntactic analysis is the assignment of syntactic categories to a sequence of words, e.g., the word hypothesis sequence generated by a speech recognizer. Flat representations up to the phrase group level support local structural decisions. Local structural decisions deal with the problem of which phrase group (abstract syntactic category) a word belongs to. In this case the local, directly preceding words and their phrase group can influence the current decision. For instance, a determiner ``the" could be part of a prepositional group ``in the mine" or part of a starting noun group ``the old mine". That is, local structural decisions depending on local context will be made based on a flat analysis.
For flat syntactic analysis we have developed a level of basic syntactic categories and abstract syntactic categories. These syntactic categories may vary depending on the language, and the degree of detail of the intended structural representation. However, the general approach is rather independent of the specifically used categories. In fact, we have used the same syntactic categories for two different domains: railway counter interactions and business meeting arrangements. The basic syntactic categories we used were noun, verb, preposition, pronoun, numeral, past participle, pause, adjective, adverb, conjunction, determiner, interjection and other. They are shown with their abbreviations in Table 1.
The abstract syntactic categories we used are verb group, noun group, adverbial group, prepositional group, conjunction group, modus group, special group and interjection group. These abstract syntactic categories are shown in Table 2.
Table 2: Abstract syntactic categories |
The categories should express main syntactic properties of the phrases. Most of our basic and abstract syntactic categories are widely used in different parsers. However, the approach of flat representations does not crucially rely on this specific set of basic and abstract syntactic categories. Our goal is to train, learn and generalize a flat syntactic analysis based on abstract syntactic categories and basic syntactic categories. Local syntactic decisions should be made as far as possible. Local syntactic ambiguities up to the phrase group level (abstract syntactic categories) can be dealt with but more global ambiguities like prepositional phrase attachment will not be dealt with since they will need additional knowledge, e.g., from a semantics module. While complete syntax trees have a certain preference (which might turn out to be wrong based on semantic knowledge), a flat syntactic representation goes as far as possible using only local syntactic knowledge for disambiguation.
Since semantic analysis is domain-dependent, the semantic categories can differ for different domains. We have worked particularly on two domains: railway counter interactions (called: Regensburg train corpus) and business meeting arrangements (called: Blaubeuren meeting corpus). There was about 3/4 overlap between the semantic categories of the train corpus and the meeting corpus (Wermter & Weber, 1996b). Differences occurred mainly for verbs, e.g., NEED-events are very frequent in the railway counter interactions while SUGGEST-events are frequent in the business meeting interactions. The semantic categories of the railway counter interactions were described in previous work (Weber & Wermter, 1995). Here we will primarily focus on the semantic categories of the meeting corpus. The basic semantic categories for a word are shown in Table 3.
Table 3: Basic semantic categories |
At a higher level of abstraction, each word can belong to an abstract semantic category. The possible abstract semantic categories are shown in Table 4.
Table 4: Abstract semantic categories |
In summary, these categories provide a basis for a flat analysis. Each word is represented syntactically and semantically in its context by four categories at two basic and two abstract levels.