Project Report ---- the chinese generation system(CGS) Wei Fang 房 蔚 This semester I have read and analysed the chinese generation system which was accomplished by Doctor Tangqiu Li. This system translates from interlingua to chinese, and successfully implements the translation of 215 interlingua sentences. The output of most of these sentences is very good. Here I am going to analyse this system in detail. First I will talk about the system itself. The 215 interlingua sentences surely doesn't cover all the phenomenon of the interlingua system, but it is still very representative. This CGS has covered all the sentence structures which appear in the "ilt" file, and most of its grammar rule for chinese generation is general. For dealing with some specific sentence structures, it uses some specific grammar rules in order to get the best output for those sentences. I will analyse this system from general to specific in section 1, and then I will talk about the "ilt" which is used here in section 2. In section 3, I am going to suggest what we are going to do next to improve this system. Section 1. CGS system Step 1. Build a *sentence-table* for the 215 interlingua sentences, and build a *lexicon* for the mapping from interlingua lexicon to chinese. Step 2. Use the sentence number index to get the corresponding interlingua sentence, use function "lexicon-lookup" to translate the interlingua to the Fstructure which replaces the lexicon in interlingua with chinese lexicon. Step 3. Use grammar rules for generation. I think most of the grammar rules for generating chinese sentence from the interlingua is general. 1. The sentence coverage is as follows: (1) label sentence label : sentence. for example: Note: The heat content rating per unit volume ... ---> 注意: ... (2) hyphen sentence --sentence. for example: = Anticipate the average outside temperature for the area of operation.---> --... (3) bullet sentence bullet. sentence for example: 1. Operate the engine until the engine reaches normal operating temperature. (4) heading sentence sentence (without punctuation) for example: Effect of Cold Weather on Fuel (5) imperative sentence for example: Avoid sharp angles. (6) discourse-cohesion sentence discourse-cohesion, sentence. for example: However, the filter should not be too fine. ---> 然而, ... (7) pre-condition sentence if pre-condition, sentence. for example: Only add the coolant conditioner precharge if you are not using Caterpillar antifreeze. A if B ---> 如果 B, A (8) reason sentence because reason, sentence. for example: The new dipsticks are different because they have a "FULL RANGE" mark. A because B ---> 因为 B, A (9) when-event sentence for example: Remember that the capacity of an eyebolt decreases when the angle between the supporting members and the object becomes less than 90 *. A when B ---> 当 B 时, A (10) although-event sentence for example: Although the PEEC system dissipates heat into the fuel, fuel heaters are still necessary. Although B, A ---> 尽管 B, A (11) during-event sentence for example: You can use No. 2 diesel fuel in diesel engines during cold weather. A during B ---> 当 B(root is verb), A 在 B(root is noun) 期间 A (12) until-event sentence for example: Operate the engine until the engine reaches normal operating temperature. A until B ---> A 直到 B (13) but-event sentence for example: The fuel heater should be mechanically simple, but the heater must be adequate for the application. A, but B ---> A, 但是 B (14) sequential-event sentence for example: Drain the oil and change the oil filter. A and B ---> A, B (15) or-event sentence for example: Add liquid conditioner or replace the coolant conditioner element. A or B ---> A 或 B (16) purpose sentence for example: Slowly loosen the radiator filler cap in order to relieve pressure. A in order to B ---> 为了 B(root is noun) A 为了 B(root is verb), A All the above sentence structures are translated into chinese correctly, and without any lack of generality. 2. Special tense markers In chinese, we use some specific word as the tense marker in English. For example, we use "在" or "正在" as the progre tense; use "将" or "将要" as the future tense. Also we use "已经" as the perfect tense. These are correctly implemented in this system. 3. Case marker No case marker in chinese. That means "he" and "him" is the same in Chinese. 3. Number and person Generally speaking, there are no number and person agreement between subject and verb or object, no agreement between adjective and noun in chinese. But "I" is different from "we", "she" is different from "he", and "this" is different from "that", etc. All these difference have been implemented in this system. 4. proposition Proposition should be translated into chinese as follows: B that C ---> C 的 B This translation is general and it is handled well in CGS. 5. Some limitations In the generation grammar rules, we can find some rules are trying to evaluate the root. These kinds of grammar rules are specific for some particular sentences or words. In order to achieve high accuracy translation, these kinds of rules are necessary sometimes. For example: if the root is "design"(设计) and it is a passive sentence, the translation should be: theme 是 为 purpose 设 计 的. theme is for purpose design. if the root is "make"(zhizao) and it is a passive sentence, the translation should be: theme 是 由 agent 制 造 的. theme is by agent made. This kind of translation is the best. So we need some very specific rules to get this output. One way to make these kind of rules more general is to put a class of words together, and try to evaluate the class instead of the root. This is also the things need improving. How to find a good criterion to put the class of words together is what I should do next. Some translation doesn't handle the tense marker well. For example, the output of sentence 72 is not satisfactory. To make the output more accurate is also I should do next. Section 2. The interlingua Our CGS just uses a interlingua with 215 sentences. This interlingua covers: *O- noun *P- adjective *E- verb & adverb *UNIT-RANGE *QUOTED-TERM *PP-COORDINATION *COORDINATION *PRON *NUMBER *UNITS *MULTIPLE* Besides all the annotation we mentioned above, there are above 100 annotations, which are very important to the understanding of this system. Here is the annotation which Doctor Li covers: NUMBER PERSON GENDER MOOD REFERENCE HEADING COLON HYPHEN LABEL NUMBER-BULLET PARENTHETICAL ATTRIBUTE TYPE MODAL COMPULSION OBLIGATION TENTATIVE HAS-PART BELONGS-TO QUOTED-LABEL QUOTED-STRING THEME PATIENT OBJECT AGENT AS-THEME AS-OBJECT PURPOSE REASON SEARCH-FOR GOAL AND-CONJUNCTION OR-CONJUNCTION AND/OR-CONJUNCTION CONJUNCTS PASSIVE NEGATION PERFECT TENSE PROG GERUNDIVE NEAR FAR PLEONASTIC-PRON COFACTOR POSSESSIVE POSSESSOR EXPERIENCER BENEFACTOR PREDICATED-OF-THEME PREDICATED-OF-EVENT PREDICATE EVENT SEQUENTIAL-EVENT DURING-EVENT WHEN-EVENT FOR-EVENT BUT-EVENT OR-EVENT UNTIL-EVENT ALTHOUTH-EVENT PROPOSITION LOCATION DESTINATION SOURCE PRE-CONDITION CONDITION TIME-POINT TIME-EXTENT TEMP-POINT METRIC QUANTITY DECIMAL INTEGER ENGLISH PERCENT TEXT-REFERENT EVENT-FREQUENCY DISCOURSE-COHESION EVENT-CONTRAST INTENSIFIER MANNER SERIAL-NUMBER-CONCEPT SERIAL-NUMBER-INTEGER INSTRUMENT BETWEEN-ITEMS PRE-MEASUREMENT-UNIT THAN-OBJECT FUNCTION ABOVE UNDER OUTSIDE LOCATED-IN PATH START END OR-COORDINATION EXTENT PAREN-MATERIAL AS-POSSIBLE OPPOSITION PROPORTION SUBSTANCE MATCH These annotations are just a part of the real interlingua system. To make this system more useful, we need to try to cover some other representative interlingua structures. This is the task of my next step. Section 3. What I could do next semester I have mentioned all the system's limitation and interlingua's limitation. So there are still a lot of things need to do next. First, take some more representative sentence structures from the real interlingua system, and add some more rules into this system to cover the phenomenon. Second, Make the grammar rules more general to cover more sentence structures.