Project Report ---- the chinese generation system(CGS) Wei Fang 房 蔚 This semester I have read and analysed the Chinese generation system which was accomplished by Doctor Tangqiu Li. This system translates from interlingua to Chinese, and successfully implements the translation of 215 interlingua sentences. The output of most of these sentences is very good. Here I am going to analyse this system in detail. The architecture of this generation system is: Grammar files: sen.gra (for generating sentences) np.gra (for handling noun phrase generation) tran.gra (for translation) Use Genkit to compile the grammar files into lisp files: sen_gen.lisp (containing the "generator" function) np_gen.lisp tran_trf.lisp "ilt" file: a test file containing 215 sample interlinguas "lexicon.chinese" file: mapping from interlingua lexicon to Chinese "gen-sys.lisp" file: Doctor Li's lisp file build hashtable for lexicon and interlingua sentences get Fstructure from the current interlingua sentence use generator function do generation First I will talk about the system itself. The 215 interlingua sentences surely don't cover all the phenomenon of the interlingua system, but it is still very representative. This CGS has covered all the sentence structures which appear in the "ilt" file(a test file containing 215 sample interlingua), and most of its grammar rules for Chinese generation are general. For dealing with some specific sentence structures, it uses some specific grammar rules in order to get the best output for those sentences. I will analyse this system from general to specific in section 1, and then I will talk about the "ilt" which is used here in section 2. In section 3, I am going to suggest what we are going to do next to improve this system. Section 1. CGS system Step 1. Build a *sentence-table* for the 215 interlingua sentences, and build a *lexicon* for the mapping from interlingua lexicon to Chinese. Step 2. Use the sentence number index to get the corresponding interlingua sentence, use function "lexicon-lookup" to translate the interlingua to the Fstructure which replaces the lexicon in interlingua with Chinese lexicon. Step 3. Use Genkit compiled grammar for generation. I think most of the grammar rules for generating Chinese sentences from the interlingua are general. 1. The sentence coverage is as follows: (1) label sentence label : sentence. for example: Note: The heat content rating per unit volume ... ---> 注意: ... (2) hyphen sentence --sentence. for example: = Anticipate the average outside temperature for the area of operation.---> --... (3) bullet sentence bullet. sentence for example: 1. Operate the engine until the engine reaches normal operating temperature. (4) heading sentence sentence (without punctuation) for example: Effect of Cold Weather on Fuel (5) imperative sentence for example: Avoid sharp angles. (6) discourse-cohesion sentence discourse-cohesion, sentence. for example: However, the filter should not be too fine. ---> 然而, ... (7) pre-condition sentence if pre-condition, sentence. for example: Only add the coolant conditioner precharge if you are not using Caterpillar antifreeze. A if B ---> 如果 B, A (8) reason sentence because reason, sentence. for example: The new dipsticks are different because they have a "FULL RANGE" mark. A because B ---> 因为 B, A (9) when-event sentence for example: Remember that the capacity of an eyebolt decreases when the angle between the supporting members and the object becomes less than 90 *. A when B ---> 当 B 时, A (10) although-event sentence for example: Although the PEEC system dissipates heat into the fuel, fuel heaters are still necessary. Although B, A ---> 尽管 B, A (11) during-event sentence for example: You can use No. 2 diesel fuel in diesel engines during cold weather. A during B ---> 当 B(root is verb), A 在 B(root is noun) 期间 A (12) until-event sentence for example: Operate the engine until the engine reaches normal operating temperature. A until B ---> A 直到 B (13) but-event sentence for example: The fuel heater should be mechanically simple, but the heater must be adequate for the application. A, but B ---> A, 但是 B (14) sequential-event sentence for example: Drain the oil and change the oil filter. A and B ---> A, B (15) or-event sentence for example: Add liquid conditioner or replace the coolant conditioner element. A or B ---> A 或 B (16) purpose sentence for example: Slowly loosen the radiator filler cap in order to relieve pressure. A in order to B ---> 为了 B(root is noun) A 为了 B(root is verb), A All the above sentence structures are translated into Chinese correctly, and without any lack of generality. 2. Special tense markers In Chinese, we use some specific words as the tense marker in English. For example, we use "在" or "正在" as the progressive tense; use "将" or "将要" as the future tense. Also we use "已经" as the perfect tense. These are correctly implemented in this system. 3. Case marking No accusative case in Chinese. That means "he" and "him" is the same in Chinese. 3. Number and person Generally speaking, there is no number and person agreement between subject and verb or object, no agreement between adjective and noun in Chinese. But "I" is different from "we", "she" is different from "he", and "this" is different from "that", etc. All these differences have been implemented in this system. 4. Proposition Proposition should be translated into Chinese as follows: B that C ---> C 的 B This translation is general and it is handled well in CGS. 5. Some limitations In the generation grammar rules, we can find some rules are trying to match the root(the basic string for the structure). These kinds of grammar rules are specific for some particular sentences or words. In order to achieve high accuracy translation, these kinds of rules are necessary sometimes. For example: if the root is "design"(设计) and it is a passive sentence, the translation should be: theme 是 为 purpose 设 计 的. theme is for purpose design. if the root is "make"(zhizao) and it is a passive sentence, the translation should be: theme 是 由 agent 制 造 的. theme is by agent made. This kind of translation is the best. So we need some very specific rules to get this output. One way to make these kind of rules more general is to put a class of words together, and try to match the class instead of the root. To improve this we must determine criteria for expanding these roots into classes. Some translations don't handle the tense marker well. For example, the output of sentence 72 is not satisfactory. 72. Otherwise, the additives will have no effect. The current output is: 否则, 添加剂 将 有 没有 影响. otherwise, the additives will have nothave effect. This translation will make the reader confused. The correct output should be: 否则, 添加剂 将 没有 影响. otherwise, the additives will nothave effect. To make the output more accurate, this should be addressed. Section 2. The interlingua Our CGS just uses an interlingua file with 215 sentences. This interlingua covers: Heads: *O- noun *P- adjective *E- verb Structual Elements: & adverb *UNIT-RANGE *QUOTED-TERM *PP-COORDINATION *COORDINATION *PRON *NUMBER *UNITS *MULTIPLE* Besides all the features we mentioned above, there are over 100 features, which are very important to the understanding of this system. Please refer to the Appendix. These features are just a part of the real interlingua system. To make this system more useful, we need to try to cover some other representative interlingua structures. This is the task of my next step. Section 3. What I could do next semester I have mentioned all the system's limitation and interlingua's limitation. So there are still a lot of things need to do next. First, determine the places where the supported interlingua has been extended in the most recent KANT system, and add some more rules into this system to cover the phenomenon. Second, Make the grammar rules more general to cover more sentence structures. Appendix Features Covered in the CGS NUMBER PERSON GENDER MOOD REFERENCE HEADING COLON HYPHEN LABEL NUMBER-BULLET PARENTHETICAL ATTRIBUTE TYPE MODAL COMPULSION OBLIGATION TENTATIVE HAS-PART BELONGS-TO QUOTED-LABEL QUOTED-STRING THEME PATIENT OBJECT AGENT AS-THEME AS-OBJECT PURPOSE REASON SEARCH-FOR GOAL AND-CONJUNCTION OR-CONJUNCTION AND/OR-CONJUNCTION CONJUNCTS PASSIVE NEGATION PERFECT TENSE PROG GERUNDIVE NEAR FAR PLEONASTIC-PRON COFACTOR POSSESSIVE POSSESSOR EXPERIENCER BENEFACTOR PREDICATED-OF-THEME PREDICATED-OF-EVENT PREDICATE EVENT SEQUENTIAL-EVENT DURING-EVENT WHEN-EVENT FOR-EVENT BUT-EVENT OR-EVENT UNTIL-EVENT ALTHOUTH-EVENT PROPOSITION LOCATION DESTINATION SOURCE PRE-CONDITION CONDITION TIME-POINT TIME-EXTENT TEMP-POINT METRIC QUANTITY DECIMAL INTEGER ENGLISH PERCENT TEXT-REFERENT EVENT-FREQUENCY DISCOURSE-COHESION EVENT-CONTRAST INTENSIFIER MANNER SERIAL-NUMBER-CONCEPT SERIAL-NUMBER-INTEGER INSTRUMENT BETWEEN-ITEMS PRE-MEASUREMENT-UNIT THAN-OBJECT FUNCTION ABOVE UNDER OUTSIDE LOCATED-IN PATH START END OR-COORDINATION EXTENT PAREN-MATERIAL AS-POSSIBLE OPPOSITION PROPORTION SUBSTANCE MATCH