Evaluation of Anaphora Resolution in English

Next: AGIR's Generation Module Up: The Anaphora-Resolution Module Previous: Evaluation of Anaphora Resolution

Evaluation of Anaphora Resolution in English

The algorithm for anaphora resolution in English is based on the one developed for Spanish, and it has been conveniently adapted for English. The main difference between the two algorithms consists in a different order of the preferences obtained after the training phase. After this phase, we extracted the following conclusions:

Spanish has more morphological information than English. As a consequence, morphological constraints in Spanish discard more candidates than constraints in English.
Spanish is a nearly free-order language, in which the different constituents of a sentence (subject, object, etc.) can appear almost at any position. For this reason, the preference of syntactic parallelism has a more important role in the anaphora-resolution method in English than in Spanish.
Spanish sentences are usually longer than English ones. This fact implies more candidates for Spanish anaphors than for English ones.

After the training phase, the algorithm was evaluated over the test corpus. In the evaluation phase, two experiments were carried out. In the first experiment, only lexical, morphological, and syntactic information was used. The obtained results with the SemCor and MTI corpora appear in Table 7.

Table 7: Anaphora resolution in English, evaluation phase: experiment 1

	He	She	It	They	Him	Her	Them	Corr	Total	P(%)
SEMCOR	116	10	38	50	34	0	6	175	254	68.9
MTI	1	0	347	56	0	0	66	361	470	76.8

The table shows the number of pronouns (classified by type) for the different corpora. The last three columns represent the number of correctly solved pronouns, the total number of pronouns, and the obtained precision, respectively. For instance, in the MTI corpus a precision of 76.8% was obtained.

Discussion. In pronominal anaphora resolution in English, the following results were obtained in the first experiment: SemCor corpus, P = 68.9%, R = 66%; MTI corpus, P = 76.8%, R = 72.9%.

From these results, we have extracted the following conclusions:

The types of pronouns vary considerably according to the corpus. In the SemCor corpus, 15% of the pronouns are occurrences of the it pronoun, whereas in the MTI corpus this percentage is 73.8%. This fact is explained by the kind and domain of each corpus. The SemCor is a corpus with a narrative style which contains a lot of person entities¹⁴ that are referred to in the text with the use of personal pronouns (he, she, and they). On the other hand, the MTI corpus is a collection of technical manuals that contains almost no person entities. Rather, most references are to object entities, using it pronouns.
In the SemCor corpus, errors originated from different causes:
- The lack of semantic information caused 57% of the global mistakes. There were seventeen mistakes in the resolution of it pronouns, in which the system proposed a person entity as solutions for these pronouns. On the other hand, twenty-eight occurrences of the pronouns he, she, him, and her were incorrectly solved due to the system proposing an object or animal entity as the solution.
- There were exceptions in the applications of preferences (38%), mainly due to the existence of a large number of candidates compatible with the anaphor¹⁵.
- There were mistakes in the POS tagging (5%).
In the MTI corpus, errors were mainly produced in the resolution of it pronouns (73.4% of the global mistakes). The it pronoun lacks gender information (it is valid for masculine and feminine) and subsequently there are a lot of candidates per anaphor¹⁶. This fact originates errors in the application of preferences. The remaining errors are originated by the lack of semantic information.

After analyzing the results, it was observed that the precision of the SemCor corpus was approximately 8% lower than that for the MTI corpus. The errors in the SemCor corpus mainly originated with the lack of semantic information. Therefore, in order to improve the obtained results, a second experiment was carried out with the addition of semantic information.

The modifications to the second experiment were the following:

Two new semantic constraints--presented in [Saiz-Noeda et al., 2000]--were added to the morphological and syntactic constraints:
1. The pronouns he, she, him, and her must have as the antecedents person entities.
2. The pronoun it must have as its antecedent a non-person entity.
To apply these new constraints, the twenty-five top concepts of WordNet (the concepts at the top level in the ontology) were grouped into three categories: person, animal, and object. Subsequently, WordNet was consulted with the head of each candidate, and thus the semantic category of the antecedent was obtained.
This experiment was exclusively carried out with the SemCor corpus because it is the only one in which content words are annotated with their WordNet sense.

Table 8 shows the number of pronouns (classified by type) for the different corpora after these changes were incorporated.

Table 8: Anaphora resolution in English, evaluation phase: experiment 2

	He	She	It	They	Him	Her	Them	Corr	Total	P(%)
SEMCOR	116	10	38	50	34	0	6	220	254	86.6
MTI	1	0	347	56	0	0	66	361	470	76.8

As shown in Table 8, the addition of the two simple semantic constraints resulted in considerable improvement in the obtained precision (approximately 18%) for the SemCor corpus. We concluded that the use of semantic information (such as new constraints and preferences) in the process of anaphora resolution will improve the results obtained.

Finally, Table 9 compares anaphora resolution using AGIR with the other approaches previously presented¹⁷. It is important to emphasize the high percentages obtained using our system and Hobbs's method in the SemCor corpus; both systems incorporate semantic information¹⁸ into their methods using semantic constraints (selectional restrictions), whereas none of the other authors incorporate semantics in their approaches.

Table 9: Anaphora resolution in English, comparison of AGIR with other approaches

	Proximity	Hobbs	Lappin	Strube	AGIR
SEMCOR	37.0	81.9	59.4	59.4	86.6
MTI	54.9	66.0	75.1	63.2	76.8

Next: AGIR's Generation Module Up: The Anaphora-Resolution Module Previous: Evaluation of Anaphora Resolution

Jesus Peral 2002-12-13