In this experiment, the translation of Spanish, third-person,
personal pronouns and zero pronouns (excluding reflexive
pronouns) into English was evaluated. We
tested the method on the portion of the LEXESP corpus that was previously used
in the process of anaphora resolution.
We needed to know the semantic category and the grammatical gender
of the pronoun's antecedent in order to apply the number and
gender rules. In the LEXESP corpus, due to the lack of semantic
information, a set of heuristics was used to determine the
antecedent's semantic category. On the other hand, the information
about the antecedent's gender was provided by the POS tag of the
antecedent's head. We conducted a blind test over the entire test
corpus, and the results appear in Table
12.
Table 12:
Translation of
pronominal anaphora into English, evaluation phase
Subject
Compl
Correct
Total
P(%)
LEXESP
630
145
657
775
84.8
Discussion. In the translation of Spanish personal
pronouns in the third person into English, an overall
precision of 84.8% (657 out of 775) was obtained. From
these results, we extracted the following conclusions:
All the instances of the Spanish plural pronouns
(ellos, ellas, les, los,
las, and the zero pronouns in plural corresponding to the
English pronouns they and them), were correctly
translated into English. There are two reasons for this:
The semantic roles of these pronouns were correctly
identified in all of the cases.
The equivalent English pronouns (they and
them) lack gender information, that is, they are valid for
masculine and feminine. Therefore, the antecedent's gender did
not influence the translation of these pronouns.
The errors occurred in the translation of the Spanish singular
pronouns (él, ella, le, lo,
la, and in zero pronouns in singular corresponding to the
English pronouns he, she, it, him, and
her). There were different causes for these errors:
There were mistakes in the anaphora-resolution stage (79.7% of the global
mistakes), which caused an incorrect translation into Spanish,
mainly due to the proposed antecedent and the correct one having
different grammatical gender. Sometimes both had the same
gender, but they had different semantic categories.
There were mistakes in the application of the heuristic used to identify the
antecedent's semantic category (20.3%). This involved
the application of an incorrect morphological rule.
Our proposal was compared with the SYSTRANLinks output. As
shown in Table 13, the precision
obtained by the AGIR system was approximately 28% higher than that
obtained by Systran.
Table 13:
Translation of
pronominal anaphora into English, SYSTRANLinks and AGIR
SYSTRANLinks
AGIR
LEXESP
56.9
84.8
The low results obtained in Systran are mainly the result of
errors that occurred in the translation of Spanish zero pronouns.
Specifically, out of 775 Spanish pronouns, 334 errors occurred, and 293 of them (87.7% of the global
errors) originated in the translation of zero pronouns, whereas
the remainder (12.3%) originated in the translation
of the remaining not-omitted pronouns. The errors in the
translation of zero pronouns mainly originated in their
incorrect resolution.