Next: Anaphora Resolution and its
Up: Translation of Pronominal Anaphora
Previous: Translation of Pronominal Anaphora
The anaphora phenomenon can be considered one of the most
difficult problems in natural language processing
(NLP). The etymology of the term anaphora originates with the Ancient
Greek word ``anaphora'' (
), which
is made up of the separate words,
(``back, upstream, back in an upward direction'') and
(``the act of carrying''), and which denotes the
act of carrying back upstream.
Presently, various definitions of the term anaphora exist,
but the same concept underlies all of them. Halliday & Hassan
[Halliday & Hasan, 1976] defined anaphora as ``the cohesion
(presupposition) which points back to some previous item.'' A more
formal definition was proposed by Hirst [Hirst, 1981], which
defined anaphora as ``a device for making an abbreviated reference
(containing fewer bits of disambiguating information, rather than
being lexically or phonetically shorter) to some entity (or
entities) in the expectation that the receiver of the discourse
will be able to disabbreviate the reference and, thereby,
determine the identity of the entity.'' Hirst refers to the entity
as an anaphor, and the entity to which it refers is its
antecedent:
- [Mary]
went to the cinema on Thursday. She
didn't like the
film.
In this example, the pronoun she is the anaphor and
the noun phrase Mary is the antecedent. This type of anaphora is the
most common type, the so-called pronominal
anaphora.
The anaphora phenomenon can be further broken down into two processes:
that of resolution and generation. ``Resolution'' refers to the process of
determining the antecedent of an anaphor; ``generation'' is the
process of creating references over a discourse entity.
In the context of machine translation, the resolution of
anaphoric expressions is of crucial importance in order to
translate/generate them correctly into the target language
[Mitkov & Schmidt, 1998]. Solving the anaphora and extracting the
antecedent are key issues for correct translation into the target
language. For instance, when translating into languages which
mark the gender of pronouns, resolution of the
anaphoric relation is essential. Unfortunately, the majority of MT systems do
not deal with anaphora resolution, and their successful operation
usually does not go beyond the sentence level.
We have employed a computational system that focuses on anaphora
resolution in order to improve MT quality and have then measured
the improvements. The SUPAR (Slot Unification Parser for
Anaphora Resolution) system is presented in the work of
Ferrández, Palomar, & Moreno [Ferrández et al., 1999]. This system
can deal with several kinds of anaphora with good results. For
example, the system resolves pronominal anaphora in Spanish with a
precision rate of 76.8% [Palomar, M., et al., 2001]; it resolves
one-anaphora in Spanish dialogues with a precision rate of 81.5%
[Palomar & Martínez-Barco, 2001], and it resolves definite descriptions in
Spanish direct anaphora and bridging references with precision
rates of 83.4% and 63.3%, respectively [Munoz et al., 2000]. In the
work presented here, we have used an MT system exclusively for
pronominal anaphora resolution and translation. This kind of
anaphora is not usually taken into account by most of the MT
systems, and therefore pronouns are usually translated incorrectly
into the target language. Although we have focused on pronominal
anaphora, our approach can be easily extended to other kinds of
anaphora, such as one-anaphora or definite descriptions previously
resolved by the SUPAR system.
It is important to emphasize that in this work we only
resolve and translate personal pronouns in the third person whose
antecedents appear before the anaphor--that is, an anaphoric
relation between the pronoun and the antecedent is established,
and cataphoric relations (in which the antecedent appears after the anaphor)
are not taken into account.
This paper focuses on the evaluation of the different tasks
carried out in our approach that lead to the final task: the
translation of the pronominal anaphora into the target language.
The main contributions of this work are a presentation and
evaluation of the multilingual anaphora resolution module
(English and Spanish) and an exhaustive evaluation of the
pronominal anaphora translation between these languages.
The paper is organized as follows: Section 2 shows the
anaphora-resolution needs in MT and the deficiencies of
traditional MT systems to resolve this phenomenon conveniently.
Section 3 presents the analysis module of our approach. In Section
4, we identify and evaluate the NLP problems related to pronominal
anaphora resolved in our system. Section 5 presents the generation
module of the system. In Section 6, the generation module is
evaluated in order to measure the efficiency of our proposal.
Finally, we present our conclusions.
Next: Anaphora Resolution and its
Up: Translation of Pronominal Anaphora
Previous: Translation of Pronominal Anaphora
Jesus Peral
2002-12-13