Next: Evaluation of Zero-Pronoun Detection
Up: Elliptical Zero-Subject Constructions (Zero
Previous: Elliptical Zero-Subject Constructions (Zero
To evaluate this task, two experiments were performed: an
evaluation of zero-pronoun detection and an evaluation of
zero-pronoun resolution. In both experiments the method was tested
on two kinds of corpora. In the first instance, we used a portion
of the LEXESP7 corpus that contains a set of
thirty-one documents (38,999 words) from different genres and
written by different authors. The LEXESP corpus contains texts of
different styles and on different topics (newspaper articles about
politics, sports, etc.; narratives about specific topics; novel
fragments; etc.). In the second instance, the method was tested on
a fragment of the Spanish version of Blue Book (BB) corpus (15,571
words), a technical manual that contains the handbook of the
International Telecommunications Union (CCITT) published in
English, French, and Spanish. Both corpora are automatically
tagged by different taggers.
We randomly selected a subset of the LEXESP corpus (three documents
--6,457 words) and a fragment of the Blue Book corpus (4,723
words) as training corpora. The remaining fragments of the corpora
were reserved for test data.
It is important to emphasize that all the tasks presented in this
paper were automatically evaluated after the annotation of each
pronoun (including zero pronouns). To do so, each anaphoric,
third-person, personal pronoun was annotated with the information
about its antecedent and its translation into the target language.
Furthermore, co-reference chains were identified. The annotation
phase was accomplished in the following manner: (1) two
annotators (native speakers) were selected for each language, (2)
an agreement was reached between the annotators with regard to the
annotation scheme, (3) each annotator annotated the corpora, and
(4) a reliability test [Carletta, J., et al., 1997] was done
on the annotation in order to guarantee the results. The
reliability test used the kappa statistic that measures
agreement between the annotations of two annotators in making
judgments about categories. In this way, the annotation is
considered a classification task consisting of defining an
adequate solution among the candidate list. According to Carletta
et al. [Carletta, J., et al., 1997], a k measurement such as
allows us to draw encouraging conclusions, and a
measurement means there is total reliability between
the results of the two annotators. In our tests, we obtained a
kappa measurement of 0.83. Therefore, we consider the
annotation obtained for the evaluation to be totally reliable.
Next: Evaluation of Zero-Pronoun Detection
Up: Elliptical Zero-Subject Constructions (Zero
Previous: Elliptical Zero-Subject Constructions (Zero
Jesus Peral
2002-12-13