In order to show the importance of defining an adequate anaphoric accessibility
space, a study of the location of the antecedent of each pronominal and
adjectival anaphora was done using the training corpus. The results are
given in Table
1.9
Table 1: Structural anaphoric accessibility space results
1 The antecedent is found in the same
adjacency pair as the anaphor
2 The antecedent is found in the previous
adjacency pair to the one containing the anaphor
3 The antecedent is found in the adjacency
pair containing the adjacency pair including the anaphor
4 The antecedent is found in the topic
of the dialogue
5 The antecedent is found elsewhere
As can be seen in the table, 95.9% of the antecedents were located in the
proposed structural anaphoric accessibility space. It is estimated that
the remaining antecedents (4.1%) are located in the subtopics of the dialogues.10
In order to incorporate these remaining antecedents into the anaphoric
accessibility space, one might employ a strategy that uses the full
space (i.e., all the noun phrases from the beginning of the dialogue
to the anaphor might be used). However, as shown in Table 2,
our proposal for the anaphoric accessibility space (hereafter referred
to as structural), reduces the average number of candidates per
anaphor (before applying constraints) to 10.74 from the 34.14 that would
be obtained if the full space approach were adopted. In others words,
using the full space approach would increase the number of possible
candidates by a factor of three, thereby greatly increasing both the required
computational effort and the possibility of selecting incorrect antecedents.
Notice, too, that these experiments were performed over a collection of
short dialogues (around 332 words per dialogue). These problems will be
even more acute in longer dialogues.
Table 2: Candidates to be processed for each anaphoric
accessibility space
Anaphoric accessibility space
Structural
Full space
Window of utterances
Total candidates
1,063
3,380
1,292
Candidates per anaphor
10.74
34.14
13.05
Proportion
100%
318%
122%
Other researchers have proposed using a window with a fixed number of sentences
to define the anaphoric accessibility space. This type of approach might
be called a window of sentences approach. For example, Ferrández
et
al. 2000 propose using the three previous sentences to define the
accessibility space for pronouns and the four previous sentences for adjectival
anaphora in Spanish. For English, Kameyama
1997 proposes the same space for the pronominal. However, there is
no structural justification for these definitions. Ferrández et
al. and Kameyama performed several empirical studies to show the optimal
space for each experiment. Table 3 below shows the results
of a study which we performed using the Corpus Infotren: Person,
the goal of which was to define an anaphoric accessibility space based
on a window of sentences that can then be adapted to dialogues by
means of a window of utterances. As the table shows, 11 utterances
for pronominal anaphora and 10 utterances for adjectival anaphora are needed
in order to cover the same number of antecedents as was covered using the
structural anaphoric accessibility space (which was defined based on adjacency
pairs and the topic). Since the anaphoric space using a window of utterances
is not based on any principle, but rather on empirical studies, it may
vary from one text to another and therefore is inadequate. Moreover, the
structural anaphoric accessibility space can cover only those cases that
refer to NPs introduced at the outset of the dialogue (topics), not those
with a window of sentences/utterances approach.
In conclusion, it would appear that the structural anaphoric
accessibility space is to be preferred, at least for anaphora resolution
in dialogues.
Table 3: Empirical study of anaphoric accessibility space
based on a window of utterances