Introduction

Anaphora resolution is one of the most active areas of research in Natural Language Processing (NLP). The comprehension of anaphora is an important process in any NLP system, yet it is among the toughest problems in computational linguistics and NLP. According to Hirst 1981:

Anaphora, in discourse, is a device for making an abbreviated reference (containing fewer bits of disambiguating information, rather than being lexically or phonetically shorter) to some entity (or entities) in the expectation that the receiver of the discourse will be able to disabbreviate the reference and, thereby, determine the identity of the entity.

The reference to an entity (e.g., a pronoun) is generally called an anaphor, the entity to which the anaphor refers is its referent, and the previous reference to the same entity is the anaphor's antecedent. For instance, in the statement ``John_i ate an apple. He_i was hungry'', the pronoun he is the anaphor and the noun John is the antecedent.

An anaphoric problem can be described as lying somewhere between the resolution and the generation of anaphora, the former term being the disabbreviating of the reference and the latter being the abbreviating form of the reference to an entity. This paper focuses exclusively on the resolution of anaphora and not on their generation. Anaphora can be classified in many different ways, depending upon the particular criteria one chooses to employ. Regarding the element that carries out the reference (the anaphor), for example, clear distinctions should be made between pronominal anaphora, adjectival anaphora, definite descriptions, one-anaphora, surface-count anaphora, verbal-phrase anaphora, and time and/or location references. This paper focuses on the resolution of pronominal and adjectival anaphora.¹

It is widely agreed that the process of resolving anaphora in natural language texts may be supported by a variety of strategies that employ different kinds of knowledge. By different kinds of knowledge we mean the various sources of information usually employed for anaphora resolution, including morphological agreement, syntactic parallelism, semantic information, discourse structure, topical knowledge, and so on.

Natural language processing (NLP), and, specifically, anaphora resolution, uses many resources and sources of information for two reasons: (1) numerous resources are available to the scientific community; and (2) humans employ many sources of information in order to resolve different linguistic phenomena.

We present an algorithm that coordinates different forms of knowledge by distinguishing between linguistic knowledge (constraints and preferences) and dialogue-structure knowledge (anaphoric accessibility space). The algorithm identifies the noun phrase to which a third-person personal or demonstrative pronoun or adjectival anaphor ² refers in a Spanish dialogue. We call this algorithm ARDi (anaphora resolution in dialogues). ARDi was implemented in Prolog.

In Section 2 below, we present related work on anaphora resolution in dialogues. In Section 3, we suggest an annotation scheme for capturing Spanish dialogue structure. In Section 4, an accessibility space based on this annotation scheme is defined. In Section 5, we present the algorithm ARDi. Finally, an experimental study of the algorithm is presented in Section 6.

patricio 2001-10-17