An automatic topic detection proposal

Next:Anaphora resolution in SpanishUp:Accessibility space proposalPrevious:Description

An automatic topic detection proposal

Several works about automatic topic detection have been published--Reynar 1999, Youmans 1991, or Hearst 1994--. In Martínez-Barco et al. 1999 an automatic topic detection algorithm as applied to anaphora resolution is presented.

This algorithm selects noun phrases (NP) occurring before an anaphor. These NPs are included in a list that is then weighted. Each time the NP appears in a new turn (frequency), its weight is increased, and each time the NP does not appear in a new turn (infrequency), its weight is decreased. According to this algorithm, the dialogue topic may be determined by its salience, i.e., by determining the NP with the heaviest weight (high frequency in a short distance) occurring before an anaphor. In order to obtain this information (weight), the algorithm uses the following two coefficients:

C_f: coefficient of frequency
C_i: coefficient of infrequency

C_f increases the salience of a referring expression when the entity appears in the current intervention turn. C_i decreases the salience of expressions that appeared in previous intervention turns but not in the current one, indicating a loss of importance. Both coefficients obviously affect the salience of expressions in reflecting their frequency and their distance from the current intervention turn where the anaphor has been found. The expression with the highest salience will be the most favored candidate antecedent on the whole list and therefore the most relevant topic for the current intervention turn.

This automatic topic detection method has the following advantage over other methods: it does not obtain a single topic, but rather a list of topic candidates ordered by salience. That is important for our anaphora resolution system because, if the highest-ranked candidate does not fulfill the relevant constraints, then the next highest candidate can be tested.

Initially, values of 10 units and 1 unit, respectively, were assigned to C_f and C_i. These values were arrived at experimentally, but further study could lead to more precise values.

Next:Anaphora resolution in SpanishUp:Accessibility space proposalPrevious:Description

patricio 2001-10-17