Evaluation of the Focus Model

Next: Cases in which the Up: Analysis Previous: Analysis

Evaluation of the Focus Model

The algorithm presented here does not include a mechanism for recognizing the global structure of the discourse, such as in the work of Grosz and Sidner [10], Mann and Thompson [23], Allen and Perrault [3], and in descendent work. Recently in the literature, Walker [39] argues for a more linear-recency based model of attentional state (though not that discourse structure need not be recognized), while Rosé et al. [31] argue for a more complex model of attentional state than is represented in most current computational theories of discourse.

Many theories that address how attentional state should be modeled have the goal of performing intention recognition as well. We investigate performing temporal reference resolution directly, without also attempting to recognize discourse structure or intentions. We assess the challenges the data present to our model when only this task is attempted.

The total number of Temporal Units and the number of them specified by anaphoric noun phrases in the two training data sets are given in Figure 7.⁴ There are different units that could be counted, from the number of temporal noun phrases to the number of distinct times referred to in the dialog. Here, we count the entities that must be resolved by a temporal reference resolution algorithm, i.e., the number of distinct temporal units specified in each sentence, summed over all sentences. Operationally, this is a count of Temporal Units after the normalization phase, i.e., after Step 1 in Section 5.2. This is the unit considered in the remainder of this paper.

**Figure 7:** Counts of Temporal Unit References in the Training Data
$\begin{figure}\begin{center} \begin{tabular}{\vert l\vert c\vert c\vert} \hline ... ...1 \\ % \hline Total & 292 & 238 \\ \hline \end{tabular}\end{center}\end{figure}$

To support the evaluation presented in this section, antecedent information was manually annotated in the training data. For each Temporal Unit specified by an anaphoric noun phrase, all of the antecedents that yield the correct interpretation under one of the anaphoric relations were identified, except that, if both TU_i and TU_j are appropriate antecedents, and one is an antecedent of the other, only the more recent one is included. Thus, only the heads of the anaphoric chains existing at that point in the dialog are included. In addition, competitor discourse entities were also identified, i.e., previously mentioned Temporal Units for which some relation could be established, but the resulting interpretation would be incorrect. Again, only Temporal Units at the head of an anaphoric chain were considered. To illustrate these annotations, Figure 8 shows a graph depicting anaphoric chain annotations of an NMSU dialog (dialog 9). In the figure, solid lines link the correct antecedents, dotted lines show competitors, and edges to nowhere indicate deictics.

**Figure 8:** Anaphoric Annotations of Part of NMSU Dialog 9.
$\begin{figure}\centerline{\psfig{figure=ana.ps,width=0.99\textwidth}} \end{figure}$

Next: Cases in which the Up: Analysis Previous: Analysis