Scheduling dialogs, during which people negotiate the times of
appointments, are common in everyday life. This paper reports the
results of an in-depth empirical investigation of resolving explicit
temporal references in scheduling dialogs. There are four
phases of this work: data annotation and evaluation, model development,
system implementation and evaluation, and model evaluation and
analysis. The system and model were developed primarily on one set of
data, and then applied later to a much more complex data set, to
assess the generalizability of the model for the task being performed.
Many different types of empirical methods are applied
to pinpoint the strengths and weaknesses of the approach.
Detailed
annotation instructions were developed and an intercoder reliability
study was performed, showing that naive annotators can reliably perform
the targeted annotations. A fully automatic system has been developed
and evaluated on unseen test data, with good results on both data
sets. We adopt a pure realization of a recency-based focus model to
identify precisely when it is and is not adequate for the task being
addressed. In addition to system results, an in-depth evaluation of
the model itself is presented, based on detailed manual annotations.
The results are that few errors occur specifically due to the model of
focus being used, and the set of anaphoric relations defined in
the model are low in ambiguity for both data sets.