Next: Evaluation of the Focus Up: An Empirical Approach to Previous: Other Work on Temporal

Analysis

The implementation is an important proof of concept. However, as discussed in Section 6, various kinds of errors are reflected in the results; many are not directly related to discourse processing or temporal reference resolution. Examples are (1) completely null inputs, when the semantic parser or speech recognizer fails, (2) numbers mistaken as dates, and (3) failures to recognize that a relation can be established, due to a lack of specific domain knowledge.

To evaluate the algorithm itself, in this section we separately evaluate the components of our method for temporal reference resolution. Sections 8.1 and 8.2 assess the key contributions of this work: the focus model (in Section 8.1) and the deictic and anaphoric relations (in Section 8.2). These evaluations required us to perform extensive additional manual annotation of the data. In order to preserve the test dialogs as unseen test data, these annotations were performed on the training data only. In Section 8.3, we isolate the architectural components of our algorithm, such as the certainty factor calculation and the critics, to assess the effects they have on performance.

Next: Evaluation of the Focus Up: An Empirical Approach to Previous: Other Work on Temporal