next up previous
Next: Evaluation of the Architectural Up: Analysis Previous: Multiple threads

   
Coverage and Ambiguity of the Relations Defined in the Model

A question naturally arises from the evaluation presented in the previous section: in using a less complex focus model, have we merely ``pushed aside'' the ambiguity into the set of deictic and anaphoric relations? In this section, we assess the ambiguity of the anaphoric relations for the NMSU and CMU training sets. This section also presents other evaluations of the relations, including an assessment of their coverage, redundancy, how often they are correct, and how often they are applicable.

The evaluations presented in this section required detailed, time-consuming manual annotations. The system's annotations would not suffice, because the implementation does not perfectly recognize when a rule is applicable. A sample of four randomly selected dialogs in the CMU training set and the four dialogs in the NMSU training set were annotated.

The counts derived from the manual annotations for this section are defined below. Because this section focuses on the relations, we consider them at the more specific level of the deictic and anaphoric rules presented in Online Appendix 1. In addition, we do not allow trivial extensions of the relations, as we did in the evaluation of the focus model (Section 8.1). The criterion for correctness in this section is the same as for the evaluation of the system: a field-by-field exact match with the manually annotated correct interpretations. There is one exception. The starting and end time of day fields are ignored, since these are known weaknesses of the rules, and they represent a relatively minor proportion of the overall temporal interpretation.

The following were derived from manual annotations.

The values for each data set, together with coverage and ambiguity evaluations, are presented in Table 7.

 
Table 7: Coverage and Ambiguity
CMU Training Set
4 randomly selected dialogs
TimeRefs TimeRefsC Interp CorrI IncI DiffI DiffICorr
78 74 165 142 23 91 85

Coverage (TimeRefsC / TimeRefs) = 95%
Ambiguity (DiffICorr / TimeRefsC) = 1.15
Overall Ambiguity (DiffI / TimeRefs) = 1.17
Rule Redundancy (CorrI / TimeRefsC) = 142/74 = 1.92 %

NMSU Training Set
4 dialogs
TimeRefs TimeRefsC Interp CorrI IncI DiffI DiffICorr
98 83 210 154 56 129 106

Coverage (TimeRefsC / TimeRefs) = 85%
Ambiguity (DiffICorr / TimeRefsC) = 1.28
Overall Ambiguity (DiffI / TimeRefs) = 1.32
Rule Redundancy (CorrI / TimeRefsC) = 154 / 83 = 1.86 %
 

The ambiguity for both data sets is very low. The Ambiguity figure in Table 7 represents the average number of interpretations per temporal reference, considering only those for which the correct interpretation is possible (i.e., it is DiffICorr / TimeRefsC). The table also shows the ambiguity when all temporal references are included (i.e., DiffI / TimeRefs). As can be seen from the table, the average ambiguity in both data sets is much less than two interpretations per utterance.

The coverage of the relations can be evaluated as (TimeRefsC / TimeRefs), the percentage of temporal references for which at least one rule yields the correct interpretation. While the coverage of the NMSU data set, 85%, is not perfect, it is good, considering that the system was not developed on the NMSU data.

The data also show that there is often more than one way to achieve the correct interpretation. This is another type of redundancy: redundancy of the data with respect to the model. It is calculated in Table 7 as (CorrI / TimeRefsC), that is, the number of correct interpretations over the number of temporal references that have a correct interpretation. For both data sets, there are, on average, roughly two different ways to achieve the correct interpretation.

Table 8 shows the number of times each rule applies in total (column 3) and the number of times each rule is correct (column 2), according to our manual annotations. Column 4 shows the accuracies of the rules, i.e., (column 2 / column 3). The rule labels are the ones used in Online Appendix 1 to identify the rules.


 
Table 8: Rule Applicability Based on Manual Annotations
CMU Training Set
4 randomly selected dialogs
Rule Correct Total Accuracy
D1 4 4 1.00
D2i 0 0 0.00
D2ii 35 40 0.88
a frame-of-reference deictic relation
D3 1 2 0.50
D4 0 0 0.00
D5 0 0 0.00
D6 2 2 1.00
A1 45 51 0.88
a co-reference anaphoric relation
A2 0 0 0.00
A3i 1 1 1.00
A3ii 35 37 0.95
a frame-of-reference anaphoric rel.
A4 14 18 0.78
a modify anaphoric relation
A5 0 0 0.00
A6i 2 2 1.00
A6ii 1 1 1.00
A7 0 1 0.00
A8 0 0 0.00
NMSU Training Set
4 dialogs
Rule Correct Total Accuracy
D1 4 4 1.00
D2i 0 0 0.00
D2ii 24 36 0.67
a frame-of-reference deictic relation
D3 6 9 0.67
D4 0 1 0.00
D5 0 0 0.00
D6 0 0 0.00
A1 57 68 0.84
a co-reference anaphoric relation
A2 5 5 1.00
A3i 0 0 0.00
A3ii 21 32 0.66
a frame-of-reference anaphoric rel.
A4 27 37 0.73
a modify anaphoric relation
A5 0 1 0.00
A6i 7 9 0.78
A6ii 0 0 0.00
A7 0 0 0.00
A8 0 0 0.00
 

The same four rules are responsible for the majority of applications in both data sets, the ones labeled D2ii, A1, A3ii, and A4. The first is an instance of the frame of reference deictic relation, the second is an instance of the co-reference anaphoric relation, the third is an instance of the frame of reference anaphoric relation, and the fourth is an instance of the modify anaphoric relation.

How often the system considers and actually uses each rule is shown in Table 9. Specifically, the column labeled Fires shows how often each rule applies, and the column labeled Used shows how often each rule is used to form the final interpretation. To help isolate the accuracies of the rules, these experiments were performed on unambiguous data. Comparing this table with Table 8, we see that the same four rules shown to be the most important by the manual annotations are also responsible for the majority of the system's interpretations. This holds for both the CMU and NMSU data sets.


 
Table 9: Rule Activation by the System on Unambiguous Data
CMU data set
Name Used Fires
D1 16 16
D2i 1 3
D2ii 78 90
a frame-of-reference deictic relation
D3 5 5
D4 9 9
D5 0 1
D6 2 2
A1 95 110
a co-reference anaphoric relation
A2 2 24
A3i 1 1
A3ii 72 86
a frame-of-reference anaphoric rel.
A4 45 80
a modify anaphoric relation
A5 4 5
A6i 10 10
A6ii 0 0
A7 0 0
A8 1 1
NMSU data set
Name Used Fires
D1 4 4
D2i 2 2
D2ii 20 31
a frame-of-reference deictic relation
D3 2 3
D4 0 0
D5 0 0
D6 0 0
A1 46 65
a co-reference anaphoric relation
A2 6 12
A3i 0 2
A3ii 18 27
a frame-of-reference anaphoric rel.
A4 24 42
a modify anaphoric relation
A5 3 5
A6i 6 8
A6ii 0 0
A7 0 0
A8 0 0
 


next up previous
Next: Evaluation of the Architectural Up: Analysis Previous: Multiple threads