The performance of the system that uses automatic features (including auto-SLU-success) for the first utterance is given in Table 6. This system has an overall accuracy of 69.6%. These results show that, given the first exchange, the ruleset predicts that 18.3% of the dialogues will be problematic, while 33% of them actually will be. Of the problematic dialogues, it can predict 31.6% of them. Once it predicts that a dialogue will be problematic, it is correct 56.6% of the time.
![]() |
![]() |
|
The performance of the system that uses automatic features for Exchanges 1&2 is summarized in Table 7. These results show that, given the first two exchanges, this ruleset predicts that 20% of the dialogues will be problematic, while 33% of them actually will be. Of the problematic dialogues, it can predict 49.5% of them. Once it predicts that a dialogue will be problematic, it is correct 79.7% of the time. This classifier has an improvement of 17.87% in recall and 23.09% in precision, for an overall improvement in accuracy of 9.6% over using the first exchange alone.