Error Correction Using Confidence Annotation in Automatic Speech Recognition. __________________________ This work deals with identifying and rectifying incorrect recognition in automatic speech recognition. Some recent work [1] has been done in this direction using confidence measures [2,3,4] which uses confidence scores as an additional feature along with acoustic and language model scores to improve the recognition performance. In this work, we use the existing confidence annotator [3] to identify incorrectly recognized segments in the decoded utterance. We propose two measures, namely "Overall Likelihood Difference" and "Homogeneity Count Difference". These measures are used to rectify incorrect segments by identifying their correct substitutes in the N-best hypotheses list produced during the decoding. The top hypothesis in the N-best list is the output of the speech decoder. The first best alternative segment of a decoded segment has the highest likelihood within the same time segment in the N-best list. The proposed measures are defined as follows: 1) Overall Likelihood Difference is the difference in likelihood of a decoded segment and its best alternative. The smaller the likelihood difference, the more likely it is that the best alternative segment is the correct segment. 2) Homogeneity Count Difference is the difference in the number of words in the decoded and its best alternative segment, having the homogeneity score above a threshold. The threshold giving maximum reduction in cross entropy [4] is chosen. Correct segments are expected to have higher homogeneity counts compared to incorrect segments. We only consider the first best alternative segment because most of the correct segments are either the decoded segments themselves or their first best alternative. (the distribution goes down almost exponentially with the position) The SPHINX-III speech recognizer was used to decode the 1997 Hub4 broadcast news corpus. The output of the decoder was tagged into class correct/incorrect by the confidence annotator [3]. The confidence annotator identifies 80% of the incorrect words at a false alarm of 45%. As expected, correctly decoded segments show a higher Overall Likelihood Difference compared to incorrectly decoded segments. Also, correct segments show a higher Homogeneity Count compared to incorrect segments. A combined use of the two measures rectifies 50% of the segments where the decoded segment is incorrect and its best alternative segment is correct. Only 15% of the correctly decoded segments were replaced with their incorrect best alternatives. We propose to continue further work in the following directions: 1. Improving confidence annotation. Using the word lattice for identifying the correct segments. 2. As many incorrect segments seemed to be syntactically incorrect, we plan to use parse scores as an extra measure. [1] Rose C. R. et al, "Integration of Utterance Verification with Statistical Language Modeling and Spoken Language Understanding". ICASSP-98, Volume 1, page 237-241, [2] Jeanrenaud et al, "Large Vocabulary Word Scoring as a Basis of Transcript ion Generation", Proc. Eurospeech, pp 2149-2152, 1995. [3] Bansal D. and Ravishankar R., "New features for Confidence Annotation", Proc. ICSLP-98. Paper No. 829. [4] Chase L., "Error Responsive Feedback Feedback Machenisms for Speech Recognizers", Ph.D thesis, Carnegie Mellon University, TR no. CMU-RI-TR-97-18, Apr. 1997.