main
User Manual
Introduction to JANUS for Users
Overview: Automatic Speech Recognition

Hypothesis


Single Word Recognition

If we consider a single word recognizer the output can either be right or wrong. The word error rate (WER) is then:
WER = wrong_words / total_words


Continuous Recognition

The output of a continuous recognizer can have 3 different types of errors when compared to the reference sentence:
error types
substitutions
insertions
deletions

The total number of errors is the sum of all types. The word error rate (WER) is defined as:

WER = errors / reference_words

In most cases the WER is given in %. The word accuracy (WA) is:

WA = 100% - WER = (reference_words - errors) / reference_words

Note that the WA can be negative if there are more errors than reference words.


Scoring tools

A scoring tool will align the hypothesis of the recognizer with the reference sentence, giving you the number of correct words, errors and reference words. Some might be able to indicate where the errors occured:
REF: HI  HOW are you ** today 
HYP: HEY *** are you IN today 

WORD RECOGNITION PERFORMANCE:
Correct          =  60.0% (     3)
Substitutions    =  20.0% (     1)
Deletions        =  20.0% (     1)
Insertions       =  20.0% (     1)
Errors           =  60.0% (     3)

Ref. words       =      5
Hyp. words       =      5
WORD ACCURACY    =  40.0%


Maintainer: westphal@ira.uka.de