Testing

Now let's see how good our recognizer is. For testing we need most of the objects that are also needed for training. So we can use basically the same startup procedure as before. We can even omit the creation of the path and the HMM object. Therefore we need some more objects for the search, plus we need a vocabulary file and a language model. We can write a vocabulary file that covers the entire training set as follows:

set fp [open vocab w]
foreach w [dict:] { puts $fp $w }
close $fp

The language model file is a bit more complicated. We will not discuss every line of a script that produces a language model for a given database. You can use the ready-to-run script from the scripts thread. The script there is rather simple, it uses an array lm where lm(cnt) is the total count of all words, lm(1,v) contains the occurrence frequency of word v, and lm(2,v,w) contains the bigram frequency of the bigram v w. After the lmUpdate procedure has been called, l(p,1,v) contains the unigram probability of word v, and l(p,2,v,w) contains the bigram probability of the bigram v w.

You can find the complete test script in the script thread. On this page we will explain each step of it.

After you have created a vocabulary and a language model file, let's now build the search objects:

Vocab voc vocab -dictionary dict -acousticModel amo
Lm lm voc langmod -weight 16 -penalty 0
Search search voc lm 
search.treeFwd configure -beamWidth 200 -topN 50 -phoneBeamWidth 200 \
                         -lastPhoneBeamWidth 120 -wordBeamWidth 150  \
                         -lastPhoneAloneBeamWidth 120

Here, we have established a vocabulary object vocab. based on the local file "vocab" the current dictionary and the current AModelSet.

The language model object lm is based on the local file "langmod". The -weights option defines the weighting of the language model vs. the acoustic model. The higher this value is the more emphasis we are putting on the language model. For now you will just have to trust, that the value 16 is some good value. The same is true for the -penalty option. This is the penalty that regulates the number of words in a hypothesis.

Finally we can create the search object, which is based on the two previously created objects voc and lm. The beam widths must be set experimentally as optimal values are highly task dependent.

Now we can start defining our search procedure:

proc testOne { utt } {
  set uttinfo [db get $utt]
  search treeFwd -eval $uttinfo
  puts [search.treeFwd.hypoList puts -id $utt -style simple]
}

The $uttinfo variable is obtained in the same way as we did in the training scripts. The actual trigger for the decoding is the treeFwd method of the search object. This makes the search perform the so called "tree-forward" pass. There are other passes available, but we won't discuss this now. Later, we'll run a multi-pass decoding. The puts command will display the recognized hypothesis.

To test on the entire testset we'd run the following loop:

set fp [open ../step1/testIDs r]
while { [gets $fp utt] != -1 } {
  puts "testing utterance $utt"
  testOne $utt
}
close $fp