Here we will only show how the most common three pass search works which includes a tree-forward pass, a flat-forward pass, a lattice-pass, and finally the rescoring of the lattice output with different language model parameters.
This time we will also use the configuration options for the search parameters, especially the search beams. The startup for the full fledged search is only a bit different from the simple search we had before:
[FeatureSet fs] setDesc @../step5/featDesc fs setAccess @../step2/featAccess [CodebookSet cbs fs] read ../step14/codebookSetClustered [DistribSet dss cbs] read ../step14/distribSetClusteredPruned [PhonesSet ps] read ../step14/phonesSet [Tags tags] read ../step2/tags [Tree dst ps:phones ps tags dss] read ../step14/distribTreeClusteredPruned SenoneSet sns [DistribStream str dss dst] [TmSet tms] read ../step2/transitionModels [TopoSet tps sns tms] read ../step2/topologies [Tree tpt ps:phones ps tags tps] read ../step2/topologyTree [DBase db] open ../step1/db.dat ../step1/db.idx -mode r [Dictionary dict ps:phones tags] read ../step1/convertedDict [FMatrix ldaMatrix] bload ../step15/ldaMatrix AModelSet amo tpt ROOT dst configure -padPhone [ps:phones index pad] cbs load ../step17/codebookWeights.3 dss load ../step17/distribWeights.3 Search configure -silenceWordPenalty 10 TreeFwd configure -beamWidth 200 -topN 50 -phoneBeamWidth 200 \ -lastPhoneBeamWidth 120 -wordBeamWidth 150 \ -lastPhoneAloneBeamWidth 120 FlatFwd configure -beamWidth 250 -topN 50 -phoneBeamWidth 250 \ -lastPhoneBeamWidth 180 -wordBeamWidth 190 Lattice configure -beamWidth 120 -topN 40 set baseLz 16 set baseLp -16 Vocab voc ../step7/vocab -dictionary dict -acousticModel amo Lm lm voc langmod -weight $baseLz -penalty $baseLp Search search voc lmYou can see that we are creating the same objects, only this time we are configuring some default beams in advance. Every search pass has its own beams, penalties, and other such parameters. Refer to the documentation of the search module for detailed information about what the meanings of the many different beam values are. For now you can just use the above values. They should be fine for our experiments.
As we did before, we will now too, use a Tcl procedure "testOne" for testing one utterance:
proc testOne { utt } { global rr lzList lpList set lzList {8 16 32 64 128} set lpList {-8 -4 0 4 8} set uttinfo [db get $utt] makeArray infoArray $uttinfo search treeFwd -eval $uttinfo set hypo [hypo search.treeFwd.hypoList] puts "==tree==$utt=== [recogRate $infoArray(text) $hypo]" search flatFwd set hypo [hypo search.flatFwd.hypoList] puts "==flat==$utt=== [recogRate $infoArray(text) $hypo]" search lattice foreach lz $lzList { foreach lp $lpList { search.lattice rescore -lz $lz -lp $lp set hypo [hypo search.lattice.hypoList] set rr($lz,$lp) [recogRate $infoArray(text) $hypo] } } reportRecogRate $utt }You can see that we have introduced a few extra procedures. The procedure hypo will return the plain hypothesis without anything atached to it. The procedure recogRate will return the recognition accuracy in percentage points, and the procedure reportRecogRate will print out all the accuracies for the different lattice rescorings.
Here come the procedures' definitions. Let's start with recogRate:
proc recogRate { corr hypo } { set ali [lindex [align $corr $hypo] 9] set errN 0 foreach i $ali { if { $i != "c" } { incr errN } } return [expr 100.0 * (1.0 - $errN.0 / [llength $corr])] }This procedure calls that Janus built-in command align whose return value is a list that contains lots of aligment information, but we are only interested in the 9th list element which itself is a list whose elements are either c (correct), s (substitution), i (insertion), or d (deletion). A hypohtesis that is completely correct will only have c's in that list. Otherwise anything else than a c is counted as an error. All the errors divided by the size of the correct transcription is the error rate, and 100% - error rate is the recognition accuracy, and that is what the procedure returns.
The procedure hypo calls the puts method of the HypoList object and replaces all parentheses and dollar character by nothing:
proc hypo hl { regsub -all {\(|\)|\$} [$hl puts -id "" -style simple] "" hypo return $hypo }The reportRecogRate procedure prints a matrix, containing all the recognition rates stored in the global rr array. It's basically doing only Tcl text processing, so we won't discuss it further. Here it is:
proc reportRecogRate { utt } { global rr lzList lpList puts $utt puts -nonewline "lz\\lp |" foreach lp $lpList { puts -nonewline [format " %4d |" $lp] } puts -nonewline "\n------+" foreach lp $lpList { puts -nonewline "------+" } ; puts "" foreach lz $lzList { puts -nonewline [format "%5s" $lz] ; puts -nonewline " |" foreach lp $lpList { puts -nonewline [format "%6.1f" $rr($lz,$lp)] ; puts -nonewline "|" } puts "" } }The main loop of the test script looks like the one from the previous tests. Additionally we've added a redefinition of the itfOutput procedure to keep Janus from flooding us with information that we don't need at the moment:
set body [info body itfOutput] set head { if {$cmd == "INFO"} return } eval proc itfOutput \{cmd mode text file line major minor\} \{$head \; $body \} foreach utt [db] { puts "testing utterance $utt" testOne $utt }