Running the Multi-Pass Decoder

So far we've run only rather simple test sessions by doing only the tree-forward pass. Besides the tree-forward pass, Janus offers a couple of more passes. On this page we will not discuss the meaning of of these. See the Janus documentation or the discussion pages for more details.

Here we will only show how the most common three pass search works which includes a tree-forward pass, a flat-forward pass, a lattice-pass, and finally the rescoring of the lattice output with different language model parameters.

This time we will also use the configuration options for the search parameters, especially the search beams. The startup for the full fledged search is only a bit different from the simple search we had before:

[FeatureSet fs] setDesc   @../step5/featDesc
            fs  setAccess @../step2/featAccess

[CodebookSet cbs fs]                read ../step14/codebookSetClustered
[DistribSet  dss cbs]               read ../step14/distribSetClusteredPruned
[PhonesSet ps]                      read ../step14/phonesSet
[Tags tags]                         read ../step2/tags
[Tree dst ps:phones ps tags dss]    read ../step14/distribTreeClusteredPruned

SenoneSet sns [DistribStream str dss dst]

[TmSet tms]                         read ../step2/transitionModels
[TopoSet tps sns tms]               read ../step2/topologies
[Tree tpt ps:phones ps tags tps]    read ../step2/topologyTree
[DBase db]                          open ../step1/db.dat ../step1/db.idx -mode r
[Dictionary dict ps:phones tags]    read ../step1/convertedDict
[FMatrix ldaMatrix]                bload ../step15/ldaMatrix

AModelSet amo tpt ROOT
dst configure -padPhone [ps:phones index pad]

cbs load ../step17/codebookWeights.3
dss load ../step17/distribWeights.3

Search configure  -silenceWordPenalty 10
TreeFwd configure -beamWidth 200 -topN 50 -phoneBeamWidth 200 \
                  -lastPhoneBeamWidth 120 -wordBeamWidth 150  \
                  -lastPhoneAloneBeamWidth 120
FlatFwd configure -beamWidth 250 -topN 50 -phoneBeamWidth 250 \
                  -lastPhoneBeamWidth 180 -wordBeamWidth 190
Lattice configure -beamWidth 120 -topN 40

set baseLz   16
set baseLp  -16

Vocab voc ../step7/vocab -dictionary dict -acousticModel amo
Lm lm voc langmod -weight $baseLz -penalty $baseLp
Search search voc lm

You can see that we are creating the same objects, only this time we are configuring some default beams in advance. Every search pass has its own beams, penalties, and other such parameters. Refer to the documentation of the search module for detailed information about what the meanings of the many different beam values are. For now you can just use the above values. They should be fine for our experiments.

As we did before, we will now too, use a Tcl procedure "testOne" for testing one utterance:

proc testOne { utt } {

  global rr lzList lpList

  set lzList {8 16 32 64 128}
  set lpList {-8 -4 0 4 8}

  set uttinfo [db get $utt]
  makeArray infoArray $uttinfo
  search treeFwd -eval $uttinfo
  set hypo [hypo search.treeFwd.hypoList]
  puts "==tree==$utt=== [recogRate $infoArray(text) $hypo]"
  search flatFwd
  set hypo [hypo search.flatFwd.hypoList]
  puts "==flat==$utt=== [recogRate $infoArray(text) $hypo]"
  search lattice
  foreach lz $lzList { foreach lp $lpList {
    search.lattice rescore -lz $lz -lp $lp
    set hypo [hypo search.lattice.hypoList]
    set rr($lz,$lp) [recogRate $infoArray(text) $hypo]    
  } }
  reportRecogRate $utt
}

You can see that we have introduced a few extra procedures. The procedure hypo will return the plain hypothesis without anything atached to it. The procedure recogRate will return the recognition accuracy in percentage points, and the procedure reportRecogRate will print out all the accuracies for the different lattice rescorings.

Here come the procedures' definitions. Let's start with recogRate:

proc recogRate { corr hypo } {
  set ali [lindex [align $corr $hypo] 9]
  set errN 0
  foreach i $ali { if { $i != "c" } { incr errN } }
  return [expr 100.0 * (1.0 - $errN.0 / [llength $corr])]
}

This procedure calls that Janus built-in command align whose return value is a list that contains lots of aligment information, but we are only interested in the 9th list element which itself is a list whose elements are either c (correct), s (substitution), i (insertion), or d (deletion). A hypohtesis that is completely correct will only have c's in that list. Otherwise anything else than a c is counted as an error. All the errors divided by the size of the correct transcription is the error rate, and 100% - error rate is the recognition accuracy, and that is what the procedure returns.

The procedure hypo calls the puts method of the HypoList object and replaces all parentheses and dollar character by nothing:

proc hypo hl {
  regsub -all {\(|\)|\$} [$hl puts -id "" -style simple] "" hypo
  return $hypo 
}

The reportRecogRate procedure prints a matrix, containing all the recognition rates stored in the global rr array. It's basically doing only Tcl text processing, so we won't discuss it further. Here it is:

proc reportRecogRate { utt } {

  global rr lzList lpList
  
  puts $utt
  puts -nonewline "lz\\lp |"
  foreach lp $lpList { puts -nonewline [format " %4d |" $lp] }
  puts -nonewline "\n------+"
  foreach lp $lpList { puts -nonewline "------+" } ; puts ""
  foreach lz $lzList { 
    puts -nonewline [format "%5s" $lz] ; puts -nonewline " |"
    foreach lp $lpList {
        puts -nonewline [format "%6.1f" $rr($lz,$lp)] ; puts -nonewline "|"
    }
    puts ""
  }
}

The main loop of the test script looks like the one from the previous tests. Additionally we've added a redefinition of the itfOutput procedure to keep Janus from flooding us with information that we don't need at the moment:

set body [info body itfOutput]
set head { if {$cmd == "INFO"} return }
eval proc itfOutput \{cmd mode text file line major minor\} \{$head \; $body \}

foreach utt [db] {
  puts "testing utterance $utt"
  testOne $utt
}

Some Experiments

Run a few test experiments, using different values for the search beams. See if you can make Janus runs slower when using larger beams. See if you can make the recognition accuracy deteriorate when using smaller beams. See, what happens, if you use other lz and lp values.