Lectures

Exercise 7
Search

Introduction

In the following session we are talking about objects needed for doing the decoding step. Since decoding is a process of picking the best hypothesis among a large number of possible hypotheses, we call this process search.

Search passes

Look at the script secondTest-0.tcl which you can download from here to make your changes on your local copy. This scripts performs three search passes explained below and calculates the word accuray of the recognized hypotheses (cumulative).

The core procedure of the script is testOne which calls:

search treeFwd -eval $uttInfo
You already applied this tree forward pass in earlier sessions for calculating the recognition results. The name of this search pass comes from the tree structure in which the lexicon is organized. This tree structure may causes search errors for the following reason: within a word the dynamic search procedure decides in each frame for the best matching acoustic model, but on a word boundary the language model probability has to be taken into account. The problem with considering the language model is, that the best word candidate is not known at the time the tree root (let's say the root node "a" for all words starting with an a) is entered.
search flatFwd
From the treeFwd pass we get the information which words are likely to be uttered in the sentences. The list of these words is used to perform the flatFwd search pass. This time the lexicon is not organized as a tree but in a linear data structure. The calculation of this pass can be performed with reasonable computational effort since the wordlist is already reduced compared to the wordlist used in the treeFwd.
search lattice
In this pass a lattice graph is produced which represents possible hypotheses
search.lattice rescore -lz $lz -lp $lp
The resulting lattice is rescored using different pairs of lz and lp values, i.e. this pass outputs the hypothesis which gets the highest score given lz,lp. The term lz refers to the weighting factor for the language model compared to the acoustic model and lp is the word transition penalty. This rescoring procedure enables us to try several lz/lp pairs without recalculating the treeFwd and flatFwd again and again. Since the lattice represents only a subset of all possible search hypotheses, it may happen that the lz/lp pair which gave the highest word accuracy in rescoring does not have to be the pair which is the best for the actual recognition (baseLz and baseLp).

Test using Tree, Flat and Lattice pass

Task 21: Run a test on your test set using the script secondTest-0.tcl. To do so create a subdirectory test in your directory steps. You may change some of the files to be loaded in the testscript (i.e. convertedDict). Redirect the recognizer output in order to look at the results without ruffle and to compare it to other test runs:

time janusS secondTest-0.tcl >& logTest-0 &

Possible values for the word transition penalty are positive numbers or negative numbers, therefore we start with set baseLp 0 .

Possible values for the language model weight are positive numbers. The language model penalty should be of the same order as the acoustic score for a word, therefore we start with set baseLz 32 .

If the command time precedes a program, it will output the run-time of this program when finished. Have a look at the man-pages man time during running the test.

Question 11:

What was the run-time of the test?
What is the word accuracy for the tree pass over your test set?
What is the word accuracy for the flat pass over your test set?
Which lz,lp pair in the lz,lp matrix gives the highest accuracy?
What is the highest word accuracy in the lz,lp matrix?

Configuration of the language model parameter `lz` and `lp`

Task 22: The lz,lp parameter found with the lattice pass often fit better than the original used baseLz,baseLp ones. For our data it might be something like lz = 16 and lp = -4. It is likely to get a better lattice if we use those better parameters for the actual decoding pass. In order to do so chose the lz,lp pair that gave the highest accuracy as parameters for baseLz,baseLp. Make a copy of the script secondTest-0.tcl into secondTest-1.tcl and change baseLz and baseLp accordingly. Also change the lzList and lpList in order to rebalance the matrix. Reasonable values for lzList are now {2 4 8 16 32 64 128} and for lpList {-16 -8 -4 0 4}
Start the next test run:

time janusS secondTest-1.tcl >& logTest-1 &

In real life the next step would be to get better lz,lp parameter by testing lz,lpLists of finer granularity. However, this only makes sense if your test set is large enough. If the distance between baseLz and the best lz is very large, you should prefer not to substitute baseLz by lz but choosing a value somewhere in between. The same is true for baseLp and lp.

If you perform the test on a too small test set the lz,lp matrix is not very smooth. For this reason it is not very reasonable to transfer the results to new (unseen) data.

In general, tuning of the lz,lp parameter would be performed on a cross-validation set not on the test set. The reason we are doing it on the test set in this speech lab is that we are working on a very small data set in which test and train are overlapping anyway.

The lattice should contain a large variety of hypotheses, otherwise the parameter tuning would not be sucessful. To see how large your lattice gets, look for the recognizer output.

INFO lattice.c (0708) Lattice 103 nodes 191 links
This line tells you that the lattice contains 103 nodes (with words attached to it) and 191 links between those nodes, which is large enough for the parameter tuning in our case.

The cross-validation set you are choosing for the parameter tuning should be as close as possible to the true test conditions, i.e. that the subset should not overlap with the training set, since recognition of utterances seen in the training is much easier and requires different lzlp parameter.

Question 12:

What was the run-time of the test?
What is the word accuracy for the tree pass over your test set?
What is the word accuracy for the flat pass over your test set?
Which lz,lp pair in the lz,lp matrix gives the highest accuracy?
What is the highest word accuracy in the lz,lp matrix?

Configuration of the pruning beams

In order to build efficient recognition engines Janus provides you with lots of different beams:

-beamWidth: Pruning at transitions within phonemes
-phoneBeamWidth: Pruning at transitions between phonemes
-wordBeamWidth: Pruning at transitions into words
-lastPhoneBeamWidth: Pruning at transitions into the last phoneme
-lastPhoneAloneBeamWidth: Only for the tree pass: Pruning at transitions within the last phoneme of a word (pruning against other last phonemes, not against middle or first phonemes of a word)

To make life easier in this speech lab we set all beams to the same value onebeam.

Task 23: Make another copy of the script secondTest-0.tcl into secondTest-2.tcl and change the values for all beams to half of the former size (500 -> 250). Don't change the topN parameter. Do a third copy secondTest-3.tcl and set all beams to quarter of the former size (500 -> 125), again don't change topN. Start the next test run and log your results:
time janusS secondTest-2.tcl >& logTest-2 & time janusS secondTest-3.tcl >& logTest-3 &

The beam parameter should be as large as possible (reasonable run-time). A good starting point is a beam size which corresponds to an average score of two phonems (12 frames). If your test sentence got a score of 3856 for 244 frames than the score for 12 frames is about 180. In order to pick a beam size which is not too small we start with 500.

The parameter topN gives the maximal number of word ends to be active at the same time. It depends on the size of the vocabulary, an upper bound for topN is about 10% of the size of the vocabulary. Reasonable values lie between 10 and 30 (fast demo system) and 50 to 200 for evaluation systems (offline high performant systems).

If the score of a sentence does not change after reducing the beam and topN parameter by 20-50%, the former values were too large.

If the score of a sentence does change or the hypothesis changes after increasing the beam and topN parameter by about 50%, the former values were too small.

Question 13:

What was the run-time of the tests?
Did the recognizer speed up? Take a look at the recognizers output: End of Tree Forward (xxx sec)
Did the lattice size grow?
Did the scores change? Take a look at the recognizers output: score xxx
Did the performance change?
Did the hypotheses change?
What about the size of the beam? Is 500 to big? Why?
What about the size of the beam? Is 125 to small? Why?
Give some criteria for a good beam size, what is a good beam for your recognizer?

Last modified: Thu Apr 19 22:29:55 EDT 2001
Maintainer: tanja@cs.cmu.edu.

Exercise 7 Search

Search passes

Test using Tree, Flat and Lattice pass

Configuration of the language model parameter lz and lp

Configuration of the pruning beams

Exercise 7
Search

Configuration of the language model parameter `lz` and `lp`