Starting Up a First Recognizer

In this step we will check if the creation of the description files went all right. We will start up the newly created environment and have a look at some features, some weights, and Viterbi paths.

The startup script

Let's jump right into it (in the step3 directory):
[FeatureSet fs] setDesc   @../step2/featDesc
            fs  setAccess @../step2/featAccess

[CodebookSet cbs fs]                read ../step2/codebookSet
[DistribSet  dss cbs]               read ../step2/distribSet
[PhonesSet ps]                      read ../step2/phonesSet
[Tags tags]                         read ../step2/tags
[Tree dst ps:phones ps tags dss]    read ../step2/distribTree

SenoneSet sns [DistribStream str dss dst]

[TmSet tms]                         read ../step2/transitionModels
[TopoSet tps sns tms]               read ../step2/topologies
[Tree tpt ps:phones ps tags tps]    read ../step2/topologyTree
You've seen the creation of all these objects before, so there should be nothing new for you except that this time we are not giving life to the objects ourselves by adding all items explicitly. Instead, we just read the previously written description files.

So start a Janus process, and have it execute the above lines. If everything works without problems, then the environment files should be fine.

Loading the Acoustic Parameters

Now let's go for some acoustic parameters (weights). Remember that the given archive contained some generic weights. Fortunately they are accumulators for codebooks with 16 16-dimensional vectors which were trained using the same preprocessing as we are using in our feature description file. If this were not the case we'd have to write a suitable feature description or start with random weights or use some labels and continue at a stage of this tutorial where we already have labels. But let's not complicate things too much, let's just be happy that we can use the generic weights right away.
Well, actually, we have to do some minor modifications. The given weights file has parameters for 16 phonemes (the same as ours) plus for the phonemes SIL and GARBAGE. All have three codebooks, because the recognizer which created the weights used three subphone segments (beginning, mid, and end). So we will just ignore the unneeded segments in the weights file. Remember that we are using the underscore for the silence phone and the plus character for the garbage phone. So if we were simply loading the weights, Janus wouldn't know what to do with the weights for "SIL". Similar problems can occur when you want to initialize the weights for a recognizer on a new language. Your phoneme set will very likely not match the 16 generic phones in this generic weights file. To cope with such problems, Janus offers a set of rewrite rules, where you can define what your system's name for a model's name in the weights file is. Do the following:
RewriteSet rws
rws add SIL-m _-m
rws add GARBAGE-m +-m
cbs configure -rewriteSet rws
Now you have defined a set of two rewriting rules, one will interpret the name "SIL-m" from the weights file as if it were a "_-m", and the other will rename "GARBAGE-m" to "+-m". The configure command tells the codebook set object to use the just created rewrite rules set.

The generic weights file contains codebook accumulators, i.e. the stuff that is collected during training. To get some ready-to-use weights we'll have to do the following:

cbs createAccus
cbs loadAccus ../data/codebookAccus
cbs update
First create an accumulator object for every codebook, then load the generic weights. Then tell the codebook set object to update its parameters according to its accumulators.

After the loading of the weights accumulator file you should get the following message from Janus:

INFO    codebook.c(2706)       54 accumulators were found in the file
INFO    codebook.c(2707)       34 accumulators were loaded
INFO    codebook.c(2708)       34 codebooks were defined
INFO    codebook.c(2709)       20 codebooks were undefined
INFO    codebook.c(2710)        0 codebooks had no accumulator
INFO    codebook.c(2711)        0 refN mismatches occurred
INFO    codebook.c(2712)        0 dimN mismatches occurred
INFO    codebook.c(2713)        0 subN mismatches occurred
This means that there were 54 accumulators in the file (remember, that's 3*(16+2)), 34 of which were loaded. That is fine, because that is the number of codebooks that we have, so all codebooks have been loaded. 20 of the 54 accumulators were not defined (these were the SILENCE-b, SILENCE-e, GARBAGE-b, GARBAGE-e, and the -m segments of the other 16 phonemes). There were no mismatches in the size of the codebooks (refN = number of reference vectors, dimN = number of dimensions, subN = number of accumulators per codebook).

Since the loading of the codebooks accumulators was smooth, we can save the actual weights into a file. In this weights file our phoneme names will be used, and we won't have to use rewrite rules any more:

cbs save codebookWeights

Having a Look at some Features

We've already had a look at some features, so this won't be new to you. We'll just do it again, because this time we can use our database and our feature access rule. Do the following:
[DBase db] open ../step1/db.dat ../step1/db.idx -mode r

fs eval [db get c020i]
featshow fs MSC
It is not much easier than the "fs eval" command we used earlier. But it shows how things should be done. In many cases your scripts will not explicitly load recordings, instead you will just tell some procedures to do so, and you will not want to worry about how the feature set will get its features. It will be enough to define a rule once.

Having a Look at some Viterbi Paths

With a little additional stuff we can even compute our first Viterbi alignment. This is useful to see if the loaded weights are worth anything. If the resulting Viterbi-Path is too edgy, if a few states get most of the speech frames, then the weights are not really useful.

To be able to run a Viterbi alignment, we'll first have to create some more objects, namely a dictionary, an acosutic model set, an HMM, and a path object:

[Dictionary dict ps:phones tags] read ../step1/convertedDict
HMM hmm dict [AModelSet amo tpt ROOT]
Path path
The dictionary object dictshould be self explanatory. The Path object path will be used to hold the Viterbi alignment path, the HMM object hmm will hold the entire HMM topology of an utterance, and the acoustic models set (AModelset) named amo is an object that has little to show, it maintains a collection of ways how phonemes can be modeled, including their topologies and acoustic units (senones).

Once these object are created we can call the following little procedure:

proc viterbi utt {
  set uttInfo [db get $utt]
  makeArray arr $uttInfo
  hmm make $arr(text) -optWord SIL
  return [path viterbi hmm -eval $uttInfo]
}
It accepts one argument utt, which is an utterance ID. It then gets the information about this utterance that is stored in the database object db. The makeArray command makes an array arr out of the list uttInfo. This array has two elements, arr(text) contains the transcription of the utterance, and arr(utt) contains the utterance ID. We could have created a mightier database earlier with more information; then this information would also be part of the array. The "make" command lets the hmm object build all its internal structures for the entire utterance's topology, which consists of three subobjects, a word-graph, a phone-graph, and a state-graph. The hmm is given as an argument to the viterbi method of the path object. The option "-eval $uttInfo", triggers the automatic creation of the needed features for the Viterbi alignment. Don't confuse the viterbi method (which is an internal hard-coded Janus function that can be applied to path objects) with the Tcl viterbi procedure that we've just defined.

With this procedure defined, we can just do the following:

puts [viterbi c0310]
displayLabels path hmm
You can repeat it for other utterances. You can get a list of all utterance IDs by just typing
db
If the displayed Viterbi paths look more or less smooth, then the weights should be usable.