Training Context-Dependent Distributions

The polyphone collection process can easily produce hundreds of thousands, even millions of polyphones. It is obviously not feasible to use a fully continuous HMM model for each of them. Eventually we will want to use continuous density HMMs but for a smaller number of models. It is, however, possible to train hundreds of thousands of mixture weights distributions, such that they can be clustered into fewer afterwards. This is what we are going to do in this step. The complete script can be found as usual in the scripts thread.

The startup looks a bit different from the starups that we had so far, because we now have to incorporate the ptrees:

[FeatureSet fs] setDesc   @../step5/featDesc
            fs  setAccess @../step2/featAccess

[CodebookSet cbs fs]                read ../step8/codebookSet
[DistribSet  dss cbs]               read ../step10/distribSet
[PhonesSet ps]                      read ../step2/phonesSet
[Tags tags]                         read ../step2/tags
Tree dst ps:phones ps tags dss 
     dst.ptreeSet                   read ../step10/ptreeSet
     dst                            read ../step10/distribTree

SenoneSet sns [DistribStream str dss dst]

[TmSet tms]                         read ../step2/transitionModels
[TopoSet tps sns tms]               read ../step2/topologies
[Tree tpt ps:phones ps tags tps]    read ../step2/topologyTree
[DBase db]                          open ../step1/db.dat ../step1/db.idx -mode r
[Dictionary dict ps:phones tags]    read ../step1/convertedDict
[FMatrix ldaMatrix]                bload ../step5/ldaMatrix

AModelSet amo tpt ROOT
HMM hmm dict amo
Path path
We now load the last codebook weights. Remember that we don't have any distribution weights yet. We could load the context independent distribution weights and initialize every context dependent distribution with its corresponding context-independent distribution, but experiments have shown that this is not necessary. It is fine to not load any distribution weights and thus start with equally distributed values (i.e. every distribution value will be 1/16). Besides, the only reason why we are training these distributions is to cluster them later into fewer which will have to be trained anew, anyway:
cbs load ../step9/codebookWeights.3
cbs createAccus
dss createAccus
We use the same Tcl procedure "forcedAlignment" that we used last time when training along labels and use a "regular" training loop: