Computing a Second LDA

Now that we have many more codebooks, we can compute another LDA, which will be trained to discriminate this larger number of classes. From experiments we know that the LDA generally works better with greater numbers of classes.

The LDA script that we are running now is very similar to the one we did before. In fact it is even simpler. The startup is a bit different, because we have new architecture description files:

[FeatureSet fs] setDesc   @../step2/featDesc
            fs  setAccess @../step2/featAccess

[CodebookSet cbs fs]               read ../step14/codebookSetClustered
[DistribSet  dss cbs]              read ../step14/distribSetClusteredPruned
[PhonesSet ps]                     read ../step14/phonesSet
[Tags tags]                        read ../step2/tags
[Tree dst ps:phones ps tags dss]   read ../step14/distribTreeClusteredPruned

SenoneSet sns [DistribStream str dss dst]

[TmSet tms]                        read ../step2/transitionModels
[TopoSet tps sns tms]              read ../step2/topologies
[Tree tpt ps:phones ps tags tps]   read ../step2/topologyTree
[Dictionary dict ps:phones tags]   read ../step1/convertedDict
[DBase db]                         open ../step1/db.dat ../step1/db.idx -mode r
[FMatrix ldaMatrix]               bload ../step5/ldaMatrix
AModelSet amo tpt ROOT
HMM hmm dict amo
Path path
dst configure -padPhone [ps:phones index pad]

The actual LDA loop remains the same. This time we can omit the writing of the new feature description file (because it did not change), and we can stop right after writing the model counts file.