First LDA

The LDA is the first step for which parallelization could be helpful. There will be another LDA computation at a later stage of the devolpment process, we will use the parallel version later in order to make things easier to understand now. So when you'll have done the entire development scheme, you'll have seen both, sequential and parallel LDA, and you can choose whichever you prefer in you own scripts.

Computing an LDA is not much different from any other kind of training. The main loop will go through all training utterances get an alignment from somewhere (labels or Viterbi) and accumulate the training data. Accumulating could mean distributions and codebooks, or, as in our current case, the LDA data (scatter matrices).

Since we don't have any weights yet, we can't do Viterbi, and since we don't have any Janus-written labels, we'll have to use the available ASCII-labels that came with the data.

So we will address four issues in this development stage:

non-parallel training of the training set
computation of an LDA matrix
training along labels
using ASCII labels

The outline of the script that we're going to develop now will look like this:

  initialize sytem, create objects
  create and initialize LDA object
  foreach sentence in trainingset {
    read the ASCII labels and build a path object from them
    accumulate the data for the scatter matrices
  }
  compute the LDA matrix
  write out matrix and side dishes

Here come the details for each of the outlined steps. We could use the most flexible and portable and universal script, hiding everithing from the user, but for the beginning, let's use something more simple to get a better view of what is happening. Later, we'll introduce more universal scripts.

initialize system, create objects


  PhonesSet phonesSet
  Tags tags
  [FeatureSet fs]                    setDesc @../prepare/featdesc
	      fs                   setAccess @../prepare/feataccess
  [Phones phones]                        read ../prepare/phones 
  [TmSet tms]                            read ../prepare/trans
  [Dictionary dict phones tags]          read ../prepare/dict
  [CodebookSet cbs fs]                   read ../inienv/cbs-desc
  [DistribSet dss cbs]                   read ../inienv/dss-desc
  [Tree dst phones phonesSet tags dss]   read ../inienv/tree-desc
  SenoneSet sns [DistribStream stream dss dst] -phones phones -tags tags
  [TopoSet tps sns tms]                  read ../prepare/topoSet 
  [Tree ttree phones phonesSet tags tps] read ../prepare/topoTree 
  [DBase dbase] open ../prepare/dbase.dat ../prepare/dbase.idx -mode r
  AModelSet amo ttree ROOT
  Path p
  HMM hmm dict amo

We won't explain the purpose of all these objects, have a look at the Janus users' manual for that. You can see, that many objects depend on the existance of other objects which restricts the order of object creation. Although we don't have any sets phonemes defined, and although we don't want to use any tags, we still have to define the corresponding objects and just leave them empty, because other object expect a Tags and a PhonesSet object to exist.

We've used a shortcut programming as it is usually seen in Tcl. The return value of an object creation is the name of the created object, that way we can use this name immediately for the object's read method, and read the corresponding description file. After all the object are created, we can read labels into the path object named p, and tell the yet to be created LDA object to accumulate its stuff.

create and initialize LDA object

For accumulating data, the LDA object will get the labels-filled path and read the senone indices from there. Since not every senone is necessarily a class of it's own, but usually classes correspond to codebooks, we have to tell the lda which senone belongs to which class. Senones are created dynamically, so we can't know in advance which senone index will be assigned to which model. Therefore we'll fill the path-items' senone indices with the coresponding distribution indices. For the initialization of the LDA object this means that we have to tell it which distribution index belongs to which class. In our example we have the same number and names of distributions as codebooks, so we can simply use the distributions as classes. At a later stage, we'll do it differently. For now the following script will do:

  LDA lda fs MSC 16
  foreach ds [dss:] { lda add $ds ; lda map [dss index $ds] -class $ds }

In our example we simply use the 16 mel-scale-coefficients out of which we will compute a few LDA coefficients. For a real system you would probably use a much more sophisticated approach.

looping over all training sentence

If we are not doing any parallel processing then the loop is rather simple:

  set fp [open ../prepare/trainIDs r]
  while { [gets $fp utt] != -1 } {
    readPath $utt dbase fs ../data/labels/$utt p sns MSC
    lda accu p
  }
  close $fp

The body-lines of the loop match the lines of the outline, further above. We will define the "readPath" procedure soon.

build path from ASCII labels

Your label format might differ from the one in the example. In that case either filter you labels such that they look like those in this example or modify the following procedure to suit your label format.

proc readPath { utt db fs file path sns feature } {

  $fs eval [$db get $utt]
  set frameN [$fs:$feature configure -frameN]
  $path make $sns -from 0 -to [expr $frameN - 1 ]

  set str [ open $file r ]
  while { [ gets $str item ] != -1 } {
    set frX [lindex $item 0]      ; # <--- MODIFY THESE LINES IF YOUR FORMAT
    set sn  [lindex $item 1]      ; # <--- DIFFERS FROM THE EXAMPLE'S FORMAT
    $path.itemList($frX) add 1 -senoneX [sns.stream(0).distribSet index $sn]
    $path.itemList($frX).item(0) configure -gamma 1.0
  }
  close $str
}

The readPath procedure fills the given path object with the labels from the named file. The utt-argument is the ID of the utterance, the db-argument is the name of the used database, the fs-argument is the name of the feature set, which is needed to load the feature and together with the feature-argument to find out how many frames there are in the utterance.

compute the LDA matrix

So far we have accumulated the so called scatter matrices, one for the total scatter of all data, and one for the average within-class scatter of all classes. Before we can use these matrices we have to let the LDA object do an 'update', to create the actual matrices according to the accumulated data.

  lda update
  DMatrix A
  DMatrix K
  A simdiag K lda.matrixT lda.matrixW
  [FMatrix B] DMatrix A
  B bsave lda.bmat