The Sample Set Module

This module implements the object types SampleSet and SampleSetClass. They are used for extracting training samples, dumping them in files, and creating initial codebooks with the k-means or "neural gas" algorithm. It is very similar to the LDA module. Almost all of the LDA object's methods are just the same for the SampleSet object. Sample sets also have classes, and a mapping from senone indices to classes. The main differences to the LDA module are the accumulation fuctions. While the LDA module offers two such function one for each pass, there is only one for the sample sets. Of course the configurable parameters differ, too.

Examples:

This is an example how to write two files, one containing silence samples, and one containing non-silence samples. Of course, you must imagine the rest of the environment
SampleSet sms fs LDAS 48
sms add SIL
sms add NONSIL

foreach sn [sns] { 
  if { $sn == "SIL-m(1)" } { 
     sms map [sns index $sn] -class SIL
  } else {
     sms map [sns index $sn] -class NONSIL
  }
}

smsAlongLabels $trainingUtterancesFile dbase fs pa sns LDAS

sms flush

where the smsAlongLabels procedure looks like this:
proc smsAlongLabels { file db fs pa sns feature } {

  set str [ open $file r ]
  while { [ gets $str utt ] != -1 } {
    puts -nonewline "training utterance $utt "
    set labelFile [computeLabelFile $utt]
    readPath $utt $db $fs $labelFile $pa $sns $feature
    sms accu $pa
  }
}

Further information about the module: