The Sample Set Module
This module implements the object types SampleSet and SampleSetClass. They are used for extracting training samples, dumping them in
files, and creating initial codebooks with the k-means or "neural gas" algorithm. It is very similar to the LDA
module. Almost all of the LDA object's methods are just the same for the SampleSet object. Sample sets also have classes, and a mapping
from senone indices to classes. The main differences to the LDA module are the accumulation fuctions. While the LDA module offers two such function one
for each pass, there is only one for the sample sets. Of course the configurable parameters differ, too.
Examples:
This is an example how to write two files, one containing silence samples, and one containing non-silence samples. Of course, you must imagine the rest
of the environment
SampleSet sms fs LDAS 48
sms add SIL
sms add NONSIL
foreach sn [sns] {
if { $sn == "SIL-m(1)" } {
sms map [sns index $sn] -class SIL
} else {
sms map [sns index $sn] -class NONSIL
}
}
smsAlongLabels $trainingUtterancesFile dbase fs pa sns LDAS
sms flush
where the smsAlongLabels procedure looks like this:
proc smsAlongLabels { file db fs pa sns feature } {
set str [ open $file r ]
while { [ gets $str utt ] != -1 } {
puts -nonewline "training utterance $utt "
set labelFile [computeLabelFile $utt]
readPath $utt $db $fs $labelFile $pa $sns $feature
sms accu $pa
}
}
Further information about the module: