The Senones Module - Overview
This module implements the object classes SenoneSet and Senone. A single senone can
not be created from Tcl (wouldn't make much sense anyway). A senone set uses generic
lists to maintain its senones. It it possible to add or delete
senones during runtime. But you better not do this, unless you really know what you are doing, because
many other modules rely on the SenoneSet object which they are using, to remain unchanged.
Besides maintaining a set of senones, there are other quite important tasks performed by the senones
module, namely the computation of HMM emission probabilities, the accumulation of training data, and
the updating (optimization) of a system's acoustic parameters. The senones module hides how these tasks
are performed from the rest of the system. It is also the senones module's job to make sure that the
scores for the current utterance are computable (i.e. inform the feature module to make the needed
features). Since we can think of many very different ways to compute HMM emission probabilities
(e.g. Gaussian mixtures, neural nets, hybrids, etc.), the senone module refers to a generic score
computer. Whenever a new way of computing scores is to be added to JANUS it must conform with
the definition of a score computer.
Please note that we are misusing the therm "senone". When we talk about a senone, we don't always
mean what Mei-Yuh Hwang meant in her PhD thesis. Originally the term "senone" meant a generalized
subtriphone. In JANUS we call all atomic acoustic units senones, even if they are not
generalized (e.g. in context independent systems or in not-clustered context dependent systems).
For us, a senone is the smallest speech unit, for wich we can compute HMM emission probabilities.
What Is a Senone?
A senone is modeled by a set of streams and their corresponding stream weights. I.e. the HMM emission
probability for a senone and a given frame is the weighted sum of outputs from any number of streams.
(If you consider the output of a stream to be a log probability, then a weighted sum of logprobs
with multiplicative weights becomes a weighted product of probabilities with exponential weights.)
The internal representation of a senone, thus, consists of an array of stream identifiers and an
equally sized array of stream weights, and an equally sized array of class indices. When a score
is computed, the class index of a stream is given to the stream's score computer. The return value,
then, is what we called the stream's output. When we are using Gaussian mixtures, then a class
index would be the index of a distribution. If we were using
a neural net, then this index would probably be the index of an output node of the net, or some
subnet identifier.
Usage of the Senones Module
This paragraph does not give you details about the Tcl methods and their syntax. Instead it should
explain what can be done with senones and what this is good for. Look at the links at the end of
this page to get more details about Tcl, syntax, scripts etc.
Creating a SenoneSet
Usually you will create a SenoneSet object from Tcl somewhere in your training or
testing script. Then you will read a senones description file to fill the contents of the
SenoneSet object. After that it should be usable. When creating a SenoneSet,
you must also supply a score computer for every stream. In the case of Gaussian mixtures
you would use a DistribSet object as a score computer. Of course, you can use the
same score computer for different streams.
Preparing for Score Computation
It is the senones module's job to prepare the system for a new utterance. Although it is possible
for the user to inform the feature module manually, about a
new utterance, in order to trigger the computation and creation (preprocessing) of the needed
features, this is not recommended for two reasons: a) The user must know how to trigger the
feature computation, and b) there might be many FeatureSet objects, which could
have been exchanged by each other, and the user does neither know which FeatureSet
is to be used, nor wich Feature subobjects are needed. Making the feature module compute
not needed features can not only waste CPU time and memory, but also cause unwanted behaviour.
You can do it nevertheless, if you want to do something special, like if you want the feature
module to ignore the feature-file information from the task's database, and you want to force
it to use some other file. Other similar situations do exist. Especially when you want to play
around with preprocessing thechniques or just look at some features, then you will prefer
triggering the feature computation yourself, manually.
Therefore there is a function (snsFeatEval()) and a method (featEval) in the
senones module to which you can give an utterance's entry from the task database. This entry will be
passed to all the FeatueSet objects that could be involved in score computation, together with
a list of their respective Feature subobjects that will be accessed during score computation.
After this, HMM emission probabilities can be computed. So, calling snsFeatEval can be
interpreted as: "please prepare to compute scores for the utterance with the given description".
Accumulating Training Data
The accumulation of training data is done by calling a SenoneSet method (accu)
giving the path that was created by a forced alignment (or from labels) as the argument. The senones
module will then call all of the used score computers for each of the path's cells. It doesn't care
about what the score computers will actually do. A Gaussian mixture score computer will accumulate
counts and sums of squares and means etc., a neural net will compute an error function and accumulate
backprop data. The accumulation is proportional to a training factor. This factor itself is a product
of three factors: a) a user-supplied training factor (+1.0 for regular training, -1.0 for negative
or corrective training), b) the gamma-value from the alignment path (which is always 1.0 in Viterbi
paths), and c) the stream weight (which is effective only if the same class of a score computer is
used in different streams).
Updating Acoustic Parameters
Well, that's easy. Just call the SenoneSet's update method. The senones module
will then inform all involved score computers about the update command. After that it doesn't
care what the score computers are actually doing. All this hiding of score computer functions
by the senones module is there to allow uniform training scripts for any kind of score computer.
The JANUS designers' goal was to be able to plug in or out any score computer at will,
without having to modify anything else in the system (well, yes, you still have to inform the
senones module about them).
Definition of a Score Computer
A score computer doesn't actually exist. There is no module or object with such a name.
There might be distributions, or backprop nets, or whatever, but not a mere score computer.
Any object class which should be usable as a score computer must be defined as a structure
with a number of exactly specified fields from the top. These fields contain function
pointers for computing a score, accumulating training data, updating parameters, returning
a list of used features, returning the number of currently available frames, and others.
Have a look at the C source code documentation for further
details.
Further information about the module: