Linear Discrimant Analysis
Linear discriminant analysis is not necessary to build a recognizer.
But it is very helpful in terms of improving the recognition accuracy.
We will not explain the theory behind LDA here, only that much that
LDA is finding a transformation matrix A such that if every
feature vector x is multiplied with A the ratio of the
determinant of the total-scatter matrix and within-scatter matrix is
maximized. The total scatter measures the diversity of all data,
and the within scatter measures the average diversity of the data
that belong to the same class. Thus finding the LDA matrix means
making the data that belong to the same class move a bit closer,
and making the data of different classes move a bit further apart.
If you are interested in more details about the theoretic background
of LDA computation have a look at some good book or papers. On the
rest of this page, we will only address the issue of how LDA matrices
are computed with Janus.
Computing an LDA Matrix
Janus offers the object class LDA. The first part of an LDA
computation is to establish an LDA object and define which acoustic
models belong into which LDA-class. Usually we use one class for each
Gaussian codebook. Generally we find that computing an LDA for a
greater number of classes gives us better performance, so we recompute
an LDA after switching from a context-independent to a context dependent
system. The usual number of codebooks for a context independen system
is three times the number of monophones, because we use three codebooks
per monophone. After making the step to a context-dependent system we
usually end up with thousands of codebooks. After that we usually
compute a new LDA matrix that will discriminate better than the one
that was computed with the context-independent system.
When an LDA object is initialized and the classes are defined, we
can start training the scatter matrices. Maybe the term 'training'
is not the right one, and 'computing' would be a better one, but
considering that the main loop that Janus will do during LDA
computation is exactly the same as when doing regular maximum
likelihood training of its Gaussian mixtures, the term 'training'
fits fine for LDA, too. So all that is done during training is
loading an utterance, getting a path from somewhere (running
Viterbi or loading labels from file), and accumulating the scatter
matrices. When all the training utterances have been processed
we end up with the two mentionned matrices, and the LDA object also
stores the mean vectors for each class, and also the counts, i.e.
the number of feature vectors the belonged to each class.
The computation of the LDA matrix from the scatter matrices is
called 'simultaneous diagonalization', and can be done in Janus
bay calling the corresponding command. At the end, what is left
to do, is just storing the LDA matrix and the counts. We can use
the counts later for extracting sample vectors into separate files
for the k-means or neural-gas algorithm.