Forced-Alignment Training
One iteratioon of the general training algorithm looks as follows:
for each utterance
1) load and preprocess the utterances features
2) get an alignment path from somewhere
3) whatever has to be trained, let it accumulate
the necessary training information
whatever has to be trained, let it update its
parameters according to the accumulated data
Here step 2) can be either the running of a Viterbi or a
forward-backward alignment, or we can load an aready aligned
path from a file. Whenever we don't have label files, or
when we want to write some, then we have to compute an
alignment path with one of the available alignment algorithms.