Exercise 5
Training

Introduction

In todays session we will train a recognizer using Janus. So far you learnt a lot about the concept of objects, sub-objects and their methods. Now you get familiar with even more objects, namely the ones you need to train a (contextindependent) recognizer. The term training refers to the technique of using the training database (speech and transcripts) to change the acoustic parameters in order to gain a better match to these training data. The training of a recognition engine with Janus is basically done in 4 steps: first creating labels, second calculating a LDA matrix, third creating codebooks using the k-means algo, and last but not least apply EM-training along the produced labels. The step of writing labels could be skipped but the benefit is that once we do have these labels, the subsequent training iterations can be performed much faster.

Writing Labels

Task 13: Use one of the start-up scripts from the last session to start-up Janus. Create in directory step4 a subdirectory labels. From the last session you already know how the Viterbi-Alignment works. Use the start-up script to produce a Viterbi for all training utterances. Store these Labels for each utterance named by its utterance ID in the sub-directory labels. To store the labels use the method bsave of the oject class Path. For more details look into Janus Tutorial Step4 .

Question 10: Think about which objects are necessary for the training with Janus. Then answer the following questions:

  1. What is the benefit of writing Labels?

  2. Why is it beneficial to reduce the dimension of the feature space after calculating a LDA?

  3. Why you think that the vectors (Gaussians) of one codebook are similar to each other? (Do you?) Does the LDA increase or descrease this similarity?

  4. Discuss the benefits of the K-means approach versus using weights from other recognizers.

  5. If you train along the labels for two iterations, the weights are changing from iteration 1 to 2 even if the labels remain the same. Why?

  6. How should the average score of all training utterances after n iterations looks like compared to iteration n+1 ?

  7. Is it possible that the score of a training utterance get worse after training along labels? What if the training does not happen along labels but with Forward-Backward algorithm?

  8. Under which circumstances is it possible that the error rate increase after training?

Linear Discriminant Analysis (LDA)

Task 14: Read Janus Tutorial Step5 and perform the described steps. Look at the Eigenvalue matrix. Create a feature in the FeatureSet and assign the LDA-matrix to this feature. Look at the LDA-matrix using the grey scale mode. What do you see? (Well, I know it is a nice picture with light grey and dark grey dots, but what else :-)

Create Codebooks using K-means

Task 15: Read Janus Tutorial Step8 and follow the instructions. The resulting Sample-Files are of type FMatrix. You can load them with the method bload. Look at some of the Sample-Files and check whether the containing vectors are similar to each other. You can look at the resulting codebooks using showDSS.

Use the test script from last session and evaluate the brand new weights by calculating the word error rate.

EM-Training along Labels

Task 16:Read Janus Tutorial Step9 and follow the instructions. Run the commandos in directory step9. For our purposes here it is okay if you only train one single iteration. Evaluate the weights from this iteration by running your test script.

Last modified: Fri Apr 6 00:20:13 EDT 2001
Maintainer: tanja@cs.cmu.edu.