Lectures

Exercise 4
Acoustic Modeling

Introduction

Folks, today you are going to build the first Janus speech recognizer! It will be a very small recognition engine with simple context independent acoustic models, which we will borrow from another engine. And for now we won't be able to have a real live demo with speaking into it and so on... But, hey - it's a start! The goal of this exercise is to get familiar with all the Janus objects you need in order to get the recognizer running and to learn more about the Janus-script language.

Repeat the formular for the multinominial multivariate Gaussian distribution from the Janus Tutorial .

Task 11 Create a small codebook, mixture weights, and a reference vector. Get the value of the multinomial distribution of this reference vector by following the instructions below:

Define a 2x1 matrix m (i.e. a column vector) which contains the reference vector to be used later
Create a FeatureSet fs containing the Feature f that is initialized with the reference vector
Create a CodebookSet cbs and a DistribSet dss. Use the add-Methode of the CodebookSet to add a codebook cb which is defined over the feature f. The codebook contains two vectors with diagonal covariances. (see cbs add -help to get help with the usage of the add-method). Create a distribution ds in the DistribSet, which is defined over the Codebook cb.
Create a 2x2 matrix c, so that the columns of this matrix contains the vectors of your codebook. Assign them to the codebook. Keep the default covariance matrices which are set to E. Look at the distribution and the codebook and verify that its content is as expected.
Use pencil and paper to calculate the logarithm probablility of reference m given the above defined Gaussian mixture. (This log prob is known as the HMM emission probability).
Use the Janus method score of the DistribSet object to calculate the probability. Compare both values. Not the same? Go to step 1 :-)

Question 8 Reconsider the most important Janus objects, their purposes, their relation and dependencies and then answer the following questions:

Which objects have to be created before one can create a DistributionSet object?
Which classes of objects use the subobject AModelSet?
Is it possible that a CodebookSet object contains two Codebook subobjects, where the subobjects are defined over different feature spaces?
How can a single Distrib object be created without creating the DistribSet object?
Which classes of objects can use ModelSet objects?
What is the difference between a topology tree and a distribution tree?
Why is it not possible to define a tree over codebooks?

Typing the name of an object followed by a . gives you all subobjects as output. Typing the name of an object followed by a : gives you all names of the subobjects.

Setup the environment

You will now setup your trainings and test environment and arrange your files and data you already prepared in the last sessions. For information about the database we are using see also Janus Tutorial Step1 .

If not already done, create a directory step1 in your steps directory. Change to step1. Copy the Janus-readable dictionary, the Janus-database, the lists of training and the test set into this directory. Your dictionary should be renamed to convertedDict, the database should be named db.dat and db.idx. The list of the test utterances should have the name testIDs.
Follow Janus Tutorial Step2 to set up your initial Janus environment.
Look at the initCI.tcl script, discuss and interpret the functionality of this script with the other members of our speech lab group.
Create the directory step2, download the script initCI.tcl here and run it. Look at the files which have been created by the script.
Use pencil and paper to draw your interpretation of the topology tree.

Intialize weights / Run first Viterbi

Read Janus Tutorial Step3 and follow the instructions. Run the commands interactively in your directory step3. Try out some alternative parameters, apply the Viterbi to different utterances and get familiar with this procedere.

To avoid to type all the commands again and again and again, it is very useful to write a startup script . Instead of typing the lines or cut and paste them you now can source this script by typing in Janus %source startup.tcl.

Get recognition results

In order to test the resulting recognizer we do need a Language Model and a vocabulary.

Task 12: Reuse your script countPairs.tcl from Session 1 which produces word pairs. Modify your script such that it produces a language model according to the NIST specifications (see below). The language model consists of one entry for each word (unigram) and word pair (bigram) together with a notation of the probability of the word/word pair in the training corpus.

Question 9 Look into the resulting language model file and find the bigram with the highest propability. Which one is it? Is it reasonable? NIST language Model:

        comments

        \data\
        \ngram 1=Number of Unigrams
        \ngram 2=Number of Bigrams

        \1-grams:
        log(p(word)) word -99.9
        ...
        log(p(word)) word -99.9

        \2-grams:
        log(p(word2|word1)) word1 word2
        ...
        log(p(word2|word1)) word2 word1

        \end\

Assuming we do have a corpus of the following three sentences

<s> B C A </s>
<s> A A B </s>
<s> C A A </s>

Then a NIST language model could look like:

\data\
ngram 1=5
ngram 2=9

\1-grams:
-1.18045354881 </s> -99.9
-0.700418578079 <s> -99.9
-0.477989694795 A -99.9
-0.87723631318 B -99.9
-0.87723631318 C -99.9

\2-grams:
-0.481485034036 <s> A
-0.481485034036 <s> B
-0.481485034036 <s> C
-0.400116075245 A </s>
-0.400116075245 A A
-0.703333310875 A B
-0.305394150245 B </s>
-0.305394150245 B C
-0.00217691461508 C A
\end\

The NIST format allows many different kinds of language models. In our experiments we are using a very simple one. You can think of others like trigram language models using different backoff schemes, so that the back-off factors can be different from the above used -99.9 ... and much more other fancy stuff.

Read Janus Tutorial Step7 and do the following things:

Create the directory step7.
Create a vocabulary file as described in the tutorial.
Run the described commands to create the search objects.
Test all sentences of your testIDs. Use a startup script to load the recognizer!
Write a Janus script to test all test sentences and use your align.tcl from Session2 to calculate the word error rate of your test set. Remove the characters (, $ and ) before you calculate the distance between hypotheses and references, since these are silence markers which should not be considered as recognition errors.

Last modified: Fri Mar 16 00:47:50 EST 2001
Maintainer: tanja@cs.cmu.edu.

Exercise 4 Acoustic Modeling

Exercise 4
Acoustic Modeling