Janus 3 Tutorial - Do-It-Yourself Thread

If you decide to reproduce some development steps yourself in your own environment, then you should create yourself an empty directory, unpack the "data.tar.gz" archive in there, and follow the steps described in this tutorial. The archive contains everything that you need to follow the steps of this tutorial. There is a tiny speech database (subset of WSJ), a dictionary for most of the used words, as can be found on many publicly accessible places, the transcriptions of the recordings, a generic weights-file containing acoustic parameters that were trained previously with Janus, and a phoneme-names mapping definition, that maps the phonemes in the given dictionary to the phonemes that we want to use in our do-it-yourself system.

On this page you can find links to pages of the do-it-yourself thread. One link for each step of the development. Each of the pages will contain detailed descriptions of what should be done to develop a recognizer. All the used scripts are explained in detail. Step 1 starts right after unpacking the archive. Follow each step in the order they are listed, and you will end up with a working recognizer.

Some of the steps' descriptions are a bit lengthy, especially the first three steps. Don't be scared by this. The intention was to explain many things carefully step by step. After you've made the first couple of steps, you will find that the other steps refer to things that you've already done before and thus their pages will be shorter.

Create a directory where you will conduct all the do-it-yourself experiments. unpack the archive in there. Also create directories named step1, step2, etc. in which you will run the scripts and store the resulting files. Most of the scripts assume that you are actually using this kind of file organization.

step 1
map the phonemes of the given dictionary, find out which words are missing, add the missing words, convert the dictionary to Janus-style, create a Janus database for the given task
step 2
build a first Janus recognizer environment consisting of description files for the features, codebooks and distributions, a phoneset and tags, transition models and trees for distributions and topologies
step 3
fire up a Janus process in the created environment, load the given generic weights, have a look at the weights and the recordings, run a Viterbi alignment and look at the resulting path
step 4
use the generic weights for running Viterbi alignments and writing Janus-style label files
step 5
compute an LDA transformation matrix based on the labels from the previous step, count the occurrence frequencies of all acoustic models, write a new feature description file for LDA
step 6
run a forced alignment training using the Viterbi or the forward backward alogrithm, do parameter accumulation and weights updates, write accumulator and weights files
step 7
run a first simple one-pass (tree-forward) decoding using the previously trained weights, create a vocabulary file, look at the found hypotheses and compute the recognition's error rate
step 8
extract sample vectors from the recordings into files, load these files and run the k-means algorithm on them to create new codebooks and distributions, write a new codebook description file using LDA features, test the resulting recognizer
step 9
run a training along labels using the previously written label files, do multiple iterations on the training data and write the trained weights files, test the resulting recognizer
step 10
create a context-dependent environment by computing a polyphone list, write a new distribution description which contains a distribution for every context-dependent model, modify the distribution tree to use the polyphone lists
step 11
train a few iterations for the architecture that was created in the previous step, write the trained distribution weights
step 12
write a questions file and a new phoneme sets file that contain phoneme classes for the decision tree, cluster the context-dependent models using the previously trained distribution weights, create an initialize new codebook for each of the clusters, write new description files with and without the unclustered polypohones puned off the distribution tree, analyse the resulting decision tree
step 13
compute another LDA transformation matrix, this time using the context dependent models as classes to be discriminated
step 14
initialize the codebooks of the context dependent clustered models by extracting sample vectors and running the k-means algorithm
step 15
train the latest recognizer along labels and evaluate it with a multi-pass test, doing a tree forward pass, flat forward pass and a lattice rescoring pass