On this page you can find links to pages of the do-it-yourself thread. One link for each step of the development. Each of the pages will contain detailed descriptions of what should be done to develop a recognizer. All the used scripts are explained in detail. Step 1 starts right after unpacking the archive. Follow each step in the order they are listed, and you will end up with a working recognizer.
Some of the steps' descriptions are a bit lengthy, especially the first three steps. Don't be scared by this. The intention was to explain many things carefully step by step. After you've made the first couple of steps, you will find that the other steps refer to things that you've already done before and thus their pages will be shorter.
Create a directory where you will conduct all the do-it-yourself experiments. unpack the archive in there. Also create directories named step1, step2, etc. in which you will run the scripts and store the resulting files. Most of the scripts assume that you are actually using this kind of file organization.
map the phonemes of the given dictionary, find out which words are missing, add the missing words, convert the dictionary to Janus-style, create a Janus database for the given task
build a first Janus recognizer environment consisting of description files for the features, codebooks and distributions, a phoneset and tags, transition models and trees for distributions and topologies
fire up a Janus process in the created environment, load the given generic weights, have a look at the weights and the recordings, run a Viterbi alignment and look at the resulting path
use the generic weights for running Viterbi alignments and writing Janus-style label files
compute an LDA transformation matrix based on the labels from the previous step, count the occurrence frequencies of all acoustic models, write a new feature description file for LDA
run a forced alignment training using the Viterbi or the forward backward alogrithm, do parameter accumulation and weights updates, write accumulator and weights files
run a first simple one-pass (tree-forward) decoding using the previously trained weights, create a vocabulary file, look at the found hypotheses and compute the recognition's error rate
extract sample vectors from the recordings into files, load these files and run the k-means algorithm on them to create new codebooks and distributions, write a new codebook description file using LDA features, test the resulting recognizer
run a training along labels using the previously written label files, do multiple iterations on the training data and write the trained weights files, test the resulting recognizer
create a context-dependent environment by computing a polyphone list, write a new distribution description which contains a distribution for every context-dependent model, modify the distribution tree to use the polyphone lists
train a few iterations for the architecture that was created in the previous step, write the trained distribution weights
write a questions file and a new phoneme sets file that contain phoneme classes for the decision tree, cluster the context-dependent models using the previously trained distribution weights, create an initialize new codebook for each of the clusters, write new description files with and without the unclustered polypohones puned off the distribution tree, analyse the resulting decision tree
compute another LDA transformation matrix, this time using the context dependent models as classes to be discriminated
initialize the codebooks of the context dependent clustered models by extracting sample vectors and running the k-means algorithm
train the latest recognizer along labels and evaluate it with a multi-pass test, doing a tree forward pass, flat forward pass and a lattice rescoring pass