Although it is very well possible to develop a recognizer in gazillions of different ways, we try to offer no more than a few alternatives in this tutorial. Once it is clear how these work, you should be able to design your own training schedules, and architecture designs.
The following is a list of items with brief descriptions. Each item has a link to a page with a detailed description of the item. Some of the items correspond very closely to development steps that are discussed in the Do-It-Yourself thread. The development schemes overview page also gives you a list of things that can be done, but from a specific point of view. Also, not everything listed here is part of a development process, so we decided to have an extra page for just listing everything that can be done without any particular point of view.
This page shows first-time Janus users how to run a Janus program. It does not describe how to compile the source code, refer to the Janus manual for that.
This page does not describe how to train a recognizer, it only lists up things that are needed and used by a recognizer, with a description of what they are and what they are used for. If you feel unsecure about what a recognizer is and what belongs to it, have a look here. If you know a bit about speech recognition, this will probably bee boring.
Here you will find some info about how to make a raw database usably by Janus. Since there might be very many kinds of databases, we can only cover some of the problems that can occur.
You've just received a tape with a load of speech files. If you are lucky you also got some transcriptions and a pronounciation lexicon. If you are very lucky, you also got some kind of labels. Now you would like to create an initial environment, i.e. the central files that are needed to start up an initial untrained Janus, like dictionary, feature description, database, etc. If so, this is the right place to look.
This page describes what you can do if your data didn't come with any labels and you don't know where to get labels from but you do have a working Janus recognizer from somewhere, which works on a language similar to the one you don't have labels for, and which uses a phoneme set similar to the one you would like to use for your new task.
Before you can start to train a recognizer you must define a preprocessing scheme. If the scheme, that is described under item Initial Environment does not suit you, you will find information about what else can be done, here.
If you don't have labels and you don't want to modify an existing system such that it could write labels, but you believe that you could use some of the acoustic models from the existing system to initialize you new recognizer's weights, then you will find how to do that, here.
This page gives an overview of how Janus computs acoustic scores. It shows the entire process from the sentence down to the Gaussian computation. Understanding this page will help understand the rest of the tutorial.
Compute an LDA matrix, for the preprocessing, also compute counts for each LDA-class (usually i.e. for each codebook), which will can be used later for sample vector extraction.
To create initial codebooks you can use the neural-gas algorithm (a generalization of the k-means, or basic ISODATA algrithm). Because it is usually infeasible to do neural-gas on the entire training data you should extract only some representative vectors and store them in an extra file for every codebook.
After you have extracted a set of representative vectors for a class (usually i.e. codebook), you can compute k mean vectors out of these set to define a codebook. Neural-Gas is just a generalization of k-means.
The initial codebooks might not be good enough to do a good forced alignment. Also, Viterbi-training can take much more time than training along labels. Here, you will find how to run a training along labels (Janus or non-Janus).
Here, the Viterbi (or forward-backward) training is described. This is usually done to improve the quality of the acoustic model or to create label files.
Context dependent acoustic models perform better than context independent models. We call context dependent models 'polyphones'. Here, you'll find how to create a first set of polyphones, and the corresponding acoustic models.