Janus 3 Tutorial - Discussion Thread

On this page you'll find a list of things you can do with the Janus speech recognizer. The items are listed as they would most likely occur chronologically if you were developing a recognizer. You can use this turorial just for reading or as a reference for looking up details about things you already know. You can also 'live' through the development of a real recognizer, by reading the tutorial and simultaneously reproducing everything that is described here.

Although it is very well possible to develop a recognizer in gazillions of different ways, we try to offer no more than a few alternatives in this tutorial. Once it is clear how these work, you should be able to design your own training schedules, and architecture designs.

The following is a list of items with brief descriptions. Each item has a link to a page with a detailed description of the item. Some of the items correspond very closely to development steps that are discussed in the Do-It-Yourself thread. The development schemes overview page also gives you a list of things that can be done, but from a specific point of view. Also, not everything listed here is part of a development process, so we decided to have an extra page for just listing everything that can be done without any particular point of view.

Starting Janus
This page shows first-time Janus users how to run a Janus program. It does not describe how to compile the source code, refer to the Janus manual for that.
What Does Janus Need
This page does not describe how to train a recognizer, it only lists up things that are needed and used by a recognizer, with a description of what they are and what they are used for. If you feel unsecure about what a recognizer is and what belongs to it, have a look here. If you know a bit about speech recognition, this will probably bee boring.
Preparing the Database
Here you will find some info about how to make a raw database usably by Janus. Since there might be very many kinds of databases, we can only cover some of the problems that can occur.
Initial Environment
You've just received a tape with a load of speech files. If you are lucky you also got some transcriptions and a pronounciation lexicon. If you are very lucky, you also got some kind of labels. Now you would like to create an initial environment, i.e. the central files that are needed to start up an initial untrained Janus, like dictionary, feature description, database, etc. If so, this is the right place to look.
Initial Labels
This page describes what you can do if your data didn't come with any labels and you don't know where to get labels from but you do have a working Janus recognizer from somewhere, which works on a language similar to the one you don't have labels for, and which uses a phoneme set similar to the one you would like to use for your new task.
Signal Preprocessing
Before you can start to train a recognizer you must define a preprocessing scheme. If the scheme, that is described under item Initial Environment does not suit you, you will find information about what else can be done, here.
Initializing Weights
If you don't have labels and you don't want to modify an existing system such that it could write labels, but you believe that you could use some of the acoustic models from the existing system to initialize you new recognizer's weights, then you will find how to do that, here.
Computing Scores
This page gives an overview of how Janus computs acoustic scores. It shows the entire process from the sentence down to the Gaussian computation. Understanding this page will help understand the rest of the tutorial.
LDA
Compute an LDA matrix, for the preprocessing, also compute counts for each LDA-class (usually i.e. for each codebook), which will can be used later for sample vector extraction.
Sample Extraction
To create initial codebooks you can use the neural-gas algorithm (a generalization of the k-means, or basic ISODATA algrithm). Because it is usually infeasible to do neural-gas on the entire training data you should extract only some representative vectors and store them in an extra file for every codebook.
K-means (Neural-Gas)
After you have extracted a set of representative vectors for a class (usually i.e. codebook), you can compute k mean vectors out of these set to define a codebook. Neural-Gas is just a generalization of k-means.
Training Along Labels
The initial codebooks might not be good enough to do a good forced alignment. Also, Viterbi-training can take much more time than training along labels. Here, you will find how to run a training along labels (Janus or non-Janus).
Forced Alignment Training
Here, the Viterbi (or forward-backward) training is described. This is usually done to improve the quality of the acoustic model or to create label files.
Polyphones
Context dependent acoustic models perform better than context independent models. We call context dependent models 'polyphones'. Here, you'll find how to create a first set of polyphones, and the corresponding acoustic models.