Every step in the list below has a link named "Do-It-Yourself" to the corresponding page of the do-it-yourself thread. Also, you will find links into the scripts thread and into the discussion thread.
In this step we prepare our data such that they become Janus-readable. We also create a first feature description file and have a look at some feature. Our dictionary might have to be converted into a Janus-readable format. We create a task-database (which is a Janus-defined object) to describe the tasks utterances. Some of the very basic architecture descriptions are created (HMM-topologies, transition models, list of phones). We define what utterances we want to use for training and which for development testing.
Before we can start training we have to get some more architecture description files (codebook descriptions, distribution descriptions, and a distribution tree). We can let Janus create those, by defining some parameters and calling a procedure from the library.
Of course, you could also train a recognizer without an LDA, but we'd usually do apply an LDA to our features, and thought it should be part of a default training scheme. We'll have more LDA computations at a later stage of the development.