Janus 3 Tutorial - Beginner's Introduction Page

What Is Janus 3

Janus is a speech-to-speech translation research project. Speech translation consists of multiple parts (recognition, analysis, generation, and synthesis). We, from the speech recognition crew, use the term "Janus" also for naming the recognition engine. So the name Janus stands for both, the entire speech-to-speech translation project, and for the recognizer that is part of the project. In this documentation, we don't address anything that is beyond recognition, so here we use Janus only to name the recognizer.

Janus is an HMM-based speech recognizer. Usually, in the literature, you will find that many authors assume that HMM recognizers use Gaussian mixtures for computing their emission probability, which has led to terms like "semi-continuous HMMs". Actually, there are no semi-continuous HMMs, only maybe semi-continuous emission probability computers, because, by definition HMMs don't really care where they get their emission probabilities from. These probabilities can even come from neural nets (such systems are often called hybrids). In Janus it is genrally possible to use any kind of emission probability computer, Gaussian mixtures as well as neural nets or anything else that can be plugged in. We can safely say, that every working continuous speech recognizer in the world is HMM-based. And by allowing any kind of emission probabilty computation, Janus can mimic almost every other recognizer type in the world.

Janus 3 is more than just a recognizer. It is a toolkit for doing speech recognition research. It offers a powerful programming language that allows the user to manipulate the innermost datastructures as well as to design very high level procedures that do very complicated things. Although Janus was developed mainly by speech researchers, it has been successfully used for other tasks like handwriting recognition, too. In various national and international speech recognition contest, Janus has shown to be among the best, so we can safely call Janus "state-of-the-art".

The Design of Janus 3

The original purpose of Janus 3 was to create a powerful tool for speech recognition research. This means, that in many cases flexibility, portability, and expandability was considered more important than speed or memory consumption. Of course, speed and memory are also an issue of research. Better results can be achieved by doing more experiments which is easier if the time per experiment is small. Also some experiments eat up that much space (RAM and disks) that space becomes an issue when these experiments can't be done at all because of processes thrashing, and disks filling up. Although Janus's primary desing goal was not efficiency, there are many ways how to speed it up and actually build a near-realtime demo system with only small losses in recognition accuracy. Before Janus 3 was written, the Janus crew made a lot of experience with its predecessor Janus 2, the ideas that were collected during this time were incoporated into Janus 3. Here's a list of the design goals:

Flexibility: Janus should be able to run many different speech recognizer architectures, different kinds of HMMs (discrete, semi-continuous, continuous), different kinds of emmission probability computation (Gaussian mixtures, neural nets), arbitrary sets of acoustic models (polyphones, clustering, subphones, phone-tags). Even other HMM-recognizer tasks than speech should be double, like e.g. handwriting recognition.
Factory-Style: It should be possible to build a simple recognizer using a predefined development scheme by declaring the architecture and pushing a single button. This means all the development steps should fit into a single script that can run from the very beginning to the very end withoud any user interaction.
Foolproofness: When doing research, people often make mistakes that can cost them a lot of time. Sometimes it is possible to work weeks or months without noticing a bug that could make all the results worthless. Janus 3 should offer many ways to make sure that everything is doing fine, and it should allow the user to control many of its functions.
Easy Usage: It should be possible for the user to train a recognizer from different degrees of insight. The simplest way would be the one-button factory-style recognizer. If a user would like to do some non-standard things, or perform some experiments, then this should be easily possible, as well as accessing and manipulating the deepest and most nested datastructures. To do all this, a scripting language is necessary, which allow close interaction with C and which allows high level constructs. We chose Tcl for that purpose.
Appearance: Janus 3 should "look nice". The Tcl/Tk language allows easy programming of graphical user interfaces. This makes looking at datastructures and controling the recognizer easy.
Expandability: Janus 3 is built very modular, such that it is easy to add new modules, and to use existing modules to build new ones upon them. Also, the object-oriented Janus-programming-language (which is the Tcl language augmented by the Janus-defined features) allows to design arbitrary recognizer architectures easily.

The Janus 3 Programming Language

This programming language is pure Tcl/Tk augmented by some additional features. These addons are object oriented. I.e., in Janus, there are objects (which belong to some class) and things you can do with them (the class's methods). The programming paradigm is not like in many command-oriented languages:

        commandName parameter1 parameter2 ... parameterN

but rather

        parameter1 commandName parameter2 ... parameterN

where parameter1 would be the name of an object. Say, you have an object called codebooks which belongs to the class CodebookSet, then you would save its contents into a file by entering the command:

        codebooks save fileName

This works, because every top-level-visible object that is created in Janus is declared to Tcl as a new command. Whenever Tcl encounters such a new command it lets Janus do whatever has to be done. One advantage of this approach is that frequently occurring "methods", like e.g. saving and loading, can allways be called the same way. So there's no need for functions like saveCodebooks, saveDistributions, or saveNeuralNet, etc., instead all these different functions are simply called "save". Since every object class is implemented in a module where all the classes methods are defined, each class/module knows how to "save" its objects.

Objects are created in Janus by entering a command of the following kind:

        className objectName parameter1 ... paramerterN

which will create an object named objectName which belongs to the class className. Sometimes, for creating an object additional information must be given as parameters.

With the object oriented addons from Janus, and with the powerful Tcl/Tk language, you have a very flexible, intuitive and portable tool for controling Janus. Tcl/Tk is freely available for all popular platforms. It's interface to C-programs allows Janus to make its internal datastructures visible to the user. It is very easy to access every single coefficient of every vector in a codebook. It is even possible to modify it, all from the command prompt. On the other hand it is possible to define powerful Tcl-procedures that do things like training a recognizer from the scratch. With this high flexibility Janus is a nice tool for both, people who just want to train a recognizer and not care about the details, as well as researchers who would like to experiment by screwing around in the innermost datastructures of the recognizer.