main
Developers Manual

Introduction to JANUS for Developers

If all you want is to train a recognizer, do some recognition, write labels or have a look at them, if you want to perform jobs that somebody else has probably done before and has written ready-to-use scripts, or if you want to do something a little different that it is usually done, then you most likely will be happy with the user's manual. Have a look in the user's manual at what tasks have been done and for what there are scripts available. The developer's manual is of interest only to people who want to write new scripts for performing new task, introduce new features to the JANUS recognizer, make major modifications to existing scripts, write or modify C source code, and write and maintain this documentation.

On this page you can find the following topics:


Object Oriented Programming

JANUS was designed to be programmable. The programming language is Tcl/Tk, expanded by some object classes and their methods. Object classes are things like dictionaries, codebooks, but also the decoder itself is an object class. Every object class has its methods (operations that can be done with objects of that class). Objects can have subobjects and can be hierarchically organized. The object oriented programming paradign allows, at least in principle, to plug in and out objects as one wishes. Simply change the dictionary by assigning a new one, copy codebooks as easily as "cb1 := cb2", add distribution accumulators as easily as "ds1.accu += ds2.accu", etc.


C and Tcl/Tk

Why Tcl/Tk

The JANUS recognizer is written in both, C and Tcl/Tk. In the predecessor version, only C code was used, the user interface was rather clumsy not very powerful. Whenever a new feature hat to be introduced, this meant C coding and debugging. In this JANUS, we decided to use Tcl as a user inteface. Tcl provides a powerful shell, has its own programming language interpreter, is well documented, freely available, and allows easy cooperation with C programs. Using Tcl, it is now possible for casual users (and developers) to write their own scripts in Tcl or to modify existing scripts and make JANUS do what they need without having to do C coding, and, hopefully, without having to ask somebody who is more experienced with the code.

The C/Tcl Interface Module

Most of the services that Tcl offers to C programmers should be accessed by calling the appropriate functions from the C/Tcl interface module. This module's most important job is to maintain a list of object types. The C source code modules implement the datastructures and methods (operations) for their object classes and call itfNewType(...) to make their object classes available to the Tcl programmer. (Please excuse, that sometimes we call the same thing object class, or object type, or somethimes even wrongly object.)

The interface module also offers some useful functions and preprocessor macroes for often used operations, like creating new instances of an object class (i.e new objects), destroying accessing and cofiguring objects.


The JANUS Hierarchical Modular Architecture

There are many modules in JANUS. A module usually consists of one C source file (.c) and a corresponding header file (.h), sometimes one module can consist of more than one source file. Usually, a module does implement one or more object classes that logically belong together. E.g. the dictionary module implements the object classes "Dictionary" and "Word". Since some objects imbed or refer to other objects, there is an object-dependency-hierarchy. These dependencies are also reflected in the module architecture. Some modules have to include the header files of other modules. The logical dependencies of modules and objects and the header-includes, all represent an almost identical hierarchy. Keep in mind, however, that it is not always possible to find a definition of what is dependent on what with consent from all involved people. The following diagram ties to show all modules and their dependencies (upper modules depend on the connected lower modules):
             search
            /   |  \
   hypotheses   |  language model
        \       |    /
         \      |   /           labels
          vocabulary           /  |   \
           |\                 /   |    \--- dictionary...
           | \  alignment    /    |     \--- allophone models...
           |  \    \   \    /     |      \--- senone tree...
           |   \    \   path      |       \--- topology tree...
           |    \    \    |       |        \--- topologies...
           |     \    \   |       |         \--- transitions
           |      \    Markov model          \--- tags
           |       \  /       |               \--- phones
           |        \/        |                \--- database
           |        /\        |
           dictionary allophone models
            |    |         |        \ 
            |    |         |         \
            |    |         |          \
          tags   |     senone tree   topology tree
                 |     /     | \  \    /   /    | 
                 |    /      |  \  tags   /     | 
                phones       |   \       /      |
			     |    \     /       |
			     |  generic tree    |
			     |                  |
			     |                  |
			   senones          topologies
                            /   \               |
			   /     \              |
                      (score computer)          |
                         /         \            |
			/           \           |
              distributions  neural net     transitions
               |         |
               |         |
            codebooks    |         	C/Tcl interface
     lda       |     \   |              generic lists
      / \      |     rewriting          sample sets
  path   features

Some of the modules appear more than once, this is only to avoid too many crossing dependency lines. The lines in the above diagram mean the following:
Iff module A is connected to a lower module B, then there is a part in A's source code which needs to know something about B's source code.
This doesn't necessarily mean that you cant define a object of type A without having one of type B. It also doesn't mean that an object of type A can have a subobject object of type B.

There are three special cases in the diagram:


The About the Makefile

The Makefile should work with standard UNIX command make. The safest way to create a JANUS executable is to create an empty directory, make a symbolic link of the JANUS source RCS directory to ./RCS, run co RCS/*, have a look at some definitions in the Makefile, to make sure that all paths are correct, run make depend, then run make. This is a protocol of such a session:
(i13a6:/home/i13hp1/rogina) mkdir tmp
(i13a6:/home/i13hp1/rogina) cd tmp
(i13a6:/home/i13hp1/rogina/tmp) ln -s /home/i13d4/speech/janus3/RCS .
(i13a6:/home/i13hp1/rogina/tmp) co RCS/*
... many RCS messages ...
(i13a6:/home/i13hp1/rogina/tmp) make depend
grep include *.c | grep "\.tclc" | cut -f2 -d'"' | xargs touch -t 199601011200
makedepend --  -I/home/i13d4/speech/janus3/include   -g -- *.c
(i13a6:/home/i13hp1/rogina/tmp) make janus
... many messages from make ...
(i13a6:/home/i13hp1/rogina/tmp) janusA
# ==================================================
#  JANUS-SR  Version 3.0 [Jan 10 1996 13:02:21]
#            ---------------------------------------
#            University of Karlsruhe, Germany       
#            Carnegie Mellon University, USA        
#                                                   
#            (c) 1993-95 - Interactive Systems Labs 
# ==================================================
% 
There are a few things not usually found in other Makefiles. Some JANUS C source files do include a preprocessed Tcl script and assign it to a string such that this string can be used for Tcl_Eval(...). It seemed nice and easier to us, to have extra humand-readable files which contain the Tcl scripts. This way it is easy to read and to edit them, and they can also be used and debugged as standalone scripts. To do this, we first have to convert a Tcl script into a single string, by replacing all doublequotes with backslash-doublequotes and all backslashes by backslash-backslashes, and all newlines by backslash-n. This is done in a oneliner which is used as the rule for creating .tclc files from .tcl files. The .tclc files are the ones that are included in the C source. Because makedepend complains when files-to-be-included don't exist, we've also added a line to the rule for make depend which will create dummy .tclc files with very old dates. (It might be a waste of time, tryining to optimize the onliner .tcl to .tclc rule.)
Maintainer: monika@ira.uka.de, rogina@ira.uka.de, finkem@cs.cmu.edu, maier@ira.uka.de