The Task's Database

Every task should have database which will store all the needed information about the task's utterances. Although it is possible to work without a database (then you'd have to do some more low-level Tcl programming of JANUS), it is much more convenient to use a database. Training an utterance is then done by simply saying "train <utteranceID>". Or testing it by saying "test <utteranceID>". A database is usually created in two steps. First you build a human readable file which contains one line per utterance, the first item in the line should be the utterance ID, the second item should be a list of two-element Tcl-lists, the first of which is a variable name, and the second is the variable's value:

  uttID_1 { { varName_1 varValue_11 } { varName_2 varValue_21 } ... { varName_m varValue_m1 } }
  uttID_2 { { varName_1 varValue_12 } { varName_2 varValue_22 } ... { varName_m varValue_m2 } }
  ...
  uttID_n { { varName_1 varValue_1n } { varName_2 varValue_2n } ... { varName_m varValue_mn } }

The { varName varValue } lists mean: "If we are talking about utterance uttID then the variable varName has the value varValue. For example you could use the variables "speaker", "gender", "sentence", "transcript", and one line of the file could look like:

  abcd1234 { { speaker John } { gender male } { sentence 14.2 } { transcript {Hello World} } }

Which would meand that the male speaker John said "Hello World" in his sentence number 14.2. When you will have to define how to get the speech file from an utterance ID, then you could e.g. define

  "set speechFile /home/mytask/${speaker}/${sentence}.adc"

Usually, it is recommended to use many variables, because this gives you flexibility in organizing your speech files. If you will have to reorder your speech files (e.g. wrap them in directories. move them elsewhere, etc.) then it should be quite easy to modify the above rule to work with your new organization.

The second step of the database creation, is the creation of a binary file which allows fast access without having to keep the whole database in memory. JANUS offers the object class "DBase". What you have to do, is basically just to create a DBase object, and call the objects "add" method with each line of the human-readable file as its arguments.
It might be tricky to use transcription texts that contain double-quotes or other characters that are special to Tcl. If you need such characters then you'll just have to be a bit more careful with them.