The Data Base Module

The database module's most important job is to offer an easy to use and very transparent way of managing a set of utterances of a task. Although databases can be used for anything (you can store whatever you want) we will describe only the suggested usage for a task database. You will be able to create your own database of anything, once you know how to make a task database.

A database usually starts with a human readable file which contains all the entries in human readable format. Once the database is finished, you'll have two more files, a database file and an index file. The latter of which contains the keys of the database entries, and pointers to the values, which reside in the database file.


Every task should have database which will store all the needed information about the task's utterances. Although it is possible to work without a database (then you'd have to do some more low-level Tcl programming of JANUS), it is much more convenient to use a database. Training an utterance is then done by simply saying "train <utteranceID>". Or testing it by saying "test <utteranceID>". A database is usually created in two steps. First you build a human readable file which contains one line per utterance, the first item in the line should be the utterance ID, the second item should be a list of two-element Tcl-lists, the first of which is a variable name, and the second is the variable's value:

  uttID_1 { { varName_1 varValue_11 } { varName_2 varValue_21 } ... { varName_m varValue_m1 } }
  uttID_2 { { varName_1 varValue_12 } { varName_2 varValue_22 } ... { varName_m varValue_m2 } }
  ...
  uttID_n { { varName_1 varValue_1n } { varName_2 varValue_2n } ... { varName_m varValue_mn } }
The { varName varValue } lists mean: "If we are talking about utterance uttID then the variable varName has the value varValue. For example you could use the variables "speaker", "gender", "sentence", "transcript", and one line of the file could look like:
  abcd1234 { { speaker John } { gender male } { sentence 14.2 } { transcript {Hello World} } }
Which would meand that the male speaker John said "Hello World" in his sentence number 14.2. When you will have to define how to get the speech file from an utterance ID, then you could e.g. define
  "set speechFile /home/mytask/${speaker}/${sentence}.adc" 
Usually, it is recommended to use many variables, because this gives you flexibility in organizing your speech files. If you will have to reorder your speech files (e.g. wrap them in directories. move them elsewhere, etc.) then it should be quite easy to modify the above rule to work with your new organization. Remember that sticking parts together is much easier than parsing long string. So you'd preferably not use variable values like { fftFile somePath/someFile.fft } because if you should only need the "someFile" part then you'll have to parse it out.

The second step of the database creation, is the creation of a binary file which allows fast access without having to keep the whole database in memory. JANUS offers the object class "DBase". What you have to do, is basically just to create a DBase object, open it (i.e. assign a file to it), and call the objects "add" method with each line of the human-readable file as its arguments. Here's a sample session:

% DBase db
db
% db.dbaseIdx configure -hashSizeX 10
% db open myBase myIdx -mode rwc
% db add abcd1234 { { speaker John } { gender male } { sentence 14.2 } { transcript {Hello World} } }
% db add abcd1235 { { speaker Mary } { gender female } { sentence 11.3 } { transcript {Hi There} } }
% db
abcd1234 abcd1235
% db get abcd1235
 { speaker Mary } { gender female } { sentence 11.3 } { transcript {Hi There} } 
% makeArray arr [db get abcd1235]
% puts $arr(gender)
female
% db close
% db destroy
% DBase again
again
% again open myBase myIdx -mode r
% dbase
invalid command name "dbase"
% again
abcd1234 abcd1235
The -hashSizeX parameter is used to define the exponent to 2 of size of the hash table that is used to implement the database. In the above example the size of the hash table would be 2 to the 10th = 1024. A good size is slightly more than the number of items you want to store in the database. So if you want to store 80000 utterances you'd pobably set -hashSizeX to 17. You don't have to add every entry manually, you can use (as with any well designed object class) the read method which will read all lines from a file and do the add command for each line.

CAUTION: It might be tricky to use transcription texts that contain double-quotes or other characters that are special to Tcl. If you need such characters then you'll just have to be a bit more careful with them.



Further information about the module: