The Vocabulary

The Vocabulary, that is a list of valid words is stored in a Vocab Object defining common indecees for search and language module. It also builds an interface between the acoustic and the search.

There are two kind of Vocab objects:

Full Vocab:
a full Vocab contains the following information:
- a list of words defining which words the recognizer can recognize. the wordlist is read from a vocab file; The indecees of the list are used to communicate between the language model object Lm defined over this Vocab and the Search object defined over this Vocab.
- an optional list of words that are to be treated as fillers, ie. that are ignored by the language model. The format is the same as for the word list.
- for each word in the wordlist, the sequence of monophone indecees that make up the word. This is taken from the Dictionary the Vocab is defined over.
- more information about the senon indecees that are to be used to model the phonemes in the context of the words in the word list. To get at this information, a vocab has to be defined over an AModelSet.
- some lookup tables required for the Recognition. The more complicated ones are required for cross word triphones. The user usually has little to do with these.
Simple Vocab:
to experiment with Language Models without loading a full recognizer, a simple Vocab can be used. To create a simple Vocab, simply don't give a Dictionary or AModelSet when creating it. The simple Vocab then only consists of a list of words as needed for an Lm object. Trying to build a Search object with it will fail.

Other information on Vocabularies:

Tcl Methods and Information
File Formats
Source Code info (currently none)
Example Scripts (currently none)

monika@ira.uka.de