The Vocabulary
The Vocabulary, that is a list of valid words is stored in
a Vocab Object
defining common indecees for search and language module. It also
builds an interface between the acoustic and the search.
There are two kind of Vocab objects:
- Full Vocab:
a full Vocab contains the following information:
- a list of words defining which words the recognizer can recognize.
the wordlist is read from a vocab file;
The indecees of the list are used to communicate between the language
model object Lm defined over this Vocab and the
Search object defined over this Vocab.
- an optional list of words that are to be treated as fillers, ie. that
are ignored by the language model. The format is the same as for the
word list.
- for each word in the wordlist, the sequence of monophone indecees
that make up the word. This is taken from the Dictionary the
Vocab is defined over.
- more information about the senon indecees that are to be used to
model the phonemes in the context of the words in the word list.
To get at this information, a vocab has to be defined over an
AModelSet.
- some lookup tables required for the Recognition. The more complicated
ones are required for cross word triphones. The user usually has little
to do with these.
- Simple Vocab:
to experiment with Language Models without loading a full recognizer,
a simple Vocab can be used. To create a simple Vocab,
simply don't give a Dictionary or AModelSet when creating it.
The simple Vocab then only consists of a list of words as needed for an Lm
object. Trying to build a Search object with it will fail.
Other information on Vocabularies:
monika@ira.uka.de