The Sample Set Module - Tcl Usage

This page contains the explanations, usage, and syntax of the methods and the configurable parameters for the SampleSet and SampleSetClass object types.

Creating a `SampleSet` Object

The command for creating a SampleSet is:

SampleSet <name> <featureSet> <feature> <dimension>

where <name> is the name of the SampleSet object to be created, <featureSet> is the name of the FeatureSet object which contains the <dimension>-dimensional feature space <feature>. This object creation is the same as for the LDA object.

SampleSetClass Methods

<object> puts
Print the following seven items:
- name of the class
- name of the file into which the buffer is dumped
- modulus, defining that only one out of every 'modulus' training vectors is taken into account
- filling level of the buffer (i.e. number of vectors that are currently in the buffer)
- number of vectors that have already been dumped since the buffer was created or cleared
- total number of vectors that have been offered to the buffer, accepted or not
- maximum number of vectors that will be written into a file (-1 = unlimited)
<object> clear
Reset the filling level counter and the counter for the number of dumped samples to 0. This will not flush the samples that are residing in the buffer. If you do a clear and don't want to lose any data, you have to flush the buffer first.
<object> flush
Dump all samples that are still residing in the buffer into their file, reset the filling level counter to 0 and increment the counter for the total number of flushed samples by the number of samples that have been flushed. Flushing will happen automatically when a buffer is full.

SampleSet Methods

<object> puts
List all the names of the of the classes defined for the given object. Entering only the name of the SampleSet has the same behaviour as the puts method.
<object> add <name> [-filename <filename>] [-size <size>] [-mod <modulus>]
Expects the name of a SampleSetClass subobject, which is to be created. This class will be added to the already existing classes, unless a SampleSetClass subobject with the same name is already existing. If you don't explicitely specify a filename, then the file into which the buffers are dumped will have the same name as the class. The size argument defines the size of the buffer (i.e. how many samples can be stored before the buffer must be flushed). Usually you'll prefer to have a buffer as big as possible. To avoid time consuming flushes (each flush is an access to the file system including an fopen and an fclose). However, for large databases or very many classes the memory of your computer might pose a limit. If no modulus option is given, then the default value of 1 will be assumed, which means that every training sample will be buffered.
<object> delete <name>
Expects the name of a SampleSetClass subobject which will be deleted from the given object. If you try to delete a non existing subobject, then an error message will be issued. Keep in mind, that removing subobjects can change the indices of the other subobjects.
<object> index <name*>
Expects any number of PhoneSetClass subobject names. The method will return the list of indices of the named subobjects.
<object> name <index*>
Expects any number of integers and returns the names of the PhoneSetClass subobjects whose index they represent. If an index is less than zero or greater than the greatest index in the object the resulting name will be (null). Please keep in mind, that deleting subobjects from an object can change the indices of other objects.
<object> map <index> [-class ]
Expects an index of a senone (actually it doesn't care what this integer is, it will only be compared to whatever is given as senone index in a Path object during accumulation). To every senone for which you'd like to accumulate data, you have to give a class. All senones that use the same class will accumulate their samples in the same buffer and flush them into the same file.
<object> showmap
Print the entire senone index to class name maping that was defined so far. The output is a list of class names, the n-th printed name is the name of the class to which the senone index n is mapped.
<object> accu <path>
Expects the name of a Path object. Every cell of the path contains a training factor (gamma) and a seone index. The corresponding sample frame will be added to the senone index's class buffer, supplemented by the gamma-factor. So, if the dimensinality of the underlying feature space is d, then for every path cell a d+1 dimensional vector is added to the buffer. The coefficient which stores the gamma-factor is the last coefficient. Buffer that are full, are automatically flushed into their files.
<object> clear
Clear the buffers of all classes of the SampleSet object. This is just a loop over all classes doing a clear for every single class.
<object> flush
Flush the buffers of all classes of the SampleSet object. This is just a loop over all classes doing a flush for every single class.

Accessing Subobjects

If you know the name of a SampleSetClass subobject of a SampleSet object, then you can access the subobject with the term <SampleSetName>:<SampleSetClassName>. If you don't know the name, but you know the index, then you can access the subobject with the term <SampleSetName>.item(<SampleSetClassIndex>).

Every sample set class has a subobject .buf which is of the type FMatrix

Configuration Parameters of SampleSet Objects (all read-only) -name <string> is the name of the object. -useN <integer> is the count of super-objects that are using the cofigured object. -itemN <integer> is the number of items in the configured object, i.e. the number of monophones in a Phones object or the number of Phones objects in a PhonesSet object. -blkSize <integer> is the blocksize of the used generic list. -featureSet <object> is the FeatureSet object that contains the underlying feature space. -featX <integer> is the index of the underlying feature space in its FeatureSet object. -dimN <integer> is the dimensionality of the underlying feature space (not including the gamma-factor coefficient). -indexN <integer> is the highest senone index that was mapped so far, unmapped indices that are smaller than this are automatically mapped to -1. Configuration Parameters of SampleSetClass Objects -name <string> is the name of the object (read-only). -useN <integer> is the count of super-objects that are using the cofigured object (read-only). -fileName <string> is the name of the file into which the buffer is flushed. -modulus <integer> defines that only one out of every 'modulus' training samples is taken into account. -count <integer> is the current filling level of the buffer. -sumCount <integer> is the number of samples that have already been flushed. -allCount <integer> is the number of samples that have been offerd to the buffer. -maxCount <integer> is the maximum number of samples to be flushed (-1 = unlimited), when this number is reached every sample is ignored.

The Sample Set Module - Tcl Usage

Creating a `SampleSet` Object

SampleSetClass Methods

SampleSet Methods

Accessing Subobjects

Configuration Parameters of `SampleSet` Objects (all read-only)

Configuration Parameters of `SampleSetClass` Objects

The Sample Set Module - Tcl Usage

Creating a SampleSet Object

SampleSetClass Methods

SampleSet Methods

Accessing Subobjects

Configuration Parameters of SampleSet Objects (all read-only)

Configuration Parameters of SampleSetClass Objects

Creating a `SampleSet` Object

Configuration Parameters of `SampleSet` Objects (all read-only)

Configuration Parameters of `SampleSetClass` Objects