The LDA Module

I/O methods

Extracting the means

Multiple machines can be used to run the extraction of the means and scatter matrices in parallel. The training data is divided in n parts and the means are calculated in the usual way for each part. Then all the means (total and class means) of each part can be stored at once with the saveMeans method.

% #--- Do that on n machines ---
% lda saveMeans means.$x
% close [open ok.mean.$x w]

Note that $x is for example the number of the part running from 1 to n. We also "touch" a file ok.mean.$x to indicate that this part is done.

The loadMeans method on the other hand accumulates means when reading them from file. Before reading you should make sure to clear the means first. The while loop checks every 100 seconds if the 'ok' files exists.

% #--- collect means on one machine ---
% lda clearMeans
% for {set x 1} {$x <= $n} {incr x} { 
>   while {![file exists ok.mean.$x]} {after 100000}
>   lda loadMeans means.$x
> }
% lda saveMeans means.all

Now the file means.all contains the mean vectors just as you would have done the whole database. (That is not exactly true because the values are stored in float format in the file and not in doubles as they were represented during runtime. But as long as the counts in the files are not very different from each other that's no problem.)

Extracting the scatter matrices

We can proceed extracting the scatter matrices in a similar way using multiple machines. First we load the means and calculate the scatter matrices for each part. Then we save them and use the loadScatters method to mix them all together.

% #--- Do that on n machines ---
% #    creating scatter.$x.T 
% #    and  scatter.$x.W
% #-----------------------------
% lda saveScatters scatter.$x
% close [open ok.scatter.$x w]

% #--- collect scatters on one machine ---
% lda clearScatters
% for {set x 1} {$x <= $n} {incr x} { 
>   while {![file exists ok.scatter.$x]} {after 100000}
>   lda loadScatters scatter.$x
> }

maier@ira.uka.de