Extracting Feature Vectors

If you want to start with more than one vector per codebook you need some algorithm that can create multiple vectors, newural gas or k-means. These two algorithms are iterative algorithms. It is possible to do the traininig in such a way that every training iteration (one pass over the entire training set) equals one iteration of k-means or neural gas. Depending on the kind of data and the size of the codebooks you might need very many iterations. Since reading the entire training set over and over again can cost quite some time, you might prefer to first 'sort' the training set, by collecting all the vectors that belong to the same codebook in one file. Then you can give this file to the neural gas (k-means is a special case of neural gas) algorithm, which will then run many iterations without having to read the training data. See the createSbuf method of the CodebookSet for details.