Main Page Namespace List Class Hierarchy Alphabetical List Compound List File List Namespace Members Compound Members File Members Related Pages
This application (PLSA.cpp
) will either perform PLSA on a collection, building three probability tables: P(z), P(d|z), and P(w|z) where z in Z are the latent variables (categories), d in D are the documents in the collection, and w in W are the terms in the vocabulary over the collection, or open those tables and read them into memory to illustrate their potential use.
The parameter doTrain (true|false)
determines whether the tables are constructed or read. The default value is true
.
The other parameters accepted by PLSA are:
- index -- the index to use. Default is none.
- numCats -- the number of latent variables (categories) to use. Default is 20.
- beta -- The value of beta for Tempered EM (TEM). Default is 1.
- betaMin -- The minimum value for beta, TEM iterations stop when beta falls below this value. Default is 0.6.
- eta -- Multiplier to scale beta before beginning a new set of TEM iterations. Must be less than 1. Default is 0.92.
- annealcue -- Minimum allowed difference between likelihood in consecutive iterations. If the difference is less than this, beta is updated. Default is 0.
- numIters -- Maximum number of iterations to perform. Default is 100.
- numRestarts -- Number of times to recompute with different random seeds. Default is 1.
- testPercentage -- Percentage of events (d,w) to hold out for validation.
- doTrain -- whether to construct the probability tables or read them in. Default is true.
Generated on Wed Nov 3 13:00:03 2004 for Lemur Toolkit by
1.2.18