HomeFeedback
LSI
Sinopsis
Latent Semnatic Indexing (LSI)
Description
Usage
LSI is maintained by Xin Liu (xliu@cs.cmu.edu).
The best way to start is of course copying the Makefile in your working directory. It contains the following commands:
Wrappers are in moscow:/usr9/xliu/lsi and
/usr9/xliu/gvsm.
If you want to see the source code for LSI and GVSM,
they are at moscow:/usr3/xliu/work/lsi.
Example
Repeating part of Xin's work on the UNICEF corpus is a good example.
His documentation is complete and I'll review just a small part of it.
Goal: estimate the 11-avgp on the monolingual and translingual data sets of the UNICEF corpus. Try different number of eigenvectors (singular values).
Copy the Makefile in your lsi directory
.
Try "make init". It should create a source directory with a bunch of data files.
Modify the SV value and "make all" will generate the corresponding evaluation files (look in the eval directory).
Example
For the UNICEF corpus, using ntc weighting.
| 100 | 200 | 300 |
MIR 11-avgp | 0.3954 | 0.4275 | 0.4267
|
TIR 11-avgp | 0.3967 | 0.4114 | 0.4145
|
DON'T FORGET TO CLEAN AND TIDY ('gmake clean tidy') AS SOON AS YOU GET THE DESIRED RESULTS. DATA AND TEMPORARY FILES USE LOTS OF RESOURCES.
Links
Latent semantic analysis at Colorado
Susan Dumais Home Page
Bellcore LSI Page
Mike Berry LSI PAge