HomeFeedback

LSI

Sinopsis

Latent Semnatic Indexing (LSI)

Description

Usage

LSI is maintained by Xin Liu (xliu@cs.cmu.edu).
The best way to start is of course copying the Makefile in your working directory. It contains the following commands: Wrappers are in moscow:/usr9/xliu/lsi and /usr9/xliu/gvsm.
If you want to see the source code for LSI and GVSM, they are at moscow:/usr3/xliu/work/lsi.

Example

Repeating part of Xin's work on the UNICEF corpus is a good example.
His documentation is complete and I'll review just a small part of it.
Goal: estimate the 11-avgp on the monolingual and translingual data sets of the UNICEF corpus. Try different number of eigenvectors (singular values). . Modify the SV value and "make all" will generate the corresponding evaluation files (look in the eval directory). For the UNICEF corpus, using ntc weighting.
Example
100 200 300
MIR 11-avgp 0.3954 0.4275 0.4267
TIR 11-avgp 0.3967 0.4114 0.4145


Links

Latent semantic analysis at Colorado
Susan Dumais Home Page
Bellcore LSI Page
Mike Berry LSI PAge