Main Page Namespace List Class Hierarchy Alphabetical List Compound List File List Namespace Members Compound Members File Members Related Pages

GenerateSmoothSupport Application

This application ( GenerateSmoothSupport.cpp ) generates two support files for retrieval using the language modeling approach. Both files contain some pre-computed quantities that are needed to speed up the retrieval process. One file (name given by the parameter smoothSupportFile, see below) is needed by retrieval using smoothed unigram language model based on BasicIndex. Each entry in this support file corresponds to one document and records two pieces of information: (a) the count of unique terms in the document; (b) the sum of collection language model probabilities for the words in the document. The other file (with an extra suffix ".mc" is needed if you run feedback based on the Markov chain query model. Each line in this file contains a term and a sum of the probability of the word given all documents in the collection. (i.e., a sum of p(w|d) over all possible d's.)

To run the application, follow the general steps of running a lemur application and set the following variables in the parameter file:

(1) index: the table-of-content (TOC) record file of the index (e.g., the .bsc file created by BuildBasicIndex or the .ifp file created by PushIndexer. )

(2) smoothSupportFile: file path for the support file (e.g., /usr0/mydata/index.supp)

This application is also a good example of using the doc index (i.e., doc->term index).

Generated on Wed Nov 3 13:00:02 2004 for Lemur Toolkit by

1.2.18