This application builds an FP passage index for a collection of documents. If the index already exists, new documents are added to that index, otherwise a new index is created. Documents are segmented into passages of size passageSize
with an overlap of passageSize/2
terms per passage.
To use it, follow the general steps of running a lemur application.
The parameters are:
index
: name of the index table-of-content file without the .ifp extension. memory
: memory (in bytes) of InvFPPushIndex (def = 96000000). stopwords
: name of file containing the stopword list. acronyms
: name of file containing the acronym list. countStopWords
: If true, count stopwords in document length. docFormat
: stemmer
: KstemmerDir
: Path to directory of data files used by Krovetz's stemmer. arabicStemDir
: Path to directory of data files used by the Arabic stemmers. arabicStemFunc
: Which stemming algorithm to apply, one of: dataFiles
: name of file containing list of datafiles to index. passageSize
: Number of terms per passage.