Main Page Namespace List Class Hierarchy Alphabetical List Compound List File List Namespace Members Compound Members File Members Related Pages

Structured Query Evaluation

This application (StructQueryEval.cpp) runs retrieval experiments to evaluate the performance of the structured query model using the inquery retrieval method. StructQueryEval requires that its index parameter be a positional index (currently one of InvFPIndex or KeyfileIncIndex).

Feedback is implemented as a WSUM of the original query combined with terms selected from the feedback documents based on belief score. The expanded query has the form:

wsum( (1 - a) <original query>
      a*w1  t1
      a*w2  t2
      ...
      a*wN  tN
      )

where a is the value of the parameter feedbackPosCoeff.

Scoring is either done over a working set of documents (essentially re-ranking), or over the whole collection. This is indicated by the parameter "useWorkingSet". When "useWorkingSet" has either a non-zero (integer) value or the value true, scoring will be on a working set specified in a file given by "workSetFile". The file should have three columns. The first is the query id; the second the document id; and the last a numerical value, which is ignored. By default, scoring is on the whole collection.

The parameters are:

index: The complete name of the index table-of-content file for the database index. This must be a positional index (currently one of InvFPIndex or KeyfileIncIndex).
textQuery: the query text stream parsed by ParseInQuery
resultFile: the result file
resultFormat: whether the result format should be of the TREC format (i.e., six-column) or just a simple three-column format <queryID, docID, score&gt. String value, either trec for TREC format or 3col for three column format. The integer values, zero for non-TREC format, and non-zero for TREC format used in previous versions of lemur are accepted. Default: TREC format.
resultCount: the number of documents to return as result for each query
defaultBelief: The default belief for a document: Default=0.4
feedbackDocCount: the number of docs to use for pseudo-feedback (0 means no-feedback)
feedbackTermCount: the number of terms to add to a query when doing feedback.
feedbackPosCoeff: the coefficient for positive terms in the expanded query.

Generated on Wed Nov 3 13:00:03 2004 for Lemur Toolkit by

1.2.18