Main Page Namespace List Class Hierarchy Alphabetical List Compound List File List Namespace Members Compound Members File Members Related Pages

Cross Lingual Retrieval Evaluation

This application (XlingRetEval.cpp) runs cross-lingual retrieval experiments.

Parameters are:

sourceIndex: The complete name of the index for the source language collection. This provides the background model for the source language.
targetIndex: The complete name of the index for the target language collection. This is the collection that is searched.
textQuery: the query text stream, in the source language
XLlambda: The smoothing parameter for mixing P(t|D) and P(s|GS).
XLbeta: The Jelinik-Mercer lambda for estimating P(t|D).
sourceBackgroundModel: One of "term" or "doc". If term, background model for the source language is estimated as tf(s)/|V|. If doc, the background model for the source language is estimated as df(t)/sum_w_in_V df(w). Default is term.
targetBackgroundModel: One of "term" or "doc". If term, background model for the target language is estimated as tf(s)/|V|. If doc, the background model for the target language is estimated as df(t)/sum_w_in_V df(w). Default is term.
resultFile: the result file
resultFormat: whether the result format should be of the TREC format (i.e., six-column) or just a simple three-column format <queryID, docID, score>. String value, either trec for TREC format or 3col for three column format. Default: TREC format.
resultCount: the number of documents to return for each query
feedbackDocCount: the number of docs to use for pseudo-feedback (0 means no-feedback)
feedbackTermCount: the number of terms to add to a query when doing feedback.




 Simple KL parameters: 

 smoothSupportFile: The name of the smoothing support file 
 smoothMethod: One of the four:  
jelinikmercer or jm for Jelinek-Mercer 
dirichletprior or dir for Dirichlet prior  
absolutediscount or ad for Absolute discounting 
twostage or 2s for two stage. 
 smoothStrategy: Either interpolate for interpolate or backoff for backoff.

 adjustedScoreMethod: Which type of score to output, one of:  
 "querylikelihood" or "ql" for query likelihood. 
 "crossentropy" or "ce" for cross entropy. 
 "negativekld" or "-d" for negative KL divergence. 
 JelinekMercerLambda: The collection model weight in the JM interpolation method. Default: 0.5

 DirichletPrior: The prior parameter in the Dirichlet prior smoothing method. Default: 1000

 discountDelta: The delta (discounting constant) in the absolute discounting method. Default 0.7.  
 queryUpdateMethod: feedback method, one of: 
relevancemodel1 or rm1 for relevance model 1. 
relevancemodel2 or rm2 for relevance model 2. 

 feedbackCoefficient: the coefficient of the feedback model for interpolation. The value is in [0,1], with 0 meaning using only the original model (thus no updating/feedback) and 1 meaning using only the feedback model (thus ignoring the original model).

 feedbackTermCount: Truncate the feedback model to no more than a given number of words/terms.

 feedbackProbThresh: Truncate the feedback model to include only words with a probability higher than this threshold. Default value: 0.001.

 feedbackProbSumThresh: Truncate the feedback model until the sum of the probability of the included words reaches this threshold. Default value: 1.  
Parameters feedbackTermCount, feedbackProbThresh, and feedbackProbSumThresh work conjunctively to control the truncation, i.e., the truncated model must satisfy all the three constraints.  
Generated on Wed Nov 3 13:00:03 2004 for Lemur Toolkit by

1.2.18