Main Page   Namespace List   Class Hierarchy   Alphabetical List   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

Cross Lingual Retrieval Evaluation

This application (XlingRetEval.cpp) runs cross-lingual retrieval experiments.

Parameters are:

  1. sourceIndex: The complete name of the index for the source language collection. This provides the background model for the source language.
  2. targetIndex: The complete name of the index for the target language collection. This is the collection that is searched.

  3. textQuery: the query text stream, in the source language
  4. XLlambda: The smoothing parameter for mixing P(t|D) and P(s|GS).
  5. XLbeta: The Jelinik-Mercer lambda for estimating P(t|D).

  6. sourceBackgroundModel: One of "term" or "doc". If term, background model for the source language is estimated as tf(s)/|V|. If doc, the background model for the source language is estimated as df(t)/sum_w_in_V df(w). Default is term.
  7. targetBackgroundModel: One of "term" or "doc". If term, background model for the target language is estimated as tf(s)/|V|. If doc, the background model for the target language is estimated as df(t)/sum_w_in_V df(w). Default is term.

  8. resultFile: the result file
  9. resultFormat: whether the result format should be of the TREC format (i.e., six-column) or just a simple three-column format <queryID, docID, score>. String value, either trec for TREC format or 3col for three column format. Default: TREC format.
  10. resultCount: the number of documents to return for each query

  11. feedbackDocCount: the number of docs to use for pseudo-feedback (0 means no-feedback)
  12. feedbackTermCount: the number of terms to add to a query when doing feedback.

Simple KL parameters:

  1. smoothSupportFile: The name of the smoothing support file
  2. smoothMethod: One of the four:
    • jelinikmercer or jm for Jelinek-Mercer
    • dirichletprior or dir for Dirichlet prior
    • absolutediscount or ad for Absolute discounting
    • twostage or 2s for two stage.
  3. smoothStrategy: Either interpolate for interpolate or backoff for backoff.

  4. adjustedScoreMethod: Which type of score to output, one of:
    • "querylikelihood" or "ql" for query likelihood.
    • "crossentropy" or "ce" for cross entropy.
    • "negativekld" or "-d" for negative KL divergence.
  5. JelinekMercerLambda: The collection model weight in the JM interpolation method. Default: 0.5

  6. DirichletPrior: The prior parameter in the Dirichlet prior smoothing method. Default: 1000

  7. discountDelta: The delta (discounting constant) in the absolute discounting method. Default 0.7.
  8. queryUpdateMethod: feedback method, one of:
    • relevancemodel1 or rm1 for relevance model 1.
    • relevancemodel2 or rm2 for relevance model 2.
    1. feedbackCoefficient: the coefficient of the feedback model for interpolation. The value is in [0,1], with 0 meaning using only the original model (thus no updating/feedback) and 1 meaning using only the feedback model (thus ignoring the original model).

    2. feedbackTermCount: Truncate the feedback model to no more than a given number of words/terms.

    3. feedbackProbThresh: Truncate the feedback model to include only words with a probability higher than this threshold. Default value: 0.001.

    4. feedbackProbSumThresh: Truncate the feedback model until the sum of the probability of the included words reaches this threshold. Default value: 1.
    Parameters feedbackTermCount, feedbackProbThresh, and feedbackProbSumThresh work conjunctively to control the truncation, i.e., the truncated model must satisfy all the three constraints.

Generated on Wed Nov 3 13:00:03 2004 for Lemur Toolkit by doxygen1.2.18