The project is an extension of my 11-741 project. The topic is Translingual Information Retrieval using different retrieval methods and models. In the first part of the project, the following methods will be applied to the English-Spanish UNICEF collection: DICT, DICT+Wordnet, GVSM, PRF, LSI. In the second part of the project, an automatically generated corpus will be used for methods requiring traning on bilingual corpora. The corpus has been obtained from the translation service provided on the Web by AltaVista which is based on the Systran MT system.
The original project will be extended with an experiment in which the DICT method is run using an automatically translated dictionary. The dictionary will be obtained by extracting all terms appearing in the UNICEF collection and translating them using SYSTRAN. This experiment will provide a baseline for comparing the results of the runs in which the whole corpus was automatically translated.
Also, a more sophisticated evaluation method will be used to assess how well all the methods perform. In particular, many runs on different partitions of the corpus in training, fine-tuning and test set will be done. The addition of the fine-tuning set is especially interesting since the previous results the parameters were tuned on the training set.
Task |
to be done by |
status |
Set up the web page | October 20th | done |
Extract and translate the vocabulary | November 1st | done |
Evaluate DICT using the automatically translated dictionary | November 5th | done |
Code for automatic evaluation | done | |
Evaluation of PRF | November 1st | pending |
Evaluation of GVSM and LSI on UNICEF | November 15th | pending |
Write-up | December 10th | pending |
| | | | ||||||
| | | | | | | | | |
No WordNet expansion | | | | | | | | ||
Single WordNet expansion | | | | | | | | ||
Full WordNet expansion | | | | | | | |
| | | | ||||||
| | | | | | | | | |
No WordNet expansion | | | | | | | | ||
Single WordNet expansion | | | | | | | | ||
Full WordNet expansion | | | | | | |
no WordNet expansion | single WordNet expansion | full WordNet expansion | |
GVSM | | | |
LSI | | | |