The English-to-Spanish Translingual Information Retrieval Project described on the IR11741 project web page will be carried out. Experiments using the "DICT" (Collins) English-Spanish machine-readable dictionary for query translation will be carried out, plus experiments with translingual PRF, GVSM, and LSI. These should duplicate the results reported by researchers at CMU [1]. In addition, experiments will be carried out combining translation using DICT with pre-translation query expansion via addtion of synonyms extracted using Wordnet.
If time permits, some experiments will be done in English-to-Serbo-Croatian TLIR. The UNICEF corpus will be machine-translated into Serbo-Croatian using DIPLOMAT. Experiments like those carried out at CMU will be done. If comparable results are obtained this wil be taken as evidence that machine translation is a viable approach to obtaining a parallel training corpus.
References:
[1] Y. Yang, J.G. Carbonell, R.E. Frederking, and R. Brown, Translingual Information Retrieval: Learning from Bilingual Corpora. Artificial Intelligence Journal Special Issue: Best of IJCAI-97, 1998.
Schedule (for more details see proposal):
Task |
to be done by |
status |
Implement program to drive Wordnet, test-run all English-to-Spanish experiments except LSI, submit design of LSI experiment to Xin Liu. | 4/10/98 | Active |
Run experiments, machine-translate corpus to Serbo-Croatian if time allows. | 4/17/98 | open |
Perform English-to-Serbo-Croatian experiments, if any. Prepare final report. | 4/23/98 | open |
1. English-to-Spanish TLIR on the UNICEF corpus using DICT.
2. English-to-Spanish TLIR on the UNICEF corpus using DICT with pre-translation query expansion using Wordnet.
3. English-to-Spanish TLIR on the UNICEF corpus using TL-PRF.
4. English-to-Spanish TLIR on the UNICEF corpus using TL-GVSM.
5. English-to-Spanish TLIR on the UNICEF corpus using TL-LSI.
6. English-to-Serbo-Croatian TLIR experiments as time permits.
-
No plans at this time.