IR term project of Paul Kennedy - Revised

Basic Information

Contents


Abstract

The English-to-Spanish Translingual Information Retrieval Project described on the IR11741 project web page will be carried out. Experiments using the "DICT" (Collins) English-Spanish machine-readable dictionary for query translation will be carried out, plus experiments with translingual PRF, GVSM, and LSI. These should duplicate the results reported by researchers at CMU [1]. In addition, experiments will be carried out combining translation using DICT with pre-translation query expansion via addtion of synonyms extracted using Wordnet.

If time permits, some experiments will be done in English-to-Serbo-Croatian TLIR. The UNICEF  corpus will be machine-translated into Serbo-Croatian using DIPLOMAT. Experiments like those carried out at CMU will be done. If comparable results are obtained this wil be taken as evidence that machine translation is a viable approach to obtaining a parallel training corpus.

References:

[1] Y. Yang, J.G. Carbonell, R.E. Frederking, and R. Brown, Translingual Information Retrieval: Learning from Bilingual Corpora. Artificial Intelligence Journal Special Issue: Best of IJCAI-97, 1998.

Proposal and Timelines

Proposal

Midterm Exam

Schedule (for more details see proposal):

Task

to be done by

status

Implement program to drive Wordnet, test-run all English-to-Spanish experiments except LSI, submit design of LSI experiment to Xin Liu. 4/10/98 Active
Run experiments, machine-translate corpus to Serbo-Croatian if time allows. 4/17/98 open
Perform English-to-Serbo-Croatian experiments, if any. Prepare final report. 4/23/98 open

System Description

Experiments

1. English-to-Spanish TLIR on the UNICEF corpus using DICT.

2. English-to-Spanish TLIR on the UNICEF corpus using DICT with pre-translation query expansion using Wordnet.

3. English-to-Spanish TLIR on the UNICEF corpus using TL-PRF.

4. English-to-Spanish TLIR on the UNICEF corpus using TL-GVSM.

5. English-to-Spanish TLIR on the UNICEF corpus using TL-LSI.

6. English-to-Serbo-Croatian TLIR experiments as time permits.

Results

-

Demo

No plans at this time.


last update: April 2, 1998