IR Lab project of Paul Kennedy

Basic Information

Project Title: TLIR: Query Expansion using multiple hypotheses from an EBMTsystem
Name: Paul Kennedy (email: pkennedy@cs.cmu.edu
Presentation Date: December 9, 1998

Abstract
Introduction
Panlite Chart Output
Experiments
System Description
Results
References

Abstract

The Panlite EBMT system produces a chart of alternative translations
of chunks of the source-language input. TLIR experiments were
conducted on the UNICEF test bed at CMU using the chart contents for
query expansion. Baseline experiments were run for comparison,
employing Panlite's normal "one-best" translation and also
dictionary-based query translation. The chart-based experiments
produced worse retrieval performance than either of these. Thus the
experiments reported here must be considered unsuccessful. Two things
might still be explored to attempt to prove the viability of this
approach. Query expansion in general is known not to work well with
the UNICEF corpus; thus a more suitable test bed might be sought for.
Also, ranking of the chart contents for sparsifcation ought to be done
according to the same algorithm used by Panlite to select its
"one-best" translation; a cruder method was used here.

Introduction

The most obvious and most widely used approach to translingual
information retrieval is machine translation of the query into the
language of the document collection to be searched. The author has
recently used the example-based machine translation (EBMT) system
Panlite (II, IV, V) for this purpose in a few TLIR experiments.
Panlite can provide more information than just the finished
translation of the query, however. Panlite scans its example base for
possible translations of substrings of its input. It puts the results
into a chart, then uses a statistical model of the target language to
forge a finished translation from the most likely alternatives in the
chart.

Panlite can be configured at runtime to output the contents of the
chart instead of a finished translation. The idea for this project is
to use multiple alternatives from the chart to produce a target-
language version of the query expanded beyond what Panlite
would have output as a finshed translation. The percentage of the
chart contents to use becomes a tunable parameter. Each item in the
chart comes with a confidence score or a probablility attached, so
chart contents can be prioritized for possible inclusion in the
target-language query.

Using Panlite for query translation and expansion is in effect one
more example of exploiting parallel corpora in TLIR. This is an
active area of inquiry at a number of research centers. Panlite may
be unique, however, in its approach to aligning a parallel corpus. In
other reports surveyed by the author (VIII, IX, XI), the conceptual
point of departure was to compute the most-probable alignment(s) of
parallel documents using fully automated methods, based on a Bayesian
approach. The designers of Panlite, by contrast, strive for a 100%
accurate sentence-by-sentence alignment of the parallel corpus, using
manual methods where necessary. Probabilistic and heuristic methods
come into play at runtime (during actual production of a translation)
to produce alternative subsentential alignments from which chart
entries are produced (II, III, IV, V, VI, VII).

This paper reports an experiment employing the Panlite chart that was
run on the UNICEF test bed developed at CMU (I). The advantage of
using this test bed is that it employs a carefully aligned parallel
bilingual corpus with a set of well-crafted queries and a full set of
human relevance judgements. A disadvantage for this experiment is
that the queries consist of isolated words and two-word phrases. The
full power of EBMT in finding multiple translation alternatives for
phrases and sentences must go unexploited. Another disadvantage is
that query expansion techniques are known frequently to give
disappointing results with this test bed, probably because the queries
are on average fairly long to begin with, and already include sets of
terms that thoroughly cover the given topic. It was hoped that the
test corpus used for the CLIR track of the Text Retrieval Conference
could be obtained for this study. Unfortunately, the TREC data is
available only to CLIR-track participants because of copyright
restrictions.

Panlite Chart Output

Appendix 1 gives an example of an English sentence to be translated to
Spanish, and the corresponding chart output from Panlite. The
sentence to be translated is "The Spanish president addressed the
General Assembly." A typical chart entry is as follows:

0 2 :EBMT 1 "Presidente español"

The numbers 0 and 2 at the start of the entry specify the beginning
and end words in the input sentence for which this entry gives a
proposed translation. Note that the words in the input sentence are
numbered 0 to N-1 where N is the number of words in the sentence.
":EBMT" indicates that the EBMT engine is proposing this translation,
i.e. example-based machine translation found a probable correspondence
in its parallel corpus between "the Spanish president" on the English
side and "Presidente español" on the Spanish side. The number 1 is
a confidence score computed by the EBMT engine for the proposed
translation. Confidence scores range from 0 to 1. (So the EBMT
engine is pretty certain it's found the correct translation of this
particular chunk.) "Presidente español" is of course the proposed
translation.

Possible entries for translation engine are as follows:

:EBMT Example-based machine translation

:DICT Dictionary engine using a mixture of machine-readable
handcrafted glossaries and statistical dictionaries.

:GLOSS Engine using handcrafted glossaries

:ELM The target-language statistical modeller. This indicates the
probable best choice found among the proposals output by the
translation engines starting at a given word of the input sentence.

The last line of the chart output gives the one best translation
derived by the language modeller from the chart.

The example in appendix 1 conveys the spirit of Panlite in finding a
translation for a complete sentence. Appendix 2 gives the chart
output for one of the UNICEF test queries, which generally consist of
one and two-word fragments.

Appendix 1
Appendix 2

Experiments

The UNICEF test queries were run through Panlite. A special version
of Panlite (courtesy of Ralf Brown) was used which excluded the UNICEF
test corpus from its example base. (Otherwise the author would be
guilty of testing on training data.) Each query was presented to
Panlite as a sentence. The chart of hypotheses in Spanish was
obtained for each one. New sets of queries were formed by extracting
the Spanish words from the charts. A simple sparsification procedure
was used. For each word of the English query, a certain percentage of
the Spanish chart entries beginning at that word was included in the
Spanish query. Ten sets of Spanish queries were derived, keeping
respectively 10%, 20%, and so on up to 100% of the chart entries.

To prioritize chart entries for inclusion in the Spanish query, the
author used a rough-and-ready sorting process. (This scheme was
recommended by Ralf Brown.) Chart entries produced by the EBMT engine
received highest priority, followed by GLOSS entries. DICT entries
received lowest priority. This of course is because EBMT,
human-compiled glossaries, and statistical dictionaries, in that
order, reflect decreasing reliability of translation on average.
Among chart entries from the same engine, longer entries (covering
more input words) got priority over shorter entries. Among entries of
the same length, those with higher confidence scores got higher
priority. Entries from the same translation engine with the same
length and confidence scores were arbitrarily prioritized by ASCII
collating sequence.

The Spanish corpus and all the Spanish queries were stopped and
stemmed. The Spanish stoplist developed at CMU was used (I).
The author employed his own crude stemming procedure,
developed for another project. A list of inflectional endings was
culled from a Spanish grammar textbook. For any given word the
longest sequence of terminating letters matching an ending was
truncated away.

In addition to the sparsification approach just described, a couple of
simple pseudo-relevance feedback experiments were done. These were
inspired by work in spoken-document retrieval presented by AT&T
researchers at the most recent Text Retrieval Conference (X). The
TREC SDR track involved retrieval done against transcripts of
broadcast news extracted by a speech recognizer. The AT&T researchers
among other things did document expansion by doing retrieval from a
corpus of text-based news using the spoken document as the query. To
be included in the expanded document, a new term thus obtained also
had to be found among the alternative hypotheses produced by the
speech recognizer when transcribing the document.

For the current study two PRF experiments were run similar to the AT&T
approach. A preliminary retrieval run used the standard Panlite
"one-best" translation of the queries. From the highest-ranked N
documents (5 in one case, 10 in the other), the non-stopped Spanish
terms that were also found in Panlite's chart for the given query were
included in the expanded query. (I am indebted to Alex Hauptmann for
pointing out the possibility of a PRF experiment analogous to the AT&T
work.)

The PRF experiments gave somewhat better performance than the
sparsification approach, but neither approach was as good as the
baseline experiments.

System Description

The UNICEF testbed was used, as stated above. This of course means
that the SMART retrieval engine was used. ntc.ntc term weighting was
used (TF*IDF with cosine normalization), since it was found to be optimal
in the CMU research (I). Stopping and stemming in Spanish were not
done by SMART. The queries and Spanish document collection were
instead stopped and stemmed ahead of time.

Links to the Spanish document collection and query set (all stopped and
stemmed):

Spanish documents
Queries pruned during PRF or sparsified
"One-best" translation of queries by Panlite
Dictionary-based translation of queries

Scripts

Results

11-point interpolated average precision by experiment

% of chart kept average precision

10                                 0.2010
100                               0.3374
20                                 0.2300
30                                 0.2518
40                                 0.2773
50                                 0.3192
60                                 0.3097
70                                 0.3081
80                                 0.3248
90                                 0.3403

Panlite 1-best translation    0.4109
Panlite dictionary-based translation    0.3967
prf - top 5 documents intersected with chart   0.3512
prf - top 10 documents intersected with chart   0.3571

For a truer test of the potential of this approach, the confidence
assigned by Panlite's language modeller to alternative hypotheses
ought to be used in sparsification, rather than the sort procedure
described above. This might require an enhancement to Panlite to
output alternative coverings of the chart ranked by corresponding
confidence score, or duplication of the language modeller's logic in a
separate program with this end in view. Also, as mentioned before, a
test bed more friendly to query expansion might be sought for.

References

I. Y. Yang, J.G. Carbonell, R.E. Frederking, and R. Brown, Translingual
Information Retrieval: Learning from Bilingual Corpora. Artificial
Intelligence Journal Special Issue: Best of IJCAI-97, 1998

II. Ralf D. Brown, "Example-Based Machine Translation in the Pangloss
System". In Proceedings of the 16th International Conference on
Computational Linguistics (COLING-96), p. 169-174. Copenhagen,
Denmark, August 5-9, 1996.

III. Ralf D. Brown, "Automated Dictionary Extraction for ``Knowledge-Free''
Example-Based Translation". In Proceedings of the Seventh
International Conference on Theoretical and Methodological Issues in
Machine Translation, p. 111-118. Santa Fe, July 23-25, 1997.

IV. Robert Frederking and Ralf D. Brown, "The Pangloss-Lite Machine
Translation System". In Expanding MT Horizons: Proceedings of the
Second Conference of the Association for Machine Translation in the
Americas, Montreal, Canada. pp 268-272.

V. Ralf Brown and Robert Frederking, "Applying Statistical English
Language Modelling to Symbolic Machine Translation". In Proceedings of
the Sixth International Conference on Theoretical and Methodological
Issues in Machine Translation (TMI'95), p. 221-239. Leuven, Belgium,
July 5-7, 1995.

VI. Ralf D. Brown. "Automatically-Extracted Thesauri for Cross-Language
IR: When Better is Worse", In Proceedings of the First Workshop on
Computational Terminology (COMPUTERM'98), Montreal, 15 August 1998,
pp. 15-21.

VII. R.D. Brown, "Corpus-Based Query Translation for Translingual
Information Retrieval". Position paper for SIGIR-97 workshop on
Cross-Lingual Information Retrieval (Philadelphia, 31 July 1997).

VIII. Mark W. Davis and William C. Ogden, "Free Resources and Advanced
Alignment for Cross-Language Text Retrieval". The Sixth Text REtrieval
Conference (TREC-6), NIST Special Publication 500-240,
U.S. Govt. Printing Office, Washington, D.C., August 1998,
pp. 385-394.

IX. Text Alignment in the Real World: Improving Alignments of Noisy
Translations Using Common Lexical Features, String Matching Strategies
and N-Gram Comparisons. Mark Davis, Ted Dunning, Bill Ogden,
Proceedings of the 7th Conference of the European Chapter of the
Association of Computational Linguistics, Dublin, ireland, March 1995.

X. Amit Singhal, John Choi, Donald Hindle, David D. Lewis, Fernando
Pereira. AT&T at TREC-7. The Seventh Text Retrieval Conference, 1998

XI. J. Scott McCarley. Multilingual Information Retrieval at IBM. The
Seventh Text Retrieval Conference, 1998