Panlite can be configured at runtime to output the contents of the
chart instead of a finished translation. The idea for this project
is
to use multiple alternatives from the chart to produce a target-
language version of the query expanded beyond what Panlite
would have output as a finshed translation. The percentage of
the
chart contents to use becomes a tunable parameter. Each item
in the
chart comes with a confidence score or a probablility attached, so
chart contents can be prioritized for possible inclusion in the
target-language query.
Using Panlite for query translation and expansion is in effect one
more example of exploiting parallel corpora in TLIR. This is
an
active area of inquiry at a number of research centers. Panlite
may
be unique, however, in its approach to aligning a parallel corpus.
In
other reports surveyed by the author (VIII, IX, XI), the conceptual
point of departure was to compute the most-probable alignment(s) of
parallel documents using fully automated methods, based on a Bayesian
approach. The designers of Panlite, by contrast, strive for a
100%
accurate sentence-by-sentence alignment of the parallel corpus, using
manual methods where necessary. Probabilistic and heuristic methods
come into play at runtime (during actual production of a translation)
to produce alternative subsentential alignments from which chart
entries are produced (II, III, IV, V, VI, VII).
This paper reports an experiment employing the Panlite chart that was
run on the UNICEF test bed developed at CMU (I). The advantage
of
using this test bed is that it employs a carefully aligned parallel
bilingual corpus with a set of well-crafted queries and a full set
of
human relevance judgements. A disadvantage for this experiment
is
that the queries consist of isolated words and two-word phrases.
The
full power of EBMT in finding multiple translation alternatives for
phrases and sentences must go unexploited. Another disadvantage
is
that query expansion techniques are known frequently to give
disappointing results with this test bed, probably because the queries
are on average fairly long to begin with, and already include sets
of
terms that thoroughly cover the given topic. It was hoped that
the
test corpus used for the CLIR track of the Text Retrieval Conference
could be obtained for this study. Unfortunately, the TREC data
is
available only to CLIR-track participants because of copyright
restrictions.
0 2 :EBMT 1 "Presidente español"
The numbers 0 and 2 at the start of the entry specify the beginning
and end words in the input sentence for which this entry gives a
proposed translation. Note that the words in the input sentence
are
numbered 0 to N-1 where N is the number of words in the sentence.
":EBMT" indicates that the EBMT engine is proposing this translation,
i.e. example-based machine translation found a probable correspondence
in its parallel corpus between "the Spanish president" on the English
side and "Presidente español" on the Spanish side. The
number 1 is
a confidence score computed by the EBMT engine for the proposed
translation. Confidence scores range from 0 to 1. (So the
EBMT
engine is pretty certain it's found the correct translation of this
particular chunk.) "Presidente español" is of course the
proposed
translation.
Possible entries for translation engine are as follows:
:EBMT Example-based machine translation
:DICT Dictionary engine using a mixture of machine-readable
handcrafted glossaries and statistical dictionaries.
:GLOSS Engine using handcrafted glossaries
:ELM The target-language statistical modeller. This indicates
the
probable best choice found among the proposals output by the
translation engines starting at a given word of the input sentence.
The last line of the chart output gives the one best translation
derived by the language modeller from the chart.
The example in appendix 1 conveys the spirit of Panlite in finding a
translation for a complete sentence. Appendix 2 gives the chart
output for one of the UNICEF test queries, which generally consist
of
one and two-word fragments.
To prioritize chart entries for inclusion in the Spanish query, the
author used a rough-and-ready sorting process. (This scheme was
recommended by Ralf Brown.) Chart entries produced by the EBMT
engine
received highest priority, followed by GLOSS entries. DICT entries
received lowest priority. This of course is because EBMT,
human-compiled glossaries, and statistical dictionaries, in that
order, reflect decreasing reliability of translation on average.
Among chart entries from the same engine, longer entries (covering
more input words) got priority over shorter entries. Among entries
of
the same length, those with higher confidence scores got higher
priority. Entries from the same translation engine with the same
length and confidence scores were arbitrarily prioritized by ASCII
collating sequence.
The Spanish corpus and all the Spanish queries were stopped and
stemmed. The Spanish stoplist developed at CMU was used (I).
The author employed his own crude stemming procedure,
developed for another project. A list of inflectional endings
was
culled from a Spanish grammar textbook. For any given word the
longest sequence of terminating letters matching an ending was
truncated away.
In addition to the sparsification approach just described, a couple
of
simple pseudo-relevance feedback experiments were done. These
were
inspired by work in spoken-document retrieval presented by AT&T
researchers at the most recent Text Retrieval Conference (X).
The
TREC SDR track involved retrieval done against transcripts of
broadcast news extracted by a speech recognizer. The AT&T
researchers
among other things did document expansion by doing retrieval from a
corpus of text-based news using the spoken document as the query.
To
be included in the expanded document, a new term thus obtained also
had to be found among the alternative hypotheses produced by the
speech recognizer when transcribing the document.
For the current study two PRF experiments were run similar to the AT&T
approach. A preliminary retrieval run used the standard Panlite
"one-best" translation of the queries. From the highest-ranked
N
documents (5 in one case, 10 in the other), the non-stopped Spanish
terms that were also found in Panlite's chart for the given query were
included in the expanded query. (I am indebted to Alex Hauptmann
for
pointing out the possibility of a PRF experiment analogous to the AT&T
work.)
The PRF experiments gave somewhat better performance than the
sparsification approach, but neither approach was as good as the
baseline experiments.
Links to the Spanish document collection and query set (all stopped
and
stemmed):
Spanish
documents
Queries
pruned during PRF or sparsified
"One-best"
translation of queries by Panlite
Dictionary-based
translation of queries
% of chart kept average precision
10
0.2010
100
0.3374
20
0.2300
30
0.2518
40
0.2773
50
0.3192
60
0.3097
70
0.3081
80
0.3248
90
0.3403
Panlite 1-best translation 0.4109
Panlite dictionary-based translation 0.3967
prf - top 5 documents intersected with chart 0.3512
prf - top 10 documents intersected with chart 0.3571
For a truer test of the potential of this approach, the confidence
assigned by Panlite's language modeller to alternative hypotheses
ought to be used in sparsification, rather than the sort procedure
described above. This might require an enhancement to Panlite
to
output alternative coverings of the chart ranked by corresponding
confidence score, or duplication of the language modeller's logic in
a
separate program with this end in view. Also, as mentioned before,
a
test bed more friendly to query expansion might be sought for.
II. Ralf D. Brown, "Example-Based Machine Translation in the Pangloss
System". In Proceedings of the 16th International Conference on
Computational Linguistics (COLING-96), p. 169-174. Copenhagen,
Denmark, August 5-9, 1996.
III. Ralf D. Brown, "Automated Dictionary Extraction for ``Knowledge-Free''
Example-Based Translation". In Proceedings of the Seventh
International Conference on Theoretical and Methodological Issues in
Machine Translation, p. 111-118. Santa Fe, July 23-25, 1997.
IV. Robert Frederking and Ralf D. Brown, "The Pangloss-Lite Machine
Translation System". In Expanding MT Horizons: Proceedings of the
Second Conference of the Association for Machine Translation in the
Americas, Montreal, Canada. pp 268-272.
V. Ralf Brown and Robert Frederking, "Applying Statistical English
Language Modelling to Symbolic Machine Translation". In Proceedings
of
the Sixth International Conference on Theoretical and Methodological
Issues in Machine Translation (TMI'95), p. 221-239. Leuven, Belgium,
July 5-7, 1995.
VI. Ralf D. Brown. "Automatically-Extracted Thesauri for Cross-Language
IR: When Better is Worse", In Proceedings of the First Workshop on
Computational Terminology (COMPUTERM'98), Montreal, 15 August 1998,
pp. 15-21.
VII. R.D. Brown, "Corpus-Based Query Translation for Translingual
Information Retrieval". Position paper for SIGIR-97 workshop on
Cross-Lingual Information Retrieval (Philadelphia, 31 July 1997).
VIII. Mark W. Davis and William C. Ogden, "Free Resources and
Advanced
Alignment for Cross-Language Text Retrieval". The Sixth Text REtrieval
Conference (TREC-6), NIST Special Publication 500-240,
U.S. Govt. Printing Office, Washington, D.C., August 1998,
pp. 385-394.
IX. Text Alignment in the Real World: Improving Alignments of
Noisy
Translations Using Common Lexical Features, String Matching Strategies
and N-Gram Comparisons. Mark Davis, Ted Dunning, Bill Ogden,
Proceedings of the 7th Conference of the European Chapter of the
Association of Computational Linguistics, Dublin, ireland, March 1995.
X. Amit Singhal, John Choi, Donald Hindle, David D. Lewis, Fernando
Pereira. AT&T at TREC-7. The Seventh Text Retrieval
Conference, 1998
XI. J. Scott McCarley. Multilingual Information Retrieval at IBM.
The
Seventh Text Retrieval Conference, 1998