Interactive and Automatic Refinement of Translation Rules for a Transfer-based MT systems OR Can the Internet help improve MT?
Achieving high translation quality remains the biggest challenge in Machine Translation (MT) systems. To address this challenge researchers have explored a variety of methods to include user feedback in the MT loop. However, most MT systems have failed to incorporate post-editing efforts beyond the addition of corrected translations to the parallel training data for Statistical and Example-Based system or to a translation memory database. My research centers on developing a largely automated approach that uses online post-editing feedback from non-experts to refine translation rules. Precise error correction information that is relevant to the system allows the Automatic Rule Refiner to trace the errors back to incorrect lexical and grammar rules responsible for the errors and to propose concrete fixes to such rules. Since this approach attacks the problem at its core, it generalizes beyond the input sentences corrected by bilingual speakers, and allows for correct translation of unseen data. The reaching power of the Internet further enhances the relevance of this work. We envision modifying the product of my research to be an online game with a purpose. This game will allow bilingual speakers to correct MT input, get rewards for making good corrections, and compare their scores and speed to those of other players. For the MT community, this game will provide a free and easy way to get feedback for MT system improvement.
I am currently in charge of developing the resources for Quechua. During the summer of 2005, I spent three months in Cusco building the resources and infrastructure to implement a Quechua-Spanish MT prototype system, as part of the V-Unit (Vision Unit). The V-Unit is part of the TechBridgeWorld Initiative at CMU. Currently the Quechua-Spanish MT prototype system has 25 translation rules and 683 lexical entries (40 manually and 643 semi-automatically created).
From May 2002 until January 2005, I was in charge of Mapudungun. In April and November of 2002, I travelled to Temuco, Chile, and worked with the local team at the Instituto de Estudios Indígenas (Universidad de la Frontera) to develop resources and NLP tools for Mapudung-Spanish (transcribed spoken corpus, dictionary, morphological analyzer). I also provided technical assistance setting up and using software developed at CMU.
For my thesis, I am working on automatically refining translation rules, by using minimal corrections from non-expert bilingual users. More
Quechua
Quechua or runasimi, which means language of the people, is the indigenous language of a large portion of the South American highlands, and there are about 10 million speakers today. However, we know of no electronic resources in Quechua, let alone any information and communication technologies in Quechua.
The term Quechua covers a variety of distinct languages and dialects. The Ethnologue Data Base showes 46 dialects of Quechuan, 32 spoken in Peru. Quechua is also spoken in Bolivia, Ecuador, South of Colombia and North of Argentina. The most important dialect is that spoken in Cuzco, the seat of the former Inca Empire. Quechua spread by means of conquests realized before and during that empire. It displaced several earlier languages, only to find itself increasingly displaced today by Spanish. In spite of this intense competition, Quechua in its various forms remains a vital language in Peru and elsewhere.
A piece of good news for us, computational linguists, is that the endless battle to decide which one of the two competing orthographies should be the official one, the pentavocal and the trivocal, has finally ended in favor of the pentavocalic orthographic system, which has a closest correspondence with the Quechuan letter-to-sound rules.
In 2005 Spring Semester, I audited Quechua II at the University of Pittsburgh, taught by Salome Gutierrez. And during my time in Cusco (June-August 2005), I studied both the Quechua language and culture at Centro Bartolome de las Casas, where I enjoyed daily classes taught by native speaker and educator Gina Maldonado.
Mapudungun
Mapudungun is an American Indigenous language spoken in Chile and Argentina by about half million Mapuche people.