Alon's Home Page
I am currently a Consulting (adjunct) Professor at the Language Technologies
Institute (LTI) at Carnegie Mellon University (CMU), where I have been a
member of the faculty since 1996. For almost 20 years (1996-2015) I was a
Research Professor at the LTI.
Concurrently, I am currently the VP of AI Research at
Phrase, a leading enterprise translation
automation technology platform, where I lead and manage our AI research team
in Pittsburgh, Prague and Edinburgh, and provide strategic leadership on our
AI R&D and product development company-wide. Prior to joining Phrase in August
2023, I was the VP of Language Technologies at
Unbabel, with leadership
responsibilities for AI R&D company-wide, and with a focus on the development
of Translation Quality Technologies.
My primary research interests and activities focus on Machine Translation
(MT) and on MT Evaluation. I directed and led the ten year development
(2004-2014) of the
METEOR automated MT
evaluation metric. More recently, while at Unbabel, I directed the
development of a new neural MT evaluation metric named
COMET, and a
complementary tool for MT quality analysis named
MT-Telescope. My
other main research interests focus on MT adaptation approaches with and
without human feedback, applied to both high-resource language pairs as well
as low-resource and minority languages. Additional interests include
translation Quality Estimation, and methods for multi-engine MT system
combination.
In 2009, I co-founded a technology start-up company by the name of
Safaba Translation Solutions, and I
served the company as Chairman of the Board, President and CTO. Safaba
developed automated translation solutions for large global enterprises that
allowed them to translate large volumes of content in all the languages of
their markets. Safaba's approach focused on generating client-adapted
high-quality translations using machine-learning-based technology. In June
2015, Safaba was acquired by Amazon.
From June 2015 to March 2019, I was a senior manager at Amazon, where I led
and managed the Amazon Machine Translation R&D group in Pittsburgh.
I served as President of the International Association for Machine
Translation (IAMT) (2013-2015). I previously served two terms as president
of the Association for Machine Translation in the Americas (AMTA)
(2008-2012), and was General Chair of the AMTA 2010 and 2012 conferences,
and of the 2015 MT Summit conference. I am also a member of the Association
for Computational Linguistics (ACL), where I was president of SIGParse -
ACL's special interest group on parsing (2008-2013).
In August 2021, at the 18th biennial Machine Translation Summit conference,
I was honored to be awarded with the 2021 Makoto Nagao IAMT Award of Honour
for my contributions to the field of Machine Translation.
Research
My main areas of research are Machine Translation (MT) and Natural Language
Processing (NLP), and in particular, NLP technologies applied to language
translation and multi-lingual processing problems. My current most active
areas of research focus are Machine Translation adaptation approaches with
human feedback and syntax-driven statistical and hybrid approaches to Machine
Translation, applied to both high-resource language pairs as well as
low-resource and minority languages. One main focus of work has been the
development of novel syntax-based methods for acquisition of the resources
that are necessary for MT. I have also actively worked on frameworks for
Multi-Engine Machine Translation (MEMT) and on developing automatic metrics
for MT evaluation (particularly, METEOR).
I have also worked extensively in the past on developing parsing approaches
for accurate annotation of Grammatical Relations (GRs)in spoken language data,
on robust parsing algorithms for analysis of spoken language, and on the
design and development of Speech-to-Speech Machine Translation systems.
Select Research Projects:
The AVENUE and LETRAS Projects:
I was co-PI of the AVENUE and LETRAS projects (funded by NSF). AVENUE is
concerned with the design and rapid development of new Machine Translation
methods for languages for which only scarce resources are available. Our goal
in AVENUE is to apply these new MT methods to minority languages, with a
specific focus on native languages of North and Latin America. We worked on
developing MT systems between Spanish and Mapudungun, a native language spoken
in southern Chile, and have started working on Quechua, a native language
spoken mainly in Peru, Ecuador and Bolivia. The LETRAS project is a follow-on
project to AVENUE, where we are focusing on further development of the
underlying general MT framework and expanding its application to new
languages, including Inupiaq (a native Alaskan language), and native languages
in Bolivia and Brazil. Together with
Jaime Carbonell,
Lori Levin, and a team of several
graduate students, the primary research topics I am working on include: The
design and implementation of a transfer-based MT framework specifically
suitable for learning from data and for rapid prototyping of MT systems (work
with Erik Peterson); Automatic
learning of MT transfer-rules for languages with limited amounts of data
resources (work with Kathrin
Probst); Automatic rule refinement based on feedback from users (work with
Ariadna Font-Llitjos; and
unsupervised learning of morphological inflection classes from monolingual
data (work with Christian Monson).
Select Publications:
The Hebrew-English MT Project:
As a direct follow-up to our AVENUE project work and in collaboration with
Shuly Wintner and his
Computational Linguistics Group
at the University of Haifa
in Israel, we are developing a prototype Hebrew-to-English Machine Translation
system that is based on the framework developed under AVENUE. This work is
being supported by a small grant from the
Caesaria Rothschild Institute at
the University of Haifa.
Select Publications:
The MEMT Project:
I was the lead-PI of a project on a new approach to Multi-Engine Machine
Translation (MEMT). The goal of MEMT is to synthesize the output of multiple
MT systems into a new output that is of higher accuracy than all of the
contributing systems. The new approach invloves two main stages. An explicit
word matcher is first used in order to identify the words that are common
between the MT engine outputs. A decoding algorithm then uses this
information, in conjunction with confidence estimates for the various engines
and several statsitical language model features in order to score and rank a
collection of sentence hypotheses that are synthetic combinations of words
from the various original engines. The highest scoring sentence hypothesis is
selected as the final output of our system. The project was funded by the
DARPA GALE program, where our MEMT system served as an essential component for
combining the output from multiple MT engines within the Interoperability
Demonstration system (IOD). The MEMT system has been made available for
experimentation to other research groups. Contact me by email to obtain a
copy.
Select Publications:
The METEOR Project:
METEOR is an automatic metric for MT evaluation that we have been
developing at CMU for the past couple of years. METEOR is designed to
address a number of weaknesses in the currently commonly used BLEU and NIST
metrics. The metric heavily relies on an algorithm for finding an optimal
word-to-word matching between a candidate MT translation and a human-produced
reference translation for the same input sentence. METEOR produces normalized
scores (in the range of [0,1]), and has been demonstrated to have
significantly higher-levels of correlation with human judgments of MT quality,
as compared with the more commonly used BLEU and NIST metrics. METEOR is
freely available, and can be downloaded from here
.
Select Publications:
The GRASP Project:
I am PI of the GRASP Project (funded by NSF), where I am working together with
Brian MacWhinney (co-PI) and
Kenji Sagae on developing a
framework for robust high-accuracy parsing of grammatical relations in spoken
language data. Our goal is to automatically annotate the CHILDES database
(a large database of child-parent conversations) with grammatical relations,
in order to support advanced corpus-based research of child language
acquisition.
Select Publications:
Previous Research Projects
I was a co-PI of the Nespole!
and C-STAR speech translation projects
and of the LingWear
and Babylon mobile speech translation projects.
I was the lead PI of AMTEXT project (2003-2005, funded by DoD), a small pilot
project that investigated the feasibility of a rapid development approach to
Machine Translation based on Information Extraction. The approach builds upon
the MT transfer framework developed in the AVENUE project and on
Fei Huang's work on translation of
Named Entities. The main idea is to use a small elicitation corpus of
translated and word-aligned sentences to semi-automatically learn pattern
transfer-rules that can then be used to both extract the information of
interest in the source-language and translate this information into the
target-language.
I was a co-PI of the Clarity project (1997-1999, funded by DoD) on the
automatic detection and classification of the discourse structure of spoken
language.
Other Research Interests
I have a general interest in parsing algorithms for natural and programming
languages and in theoretical problems related to parsing. My own research
has primarily focused on the area of robust analysis and understanding of
spoken language. In my PhD work, I developed GLR*, one of the first robust
parsers for spoken language analysis, and a key component in the earlier
versions of the JANUS speech translation system.
Teaching
From 1996 to 2014, I was the lead instructor of the
Algorithms for NLP (11-711) course at the LTI. Algorithms for NLP is an
introductory graduate-level course on the computational properties of natural
languages and the fundamental algorithms for processing natural languages.
The course provides an in-depth presentation of the major algorithms used in
NLP, including Lexical, Morphological, Syntactic and Semantic analysis, with
the primary focus on parsing algorithms and their analysis.
I was also a co-instructor of the
Machine Translation
(11-731) and
Advanced MT
Seminar (11-734) courses, and co-supervised the
NLP Lab (11-712)
and the
MT Lab (11-732)
courses.
My Students
My Students that have Graduated
Select Talks and Presentations
Miscellaneous Information
Contact Information
Office:
5715 Gates-Hillman Complex
+1-412-268-5655
Fax: +1-412-268-6298
Administrative Assistant:
Mary Jo Bensasi
65xx Gates-Hillman Complex
maryjob AT cs DOT cmu DOT edu
+1-412-268-7517
Mailing Address:
Dr. Alon Lavie
Language Technologies Institute
School of Computer Science
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213-3891
Email:
alavie AT cs DOT cmu DOT edu (anti-spam notation)
Home:
5124 Beeler St.
Pittsburgh, PA 15217
+1-412-621-0933