|
My general area of interest is the application of
statistical machine learning techniques to real-world problems. While a Ph.D. student
at CMU, my research focused on advancing the state of the art in Question Answering
(QA), the task of retrieving accurate answers to natural language questions
(e.g. "Who invented the computer?") from information sources.
My early work led to a flexible and extensible QA
architecture that supports the integration of multiple search and answer
generation strategies, and that has served as a test bed for the development
of new QA algorithms. I am the primary author of the Ephyra QA system, which
has been evaluated in the Text
REtrieval Conference (TREC), an annual workshop organized by NIST that
has been the main evaluation forum for English QA research. Ephyra has been
released as open source software to the QA community and is now used by
researchers all over the world. The open source release, OpenEphyra, lowers
the barrier to entry for QA research and facilitates evaluations and
comparisons of different algorithms by providing a common platform for
experiments. The current system combines a statistical pattern learning and
matching approach with answer-type based extraction techniques and a
semantic extractor that is based on semantic role labeling. Please take a
look at the Ephyra
website for more information about this project, or visit the
SourceForge
project site to download the latest release.
The focus of my Ph.D. thesis research was a statistical method for
automatically expanding document collections with related information from
large, unstructured sources. The approach improves the coverage of relevant knowledge
and adds paraphrases of information that is already present in the
documents. A QA system that uses the expanded text collections as knowledge
sources benefits from more relevant search results and additional
supporting evidence for identifying correct answers. The source expansion
approach provides a principled way of building large, local stores of
relevant information. Source expansion also has applications in other
natural language processing tasks beyond QA, such as machine reading, where
extended source material can facilitate automatic knowledge extraction.
For most of my time as a Ph.D. student at CMU, I was working with the
DeepQA
group at IBM Research
on Watson,
an open-domain question answering system that won against the best human contestants in the
Jeopardy! TV show.
My contributions to Watson are discussed in this
newsmaker interview with Science,
and here are some additional news articles that describe CMU's role in the Watson project:
- It's man vs. machine in 'Jeopardy!' showdown, Pittsburgh Tribune-Review, 02/09/11.
- Human champs of 'Jeopardy!' vs. Watson the IBM computer: a close match, Pittsburgh Post-Gazette, 02/13/11.
- CMU-IBM Super Computer Beats Humans On 'Jeopardy', CBS Pittsburgh, 02/16/11.
- Man versus machine: Chalk one up for the latter in 'Jeopardy!', Pittsburgh Post-Gazette, 02/17/11.
- CMU and IBM Collaborate on Open Computing System for Advancing Research on Question Answering, PR Newswire, 02/11/11.
|
|