Home | Research | Publications | CV

Research

 

My general area of interest is the application of statistical machine learning techniques to real-world problems. While a Ph.D. student at CMU, my research focused on advancing the state of the art in Question Answering (QA), the task of retrieving accurate answers to natural language questions (e.g. "Who invented the computer?") from information sources.

My early work led to a flexible and extensible QA architecture that supports the integration of multiple search and answer generation strategies, and that has served as a test bed for the development of new QA algorithms. I am the primary author of the Ephyra QA system, which has been evaluated in the Text REtrieval Conference (TREC), an annual workshop organized by NIST that has been the main evaluation forum for English QA research. Ephyra has been released as open source software to the QA community and is now used by researchers all over the world. The open source release, OpenEphyra, lowers the barrier to entry for QA research and facilitates evaluations and comparisons of different algorithms by providing a common platform for experiments. The current system combines a statistical pattern learning and matching approach with answer-type based extraction techniques and a semantic extractor that is based on semantic role labeling. Please take a look at the Ephyra website for more information about this project, or visit the SourceForge project site to download the latest release.

The focus of my Ph.D. thesis research was a statistical method for automatically expanding document collections with related information from large, unstructured sources. The approach improves the coverage of relevant knowledge and adds paraphrases of information that is already present in the documents. A QA system that uses the expanded text collections as knowledge sources benefits from more relevant search results and additional supporting evidence for identifying correct answers. The source expansion approach provides a principled way of building large, local stores of relevant information. Source expansion also has applications in other natural language processing tasks beyond QA, such as machine reading, where extended source material can facilitate automatic knowledge extraction.

For most of my time as a Ph.D. student at CMU, I was working with the DeepQA group at IBM Research on Watson, an open-domain question answering system that won against the best human contestants in the Jeopardy! TV show. My contributions to Watson are discussed in this newsmaker interview with Science, and here are some additional news articles that describe CMU's role in the Watson project:

 
Home | Research | Publications | CV