Update: As of May 13, 2013, I have graduated from the PhD program in LTI, and I am now with the Computer Science Department at the San Francisco State University. Visit my new webpage.
Research Interests: Information Retrieval, Natural Language Processing, Machine Learning
PhD Research: As part of my PhD thesis research I investigated the problem of efficient and effective search of large-scale document collections.
Search engine indexes for large document collections are often divided into multiple disjoint partitions ('shards') that are distributed across multiple computers and searched in parallel to provide rapid interactive search.
Typically, all index shards are searched for each query (exhaustive search).
My research proposes an alternative, 'selective search', that partitions collections into topical shards and searches only a few relevant shards for each query.
As per the 'cluster hypothesis' ('similar documents tend to be relevant to the same request') topical organization of the document collection has the effect of concentrating the relevant documents for any given query into a few shards.
Such an organization of documents enables selective search to ignore large portions of the collections without degrading the search accuracy.
In summary, selective search is an efficient alternative to the current de-facto search paradigm of exhaustive search.
Brief Background: I pursued my Masters in Computer Science at University of Minnesota, Duluth under Dr. Ted Pedersen's guidance. I defended my Masters thesis in July 2006 which is titled as: Unsupervised Context Discrimination and Automatic Cluster Stopping. Some more information about my research in Context Discrimination and Cluster Stopping can be found here. Here is a link to my UMD web-page.
During my time at CMU I have also worked on the REAP project with Jamie Callan and Maxine Eskenazi. The focus of my work was on developing sense-based document selection methods. More specifically, given a word of interest, the goal was to select (or filter out) a document based on the sense of the word that is being used in the document.
Here is a more comprehensive list of my publications.