Tom Pierce's IR term project

Basic Information

Project Title:Scatter/Gather Cluster Browsing
Name: Tom Pierce (email: tomp@cs.cmu.edu)
Presentation Date: Tue Apr 28th
Demo Date: TBA

Abstract
Proposal and Timelines
System Description
Experiments
Results
Demo

Abstract

One of the major drawbacks to hierarchical document clusters is that they are not easily navigable. While the underlying method may produce good results under an automated evaluation, it can only be useful in a real-world application if we can design a reasonable front-end for users. Scatter/Gather is an idea that has been proposed to provide this interface.

Scatter/Gather proposes giving the user a short list of document groups during browsing, allowing the user to interactively pick relevent subclusters from any search point. In this way, the cluster tree becomes a dynamic, rather than static, structure. By trying various methods of tree-reduction for presentation, and re-clustering user-selected subgroups I hope to demonstrate Scatter/Gather as a useful solution.

Also, you take take a look at my midterm literature review.

Proposal and Timelines

You can find my Proposal here.

When	What	Status
2/25	Proposal Due Begin System Design	Complete
3/7	Finalize Design Begin implementation of tree reduction	Complete
3/14	Working implementation of tree reduction Apply tree reduction to old results Begin retooling of clustering software	Complete Tested Complete
3/17	Preliminary Results Due Planned Delivery: Overall system design and preliminary tree reduction results	Complete
3/21	Complete retooling of clustering software Begin work on web interface	Complete
3/28	Complete web interface Commence testing/tuning	Complete Added functionality.. Ongoing
4/15	Stable System Begin Evaluation
4/18	Begin final report
4/23	Project Due Date Completed Scatter Gather System, and evaluation results

System Description

System design is complete. I expect that that details of implementation (ie. how the web interface is wired into various clustering tools) are not as interesting as the capabilities. The system will perform on-the-fly clustering of documents, and adaptive tree-splitting (based on user feedback) to further Scatter/Gather browsing. Also, Scatter/Gather browsing of traditional keyword search results (from SMART) will be accomodated.

"Experiments"

These will include thoughts on usefulness, usability, etc., by myself and whichever kind souls I can get to play with my final system and/or it's components. This will include comparisons in usability with "Scatter/Gather over traditional IR" (also to be implemented).

"Results"

As this project promises to be mostly implementation, the primary result will be a working system. Also, musings on it's potential usefulness, ways to improve it, and the like, will be forthcoming.

You can find my preliminary results (for the tree-splitting algorithm) here.

Demo

For the Demo, I expect to be able to demonstrate a functional web-based Scatter/Gather interface, using clustering results on the TDT corpus. The clustering hierarchies used will be both both pre-computed (for the top-level) and generated on-the-fly. Usability of "pure Scatter/Gather" will be contrasted with "Scatter/Gather over traditional IR".

last update: Apr 13th, 1998