Tom Pierce's IR term project

Basic Information

Contents


Abstract

One of the major drawbacks to hierarchical document clusters is that they are not easily navigable. While the underlying method may produce good results under an automated evaluation, it can only be useful in a real-world application if we can design a reasonable front-end for users. Scatter/Gather is an idea that has been proposed to provide this interface.

Scatter/Gather proposes giving the user a short list of document groups during browsing, allowing the user to interactively pick relevent subclusters from any search point. In this way, the cluster tree becomes a dynamic, rather than static, structure. By trying various methods of tree-reduction for presentation, and re-clustering user-selected subgroups I hope to demonstrate Scatter/Gather as a useful solution.

Also, you take take a look at my midterm literature review.

Proposal and Timelines

You can find my Proposal here.

When What Status
2/25 Proposal Due
Begin System Design
Complete
3/7 Finalize Design
Begin implementation of tree reduction
Complete
3/14 Working implementation of tree reduction
Apply tree reduction to old results
Begin retooling of clustering software
Complete
Tested
Complete
3/17 Preliminary Results Due
Planned Delivery: Overall system design and preliminary tree reduction results
Complete
3/21 Complete retooling of clustering software
Begin work on web interface
Complete
3/28 Complete web interface
Commence testing/tuning
Complete
Added functionality.. Ongoing
4/15 Stable System
Begin Evaluation
4/18 Begin final report
4/23 Project Due Date
Completed Scatter Gather System, and evaluation results

System Description

System design is complete. I expect that that details of implementation (ie. how the web interface is wired into various clustering tools) are not as interesting as the capabilities. The system will perform on-the-fly clustering of documents, and adaptive tree-splitting (based on user feedback) to further Scatter/Gather browsing. Also, Scatter/Gather browsing of traditional keyword search results (from SMART) will be accomodated.

"Experiments"

These will include thoughts on usefulness, usability, etc., by myself and whichever kind souls I can get to play with my final system and/or it's components. This will include comparisons in usability with "Scatter/Gather over traditional IR" (also to be implemented).

"Results"

As this project promises to be mostly implementation, the primary result will be a working system. Also, musings on it's potential usefulness, ways to improve it, and the like, will be forthcoming.

You can find my preliminary results (for the tree-splitting algorithm) here.

Demo

For the Demo, I expect to be able to demonstrate a functional web-based Scatter/Gather interface, using clustering results on the TDT corpus. The clustering hierarchies used will be both both pre-computed (for the top-level) and generated on-the-fly. Usability of "pure Scatter/Gather" will be contrasted with "Scatter/Gather over traditional IR".


last update: Apr 13th, 1998