One of the major drawbacks to hierarchical document clusters is that they are not easily navigable. While the underlying method may produce good results under an automated evaluation, it can only be useful in a real-world application if we can design a reasonable front-end for users. Scatter/Gather is an idea that has been proposed to provide this interface.
Scatter/Gather proposes giving the user a short list of document groups during browsing, allowing the user to interactively pick relevent subclusters from any search point. In this way, the cluster tree becomes a dynamic, rather than static, structure. By trying various methods of tree-reduction for presentation, and re-clustering user-selected subgroups I hope to demonstrate Scatter/Gather as a useful solution.
Also, you take take a look at my midterm literature review.
When | What | Status | |||
---|---|---|---|---|---|
2/25 | Proposal Due Begin System Design |
Complete | |||
3/7 | Finalize Design Begin implementation of tree reduction |
Complete | |||
3/14 | Working implementation of tree reduction Apply tree reduction to old results Begin retooling of clustering software | Complete Tested Complete |
|||
3/17 | Preliminary Results Due Planned Delivery: Overall system design and preliminary tree reduction results |
Complete | |||
3/21 | Complete retooling of clustering software Begin work on web interface |
Complete | |||
3/28 | Complete web interface Commence testing/tuning |
Complete Added functionality.. Ongoing |
|||
4/15 | Stable System Begin Evaluation |
||||
4/18 | Begin final report | ||||
4/23 | Project Due Date Completed Scatter Gather System, and evaluation results |
System design is complete. I expect that that details of implementation (ie. how the web interface is wired into various clustering tools) are not as interesting as the capabilities. The system will perform on-the-fly clustering of documents, and adaptive tree-splitting (based on user feedback) to further Scatter/Gather browsing. Also, Scatter/Gather browsing of traditional keyword search results (from SMART) will be accomodated.
These will include thoughts on usefulness, usability, etc., by myself and whichever kind souls I can get to play with my final system and/or it's components. This will include comparisons in usability with "Scatter/Gather over traditional IR" (also to be implemented).
As this project promises to be mostly implementation, the primary result will be a working system. Also, musings on it's potential usefulness, ways to improve it, and the like, will be forthcoming.
You can find my preliminary results (for the tree-splitting algorithm) here.For the Demo, I expect to be able to demonstrate a functional web-based Scatter/Gather interface, using clustering results on the TDT corpus. The clustering hierarchies used will be both both pre-computed (for the top-level) and generated on-the-fly. Usability of "pure Scatter/Gather" will be contrasted with "Scatter/Gather over traditional IR".