next up previous
Next: Identifying Variable Frontiers Up: No Title Previous: Discussion of Iterative

Simplifying Hierarchical Clusterings

 

A hierarchical clustering can be grown to arbitrary height. If there is structure in the data, then ideally the top layers of the clustering reflect this structure (and substructure as one descends the hierarchy). However, lower levels of the clustering may not reflect meaningful structure. This is the result of overfitting, which one finds in supervised induction as well. Inspired by certain forms of retrospective (or post-tree-construction) pruning in decision-tree induction, we use resampling to identify `frontiers' of a hierarchical clustering that are good candidates for pruning. Following initial hierarchy construction and iterative optimization, this simplification process is a final phase of search through the space of hierarchical clusterings intended to ease the burden of a data analyst.





next up previous
Next: Identifying Variable Frontiers Up: No Title Previous: Discussion of Iterative

JAIR, 4
Douglas H. Fisher
Sat Mar 30 11:37:23 CST 1996