Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high quality, but be computationally inexpensive as well. Given the combinatorial complexity of the general clustering problem, a search strategy cannot be both computationally inexpensive and give any guarantee about the quality of discovered clusterings across a diverse set of domains and objective functions. However, we can partition the search so that an initial clustering is inexpensively constructed, followed by iterative optimization procedures that continue to search in background for improved clusterings. This allows an analyst to get an early indication of the possible presence and form of structure in data, but search can continue as long as it seems worthwhile. This seems to be a primary motivation behind the design of systems such as AUTOCLASS [Cheeseman, Kelly, Self, Stutz, Taylor & Freeman, 1988] and SNOB [Wallace & Dowe, 1994].
This paper describes and evaluates three strategies for iterative optimization, one inspired by the iterative `seed' selection strategy of CLUSTER/2 [Michalski & Stepp, 1983a; 1983b], one is a common form of optimization that iteratively reclassifies single observations, and a third method appears novel in the clustering literature. This latter strategy was inspired, in part, by macro-learning strategies [Iba, 1989] -- collections of observations are reclassified en masse, which appears to mitigate problems associated with local maxima as measured by the objective function. For evaluation purposes, we couple these strategies with a simple, inexpensive procedure used by COBWEB [Fisher, 1987a; 1987b] and a system by Anderson and Matessa [Anderson & Matessa, 1991], which constructs an initial hierarchical clustering. These iterative optimization strategies, however, can be paired with other methods for constructing initial clusterings.
Once a clustering has been constructed it is judged by analysts -- often according to task-specific criteria. Several authors [Fisher, 1987a; 1987b; Cheeseman et al., 1988; Anderson & Matessa, 1991] have abstracted these criteria into a generic performance task akin to pattern completion, where the error rate over completed patterns can be used to `externally' judge the utility of a clustering. In each of these systems, the objective function has been selected with this performance task in mind. Given this performance task we adapt resampling-based pruning strategies used by supervised learning systems to the task of simplifying hierarchical clusterings, thus easying post-clustering analysis. Experimental evidence suggests that hierarchical clusterings can be greatly simplified with no increase in pattern-completion error rate.
Our experiments with clustering simplification suggest `external' criteria of simplicity and classification cost, in addition to pattern-completion error rate, for judging the relative merits of differing objective functions in clustering. We suggest several objective functions that are adaptations of selection measures used in supervised, decision-tree induction, which may do well on the dimensions of simplicity and error rate.
JAIR, 4