Footnotes

...moves.
Considering global changes also motivated redistribution of individual observations in ITERATE. As Nevins [1995] notes in commentary on experimental comparisons between of ITERATE and COBWEB [Fisher, Xu, Zard, 1992], even global movement of single observations typically did not perform as well as local movement of sets of observations simultaneously, as implemented by COBWEB's merging and splitting operators.

...repository.
A reduced mushroom data set was obtained by randomly selecting 1000 observations from the original data set.

...orderings.
A standard deviation of indicates that the standard deviation was non-0, but not observable at the 2nd decimal place after rounding.

...seconds.
Routines were implemented in SUN Common Lisp, compiled, and run on a SUN 3/60.

...stabilization.
Similar timing results occur in other computational contexts as well. Consider the relation between insertion sort and Shell sort. Shell sort's final `pass' of a table is an insertion sort that is limited to moving table elements between consecutive table locations at a time. The large efficiency advantage of Shell Sort stems from the fact that previous passes of the table have moved elements large distances, thus by the final pass, the table is nearly sorted.

...observations.
Importantly, SNOB (and AUTOCLASS) assumes probabilistic assignment of observations to clusters.

...paper.
ITERATE uses a measure for redistribution [Fisher & Langley, 1990] that probably smoothes `cliffs', and it uses an ISODATA, non-sequential version of redistribution.

...earlier.
Classification is identical to sorting except that the observation is not added to the clustering and statistics at each node encountered during sorting are not permanently updated to reflect the new observation.

...trials.
The `standard deviations' given in Row 3 are actually the mean of the standard deviations over the frontier sizes for individual variables.

...construction.
For purposes of evaluating the merits of our validation strategy in terms of error rate, we also held out a separate test set. Having demonstrated the point, however, we would not require that a separate test set be held out when using resampling as a validation strategy.

...removed,
The observation is physically removed, and variable value statistics at clusters that lie along the path from root to the observation are decremented.

...nodes.
In fact, cost is not constant across observations, even those that are classified along exactly the same path -- the number of variables that one need test depends on the observation's values along previously examined variables.

...gain.
Jan Hajek independently pointed out the relationship between the CU measure and the Gini Index, and made suggestions on when one might select one or another of the normalizations above.

...prediction strategy [...].
Importantly, prediction with COBWEB is actually performed using a probability maximizing strategy -- the most frequent value of a variable at a cluster is always predicted. Fisher [1987b] discusses the advantage of constructing clusters with an implicit probability matching strategy, even in cases where these clusters will be exploited with a probability maximizing strategy.

...decoupled).
The MML and Bayesian approaches of SNOB and AUTOCLASS support probabilistic assignment of observations to clusters, but the importance of decoupling and cohesion remain.


JAIR, 4
Douglas H. Fisher
Sat Mar 30 11:37:23 CST 1996