Evaluation

To evaluate the post-processor it was applied to all datasets containing continuous attributes from the UCI machine learning repository [Murphy and Aha, 1993] that were then held (due to previous machine learning experimentation) in the local repository at Deakin University. These datasets are believed to be broadly representative of those in the repository as a whole. After experimentation with these eleven data sets, two additional data sets, sick euthyroid and discordant results, were retrieved from the UCI repository and added to the study in order to investigate specific issues, as discussed below.

The resulting thirteen datasets are described in Table 1. The second column contains the number of attributes by which each object is described. Next is the proportion of these that are continuous. The fourth column indicates the proportion of attribute values in the data that are missing (unknown). The fifth column indicates the number of objects that the data set contains. The sixth column indicates the proportion of these that belong to the class represented by the most objects within the data set. The final column indicates the number of classes that the data set describes. Note that the glass type dataset uses the Float/Not Float/Other three class classification rather than the more commonly used six class classification.

% % most

No. of contin- % No. of common No. of

Name Attrs. uous missing objects class classes

breast cancer Wisconsin 9 100 <1 699 66 2

Cleveland heart disease 13 46 <1 303 54 2

credit rating 15 40 1 690 56 2

discordant results 29 24 6 3772 98 2

echocardiogram 6 83 3 74 68 2

glass type 9 100 0 214 40 3

hepatitis 19 32 6 155 79 2

Hungarian heart disease 13 46 20 295 64 2

hypothyroid 29 24 6 3772 92 4

iris 4 100 0 150 33 3

new thyroid 5 100 0 215 70 3

Pima indians diabetes 8 100 0 768 65 2

sick euthyroid 29 24 6 3772 94 2

Table 1: UCI data sets used for experimentation

Each data set was divided into training and evaluation sets 100 times. Each training set consisted of 80% of the data, randomly selected. Each evaluation set consisted of the remaining 20% of the data. Both C4.5 and C4.5X were applied to each of the resulting 1300 (13 data sets by 100 trials) training and evaluation set pairs.

**Table 1:** UCI data sets used for experimentation
		%			% most
	No. of	contin-	%	No. of	common	No. of
Name	Attrs.	uous	missing	objects	class	classes
breast cancer Wisconsin	9	100	<1	699	66	2
Cleveland heart disease	13	46	<1	303	54	2
credit rating	15	40	1	690	56	2
discordant results	29	24	6	3772	98	2
echocardiogram	6	83	3	74	68	2
glass type	9	100	0	214	40	3
hepatitis	19	32	6	155	79	2
Hungarian heart disease	13	46	20	295	64	2
hypothyroid	29	24	6	3772	92	4
iris	4	100	0	150	33	3
new thyroid	5	100	0	215	70	3
Pima indians diabetes	8	100	0	768	65	2
sick euthyroid	29	24	6	3772	94	2

Table 2 summarizes the percentage predictive accuracy obtained for the unpruned decision trees generated by both C4.5 and C4.5X. It presents the mean (

) and standard deviation (s) over each set of 100 trials with respect to each data set for both C4.5 and C4.5X along with the results of a two-tailed matched pairs t-test comparing these means. For twelve of the thirteen data sets C4.5X obtained a higher mean accuracy than C4.5. For the remaining data set, hypothyroid, C4.5 obtained higher mean predictive accuracy than C4.5CS (albeit by a small margin--measured to two decimal places the respective mean accuracies were 99.51 and 99.46, respectively). For nine of the data sets the advantage toward C4.5X is statistically significant at the 0.05 level (p<=0.05), although the advantage with respect to the discordant results data is too small to be apparent when measured to one decimal place (measured to two decimal places the values are 98.58 and 98.62 respectively). The advantage toward C4.5 for the hypothyroid data is also statistically significant at the 0.05 level. The differences in mean predictive accuracy for the Hungarian heart disease, new thyroid and sick euthyroid data sets are not significant at the 0.05 level.

C4.5 C4.5X

Name mean s mean s t p

breast cancer Wisconsin 94.1 1.8 94.4 1.7 -3.2 0.002

Cleveland heart disease 72.8 5.0 74.4 4.8 -6.1 0.000

credit rating 82.2 3.4 83.0 3.3 -7.6 0.000

discordant results 98.6 0.5 98.6 0.5 -5.4 0.000

echocardiogram 72.0 9.8 73.5 10.2 -2.8 0.007

glass type 74.0 7.0 75.3 7.2 -4.2 0.000

hepatitis 79.6 7.1 80.8 6.9 -3.3 0.001

Hungarian heart disease 77.0 5.3 77.4 5.2 -1.8 0.082

hypothyroid 99.5 0.2 99.5 0.2 4.4 0.000

iris 95.4 3.4 95.7 3.5 -2.2 0.028

new thyroid 89.9 4.2 90.1 4.3 -1.0 0.302

Pima indians diabetes 70.2 3.5 71.3 3.6 -8.1 0.000

sick euthyroid 98.7 0.5 98.7 0.5 -0.0 0.963

Table 2: Percentage predictive accuracy for unpruned decision trees.

**Table 2:** Percentage predictive accuracy for unpruned decision trees.
	C4.5	C4.5X
Name	mean	s	mean	s	t	p
breast cancer Wisconsin	94.1	1.8	94.4	1.7	-3.2	0.002
Cleveland heart disease	72.8	5.0	74.4	4.8	-6.1	0.000
credit rating	82.2	3.4	83.0	3.3	-7.6	0.000
discordant results	98.6	0.5	98.6	0.5	-5.4	0.000
echocardiogram	72.0	9.8	73.5	10.2	-2.8	0.007
glass type	74.0	7.0	75.3	7.2	-4.2	0.000
hepatitis	79.6	7.1	80.8	6.9	-3.3	0.001
Hungarian heart disease	77.0	5.3	77.4	5.2	-1.8	0.082
hypothyroid	99.5	0.2	99.5	0.2	4.4	0.000
iris	95.4	3.4	95.7	3.5	-2.2	0.028
new thyroid	89.9	4.2	90.1	4.3	-1.0	0.302
Pima indians diabetes	70.2	3.5	71.3	3.6	-8.1	0.000
sick euthyroid	98.7	0.5	98.7	0.5	-0.0	0.963

C4.5 C4.5X

Name mean s mean s t p

breast cancer Wisconsin 95.1 1.7 95.2 1.7 -2.0 0.051

Cleveland heart disease 74.1 5.3 74.8 5.3 -3.7 0.000

credit rating 84.1 3.2 84.6 3.2 -5.3 0.000

discordant results 98.8 0.4 98.8 0.4 -2.6 0.010

echocardiogram 74.2 9.3 75.1 9.8 -1.6 0.1180

glass type 74.4 6.9 75.4 6.9 -3.3 0.001

hepatitis 79.9 6.2 80.7 6.2 -3.0 0.003

Hungarian heart disease 79.2 4.9 79.4 4.8 -1.0 0.310

hypothyroid 99.5 0.2 99.5 0.2 5.4 0.000

iris 95.4 3.6 95.7 3.7 -1.6 0.109

new thyroid 89.6 4.2 89.8 4.2 -0.8 0.451

Pima indians diabetes 72.2 3.5 72.8 3.5 -5.9 0.000

sick euthyroid 98.7 0.4 98.7 0.4 -0.7 0.480

Table 3: Percentage accuracy for pruned decision trees.

Table 3 uses the same format as Table 2 to summarize the predictive accuracy obtained for the pruned decision trees generated by both C4.5 and C4.5X. For the same twelve data sets C4.5X obtained a higher mean predictive accuracy than C4.5. For the remaining data set, hypothyroid, C4.5 again obtained higher mean predictive accuracy, although again the magnitude of the difference is so small that it is not apparent at the level of precision displayed (measured to two decimal places the mean accuracies are 99.51 and 99.46). For six of the data sets the advantage toward C4.5X is statistically significant at the 0.05 level, although the difference is only apparent at a precision of two decimal places for the discordant results data (99.81 and 99.82, respectively). The advantage toward C4.5 for the hypothyroid data is also statistically significant at the 0.05 level. The differences for breast cancer Wisconsin, echocardiogram, Hungarian heart disease, iris, new thyroid and sick euthyroid are not statistically significant at the 0.05 level.

After completing experimentation on the initial eleven data sets, the results for the hypothyroid data stood out in stark contrast from those for the other ten. This raised the possibility that there might be distinguishing features of the hypothyroid data that accounted for this difference in performance. Table 1 indicates this data set is clearly distinguishable from the other ten initial data sets in the following six respects--

To explore these issues the discordant results and sick euthyroid data sets were retrieved from the UCI repository and added to the study. These data sets are identical to the hypothyroid data set with the exception that each has a different class attribute. All three data sets contain the same objects, described by the same attributes. The addition of the discordant results and sick euthyroid data did little to illuminate this issue however. For all three data sets the changes in accuracy are of very small magnitude. For hypothyroid there is a significant advantage to C4.5. For sick euthyroid there is no significant advantage to either system. For the discordant results data there is a significant advantage to C4.5X.

The question of whether there is a distinguishing feature of the hypothyroid data that explains the observed results remains unanswered. Further investigation of this issue lies beyond the scope of the current paper but remains an interesting direction for future research.

**Table 3:** Percentage accuracy for pruned decision trees.
	C4.5	C4.5X
Name	mean	s	mean	s	t	p
breast cancer Wisconsin	95.1	1.7	95.2	1.7	-2.0	0.051
Cleveland heart disease	74.1	5.3	74.8	5.3	-3.7	0.000
credit rating	84.1	3.2	84.6	3.2	-5.3	0.000
discordant results	98.8	0.4	98.8	0.4	-2.6	0.010
echocardiogram	74.2	9.3	75.1	9.8	-1.6	0.1180
glass type	74.4	6.9	75.4	6.9	-3.3	0.001
hepatitis	79.9	6.2	80.7	6.2	-3.0	0.003
Hungarian heart disease	79.2	4.9	79.4	4.8	-1.0	0.310
hypothyroid	99.5	0.2	99.5	0.2	5.4	0.000
iris	95.4	3.6	95.7	3.7	-1.6	0.109
new thyroid	89.6	4.2	89.8	4.2	-0.8	0.451
Pima indians diabetes	72.2	3.5	72.8	3.5	-5.9	0.000
sick euthyroid	98.7	0.4	98.7	0.4	-0.7	0.480

These results suggest that C4.5X's post-processing more frequently increases predictive accuracy than not for the type of data to be found in the UCI repository. (Of the twenty-six comparisons, there was a significant increase for fifteen and there was a significant decrease for only two. A sign test reveals that this rate of success is significant at the 0.05 level, p=0.001.)

Tables 4 and 5 summarize the number of nodes in the decision trees developed. Table 4 addresses unpruned decision trees and Table 5 addresses pruned decision trees. Each post-processing modification replaces a single leaf with a split and two leaves. At most one such modification can be performed per leaf in the original tree. For all data sets the post-processed decision trees are significantly more complex than the original decision trees. In most cases post-processing has increased the mean number of nodes in the decision trees by approximately 50%. This demonstrates that the post-processing is causing substantial change.

C4.5 C4.5X

Name mean s mean s t p

breast cancer Wisconsin 38.1 6.0 64.0 10.3 -51.5 0.000

Cleveland heart disease 66.7 7.1 100.2 11.3 -61.9 0.000

credit rating 117.6 18.1 177.9 28.4 -44.2 0.000

discordant results 64.0 10.6 85.2 16.2 -33.3 0.000

echocardiogram 15.4 4.1 22.1 6.3 -26.1 0.000

glass type 43.0 5.2 69.7 8.4 -57.2 0.000

hepatitis 24.5 4.2 34.8 6.0 -49.1 0.000

Hungarian heart disease 62.1 7.5 94.8 13.0 -50.1 0.000

hypothyroid 29.4 4.4 47.5 7.1 -57.8 0.000

iris 9.0 1.9 16.0 4.0 -31.5 0.000

new thyroid 14.7 2.4 23.4 3.8 -41.5 0.000

Pima indians diabetes 164.8 10.8 238.8 16.3 -108.9 0.000

sick euthyroid 71.7 6.6 111.4 12.1 -65.8 0.000

Table 4: Number of nodes for unpruned decision trees.

**Table 4:** Number of nodes for unpruned decision trees.
	C4.5	C4.5X
Name	mean	s	mean	s	t	p
breast cancer Wisconsin	38.1	6.0	64.0	10.3	-51.5	0.000
Cleveland heart disease	66.7	7.1	100.2	11.3	-61.9	0.000
credit rating	117.6	18.1	177.9	28.4	-44.2	0.000
discordant results	64.0	10.6	85.2	16.2	-33.3	0.000
echocardiogram	15.4	4.1	22.1	6.3	-26.1	0.000
glass type	43.0	5.2	69.7	8.4	-57.2	0.000
hepatitis	24.5	4.2	34.8	6.0	-49.1	0.000
Hungarian heart disease	62.1	7.5	94.8	13.0	-50.1	0.000
hypothyroid	29.4	4.4	47.5	7.1	-57.8	0.000
iris	9.0	1.9	16.0	4.0	-31.5	0.000
new thyroid	14.7	2.4	23.4	3.8	-41.5	0.000
Pima indians diabetes	164.8	10.8	238.8	16.3	-108.9	0.000
sick euthyroid	71.7	6.6	111.4	12.1	-65.8	0.000

C4.5 C4.5X

Name mean s mean s t p

breast cancer Wisconsin 19.2 5.0 33.1 8.6 -34.9 0.000

Cleveland heart disease 44.6 8.3 68.3 12.8 -43.6 0.000

credit rating 51.2 14.8 78.4 24.2 -25.8 0.000

discordant results 24.9 5.6 32.5 8.8 -21.1 0.000

echocardiogram 10.4 3.0 14.8 4.8 -21.0 0.000

glass type 36.6 5.5 61.0 9.5 -48.5 0.000

hepatitis 13.7 4.8 19.8 6.6 -30.7 0.000

Hungarian heart disease 26.8 11.4 41.2 17.3 -22.1 0.000

hypothyroid 23.6 2.9 37.1 5.6 -46.7 0.000

iris 8.2 1.9 14.8 3.9 -30.3 0.000

new thyroid 14.1 2.7 22.5 4.3 -36.9 0.000

Pima indians diabetes 112.0 16.4 163.9 24.0 -62.5 0.000

sick euthyroid 46.5 5.8 72.6 8.7 -76.7 0.000

Table 5: Number of nodes for pruned decision trees.