Additional comparison to one-sided selection and SHRINK

Next: Future Work Up: Experiments Previous: Additional comparison to changing

Additional comparison to one-sided selection and SHRINK

For the oil dataset, we also followed a slightly different line of experiments to obtain results comparable to [9]. To alleviate the problem of imbalanced datasets the authors have proposed (a) one-sided selection for under-sampling the majority class [17] and (b) the SHRINK system [9]. Table 5.5 contains the results from [9]. Acc+ is the accuracy on positive (minority) examples and Acc- is the accuracy on the negative (majority) examples. Figure 25 shows the trend for Acc+ and Acc- for one combination of the SMOTE strategy and varying degrees of under-sampling of the majority class. The Y-axis represents the accuracy and the X-axis represents the percentage majority class under-sampled. The graphs indicate that in the band of under-sampling between 50% and 125% the results are comparable to those achieved by SHRINK and better than SHRINK in some cases. Table 5.5 summarizes the results for the SMOTE at 500% and under-sampling combination. We also tried combinations of SMOTE at 100-400% and varying degrees of under-sampling and achieved comparable results. The SHRINK approach and our SMOTE approach are not directly comparable, though, as they see different data points. SMOTE offers no clear improvement over one-sided selection.

**Figure 25:** SMOTE (500 OU) and Under-sampling combination performance
$\begin{figure} \centerline{ \psfig {figure=myunder5.eps,width=4in} }\end{figure}$

**Table 4:** Cross-validation results (Kubat et al., 1998)
Method	Acc+	Acc-
SHRINK	82.5%	60.9%
One-sided selection	76.0%	86.6%

**Table 5:** Cross-validation results for SMOTE at 500% SMOTE on the Oil data set.
Under-sampling %	Acc+	Acc-
10%	64.7%	94.2%
15%	62.8%	91.3%
25%	64.0%	89.1%
50%	89.5%	78.9%
75%	83.7%	73.0%
100%	78.3%	68.7%
125%	84.2%	68.1%
150%	83.3%	57.8%
175%	85.0%	57.8%
200%	81.7%	56.7%
300%	89.0%	55.0%
400%	95.5%	44.2%
500%	98.0%	35.5%
600%	98.0%	40.0%
700%	96.0%	32.8%
800%	90.7%	33.3%

Next: Future Work Up: Experiments Previous: Additional comparison to changing

Nitesh Chawla (CS)
6/2/2002