next up previous
Next: Future Work Up: Experiments Previous: Additional comparison to changing

Additional comparison to one-sided selection and SHRINK

For the oil dataset, we also followed a slightly different line of experiments to obtain results comparable to [9]. To alleviate the problem of imbalanced datasets the authors have proposed (a) one-sided selection for under-sampling the majority class [17] and (b) the SHRINK system [9]. Table 5.5 contains the results from [9]. Acc+ is the accuracy on positive (minority) examples and Acc- is the accuracy on the negative (majority) examples. Figure 25 shows the trend for Acc+ and Acc- for one combination of the SMOTE strategy and varying degrees of under-sampling of the majority class. The Y-axis represents the accuracy and the X-axis represents the percentage majority class under-sampled. The graphs indicate that in the band of under-sampling between 50% and 125% the results are comparable to those achieved by SHRINK and better than SHRINK in some cases. Table 5.5 summarizes the results for the SMOTE at 500% and under-sampling combination. We also tried combinations of SMOTE at 100-400% and varying degrees of under-sampling and achieved comparable results. The SHRINK approach and our SMOTE approach are not directly comparable, though, as they see different data points. SMOTE offers no clear improvement over one-sided selection.


  
Figure 25: SMOTE (500 OU) and Under-sampling combination performance
\begin{figure}
\centerline{
\psfig {figure=myunder5.eps,width=4in}
}\end{figure}


  
Table 4: Cross-validation results (Kubat et al., 1998)
Method Acc+ Acc-
SHRINK 82.5% 60.9%
One-sided selection 76.0% 86.6%



  
Table 5: Cross-validation results for SMOTE at 500% SMOTE on the Oil data set.
Under-sampling % Acc+ Acc-
10% 64.7% 94.2%
15% 62.8% 91.3%
25% 64.0% 89.1%
50% 89.5% 78.9%
75% 83.7% 73.0%
100% 78.3% 68.7%
125% 84.2% 68.1%
150% 83.3% 57.8%
175% 85.0% 57.8%
200% 81.7% 56.7%
300% 89.0% 55.0%
400% 95.5% 44.2%
500% 98.0% 35.5%
600% 98.0% 40.0%
700% 96.0% 32.8%
800% 90.7% 33.3%



next up previous
Next: Future Work Up: Experiments Previous: Additional comparison to changing
Nitesh Chawla (CS)
6/2/2002