next up previous
Next: Datasets Up: SMOTE: Synthetic Minority Over-sampling Previous: Under-sampling and SMOTE Combination

Experiments

We used three different machine learning algorithms for our experiments. Figure 6 provides an overview of our experiments.
  
Figure 6: Experiments Overview
\begin{figure}
\centerline{
\psfig {figure=experiments.eps,width=4in}
}\end{figure}

1.
C4.5: We compared various combinations of SMOTE and under-sampling with plain under-sampling using C4.5 release 8 [21] as the base classifier.
2.
Ripper: We compared various combinations of SMOTE and under-sampling with plain under-sampling using Ripper [22] as the base classifier. We also varied Ripper's loss ratio [31,4] from 0.9 to 0.001 (as a means of varying misclassification cost) and compared the effect of this variation with the combination of SMOTE and under-sampling. By reducing the loss ratio from 0.9 to 0.001 we were able to build a set of rules for the minority class.
3.
Naive Bayes Classifier: The Naive Bayes Classifier[*] can be made cost-sensitive by varying the priors of the minority class. We varied the priors of the minority class from 1 to 50 times the majority class and compared with C4.5's SMOTE and under-sampling combination.

These different learning algorithms allowed SMOTE to be compared to some methods that can handle misclassification costs directly. %FP and %TP were averaged over 10-fold cross-validation runs for each of the data combinations. The minority class examples were over-sampled by calculating the five nearest neighbors and generating synthetic examples. The AUC was calculated using the trapezoidal rule. We extrapolated an extra point of TP = 100% and FP = 100% for each ROC curve. We also computed the ROC convex hull to identify the optimal classifiers, as the points lying on the hull are potentially optimal classifiers [1].



 
next up previous
Next: Datasets Up: SMOTE: Synthetic Minority Over-sampling Previous: Under-sampling and SMOTE Combination
Nitesh Chawla (CS)
6/2/2002