SVM Decision Boundary Based
Discriminative Subspace Induction

Jiayong Zhang   Yanxi Liu

Abstract

We study the problem of linear dimension reduction for classification, with a focus on sufficient dimension reduction, i.e., finding subspaces without loss of discrimination power. First, we formulate the concept of sufficient subspace for classification in parallel terms as for regression. Then we present a new method to estimate the smallest sufficient subspace based on an improvement of decision boundary analysis (DBA). The main idea is to combine DBA with support vector machines (SVM) to overcome the inherent difficulty of DBA in small sample size situations while keeping DBA's estimation simplicity. The compact representation of SVM boundary results in a significant gain in both speed and accuracy over previous DBA implementations. Alternatively, this technique can be viewed as a way to reduce the run-time complexity of SVM itself. Comparative experiments on one simulated and four real-world benchmark datasets highlight the superior performance of the proposed approach.

Publication

Pattern Recognition, 38(10):1746-1758, 2005

Results

Table 3.

Summary of dataset information

Dataset #Classes #Features #Samples Benchmark error (%)
WAVE-40 3 21+19 14
PIMA 2 8 768 22.3
VEHICLE 4 18 846 15.0
LETTER 26 16 20,000 6.4
MFEAT 10 649 2000 2.3

Table 4.

Experimental setup

Dataset (#Training, #Test) p C r
WAVE-40 (100每1500,5000) 3 0.01每0.6 1.0 0.9
PIMA 12-fold cross validation 2 2每60 0.2 0.04
VEHICLE 9-fold cross validation 5 0.5每60 0.2 0.05
LETTER (15,000, 5000) 5 2每100 0.2 0.005
MFEAT (500,1500) 5 2 0.6 5.0


Fig. 1. Quality of IDS estimation as a function of sample size. SVM classifier is used for evaluation.


Fig. 2. Scatter plots of training and test samples from WAVE-40 in the 2D intrinsic discriminative subspaces estimated via different methods. The number of training samples n=500.


Fig. 3. Comparison of subspaces over all output dimensions on real-world datasets using SVM classifier.


Fig. 4. Comparison of subspaces over all output dimensions on real-world datasets using 1-NN classifier.


Last update: Nov 18, 2005