Results on TREC10 data (84 topic categories)

2.0 kNN(standard)

Result tuned for macrof0.5 and Result tuned for microf0.5 .
 
 

The graph tuning feature selection number for f0.5 ( it shows 8000 is good for microf0.5 and macrof0.5.)
The graph tuning fbr score for f0.5 ( 0.5 is the best number for microf0.5 and 0.1 for macrof0.5)
The graph tuning kn for f0.5 ( 100 is a good value for microf0.5 and macrof0.5)
 

2.1 Rocchio

Result tuned for macrof0.5 and Result tuned for microf0.5
(This is the result using scut with s tuned. I also tried rcut with r=3. But rcut obviously has worse performance. I plot a graph which compares the recall,
precision,f1 and f0.5 between rcut r=3 and scut with s tuned )
 
 

The graph tuning fbr score for f0.5 ( for micro average performance, the best value is 0.5, for macro average performance, the best value is 0.1)
The graph tuning feature selection number for f0.5 (It can be seen that feature selection is useful, 7000 is a good selection for both micro avg. and macro avg. performance)
The graph tuning beta for f0.5 (-2 is the best for micro and macro avg. performance)
The graph tuning pmax for f0.5 (14000 is the best for micro and macro avg. performance)

The graph tuning fbr score for f1 ( 0.1 is good for macro and micro avg. performance)
The graph tuning feature selection number for f1 (It can be seen that feature selection is useful, 7000 is a good selection for micro and macro avg. performance)
The graph tuning beta for f1 (-2 is the best for micro and macro avg. performance)
The graph tuning pmax for f1 (14000 is the best for micro and macro avg. performance)
 

2.2 NB

Result tuned for macrof0.5 and Result tuned for microf0.5


( For macro avg. result, 2000 top features are used. fbr=0.   For micro avg. result, 6000 top features are used. fbr=0.4 )
 
 

2.3 SVM

Result tuned for macrof0.5

The graph tuning feature selection number for f0.5 ( it seems that feature selection is not very useful on SVM. We take all the features)
The graph tuning fbr score for f0.5 ( 0.6 is the best number for micro avg. performance and 0.2 for macro avg. performance)

The graph tuning feature selection number  for f1( it shows feature selection is not very useful on SVM. We take all the features)
The graph tuning fbr score for f1 ( 0.3 is the best number for micro avg. performance and 0 for macro avg. performance)
 
 

2.4 Conclusion

A graph that compares the feature selection curves of the 5 classifiers for f0.5 on TREC10 data.
A graph shows the f0.5 performance on categories with different frequency on TREC10 data. (the first graph shows the original data,which is very bumpy. The second graph shows the local(width=8) lineal regression curve which is much more smooth. To my surprise, the curve does not increase greatly when category frequency increases. The reason may be that these 84 categories are categories with relatively middle frequency. When we see such a curve on 2001t, it will be show different property)

The New Graph
 
 

Optimized

Topic set

SVM.1

SVM.2(Dave)

kNN

Rocchio

NB(rainbow)

Macro avg F0.5

TREC10 (84 cats)

0.598

0.606

0.534

0.507

0.490

Micro avg F0.5

TREC10 (84 cats)

0.746

0.758

0.679

0.653

0.659