Results on TREC10 data (84 topic
categories)
2.0 kNN(standard)
Result
tuned for macrof0.5 and Result
tuned for microf0.5 .
The
graph tuning feature selection number for f0.5 ( it shows 8000 is
good for microf0.5 and macrof0.5.)
The
graph tuning fbr score for f0.5 ( 0.5 is the best number for
microf0.5 and 0.1 for macrof0.5)
The
graph tuning kn for f0.5 ( 100 is a good value for microf0.5 and
macrof0.5)
2.1 Rocchio
Result
tuned for macrof0.5 and Result
tuned for microf0.5
(This is the result using scut with s
tuned. I also tried rcut with r=3. But rcut obviously has worse
performance. I plot
a graph which compares the recall,
precision,f1 and f0.5
between rcut r=3 and scut with s tuned )
The
graph tuning fbr score for f0.5 ( for micro average performance,
the best value is 0.5, for macro average performance, the best value
is 0.1)
The graph
tuning feature selection number for f0.5 (It can be seen that
feature selection is useful, 7000 is a good selection for both micro
avg. and macro avg. performance)
The
graph tuning beta for f0.5 (-2 is the best for micro and macro
avg. performance)
The
graph tuning pmax for f0.5 (14000 is the best for micro and macro
avg. performance)
The
graph tuning fbr score for f1 ( 0.1 is good for macro and micro
avg. performance)
The
graph tuning feature selection number for f1 (It can be seen that
feature selection is useful, 7000 is a good selection for micro and
macro avg. performance)
The
graph tuning beta for f1 (-2 is the best for micro and macro avg.
performance)
The
graph tuning pmax for f1 (14000 is the best for micro and macro
avg. performance)
2.2 NB
Result tuned for macrof0.5 and Result tuned for microf0.5
( For macro avg.
result, 2000 top features are used. fbr=0. For micro avg.
result, 6000 top features are used. fbr=0.4 )
2.3 SVM
The
graph tuning feature selection number for f0.5 ( it seems that
feature selection is not very useful on SVM. We take all the
features)
The
graph tuning fbr score for f0.5 ( 0.6 is the best number for
micro avg. performance and 0.2 for macro avg. performance)
The
graph tuning feature selection number for f1( it shows
feature selection is not very useful on SVM. We take all the
features)
The
graph tuning fbr score for f1 ( 0.3 is the best number for micro
avg. performance and 0 for macro avg. performance)
2.4 Conclusion
A graph
that compares the feature selection curves of the 5 classifiers for
f0.5 on TREC10 data.
A
graph shows the f0.5 performance on categories with different
frequency on TREC10 data. (the first graph shows the original
data,which is very bumpy. The second graph shows the local(width=8)
lineal regression curve which is much more smooth. To my surprise,
the curve does not increase greatly when category frequency
increases. The reason may be that these 84 categories are categories
with relatively middle frequency. When we see such a curve on 2001t,
it will be show different property)
The New Graph
Optimized |
Topic set |
SVM.1 |
SVM.2(Dave) |
kNN |
Rocchio |
NB(rainbow) |
Macro avg F0.5 |
TREC10 (84 cats) |
0.598 |
0.606 |
0.534 |
0.507 |
0.490 |
Micro avg F0.5 |
TREC10 (84 cats) |
0.746 |
0.758 |
0.679 |
0.653 |
0.659 |