IR term project of JianBing Li

Basic Information

Contents


Abstract

In this project, I am trying to apply a high-accuracy classification method (kNN) to very large category space (Reuters-21450 Apte version and Reuters-21578). Hopefully an divide-and –conquer strategy would make the large problem more tractable without significant loss and any loss of classification accuracy.

I would use some existing tools to help the implementation of the experiment system, i.e. the SMART system, kNN procedure, and sampling & feature selection procedures. The solution-composition program will be implemented by myself. Two experiment systems will be implemented finally: one using the hierarchical approach, and one using the normal "flat" approach.

I would test the two experiment systems on Reuters-21450 Apte version and Reuters-21578. The Precision-Recall evaluation will be used to compare the result of the two systems. The accuracy percentages evaluation will be used to compare the result of my hierarchical approach with that of M. Sahami.

Proposal and Timelines

My Proposal
¡@
 
Task
to be done by
status
Read relative reference paper to learn about the idea and algorithms
March 8
finished
Settle down the existing tools (SMART, kNN, sampling & feature selection procedures), grasp how to use them and make them ready to embedded into the experiment system
March 11
finished
Implement the flat approach
April 2
finished
Implement the hierarchical approach
April 16
on going
Test the experiment systems on Reuters-21450 Apte version and Reuters-21578, and tuning the system to gain the best result for each system
April 18
N/A
Make the project report
April 20
N/A
Make the project presentation
April 22
N/A

System Description

I would use some existing tools to help the implementation of the experiment system, i.e. the SMART system, kNN procedure, and sampling & feature selection procedures. The solution-composition program will be implemented by myself. Two experiment systems will be implemented finally: one using the hierarchical approach, and one using the normal "flat" approach.

Experiments

I would test the two experiment systems on Reuters-21450 Apte version and Reuters-21578. The Precision-Recall evaluation will be used to compare the result of the two systems. The accuracy percentages evaluation will be used to compare the result of my hierarchical approach with that of M. Sahami.

Results

...TBA...

Demo

...TBA...


last update: Mar 2nd, 1998