IR term project of JianBing Li
Basic Information
-
Project Title: Text Categorization: a hierarchical
approach
-
Name: JianBing Li (email: jbli@cs.cmu.edu)
-
Presentation Date: April 30
-
Demo Date: TBA
Contents
Abstract
In this project, I am trying to apply a high-accuracy
classification method (kNN) to very large category space (Reuters-21450
Apte version and Reuters-21578). Hopefully an divide-and –conquer strategy
would make the large problem more tractable without significant loss and
any loss of classification accuracy.
I would use some existing tools to help the implementation
of the experiment system, i.e. the SMART system, kNN procedure, and sampling
& feature selection procedures. The solution-composition program will
be implemented by myself. Two experiment systems will be implemented finally:
one using the hierarchical approach, and one using the normal "flat" approach.
I would test the two experiment systems on Reuters-21450
Apte version and Reuters-21578. The Precision-Recall evaluation will be
used to compare the result of the two systems. The accuracy percentages
evaluation will be used to compare the result of my hierarchical approach
with that of M. Sahami.
Proposal and Timelines
My Proposal
¡@
Task
|
to be done by
|
status
|
Read relative reference
paper to learn about the idea and algorithms |
March 8
|
finished
|
Settle down the existing
tools (SMART, kNN, sampling & feature selection procedures), grasp
how to use them and make them ready to embedded into the experiment system |
March 11
|
finished
|
Implement the flat approach |
April 2
|
finished
|
Implement the hierarchical
approach |
April 16
|
on going
|
Test the experiment
systems on Reuters-21450 Apte version and Reuters-21578, and tuning the
system to gain the best result for each system |
April 18
|
N/A
|
Make the project report |
April 20
|
N/A
|
Make the project presentation |
April 22
|
N/A
|
System Description
I would use some existing tools to help the implementation
of the experiment system, i.e. the SMART system, kNN procedure, and sampling
& feature selection procedures. The solution-composition program will
be implemented by myself. Two experiment systems will be implemented finally:
one using the hierarchical approach, and one using the normal "flat" approach.
Experiments
I would test the two experiment systems on Reuters-21450
Apte version and Reuters-21578. The Precision-Recall evaluation will be
used to compare the result of the two systems. The accuracy percentages
evaluation will be used to compare the result of my hierarchical approach
with that of M. Sahami.
Results
...TBA...
Demo
...TBA...
last update: Mar 2nd, 1998