IR term project of JianBing Li

Basic Information

Project Title: Text Categorization: a hierarchical approach
Name: JianBing Li (email: jbli@cs.cmu.edu)
Presentation Date: April 30
Demo Date: TBA

Abstract
Proposal and Timelines
System Description
Experiments
Results
Demo

Abstract

In this project, I am trying to apply a high-accuracy classification method (kNN) to very large category space (Reuters-21450 Apte version and Reuters-21578). Hopefully an divide-and 鈪onquer strategy would make the large problem more tractable without significant loss and any loss of classification accuracy.

I would use some existing tools to help the implementation of the experiment system, i.e. the SMART system, kNN procedure, and sampling & feature selection procedures. The solution-composition program will be implemented by myself. Two experiment systems will be implemented finally: one using the hierarchical approach, and one using the normal "flat" approach.

I would test the two experiment systems on Reuters-21450 Apte version and Reuters-21578. The Precision-Recall evaluation will be used to compare the result of the two systems. The accuracy percentages evaluation will be used to compare the result of my hierarchical approach with that of M. Sahami.

Proposal and Timelines

My Proposal
　

Task	to be done by	status
Read relative reference paper to learn about the idea and algorithms	March 8	finished
Settle down the existing tools (SMART, kNN, sampling & feature selection procedures), grasp how to use them and make them ready to embedded into the experiment system	March 11	finished
Implement the flat approach	April 2	finished
Implement the hierarchical approach	April 16	on going
Test the experiment systems on Reuters-21450 Apte version and Reuters-21578, and tuning the system to gain the best result for each system	April 18	N/A
Make the project report	April 20	N/A
Make the project presentation	April 22	N/A

System Description

Experiments

Results

...TBA...

Demo