HomeFeedback
KNN
Sinopsis
kNN: k-nearest neighbors
Description
kNN: a brief.
This text categorization method stands for k-nearest neighbor calssification. Given an arbitrary input document, the system ranks its nearest neighbours among the training documents, and uses the categories of the k top-ranking neighbors to predict the categories of the input document. The similarity score of each neighbor document to the new document being classified is used as the weight of its categories, and the sum of category weights over the k nearest neighbors are used for category ranking.
Location of Executable/Scripts
Several versions of kNN are around.
/afs/cs.cmu.edu/academic/class/11741-s98/linux/kNN has a set of scripts tested on moscow:
The Makefile
The smart procedure: knn_rafa.sh
The 11pt avrg precision procedure for smart (knn_11pt.sh).
It will also generate a bunch of files that are used by the other evaluation procedures.
The Pcut and Rcut evaluation procedures. They Take a real valued matrix, output a binary matrix after threasholded. So they need: input doc-cat similarity matrix, training doc-cat matrix for getting the proportion list, the real doc-cat rels, average number of docs kept for each category, output file)
make_texrels.pl
Other scripts around make some assumptions on the data (were written for the TDT proyect):
knn_det.pl
kNN.dlee
results.prl
A number of people collaborated with
documentation on preparing data, makefiles...
Arguments
Example
Using the apteMod dataset (based on Reuters_21450).
With a training and a test set.
Looking at the evaluation files:
for K=45:
eval.ann.ann.45.10: 11-pt Avg: 0.8679
for K=30:
eval.ann.ann.30.10: 11-pt Avg: 0.8729
Links
knn web interfase