KNN

Sinopsis

kNN: k-nearest neighbors

Description

kNN: a brief.

This text categorization method stands for k-nearest neighbor calssification. Given an arbitrary input document, the system ranks its nearest neighbours among the training documents, and uses the categories of the k top-ranking neighbors to predict the categories of the input document. The similarity score of each neighbor document to the new document being classified is used as the weight of its categories, and the sum of category weights over the k nearest neighbors are used for category ranking.

Location of Executable/Scripts

Several versions of kNN are around. /afs/cs.cmu.edu/academic/class/11741-s98/linux/kNN has a set of scripts tested on moscow:
The Makefile
The smart procedure: knn_rafa.sh
The 11pt avrg precision procedure for smart (knn_11pt.sh). It will also generate a bunch of files that are used by the other evaluation procedures.
The Pcut and Rcut evaluation procedures. They Take a real valued matrix, output a binary matrix after threasholded. So they need: input doc-cat similarity matrix, training doc-cat matrix for getting the proportion list, the real doc-cat rels, average number of docs kept for each category, output file)
make_texrels.pl
Other scripts around make some assumptions on the data (were written for the TDT proyect):
knn_det.pl
kNN.dlee
results.prl
A number of people collaborated with documentation on preparing data, makefiles...

Arguments

Example

Using the apteMod dataset (based on Reuters_21450). With a training and a test set.
Looking at the evaluation files:
for K=45:
eval.ann.ann.45.10: 11-pt Avg: 0.8679

for K=30:
eval.ann.ann.30.10: 11-pt Avg: 0.8729

Links

knn web interfase