Machine learning used by Personal WebWatcher
Dunja Mladenic
This paper describes design of personal browsing assistant
Personal WebWatcher that suggests interesting
hyperlinks on the requested Web documents.
Machine learning is used to generate a model of user's
interests.
We consider two approaches that differ in the information
included in training examples:
(1) include information presented to the user, that is a part of the text
from the document that contains a hyperlink
and (2) include information that was not presented to the user,
that is the content of the document pointed to by a hyperlink.
We compare two classification algorithms
k-Nearest Neighbor and Naive Bayes.
Bag of words document representation is used and
features are selected using Information gain.
Preliminary experiments show that there is no significant difference
between the used classifiers and that using only a small number of features
gives almost the same results as using all features.
In all experiments the achieved classification accuracy is the same or
slightly higher than the default accuracy.
Since the default accuracy is higher for approach (1) than for
approach (2), the results of approach (1) show higher classification
accuracy.