Systems for collaborative filtering

The term collaborative filtering itself was coined by Doug Terry at Xerox PARC as part of the development of the Information Tapestry system for retrieving documents from a growing corpus.[13] Among its other features, Tapestry was the first system to support collaborative filtering in that it allows its users to annotate the documents they read. Other Tapestry users can then retrieve documents to read based not only on the content of the documents themselves, but also on what other users have said about them. Tapestry provides free text annotations as well as explicit ``likeit'' and ``hateit'' annotations so users can easily indicate which of the documents they read they found most (or least) valuable.

``A tour through Tapestry'' demonstrates how the collaborative filtering abilities of Tapestry can help users process incoming documents with much greater ease than current systems.[27] In part by using annotations placed on documents by his co-workers, the protagonist in ``A Tour through Tapestry'' is able to focus his attention on the documents most likely to be of interest to him. He harnesses his co-workers' expertise to help him find useful information and he repays this service by providing annotations on the documents he reads. In a situation where each co-worker has expertise in a slightly different area, the information sharing created by document annotations gives everyone in the group access to the area expert without overloading the expert's time.

In its current incarnation, Tapestry suffers from two distinct problems. The first problem is the size of its user base. Because Tapestry is based on a commercial database system it can not be given away freely. Further, Tapestry was not designed for use by large numbers of people at distributed sites. Both these factors combine to limit the pool of potential Tapestry users to researchers at Xerox PARC. Based on anecdotal evidence, this pool does not seem large enough to support a critical mass of users. The vast majority of documents go unannotated, so there is little collaborative information to use when filtering. The second problem with Tapestry is the means by which users enter filters into Tapestry. One common interface to Tapestry requires users to specify requests for information in the form of queries in an SQL-like language. Writing such a query requires the user to have a firm sense of what types of articles he wants to read, which is a hindrance to exploration of new areas. Our goal in this thesis is to describe a collaborative filtering system for Net News that can scale up to handle at least a critical mass of users, and provides those users with a simple method for collaboratively filtering articles.

Collaborative filtering is rapidly gaining popularity as a research topic, and many groups are currently working to develop new strategies for collaborative filtering. Simon is collaborative system being developed by Mark Johnson for use with the World Wide Web system.[10] Users of Simon create ``hotlists'' which are lists of the interesting World Wide Web pages that they have found. Individual users can use these lists, called ``subject spaces,'' to keep track of their own explorations, but they can also send their subject spaces to a group Simon server. The Simon server will then combine the individual subject spaces to form global maps which can be searched or browsed by Web users at large.

GroupLens by Paul Resnick et al. is a filtering system that combines collaboration with user-profiles. In GroupLens, communities of users rank the articles they read on a numerical scale. The GroupLens system then finds correlations between the ratings users have given the articles. Essentially, a user's profile consists of the ratings that she has given to the articles she has read. When user Jane wishes to filter new articles of information, the ratings other users have given those new articles are combined to form a recommendation for Jane on how interesting the new articles will be for her. The ratings from other users are combined by weighting each user's rating in proportion to how well his user-profile correlates with Jane's. The goal of the system is to identify a peer group of users whose interests are similar to Jane's, and then to use their opinions of new articles to predict whether Jane will like the articles. A key difference between our work and GroupLens is the effort we make to support ``exploratory users'' who have not yet developed a user profile.

Where are we?

Next: Where are we? Up: Introduction Previous: Collaboration as a

David A. Maltz (dmaltz@cs.cmu.edu)