A glimpse of the future



next up previous contents
Next: Systems for filtering Up: Introduction Previous: Introduction

A glimpse of the future

The array of new information sources available over the networks seems bewildering at first. A brief glance through the literature surrounding the National Information Infrastructure shows examples of many different types of commercial products. Examples of such products are: Companies who specialize in creating ``custom storefront'' services for other companies so users can purchase items over the net; Software companies setting up product information services to field questions, updates and product enhancements over the internet network rather than the phone network; and quote services that provide constantly updated stock market quotes and information for investors.

There has also been growth in the non-commercial sector. The Library of Congress has begun to make its special exhibits and catalogs available through online services. The World Wide Web system allows anyone with a machine on the Internet to create hypertext multimedia documents and link them to other documents at other sites. [7] The Internet Gopher system provides a consistent interface to many different types of information sources ranging from full-text retrieval services to school catalogs and course schedules.[1]

Characteristic of many of these new information systems are the following properties:

Article based:
Information is added to them in small, discrete chunks which are reasonably self contained.

Independently created:
Information chunks are added to the system by large numbers of users resident at different sites or at different times.

Multiply accessed:
Each chunk of information can expect to be read or accessed by a large number of people over the course of its life.

Another example of an information system with these properties is Usenet Net News. The Usenet is a loosely organized network of heterogeneous computers stretching across all 7 continents. These computers communicate via a number of protocols such as uucp[19] and TCP/IP data streams. A primary use of the Usenet network is the exchange of Usenet Net News.[28] Net News functions like a huge distributed bulletin board system such that messages created at one site are eventually seen at all sites on Usenet.

Information on Net News takes the form of articles written by individual users. Users enter articles into the Net News system by posting them. Each site which participates in Usenet Net News runs a program called a news server that exchanges articles with news servers at other sites using the Network News Transfer Protocol (NNTP).[12] The news server stores all the articles it receives for a number of days and deletes older articles to make room for the new. Until the articles are deleted, the news server makes them available to be read by users at the site. Users request articles from the news server using any of a number of news reader programs which provide a user interface for reading and browsing the articles.

To help users find articles they are interested in reading, the articles are arranged into a hierarchy of newsgroups which is organized by topic. To keep discussions on topic, some newsgroups are moderated. Any article posted to a moderated newsgroups is first sent to a human moderator who decides whether or not the article will be distributed across the Usenet.

Usenet Net News is currently growing to an enormous size. Estimates show that there are over 2.6 million users of Net News at some 87 thousand sites throughout the world. These users generate over 26 thousand new articles a day, amounting to 57 Mbytes of data.[22] In the past, users attempted to control the number of articles they had to read a day by only subscribing to newsgroups on topics in which they are interested. However, the continuing increase in the number of newsgroups and the number of articles posted daily has had the result that many users are presented with far more messages a day than they can possibly read. To cope with this overload of incoming data, users have adopted reading strategies for dealing with the flood, but these strategies often sacrifice any chance the user has of finding useful information in the data. In fact, one of the few things the users of Net News seem to agree on is that they need a simple way of filtering the available messages to find ones in which they will be interested.[11]

The idea of filtering is not a new one, and many filtering systems of different types have been created in the past, but none of them have all the characteristics that are needed for filtering information systems like Net News. Furthermore, few existing filtering systems take advantage of a key property of networked information systems - namely that many people read each message. Making use of the multiply accessed property to create an effective filtering system is the primary goal of this thesis.

As an overview, the system we developed for filtering Net News enables users to cast votes on the articles they read, and users can either associate their names with these votes or not as they chose. The votes are distributed by a series of methods from where they are cast to where later users might access them. When users go to read the articles available in a newsgroup, they can access the votes cast by previous users and thereby find out whether previous users found those articles to be useful. Before describing our system for filtering Net News in more detail, let us first examine several general strategies that are currently used for filtering.



next up previous contents
Next: Systems for filtering Up: Introduction Previous: Introduction



David A. Maltz (dmaltz@cs.cmu.edu)