Trends in the data
Next: Where are we?
Up: Behaviors of Net
Previous: Obtaining data on
Once analyzed, the collected data show the following trends which
we believe are relevant to users' ability to find interesting articles
in the Net News system:
- Users read an average of 15 newsgroups each session. If we
ignore sessions in which users read only 0 to 4 newsgroups as
representing ``quicky'' sessions, our data on the number of newsgroups
users read nicely correlates with data from a net wide survey of Net
News reading habits taken by Jolicoeur.[11] This
correlation helps to establish that the PARC Net News community is
similar to the net wide community.

-
Often users read none of the articles they subscribe to.
Figure
is a histogram of the fraction of
available articles in a newsgroup that the users actually read. The graph
shows that the vast majority of the time, users enter a newsgroup,
view a list of the available articles, and then exit the newsgroup
without reading any articles. Presumably the users subscribed to the
groups because the users thought the groups would contain useful
information. This failure to read any of the articles indicates
that either the information content of the group really was very low,
or more likely, the user was unable to easily identify any of the
articles as being of possible interest.
Figure: Histogram of the fraction
of available articles users actually read. The height of each bar
represents the number of times users read the listed fraction of
available articles.
- A scatter plot of the number of articles users read versus the
number of articles available to be read
shows a telling breakpoint at between
200 - 300 articles (see figure
). If there are
fewer than 200 articles available to be read in a newsgroup, users
read some proportion of the available articles. In groups containing
more than 200 available articles however, few users read any articles
at all. The common behavior is to simply skip all the articles rather
than searching for ones that might be interesting.
Figure: Scatterplot of the
number of articles available to be read in a newsgroup versus the
number of articles users actually read.
- Far more people read Net News than post. Based on
data gathered from 632 sites by the Network Measurement Project at the DEC
Network Systems Laboratory, it is clear that for all groups, there are more
``lurkers'' than ``posters.''[21][20] This is
true regardless of the number of people who read the group or the number
who post articles to it.
Table
shows some sample data for groups with the largest
readership, the smallest readership, the greatest traffic, and several
others.
Table: Posting volume and estimated worldwide readership for several news
groups. (Data from Reid, USENET Readership report for Aug 93)
- The time a user spends on each article, even after he or she has
taken an action to display the full text of the article, is very short. The
data taken so far show that users spend an average of 40 seconds reading an
article.
We reduce our observations into a list of design requirements
for the collaborative filtering system as follows:
[Majority Focus]
Help the majority of Usenet users. There is a huge number of silent
readers - all groups have far more lurkers than active posters.
These lurkers are the people we believe collaborative filtering can
help the most, and also the people who least want to wade through all
the traffic on Usenet. These are also the people least willing to
announce their presence (ie: they do not post), and most likely the
ones who will least tolerate any extra overhead in their news reading.
[Streamline]
Do not interrupt the existing flow. On average, users spend so little
time reading most articles that any operation we expect a majority of
users to perform must be exceedingly quick and consistent with the
flow of the interface they use for reading articles. Voting for
articles must be seamlessly integrated into the way the user processes
Net News.
[Intelligible]
The filter must behave in a simple way that is easy to understand. If
users can not easily understand how the system is trying to help them,
the system will only be getting in the way of the user's tasks.
To help users find interesting articles we need to find a way
of cutting down the number of articles they must consider reading,
otherwise they will tend not to read any. If we can accomplish that,
we will probably help users meet their goal of increasing the number
of newsgroups they can read. Our data suggest that even if our
filtering is not very accurate at picking out the best articles, it
may still help users find articles interesting to them by reducing the
psychological burden of sifting through a huge number of available
articles.
Next: Where are we?
Up: Behaviors of Net
Previous: Obtaining data on
David A. Maltz (dmaltz@cs.cmu.edu)