This data set is a collection of 20,000 messages, collected from 20 different netnews newsgroups. One thousand messages from each of the twenty newsgroups were chosen at random and partitioned by newsgroup name. The list of newsgroups from which the messages were chose is as follows:
alt.atheism talk.politics.guns talk.politics.mideast talk.politics.misc talk.religion.misc soc.religion.christian comp.sys.ibm.pc.hardware comp.graphics comp.os.ms-windows.misc comp.sys.mac.hardware comp.windows.x rec.autos rec.motorcycles rec.sport.baseball rec.sport.hockey sci.crypt sci.electronics sci.space sci.med misc.forsale
This dataset was assembled by Ken Lang.