CMU World Wide Knowledge Base (Web->KB) project

Goal:

To develop a probabilistic, symbolic knowledge base that mirrors the content of the world wide web. If successful, this will make text information on the web available in computer-understandable form, enabling much more sophisticated information retrieval and problem solving.

Approach:

We are developing a system that can be trained to extract symbolic knowledge from hypertext, using a variety of machine learning methods.

Datasets:

The first experiments consisted in extracting knowledge about computer science departments. We have assembled two data sets for this task:

Other Datasets used by the WebKB Group

Related research on machine learning and text:

See the other research on text learning by our research group.

Publications:

Overview of Cora, a related project:

           Automatic Corpus Construction from the Web

            Spidering:

Researchers:

Project Alumni:

Internal project page visible to project members only.


theo-11-last update: Jan 2001 by Rayid Ghani

this web page is stored at /afs/cs.cmu.edu/project/theo-11/www/wwkb/index.html