The Scone
Knowledge-Base Project
Scone
is a high-performance, open-source knowledge-base (KB) system intended for
use as a component in many different software applications. Like other KB systems – for example, Cyc and the various Description
Logic systems – Scone provides
support for representing symbolic knowledge about the world. This may be general "common
sense" knowledge or knowledge about a specific application domain.
Our
plan is to release Scone - the software,
a relatively small "core" knowledge base, and a programmer-level
manual - as open-source software as soon as we have tested the system with
"friendly" users in various research groups at Carnegie
Mellon. This release will be followed
by periodic updates as we continue to develop the Scone
engine and associated knowledge bases.
We
are also working on a tutorial book that should make it much easier for
beginners to make use of the Scone
software in projects of their own. We
hope that this will lead to an active worldwide community of Scone
users who will extend the system in various ways and who will develop
open-source knowledge bases for many domains.
The Scone
Engine
Scone
supports simple inference over the elements and statements in the knowledge
base: inheritance of properties from more general descriptions, following
chains of transitive relations, detection of type mismatches, and so on. In addition, Scone
provides support for search within the knowledge base. For example, we can ask Scone to return all
individuals or types represented in the KB that exhibit some set of
properties, whether these properties are explicitly stated or inherited from
a superior class in the type hierarchy.
Scone's
type hierarchy allows multiple inheritance and exceptions. In addition, Scone
supports multiple contexts in the knowledge base. The context mechanism allows us to
efficiently represent and reason about different states of the knowledge
base, including hypothetical or counter-factual states, various opinions, and
groups of statements that are true only in some specific time or place.
The
Scone "engine", a large Common
Lisp program, implements Scone's basic
procedures for representation, search, and inference. Procedures supporting more complex kinds of
inference – conversion of units, for example, or procedures for
checking the plausibility of new knowledge – can be added to the
system. These procedures can be triggered
by KB queries or by changes to Scone's
stored knowledge.
A
major emphasis of our research on Scone
has been the desire to find search and inference algorithms that are
efficient, and that remain usable even as the knowledge base grows to millions
of entities and statements. Scone
differs from other knowledge-base systems in the way it implements search and
inference. Scone
uses marker-passing algorithms originally designed for a hypothetical
massively parallel machine (the NETL machine). These marker-passing algorithms cannot
perform every kind of search and inference that can be handled by a general
theorem-prover. However, the Scone
algorithms are very fast, and they can handle most kinds of search and
inference that are needed for common-sense reasoning. Scone's
marker-passing algorithms will be described more fully in a forthcoming
paper.
At
present, the knowledge bases we have developed for Scone
are relatively small: a few thousand statements and entities. However, we have successfully run
benchmarks on a synthetic knowledge base with several million items on a
$3000 workstation, with most simple queries being processed in a few
milliseconds; most other KB systems bog down when loaded with a few thousand
statements. If more processing power
is needed, the Scone algorithms are well
suited to parallel implementation on a network or grid of workstations, or on
a data-parallel machine.
Adding Knowledge to Scone
In
addition to the engine, the Scone system
comes with a number of knowledge-base files, each of which is a collection of
descriptions and statements about the entities in some subject area. The "core" KB includes a body of
general knowledge that is useful in most other domains: knowledge about
physical objects, materials, units of measure, time and space, people, and so
on.
The greatest problem for users of
current KB systems has been the difficulty of adding new knowledge to the
system and making that knowledge fully effective. So a second major focus of our research is
to make it easy for users with no special training to add new knowledge to
the Scone KB. Scone
eases the burden of knowledge entry by relatively clean design and by
separating system-efficiency concerns from knowledge-entry concerns.
Our general plan for creation of new Scone
knowledge bases is as follows:
· At present, complex knowledge
must be entered into Scone as a
collection of knowledge-entry statements – specialized Common Lisp
expressions. For example, to create a
new elephant named Clyde, we would enter
the following form:
(new-indv {Clyde}
{elephant})
When this form is entered, it is checked for consistency
with any information already in the KB.
· A body of fundamental
knowledge, such as Scone's representation
of time, space, objects, and materials, has been created in this form by
members of the Scone project. The process is ongoing.
· When we work with members of
another research group that wants to use Scone,
we teach them to create their own knowledge bases in Scone
format. Many of these KBs are of
general value and are added to the Scone
library.
· One of our goals in releasing Scone
as open-source software is to build a community that will create and share
high-quality knowledge bases in any number of areas.
· We are also looking at
techniques such as those developed by the Open Mind Project and the creators
of the Peekaboom game, which entice large numbers
of untrained Internet users to enter new knowledge by turning the process
into a game.
· Knowledge can also be obtained
by mining existing structured or semi-structured knowledge sources and
converting their information content into Scone
format. For example, as a
demonstration, one student in the Scone
group has extracted information about all the countries of the world –
area, population, cities, and so on – from the HTML files of the online
"CIA World Factbook" and from other sources on the Web. Technically, this conversion is a
straightforward process in most cases, though it may require some amount of
hand-editing and correction.
· Ultimately we want Scone
to accept new knowledge in the form of simple English statements (or
statements in the human language of your choice). We already can process many simple
declarative English sentences into Scone
format, and our coverage of English is increasing steadily. However, to handle the full range of
English statements – the sort of text we might find in newspapers and
textbooks – we must use the knowledge already in Scone
to help us disambiguate the new text we are trying to process. Several of the students in the Scone
Research Group are working on various aspects of this challenging
problem.
Current and Potential Applications of
Scone
In
the long run, we believe that Scone could
become a standard component for people writing knowledge-based software. A knowledge base could be used in as many
different ways as databases are used today.
Of
course, this depends on the efficiency and reliability of the system, and
most of all on its ease of use. Our
goal is to make Scone so easy that any
smart college undergrad who is developing an adventure game will be able to
read the Scone tutorial book, download
the open-source software, and begin using Scone
as a tool to hold the system's knowledge: "An
ogre typically carries a club, lives in a cave, and likes to eat
hobbits. Igor the Ogre has met Frodo
and will recognize him if they meet again.
But a character in disguise probably won't be recognized."
Of
course, Scone can be used for more
serious purposes as well. Here are a
few example applications:
· Online
catalogs: It
is straightforward in Scone to represent
hierarchies of products, their characteristics, their intended application,
which components work together, information on prices, vendors, and
availability, and so on.
· Help-desk
support: Just as products can be described and
searched in Scone, so too can families of
problems, their symptoms, and their causes.
· Autonomic
computing: Companies that develop or manage complex hardware/software
installations face a serious problem in configuring these systems correctly,
recognizing vulnerabilities and attacks, and diagnosing and repairing
problems. The first step in managing
this complexity is to create a symbolic description of the installation: its
components, tasks, personnel and permissions, and the external
environment. This is a job for a KB.
· Federated
databases: Suppose two companies merge. Company A has a database of employees, but it does not cover
temporary or part-time employees.
Company B has a database, also labeled employees, which does contain their part-time and temporary
employees, but it does not include salespeople who get commission rather than
a salary. If we can represent the
different types and subtypes of employees in a knowledge base, then we can
begin to combine these two ontologies and to resolve the differences between
them.
One possible solution is to send all database queries
first to Scone, which will pick off
and answer any odd or exceptional queries; Scone can then send the
straightforward queries (perhaps in modified form) on to the appropriate DB.
· Computational
biology: The literature in this field is huge and is
growing at an alarming rate.
Representing and organizing all this diverse knowledge, so that
connections can be noticed and so that researchers can find the information
that they need, is another job well-suited for a knowledge base, perhaps
backed up by multiple databases for low-level data.
In all of these applications, it is important
to keep in mind that we do not have to choose between a knowledge-based
approach using Scone and a statistical approach, or one using conventional
database technology. In many problem
domains, as of today, none of these approaches provides a complete solution,
but fortunately they all play well together.
For example, a little bit of symbolic
knowledge – perhaps just a type hierarchy and some properties –
can add a lot of power to a search engine or a classifier by augmenting
queries and by filtering what the search engine returns. As Scone's
knowledge base grows and evolves, it can play an ever-greater role in this
partnership. But the key point here is
that we do not have to wait for this.
This research project is ambitious but it is not an all-or-nothing
proposition.
During the spring 2006 term, Scone
has been tested by three research projects at Carnegie Mellon's School
of Computer Science. These projects are Radar, Javelin (Question
Answering), and "Read the Web".
In these applications, Scone
serves both as a repository for background knowledge and as the store where
newly learned knowledge can be saved.
Scone
has already been used to improve message classification within the Radar
system by augmenting the "bag of words" features with
"implied" features. If a
message mentions "Scott Fahlman", Scone adds additional features
for "faculty", "CMU", "AI",
"research", "Scone Project", and so on, based on the
background knowledge in the KB. If
these new virtual features are irrelevant, the classifier will learn to
ignore them, but often they are valuable.
If a user asks whether there are any upcoming "AI" talks and
we have a message saying that "Scott Fahlman" is speaking, we can
make the connection.
Software & Publications
Open-Source Scone
Software (Coming Soon)
The Scone User's Guide (Word, PDF)
SconeEdit Browser/Editor
for Scone
Other Scone-Related
Publications
Members of the Scone Research Group
Faculty:
Scott E. Fahlman
Current
Grad Students:
Allen Benson (Pitt), Ben Lambert, Wei Chen
Former
Grad Students:
Daniel Chung Yong Lim, Daniel Olsher,
E. Cinar Sahin, Alicia Tribble Sagae
Former
Vistors:
Maria Jose Santofimia Romero, David Manzano-Macho
Recent
Undergraduate Students:
Matthew Gormley, Jiquan Ngiam, Apaorn Suveepattananont
Acknowledgments
Development
of Scone from 2003 through 2008 was supported in part by the Defense Advanced
Research Projects Agency (DARPA) under contract numbers NBCHD030010 and
FA8750-07-D-0185. Additional support for Scone development has been provided
by generous research grants from Cisco Systems Inc. and from Google Inc.
|