5. The Future of Knowledge Organization Systems on the Web
As online databases moved to the Web, they began to provide
their products, including vocabulary aids, in this environment.
Portable document format (PDF) versions of printed vocabulary aids are
common, since PDF can be easily produced from a Postscript file and
it retains the look of the printed product. With Adobe's tools for
indexing and searching, the PDF file can provide some level of support
for linking. Many of these aids, however, remain in the form of
HTML files onlythere is no database structure to easily support the
linking and searching. In some cases, the full structure of the KOS is
not made available on the Web; the only format for a Web-based
thesaurus may be an alphabetical list of terms that does not enable the
user to navigate easily the hierarchical structure. As unique ways of
using these resources are developed, it is hoped that more KOS
providers will be encouraged to provide their systems in formats that are
conducive to such networked uses.
Some of the requirements for such electronic KOSs were
identified at a workshop entitled "Electronic Thesauri: Planning for a
Standard" and sponsored by NISO (1999). While the focus of this
meeting was digital thesauri, consideration was also given to other
KOSs in digital form. The identified requirements include persistent
identification at the concept level, the need for a simple protocol for
the distributed querying and response from a KOS, and the
development of a standard set of metadata attributes for describing a
remote KOS.
To facilitate the search and display of information from a
previously unknown KOS, the system must have unique and
persistent identifiers for each of the concepts in the system. For example,
the California Environmental Resources Evaluation System (of the
California Natural Resources Agency) and the U.S. Geological
Survey have developed a system for remote querying and response
(CERES 1999). It requires that each concept in the thesaurus have a
unique identifier. In the case of the previously described ITIS, which is
accessed remotely by the CERES system, the ITIS record number
is used as the identifier. Other unique identifiers could include
the DOI, or a classification notation that has been made unique by
appending the scheme name or the URL to the notation.
The second requirement is a protocol for the distributed
querying and response of KOSs. This is particularly critical for
highly structured systems such as thesauri, semantic networks, and
ontologies. Work has been done in this area within the Z39.50
community. (Z39.50 is the NISO standard for searching distributed
bibliographic databases.) A profile has been proposed by the Zthes Working
Group to tailor the Z39.50 protocol to operate on thesauri that follow
the Z39.19 standard.
A similar effort is under way at the CERES Project. Instead of
a Z39.50-based protocol, CERES has developed a structure that
is based on the Resource Description Framework (RDF) and the
HTTP protocol of standard browsers. The RDF's concept of containers is
a natural for managing the hierarchical structure of complex
systems such as thesauri. The structure proposed by CERES is likely to be
encoded using XML, a mark-up format that lends itself to
structured information. This protocol for linking distributed vocabularies
will support both searching and cataloging. The user will be
presented with remote vocabularies that can be displayed and navigated by
a local client.
The third major finding from the NISO workshop was the
need for a metadata content standard for the description of KOSs. Such
a standard is key to provision of knowledge organization services
over the Internet. The metadata identify the Web resource as a KOS
and provide important information to allow an application to use it
remotely without prior knowledge of its content or structure.
A draft set of attributes for describing KOSs available in a
networked environment has been developed by a task group of the
Network Knowledge Organization Systems (NKOS) Working Group,
an ad hoc group of terminology experts from organizations that are
interested in issues related to the use and interoperability of KOSs
over the Internet. The draft attributes are based on work originally
done by Linda Hill (Alexandria Digital Library at the University of
California at Santa Barbara) and Michael Raugh (Interconnect
Technologies).
The attributes describe the KOS so that content from the
system can be transferred over the Internet and handled by a remote
browser or client application. The attributes include the depth of
hierarchy, the types of relationships included, the subject (described by free
text or by a declared classification scheme), storage format, copyright
and rights management, and contact information. To facilitate the
transfer of information, the attribute set also includes information
on character set and file size. To facilitate the acquisition and
licensing of the KOSs, the draft content description includes point of
contact information.
During discussions about the metadata content standard,
workshop attendees identified three methods for storing the metadata
for a KOS. First, the metadata could be stored with the KOS, as
metadata elements for that resource. Second, the metadata could be stored in
a physically separate knowledge organization registry. The third
possibility is a hybrid approach, where a minimal set of metadata
elements is contained in a central registry (i.e., sufficient information
to identify the resource, where it is located, and how more
information can be obtained). The more detailed information would be
stored with the KOS itself.
There is significant interest in the use of KOSs to organize
and search material on the Internet. It is hoped that this interest will
result in knowledge organization services that will make these
sources more readily accessible to a variety of software applications and to
a variety of users. As services and enabled software proliferate, it
will be easier to integrate these KOSs into digital libraries.
Next Previous
|