![dot](../../img/dot_clear.gif) |
4. Planning and Implementing Knowledge Organization Systems in Digital Libraries
This section provides general guidelines that may be useful for
an organization that wants to use knowledge organization systems
to organize a digital library. The framework described is applicable
for KOSs of any type or subject.
Planning Knowledge Organization Systems
Analyzing User Needs
Of primary importance to any digital library project is an analysis
of its users' needs, in terms of content and functionality. Many
volumes have already been written about needs assessment, and
providing detailed guidance on this subject is beyond the scope of this
paper. However, when analyzing how a KOS might be used with a
particular digital library, it is essential to thoroughly understand the
environment of the user. One must look not only at the needs for
organizing the digital library materials but also at possible links
between content within and outside the digital library walls. This is
particularly important for KOSs that are acting as intermediate
authority files, because in such cases the links may not be readily apparent.
It is important to consider other views that might be valuable for
users and peripheral communities that might benefit from the digital
library's content were it accessible to them through a KOS.
Locating Knowledge Organization Systems
Once the user's needs have been analyzed, it is necessary to
locate KOSs to meet the need. While an alternate system can be built
locally, it is preferable to find an existing KOS for several reasons. First,
it is costly and time-consuming to build a KOS. Second, KOSs
often benefit by having been built over time. Many of the systems
described in this report have been built over decades; some existed
in paper before digitization. The value of a KOS comes from its
acceptance by the user community; sources built by noted authorities
such as learned societies, trade associations, or standards groups will
be viewed as more trustworthy than those built internally. Finally,
the networked environment has resulted in both an explosion of
primary materials, including documents, electronic journals, and
Web-based databases, and in an equivalent explosion of KOSs on the Web.
There are several ways to identify KOSs that may be of
interest. Many users are already aware of KOSs on the Web within their
discipline. Developers may also turn to directories, librarians in the
field, and reference sources, or they may perform a general search of
the Internet.
Planning the Infrastructure
It is necessary to make decisions about the architecture of the KOS
in the context of the digital library setting. The physical location of
the KOS is important. Will the system be held externally or
internally? There are pros and cons to either approach.
If the system is available on the Web, it is possible to
consider linking to the KOS as an external system. This architecture requires
a script or some search query to locate the resource. One must
then launch a query against the resource to obtain the piece of
information that will serve as the key between the two files. This key
could be a universal resource locator (URL) or input to another search
query. A query may be necessary if the KOS is stored in a database.
The script may transfer log-on information (including user ID and
password) from the digital library system to the external KOS, in order
to provide access to the Web-enabled database. In the case of a
more direct link, the access may be by URL.
However, the use of a URL as the link has the same
problem with persistence as does direct access via a URL from a browser.
The organization may move the KOS, thereby changing the URL that
is being used as the key. It is important to determine how often
the URLs in the KOS change, whether there is a means of notification
of these changes, and whether it is possible to consider an
alternative that would be more persistent. Schemes such as the Digital
Object Identifier and the Persistent URL have been devised to enable
resources to be physically moved among servers without having
their names changed. Another alternative is the use of other Uniform
Resource Identification (URI) schemes and the Uniform Resource
Name (URN), which can be sent from the newer Web browsers. The
benefit of linking to a remote resource is that the resource will always be
up-to-date. The maintenance of the KOS is in the hands of the
owner, not the digital librarian. It may also be more apparent to users
that the KOS is not owned by the digital library.
Linking to a remote KOS also has disadvantages. Persistence
and unexpected changes in the organization and content of the
system may cause problems. The software or telecommunications route
between the digital library server and the KOS may be unreliable.
In systems requiring fast response time or large amounts of data
transfer, and, therefore, high bandwidth (such as full-motion video or
detailed graphics), the fact that a connection must be made between
the digital library and the external KOS may make the system
unacceptable to the user.
Alternatively, the KOS may be obtained from the owner
and loaded locally. In many cases, this requires licensing that may not
be required when the KOS is accessed remotely, because a copy of
the whole resource is being provided to the digital library. Loading
a KOS locally also requires that one consider issues such as
maintenance, local system administration, and disk storage. If the KOS
uses special software, such as a database management system, loading
the KOS locally will require a copy of that software, which may
require additional purchase or licensing. Other considerations are the
need for firewalls and interface design. On the positive side, the KOS
is under more local control. Therefore, it may be possible to
improve the response time by not accessing the KOS over the Internet. If
the KOS is to be used behind the scenes (that is, the system is not
visible to the user), concerns of speed and integration become more
important. If additional modifications (including digitization) need to
be made to the KOS to integrate it with the digital library, it will also
be necessary to load the KOS locally.
If the digital library intends to incorporate numerous
secondary KOSs, it is important to consider the degree to which the
architecture is scaleable. The National Library of Medicine's UMLS
incorporates more than 40 different sources. While its main purpose has been
to develop a metathesaurus for moving among these vocabularies,
the management of the systems, regardless of the mapping issues,
has been a major consideration. Ingest has been a major concern,
with the need to develop a system that can handle a variety of input
formatsfrom ASCII text files to highly structured database
output. The architecture must also accommodate the character sets of the
incoming sources. This is particularly important if a mark-up
language has been used to represent special characters and diacritical
marks. Systems that have been developed in Unicode, which extends
ASCII to accommodate diacritical marks and non-Roman character
sets, cannot be handled by systems that deal only with ASCII or
extended ASCII sets.
Since many digital library systems are being built as
extensions or applications of existing integrated library systems (ILS), it is
important to consider how the KOSs will integrate with the library
system. Unfortunately, many ILS vendors have not considered links
to external files or databases in their system designs. In some cases,
the vendor may require that the information be stored in the
proprietary format of the ILS. The system may require that the files be on
the same directory or server as the accessing ILS. The fields that can
be linked to the Web or searched may be limited. Outside
communications may require Z39.50 client-server connections. With
relatively closed systems, ILSs may be a difficult environment in which to
implement alternative and nontraditional KOSs.
Digital libraries that are interested in using KOSs should
consider this integration when developing requirements for the
procurement of a system to support them. Vendors should be encouraged
to support relatively open architectures and to consider the extension
of traditional library systems to support broader digital library
functionality.
In addition to these immediate concerns, it is important to
consider the incorporation of future KOSs. Initial success may spur
the desire for integration of additional KOSs or enhanced
functionality for the existing KOS. Success may breed additional requirements
and increase the strain on hardware, software, and network architectures.
Maintaining the Knowledge Organization System
For a digital library, an outdated KOS can be more of a
hindrance than a benefit. Maintenance, both of content and of the
system, should be considered when planning a KOS. This is particularly
important if the digital library is to be self-supporting or revenue
generating.
Version control of the KOS is extremely important. Reloading
a new version from the system provider is one way to
accommodate changes; however, this may not be acceptable if the locally held
version differs substantially from that held by the system's provider.
If there has been significant transformation or processing of the
original KOS, it may be difficult, or impossible, to reload the original
and recreate the changes that have been made.
A transaction-based approach, whereby only changes are
transferred between the KOS provider and the library, is also
possible; however, this requires that the system provider have the
infrastructure, both machine and human, to produce these transactions. It
also requires that the changes to the original KOS be identifiable in
order to create change transactions. For example, Stuart Nelson of
the NLM's UMLS Project recently reported that many systems can
create annual transaction records to inform the UMLS about the
changes that have occurred to the original system. However, the changes
are often not indicated with enough detail to support automatic
change transactions in the UMLS. If a change date, for example, is
recorded only at the level of the concept record, it is impossible to tell
whether the term has changed (a correction of a typographic error for
example) or if the relationship between this concept and another
concept has changed. Since the UMLS splits the incoming terminology
and its relationships into a variety of files, it is often difficult to tell
how the UMLS files must be change based on the changes made
during the maintenance of the original KOS (NISO 1999).
Presenting the Knowledge Organization System to the User
In addition to deciding which KOS should be used and what
functions it should serve, the digital library will need to determine
how to present the KOS to its users. A KOS may be exposed to the user
or made relatively transparent.
The KOS can be exposed to the user in different ways.
Material can be grouped into KOS-related themes or categories on the
digital library's Web site. The KOS may be used at a higher level to
identify specific portals for different uses or users. If the content of the
digital library includes metadata records, the KOS may be displayed as
index terms on the records or in its entirety as a navigation aid
to searching.
In other cases, the KOS may be transparent. For example, a
thesaurus can be used behind the scenes to extend the user's search
to include synonyms, to connect the digital library's resources to
other information and resources, or to filter or rank the information
obtained.
Implementing Knowledge Organization Systems
Acquisition and Intellectual Property Issues
It is critical to properly handle the acquisition of knowledge
organization systems. The first question is whether the KOS is under
copyright. If so, the copyright holder should be contacted concerning
the KOS. It is important to ensure that the apparent contact is the
official one. Many references have been reprinted or put on the Web
without proper acknowledgment of the real owner.
Once the contact has been made, there are several points for
discussion:
- If the provider maintains the KOS, how will the digital library find out about any changes that may be made in it? Is there a notification mechanism in place? How frequently must theinformation be updated to be of benefit to the digital library's users? Will the maintenance be self-evident, or must the agreement include notification requirements? What will the owner do if the maintenance can no longer be performed?
- What will happen if the provider discontinues the product or sells or transfers it to someone else?
- What uses can the digital library make of the KOS under the proposed agreement? As with other licensing, it is advisable to aim for the broadest permissions and the longest term possible. At a minimum, the library should be able to renegotiate the terms of the agreement relatively easily.
- In a networked environment, it is beneficial to develop mechanisms for linking to online versions rather than to maintain a local copy of the resource. This ensures that what is presented is up-to-date, and acknowledges more clearly the ownership of the KOS. However, there are numerous factors to consider. Will the KOS be used on an intranet or behind a firewall, where access to the outside or information coming into the organization might be prohibited? Does the KOS service use "cookies" or require knowledge of the user's Internet provider address? Does it require a user ID and password?
- If the KOS is to be accessed remotely, are there service issues? Is it likely to be accessed with bandwidth, model, and computer speeds that are adequate for outside connections of this type? Is the use of such a critical nature that unreliable service on the part of the KOS or the Internet connection will cause the digital library itself to be viewed as less useful? Does the KOS require a
specialized search engine or search query formulation? Can the digital library system properly display the results, or would the results be better displayed through the KOS system? Will the resulting information be used in its native form or must it be extracted or transformed? If the KOS is to be loaded locally, in what formats can the content be received?
- If the KOS is not available electronically, can it be digitized? Is the owner interested in a cooperative venture, and are the human eand financial resources for such an effort available?
Making the Link
There are two parts to establishing the link between the digital
library and the KOS. The first is locating the key anchor
information in the digital library's resource. The second involves the look
up against the target file. The creation of this link may be more or
less automatic, depending on the particular situation. The
characterization of this activity is meant to be general and to allow both
"on-the-fly" links and embedded links.
Regardless of what function the KOS is going to serve in the
digital library, the essential information contained in the digital
library resource from which the link is to be made must be identified.
The mechanism for doing this depends on the type of object from
which the link is being made and on the information that is expected to
be identified in the digital library's resource.
The first step is to review any metadata related to the digital
library resource. Do the metadata carry the term (such as SIC
code, artist's name, place name, geographic coordinates) that is needed
to make the link? If this information is included, the level at which
the metadata are assigned should be reviewed. If the metadata
indicate the subject matter of the specific resource in which the user will
be interested, the metadata can be used to make the links. However,
in some cases, the terms that appear in the title or description at the
resource level (e.g., the book) may not be indicative of the subject
at the individual item level (e.g., the chapter). Automatically making
a link on the basis of the content description for an entire book
may misrepresent the content of a chapter. Whether or not the
metadata can be used will depend on the amount and type of information
given in the metadata and the level at which the metadata are assigned.
If a text resource in the digital library provides no
appropriate metadata, the procedure for identifying the key information may
involve text analysis. A program to perform simple string searching
or a search engine that can preserve hit locations can be used if the
text string has distinguishing characteristics, such as a database
acronym, or a specific structure, such as a latitude and longitude coordinate.
If the text string has no such cues, text mining or more complex
text-analysis tools may be necessary. These tools use a variety of
semantic and syntactic algorithms to locate key information. There have
been significant advances in commercially available text-mining
tools, such as IBM's Intelligent Agent, which includes specific
algorithms for identification of names of places and persons.
The second step of the linking activity is to make the
connection to the KOS. The methods for doing this vary, depending on
whether the system is being loaded locally or is referenced remotely. If
the system is loaded locally, it is possible to perform a
significant amount of processing to match the two files, assuming that
computer resources of this type are available to the digital library
organization. If the system is only available remotely over the Web, the
interaction will require knowledge of scripting and various Web-based
access techniques. Scripting should be considered in both local and
remote approaches, since the more integrated the linking is with the
resource, the more maintenance may be required if there are changes
in either the resource or the KOS. Regardless of the approach that
is taken, making the link requires an analysis of both the information
in the original digital library material and the corresponding
information in the KOS.
If the KOS is being used as an intermediate file to bridge
between the digital library's resource and another resource, it is
also important to understand the data and the process whereby
the search is performed and information returned from the target
resource. If the KOS must return a value to the original digital
library resource, the data and process must be evaluated in a
bidirectional sense.
Choosing the linking mechanism is equally important. The
link may be fixed or "on-the-fly." In the case of a fixed link, a
specific URL is embedded at the link point in the digital library
material. However, as stated before, problems of persistence are inherent
in this approach. Alternatively, a URN can be used. The URN
requires the creation of a namespace on the point of the target file, and
the search is to this namespace rather than to a specific URL.
Persistent locators (PURLs) and digital object identifiers (DOIs) can also
solve this problem. These schemes are sufficient if the material is an
HTML document.
Content in databases is more difficult to retrieve. The
National Library of Medicine now supports the searching of a variety of
its databases through its Internet Grateful Med (IGM) URL
function. IGM users can create URLs that will actually perform
searches against the databases. For example, the following script would
perform a search for "pneumonia" in the HealthSTAR file:
http://igm-02.nlm.nih.gov/cgi-bin/IGM_robot.pl?datafile=HealthSTAR&search=Subject=pneumonia.
Information on the syntax for creating such a URL is
provided on the NLM Web site. While the intent is that the search URL will
be bookmarked by an individual user, the same concept can be used
for creating an active link at the anchor point for the link. With
additional scripting, the creation of the term
pneumonia can be automatically replaced with an active link that picks up the term where the link has been made.
Summary
The framework for developing an infrastructure to support the
use of KOSs in digital libraries requires an analysis of user needs,
the identification and location of the appropriate KOSs, and the
development of the hardware, software, and network architecture to
support its integration and maintenance. The digital librarian must make
decisions concerning the degree to which they will be presented to
the user, acquisition and intellectual property issues, and
maintenance and update procedures. There are several technical ways to make
the link between the digital library and the KOS. As knowledge
organization systems are increasingly available on the Web,
requirements are beginning to be defined to improve the interoperability and
general use of these resources through the development of
knowledge organization services on the Web.
Next Previous
|