3. Making Resources Accessible to Other Communities
Someone recently compared the Web with a large room filled
with books that were scattered all over the floor. The Web is the
world's largest mass of bits and bytes. It is a meeting place that brings
together disparate communities. The "Internet Commons," as
this meeting place has been called, requires connections between
and among disparate communities in order for an "economy" to
develop (Weibel 1999). This economy will provide the framework
within which both commercial and noncommercial transactions can
occur. KOSs are one means of connecting these disparate
communities. Knowledge organization systems can be used to (1) provide
alternate subject access, (2) add modes of understanding to digital library
resources, (3) support multilingual access, and (4) supply terms for
expansion of free-text searches in domains that are relatively
unknown to the user.
Providing Alternate Subject Access
Alternate subject access refers to the provision of one or more
additional subject orientations that make the resources of the digital
library accessible to different audiences. This approach is particularly
valuable when the digital library resources appeal to groups that do
not share a common terminology. It can be a system of subject
headings, a classification scheme, or any other subject-oriented system.
Alternate subject access can be provided by
- indexing or classifying the resources using multiple schemes,
- retaining original schemes from organizations that contribute to the digital library, or
- mapping between the primary scheme and an alternate scheme.
Indexing the Material with Multiple Schemes
The most direct method for providing alternate subject access to
a collection is by classifying or indexing the resources with
multiple schemes, but it may also be the most costly. This approach
requires redundant cataloging or catalogers who are knowledgeable in
both schemes. It may also require modifications to the cataloging
tools and procedures. However, if the cataloging is at a high level
(resources versus individual documents), or if the schemes are not
difficult or detailed, it may be a reasonable approach.
Retaining Alternate Indexing from Contributors
If the digital library is being built through contributions from a
variety of sources, the originating organization may have applied an
alternate scheme that could be used. For example, the NASA
database on aeronautics and astronautics receives relevant
bibliographic records from other U.S. agencies, such as the Department of
Defense and the Department of Energy. The controlled vocabulary terms
assigned by the contributing organization are processed through a
machine-aided indexing process to create candidate indexing
terms from the NASA Thesaurus for review by NASA's indexers.
However, the final records contain both the NASA
Thesaurus terms and the controlled vocabulary terms from the contributing organization,
with the alternate indexing terms retained in a separate data element
in the bibliographic record. The terms collected from other
organizations can be viewed as an alternate access point, so that at least
part of the collection is accessible through another discipline's terminology.
Mapping Multiple Schemes
The third method for providing alternate subject access is the
most indirect, that of mapping one or more schemes. Several examples
of this approach can be found among A&I services. Both BIOSIS,
the world's largest private sector A&I service in the life sciences, and
the NLM apply MeSH to BIOSIS documents. The records that
BIOSIS contributes to NLM's TOXLINE database are processed
automatically to have appropriate MeSH terms added. This is based on a
mapping of the natural language terms that occur in the toxicology
literature and BIOSIS' normalized natural language keyword
indexing with the MeSH terminology. In the new BIOSIS relational
indexing structure, BIOSIS builds and maintains authority files that
connect natural language disease names to the MeSH-controlled
disease terms. When the BIOSIS indexer assigns the free text keyword for
the disease name, the appropriate MeSH term is also added to the
record as an alternate access point (BIOSIS 1999). The assignment is
based on the development over time of a mapping between the
terminology used by BIOSIS and the MeSH-controlled terms.
In addition to providing alternate access points to BIOSIS
products, the inclusion of the MeSH terms makes it possible to
perform cross database searching on the indexing field with MEDLINE
and other databases that include MeSH terms. From 1999 forward,
users can search BIOSIS databases using MeSH disease terms. The
disease terms can be extracted from the MeSH authority file or from a
MEDLINE record and then used in a search against the BIOSIS files,
or vice versa. This helps users find relevant records that are unique
to either BIOSIS or MEDLINE. The inclusion of terms from an
alternate KOS, such as MeSH, therefore supports the use of BIOSIS by
medical librarians and practitioners who are familiar with MeSH terminology.
A more extensive example of mapping variant schemes is
the metathesaurus developed by the NLM's Unified Medical
Language System (UMLS). This system has linked more than 40 separate
KOSs from various medical specialties. They range from MeSH to
coding and classification schemes used by insurance companies and
physicians to describe treatments and diseases on patient records.
The UMLS is licensed by many other organizations for inclusion in
applications that can bridge various health care communities.
How can digital libraries use alternate indexing? While
many digital libraries do not have the A&I resources of large database
producers such as NLM and BIOSIS, the concept of applying
alternate indexing can be scaled to fit. While the systems described deal
with item-level bibliographic records, alternate indexing can be applied
at several levels. Alternate subject access can be applied only at the
resource level, for the database, electronic book, electronic journal,
or image collection, so that other communities can identify resources
of interest that must then by searched or browsed individually.
This concept is conducive to use with portals that provide access to
the same resources with different views for different audiences.
Alternatively, if the digital library has bibliographic records or
metadata records at a very detailed level, it may be possible to develop
switching programs that will translate concepts from the original
organization of the digital library or resource to that of the alternate scheme.
Adding New Modes of Understanding to the Digital Library
People perceive the world through many modes, including
textual and graphical. Some people comprehend information more easily
in one mode than another. Most people benefit from a variety of
modes that reinforce one another or that can be used when appropriate
to the context. Many digital library projects remain text-based;
however, this text-only dimension is changing as digital libraries
become oriented more to multimedia and as other modes of information
presentation become viable on the Internet.
KOSs can be used to bring new dimensions to an
information resource or a collection in a digital library. In the digital library
environment, these dimensions can be viewed as layers that can be
added on top of one or more objects. Various tools and services can
be developed that are geared to a particular mode. For example, the
results of a text search can be presented in graphical or visual
form, based on the number of occurrences of a term or concept or on
the occurrences of documents from a particular country, journal title,
or author.
A more complex dimension that can be added is the
geospatial dimension, which emphasizes access by place. A "geolibrary" is
defined as a digital library consisting of "geoinformation," or
material that can be accessed by place (National Research Council 1999).
This so-called georeferencing can be either direct (by a geospatial
footprint, a series of latitudes and longitudes for the location) or
indirect (by a textual place name). Georeferencing of textual objects is
facilitated by a gazetteer, which brings together the place name and
the spatial footprint for its location.2 Many gazetteers also include feature types for each footprint. The vocabulary used for the
feature types varies among gazetteers, but may include terms such as
"airport," "harbor," and "railroad station."
Although many organizations, including federal and state
agencies, are currently required to provide geospatial referencing as
part of the National Spatial Data Infrastructure Program, the
geospatial referencing is not readily available for older works.
How can the data sets of today be integrated with the textual information of
yesterday? The answer is by adding geospatial referencing to the
text resource. Geospatial referencing requires that the text name for
a place have an associated spatial footprint. This can be achieved
by using a georeferenced, digital gazetteer that provides geospatial
footprints for place names.
Through this type of knowledge organization system,
place names in a library catalog or bibliographic database can have
footprints assigned (Blair 1999; Tahirkheli 1999). If one or more of the
library's resources have latitude or longitude coordinates in the
catalog record or in the full text but no place name, the coordinates
can be extracted and submitted to the gazetteer service. The service
will return the place name for the footprint. Alternatively, the
resource may have a textual place name. This place name can be
extracted and searched against the gazetteer, and the footprint can be
provided to a mapping application. The latter search may result in more
than one footprint, since place names may be ambiguous. Therefore, it
is important that the user interface be designed to allow the user to
distinguish the locations. Once the footprint has been determined,
a user can access the text resource through a geographic mapping
tool. Alternatively, a user of the text resource can find a set of results
and have the place names displayed as footprints on a map.
In disciplines such as ecology, environmental science, and
even public health and epidemiology, it would be beneficial to build a
digital library with access to such a digital gazetteer service. Users
could then access the system through the text mode or the
geographic mode, depending on their comfort level and the type of
information needed. Presenting the results on a map allows users to make
new associations and analyze the results more easily. Through a
geospatial KOS, they can see connections between disparate data,
because the data are presented in an alternate mode.
Providing Multilingual Access
A third way that KOSs can support the use of digital libraries by
disparate communities is to provide multilingual access. A variety
of sources, including multilingual dictionaries and multilingual
thesauri, can support this type of access.
One of the most extensive multilingual thesaurus efforts is
the Generalized Multilingual Environmental Thesaurus (GEMET)
from the European Environment Agency (EEA), produced by Italy's
research council, the Consiglio Nazionale delle Ricerche (CNR).
The GEMET is available in 12 languages, and plans for a global
environmental thesaurus in many more languages were recently
announced. GEMET is available by agreement with the EEA.
The European Topic Centre on Catalogue of Data Sources in
Germany is developing a system that will link data sources and
metadata information in a virtual library. GEMET will be used to convert
a search in one language into searches for the same concepts in
other languages. Users will retrieve documents not only in their native
language but also in other languages. This will allow data systems
from throughout the EEA and beyond to be accessed as a virtual
library collection with both controlled vocabulary and free-text term
searching in multiple languages.
Expanding Free-Text Search Terms
Free-text searching is the main method of searching on the
Web. Only a small percentage of Web resources have metadata, and
an even smaller percentage have controlled vocabulary assigned.
However, variations in natural language make free-text searching
problematic. Even a knowledgeable user may not know all the
terminology (synonyms or related terms) that can be used in the literature
to express a concept. The problem is exacerbated when the user is
unfamiliar with the topic or is interested in an interdisciplinary
area. How can the user expand his or her search to overcome these
terminology differences? One possibility is to use KOSs as aids to the
selection of free-text keywords.
The Getty Vocabulary Project emphasizes support for
searching as a significant application of its vocabularies. Harpring (1999)
reports that the vocabularies are increasingly being used in search
engines to look for different terms that refer to the same concept.
The Getty vocabularies (the Art and Architecture Thesaurus, the
Union List of Artists Names, and the Thesaurus of Geographic Names)
are particularly rich in equivalence relationships. "When these
equivalence relationships are exploited in search engines, there are
typically two possible scenarios: the user may be allowed to first query
the vocabulary database, locating appropriate terms, and then
applying those chosen terms in a query across target databases; or there
may be little or no user interaction with the vocabulary, when the
vocabularies are used behind the scenes [to expand the search] . . . "
(Harpring 1999). Getty developed a prototype called
a.k.a. to experiment with the use of equivalence terms to broaden or narrow
searches across databases on the Web.
In addition to expanding routine search queries, KOSs can
be used in Web mining tools. Northern Light has developed a Web
mining tool that reportedly returns a high degree of relevant hits.
The KOS that supports the Northern Light site was built by
ingesting large existing vocabularies and thesauri. The result was then
organized under an extensive classification scheme developed by
Northern Light. The terms can be used to extend a user's search or to
distinguish between multiple meanings of the terms supplied by
the user. The results of a search are organized into "folders" based on
the classification scheme. These high-level categories, represented by
the folders, help distinguish multiple meanings of the same term.
For example, an ambiguous word such as "pitcher" might result in
two folders being presented to the user. One folder would be
titled "Sports" (as in baseball pitcher), the second "Decorative Arts" (as
in water pitcher). The user who chooses only the Sports folder will
be presented with only those Web resources that use "pitcher" in
the baseball sense. The user who selects the folder called
"Decorative Arts" will be presented only with those resources that are related
to water pitchers.
KOSs can be very powerful in supporting free-text
searching within digital libraries and in integrating Web resources into
existing digital libraries. However, these systems must be used with
caution. KOSs have generally been developed for a specific discipline, task,
or function, or for the indexing of a specific collection or
database. Therefore, depending on the domain in which the KOS is being
used and the complexity of the system, it may or may not suggest
relevant free-text terms. Expanding a search with related terms, rather
than pure synonyms, may return hits that are only peripherally
relevant to the user.
Summary
One of the benefits of the Internet, the Web, and digital libraries
is the degree to which resources can be made available to broader
audiences. The technology facilitates the connection of disparate
knowledge communities at the network level. However, discovery of
the resources and true accessibility require that the content and its
organization be understood by these disparate communities. By
providing alternate subject access, adding modes of understanding,
supporting multilingual access, and supplying terms for
expanding free-text searching, KOSs can facilitate discovery and
understanding by disparate communities, and allow these communities to
interact in new ways.
Footnotes
2. A recent National Science Foundation-sponsored workshop, "Digital
Gazetteer Information Exchange," addressed the issues of digital gazetteers. One of
the critical issues is that there is no standard for the interchange of
information, either to provide gazetteer information physically to another gazetteer or
to interoperate with one or more distributed gazetteers through the Internet.
The workshop participants emphasized the need for such protocols and
for enhancements to current gazetteers. (Many gazetteers do not include
coordinates or are incomplete in this regard.) The goal is to develop a digital gazetteer
service that can be accessed by any application.
Such a service is central to the vision of a geolibrary. A report on
distributed geolibraries from the National Research Council (1999) envisions the
geolibrary as a physical globe. One would walk into such a geolibrary and be
confronted not by a card catalog or an OPAC terminal but by a large physical globe. The
user would indicate his or her area of interest by pointing to a place on the globe.
The librarian would use the geospatial location information to retrieve and
present materials related to that place. By comparing feature types, the user could ask
for other place names and locations that were similar to the original.
Significant work into digital gazetteer services and geospatial libraries
has been conducted by the Alexandria Digital Library (ADL) Project at the
University of California at Santa Barbara, with support from the National Science
Foundation's Digital Library Initiative-1 (Hill and Zheng 1999). An ADL Gazetteer
was created by merging place name authority files from the National Image
Mapping Agency and the U.S. Board on Geographic Names of the U.S. Geological
Survey. The project also added controlled feature types to the gazetteer. With the aid of
a visualization tool, the information can be provided on a map and accessed
using other geographic visualization tools.
Next Previous
|