![dot](../../img/dot_clear.gif) |
2. Linking Digital Library Resources to Related Resources
This section emphasizes the ability of knowledge organization
systems to link digital library resources to other related resources.
The basis for this linking is the identification of information within a
digital resource that can be extracted and used to search and locate
information within a KOS. The KOS may then be used to expand
codes to more explanatory full text, to provide more descriptive records,
or to link entity names to resources of physical specimens.
Expanding Codes to Full Text
Practitioners of a discipline use coding schemes to facilitate
communication within that discipline. It is often helpful to connect
these coding schemes to the full names for which the code stands. The
examples provided here include links between databank
registration codes and the biological sequence data, and between
industrial codes and the full name that the code represents.
Linking Sequence Numbers to Biosequence Databanks
The lengthy biochemical and genetic sequences that molecular
biologists, biotechnologists, and geneticists identify each day are kept
in databanks. Several databanks have been developed, for example,
to cover protein sequences, nucleotides, and cell lines. One of the
largest databanks contains information on the mapping of the
human genome. As molecular biologists began to discover these
sequences, they reported them in scientific journals. Difficulties in
composing, proofreading, and printing the text soon arose. Through an ad
hoc standards process, major biomedical publishers agreed to require
the inclusion of codes or databank numbers for these sequences in
articles when they are published. In addition, the sequence itself
must be registered in a databank before the paper can be published.
Some of the most frequently referenced databanks are listed
on the Web site of the National Center for Biotechnology
Information. They include GenBank and the Research Collaboratory for
Structural Bioinformatics Protein Data Bank. Each sequence number is
different, but all begin with a persistent code identifying the databank.
How can the link be made between the literature and the
databank? Through a search profile, a text analysis program, or
keyword indexing, the text can be analyzed and the sequence databank
numbers identified. An active link can be embedded. The active link
consists of a search strategy (possibly written as a CGI script) to
locate that sequence number in the databank where the actual sequence
is stored. When the user clicks on the active link, the script is
generated and launched from the user's browser. The Web-enabled database
is searched, and the sequence record is returned to the user.
Depending on the services provided by the databank site, the user can
analyze the sequence using a number of tools provided by the databank
or download the sequence for local manipulation.
This type of connection exists between the National Library
of Medicine's (NLM) search service, PubMed, and GenBank at the
National Center for Biotechnology Information. If a search in
PubMed yields records that have GenBank numbers, the user can
automatically search and display the sequence records from GenBank.
Linking Individual Industrial Codes to the Full Scheme
In business, classification schemes serve to communicate
important facts about a company or product. These codes are generally
controlled by a government, professional, trade, or international
standards organization. They often serve as shorthand for users
interested in material in a particular area of industry or a specific
business sector.
Perhaps the most familiar scheme is the SIC code, which was
last updated in 1987. The SIC codes have been used by the U.S.
government, economists, financial markets, regulators, and
procurement offices to identify manufacturing, agriculture, and service sectors
of the economy. In 1997, a new scheme was approved for use
within the United States. The North American Industrial Classification
System was developed with Canada and Mexico as a means of
providing an agreed-upon scheme for the collection, reporting, and
analysis of information about the economy by sector, both within and
across borders. Information about NAICS is available from the Web site
of the U.S. Census Bureau (see references for address).
The digital library can provide related information by using
the authority files for the coding schemes as a linked authority file. If
a company or economic sector mentioned in the digital library's
collection can be linked to an SIC or NAICS code, the code can be
searched against the official tables of definitions maintained by the U.S.
Census Bureau. These files provide definitions of the codes and
place each code in the classification scheme with other economic sectors.
The digital library's content can be further enhanced by
making a link between the SIC and NAICS codes. If the digital library
resource has the SIC code, it can be extracted and searched against
the Census Bureau's 1997 NAICS and 1987 SIC Correspondence Tables.
The table returns the corresponding code from the alternate scheme.
Linking to Descriptive Records
Linking the name of an entity, such as a personal name,
organization, or location, to additional information about that entity was one
of the first uses of hyperlinking. Knowledge organization systems
such as dictionaries, glossaries, and classification schemes can be used
to link the entities in one resource to richer descriptions of that entity
in another resource. This is particularly helpful for users who are
new to a topic and in cases where the additional information can
make the user's task more efficient.
The examples that follow are from three disciplines. The first
example links organism names to records that not only describe
the species more fully but also put it in the context of the overall
classification scheme for living organisms. The second example links
chemical names to descriptive records and molecular structures. In
the third example, proper names are linked to the biographies for
the person.
Linking Organism Names to Taxonomic Records
Genus-species names are the Latin names for
organisms e.g., plants, animals, and microorganisms. Taxonomists, who study
and classify living organisms, create records for each of these
organisms. Generally, these records are linked relationally to the other
organisms in a hierarchy. Beyond the organism name and the
information that it and its placement in the hierarchy convey, taxonomic
records use other elements to describe the organism. These may include
distribution patterns, the authority for naming and classification,
and the date the organism was identified. Scientists base the
information on specimens that are retained because they serve as the physical
evidence of the description. Natural history museums, private
collections, and individual scientists number, or code, the specimens
in their collections. Sometimes specimens are supported by
photographs or line drawings, which may be digitized.
By using a taxonomic authority file as an intermediate
authority file, one can link a text or an image file containing a name or
picture of an organism to additional related information. By
automatically processing the text or embedding a link from the organism name
in the text or from the image to the taxonomic authority record, one
can extend the knowledge conveyed by the text. The text can include
the descriptive and historical information in the taxonomic record
and, ultimately, link to a photograph, a drawing, or appropriate video
or audio segments.
Because of the ambiguity in organism names, many examples
of this type are now created manually. However, depending on the
extent of the files involved, the ambiguity of the Latin and
common names for organisms can be overcome. An example of a
taxonomic intermediate file is the Integrated Taxonomic Information
System (ITIS). ITIS is a partnership of U.S., Canadian, and Mexican
government agencies, private organizations, and taxonomic specialists
cooperating to develop an online, scientifically credible list of
biological names of North American plants and animals. It is used by
many U.S. government agencies for consistent naming of plants and
animals for regulatory and monitoring purposes. To link textual
material in a digital library to the ITIS record, the organism name can
be identified manually or automatically in the text and submitted as
a query to the ITIS database. When a match is found, ITIS presents
the ITIS record, which provides essential information about the
organism. The information includes synonymous names, including
some common names, and an indication of the placement of the
organism in the larger taxonomic classification scheme.
Linking Chemical Names to Molecular Structures
The unique identification for a chemical substance is not its name
but its molecular structure. However, chemical names are
commonly used in research documents, project plans, catalogs, and
directories, all of which may be resources in a digital library. There are
competing systems of nomenclature (i.e., that of the Chemical Abstracts
Service [CAS] and of the International Union of Pure and
Applied Chemistry) as well as common and commercial synonyms.
The ambiguity is resolved by providing links between the
chemical names in the text and the molecular structure. This is
done through a chemical registry number or code that is connected to
a particular chemical name (using certain nomenclature
standards) and an authority record that provides additional information
about the chemical. This information includes the chemical's
synonyms and some of its chemical and physical properties. Most important
in today's research environment is the link from this authority file to
a chemical structure file. Structure files, used with the
appropriate software, graphically depict the molecular structure. This
sophisticated software allows for three-dimensional visualization,
rotation, and substitution of the chemical bonds.
An example of the use of the chemical registry number to
link chemical names with molecular structures can be seen in the work
of BIOSIS, the world's largest not-for-profit producer of biological
and biomedical databases. In 1993, BIOSIS began processing its
bibliographic citations (titles and keywords) to automatically
identify chemical names (Hodge, Nelson, and Vleduts-Stokolov 1989).
BIOSIS assigns CAS Registry Numbers (RNs) to the chemical
names identified in this process. In the STN International online
system, hosted in the United States by CAS, a user of BIOSIS can select
one or more of the records resulting from a search and extract the
RN. The extracted RN can be applied against the CAS Registry
File, which contains more than 21 million substances, including
organics, inorganics, biosequences, metals, and alloys. The registry file
record for the chemical name, including the link to the synonyms for
the chemical name and the structure file itself, can then be
accessed. With special tools developed by CAS, the structure can be
viewed and manipulated. It can be imported into modeling tools that
allow the chemist to manipulate the structure and thereby envision
new chemicals. Alternatively, the user can start with any database
that contains CAS RNs and extract the resulting RNs to perform a
search for complementary bibliographic records in the BIOSIS database.
Linking chemical names to structures using RNs on a large
scale is neither inexpensive nor easy. There are two approaches to
identifying chemical names in text. Some journal articles include the
CAS RN for the major chemicals discussed. In this case, an analysis of
the text for the terms "RN," "CAS RN," and variations preceding
numerics can identify RNs that can be used as a link. Alternatively,
a program to identify chemical names in text, similar to that
developed by BIOSIS, could be devised. Developing the identification
program, as well as searching chemical databases, is costly; however, if
the digital library has license agreements for chemistry databases,
this type of linkage may be possible. In addition, many
organizations have small chemical files of their own that may include RNs and
other information of particular relevance to the organization's
research. It may be possible to link to these local databases using methods
that are more direct.
Linking Personal Names to Biographical Information
A common type of authority file is the personal name
authority, which controls variants of personal names. For example, the
Library of Congress Name Authority File (LCNAF) is used to control
variant personal names for authors, editors, artists, and others. The
Union List of Artist Names (ULAN), developed by the Getty
Vocabulary Program, is another example. Name authorities serve as tools for
catalogers and indexers. They ensure that the proper form of the
name, rather than an unapproved variant, is used and bring together
all works by or about the person.
A name authority file can also be used to link a
bibliographic record or document containing the person's name to a variety of
other related materials. If the digital library's resource has a
standardized form of the name, it can be identified and searched against
the authority file to locate variants. The standardized and variant
forms can be joined in a search against a variety of other resources that
can provide related information.
For example, in the case of a digital library of images of
artists' works or biographical or critical text, a name authority file such
as the ULAN or the LCNAF can act as an intermediate file to
provide additional information. The file, which contains integrated
variant names, can be searched by the name appearing in the digital
library collection. When the record is found, the information about the
artist can be displayed, providing a wide range of contextual material
for the user. Citations to significant biographical or critical works
about the artist, some of which may also be available on the Web, may
also be provided in the name authority file.
The variant names from a name authority can also be used
to locate and provide automatic links from the personal name in
the text to a biography, without requiring that the name be presented
in the same fashion in the two resources. One such resource that
could be linked to for biographical information is Gale's Biography
Resource, which contains more than 142,000 biographies and
related citations from more than 1,000 periodicals.
However, to produce this kind of link, there must be a
mechanism for locating personal names in text. Several programs can
do this type of text analysis; among those that have been
developed commercially are NameFinder from the Carnegie Group and the
Intelligent Agent from IBM. In addition, variant names can be
extracted from the name authority itself, grouped, and run as a
search against the text to locate name occurrences.
Linking Entity Names to Physical Specimens
In some cases, it is possible to go another step and connect
entity names in the digital library resources to physical specimens.
The curation of physical specimens or artifacts is critical to the
advancement of many disciplines. Exhibition catalogs describe the art
objects in a particular exhibition. Museum catalogs provide inventories
of the art, natural history, or cultural objects held by a particular
museum. These catalogs, increasingly available as computerized
databases, are knowledge organization systems that not only provide
descriptive records but also point to the location of the object in
a museum, an archive, or another collection.
For example, in biology, a physical specimen is particularly
important when it is the result of the discovery and description of
a new organism or of the reclassification of a known organism. A
type specimen is the example collected from the field by a taxonomist
to serve as the prime example for the description of the organism
and the validation of its taxonomic classification and naming. These
specimens are held by natural history collections, and their deposit is
required by the rules of various taxonomic societies.
As part of the curatorial activity, the collections assign
identification codes. While the primary use of identification codes has been
to organize the physical collections, numerous projects are under
way in the natural history community to digitize photographs of
specimens and create database records for the specimens, including
their identifiers, and thereby make them more readily accessible. The
degree of digitization varies from specialty to specialty. For example,
in botany, virtually all significant research herbaria are digitally
cataloging their type collections instead of maintaining paper
records. Many are also making digital photographs of the type
specimens available over the Web.
The publication of identification codes in the journal literature
is also changing. Historically, identification codes have been
presented in the "Materials Used" sections of journal articles. The level of
specificity of the identification code has varied, depending on the
biological discipline. For example, botanical journals tend to list only
the institution and the catalog, while vertebrate journals provide
the code to the specimen level. The current trend is to require lists
of specimens that are more detailed. As the lists become longer and
the printing costs increase, journal publishers are beginning to
request links to independent Web sites maintained by the researchers or
their organizations that carry all the specimens used in the study and
provide some level of identification.
If the digital library collection contains resources that include
the identification codes, these codes can be extracted and
matched against the Web-based catalogs or databases. This link can
provide users with location and contact information to allow them to
access the physical object mentioned in the digital library resource.
Curators or registrars of artistic, archaeological, and cultural
history collections also assign inventory or accession numbers to
items in their collections. Identification numbers may also be found
in scholarly catalogues raisonnés. Links similar to those described
for natural history can be made between text related to works of art
and the physical work in a particular collection. An article about a
work of art can be linked to additional information about the
physical specimen by linking the identification number in the text with an
online catalog containing the number and additional information
about the work.
As museums digitize their collections to establish a presence
on the Web or to reduce the handling of the physical objects, KOSs
that can link the digital library resources to the physical object are
being developed. If there is a museum with a collection that
complements that of the digital library, it is worthwhile to discuss ways in
which the digital library and digital museum collections may "co-evolve."
Summary
Digital libraries can use KOSs to link digital resources to other
digital resources or, indirectly, to physical objects. A simple example is
the expansion of codes and acronyms. Descriptive records may also
be provided either directly from the KOS or indirectly by using the
KOS to capture a search key that can be used to access another
resource. This concept may be taken a step further by using a KOS, such as
a museum or exhibition catalog, to provide information about the
location of the physical object.
Next Previous
|