Position paper
for GeoLibraries Workshop, June 15-16 1998
Christos
Faloutsos
Carnegie Mellon University
WHAT IS A GEOLIBRARY?
Geo-referenced facts, like "Olympic games in Atlanta, Georgia,
Aug 1996". The facts can be explicity geo-referenced (eg., through
a GPS reading), or implicitly, as above. Facts can be
- text (like the example above)
- a formatted record (in a relational/SQL form)
- multimedia (eg., a video clip, or an image, or a voice clip)
The geo-referencing should include the precision, and could potentially
include elevation and/or time-stamp. Thus, it would be of the form: (x
+/- Dx, y +/- Dy, z +/- Dz, t +/- Dt)
TYPES OF QUERIES
We envision two major types of queries: (A) queries on geographic attributes
and (B) Spatial data mining. Examples of the first are selections and 'spatial-joins'
on geographic attributes:
- find tour-guides for Washington DC
- find the 5 nearest restaurants to our hotel
- Overlay a climate map (avg temperature) with an income map
Examples of data mining are:
- Find correlations between weather, vegetation and economic growth of
U.S.A. regions
- find articles on tornado touch-downs; geo-reference them and plot them
DATABASE PERSPECTIVES
Collaboration between Geography, Databases, Information Retrieval
seems necessary and very promising. Database research has much to offer:
- mature tools for spatial and temporal data ("R-trees", clustering
methods, fast algorithms for spatial joins and overlays)
- interoperability tools ('mediators', 'multi-databases', data warehouses)
to handle multiple information sources and to map them into a unified schema
- Traditionally, the emphasis in database research is on large datasets.
The results are scaleable algorithms, compression, multi-resolution representation.
Data mining (which requires AI, DB, Statistics) also has a lot
to offer:
- spatial data mining (‘find correlations between distance from lakes
and house values’)
- powerful statistics tools (regression, singular value decomposition,
forecasting)
- several visualization tools (scatter plots, parallel axis, interactive
visualization)
NEEDS
It seems that standards of representation are necessary. Specifically,
standards should be established to represent geographical entities (points
(cities, etc), lines and polylines (roads, rivers), regions (counties,
lakes, islands)).
In that respect, an XML extension would seem very promising, to facilitate
exchange, and processing of geographic data.