Indexing and Mining Streams.
Christos
Faloutsos, CMU
DESCRIPTION
- OBJECTIVES
How
can we find patterns in a sequence of sensor measurements (eg., a
sequence
of temperatures, or water-pollutant measurements)? How can we compress
it? What are the major tools for forecasting and outlier detection? The
objective of this tutorial is to provide a concise and intuitive
overview
of the most important tools, that can help us find
patterns
in sensor sequences. Sensor data analysis becomes of increasingly high
importance, thanks to the decreasing cost of hardware and the
increasing
on-sensor processing abilities. We review the state of the art in three
related fields: (a) fast similarity search for time sequences, (b)
linear
forecasting with the traditional AR (autoregressive) and ARIMA
methodologies
and (c) non-linear forecasting, for chaotic/self-similar time
sequences,
using lag-plots and fractals. The emphasis of the tutorial is to give
the
intuition behind these powerful tools, which is usually lost in the
technical
literature, as well as to give case studies that illustrate their
practical
use.
NOTICE: At SIGMOD, Prof.
Dennis Shasha will be delivering a related but complementary tutorial,
which will discuss
multi-window techniques for burst detection, moving window correlation,
a query language for order and applications in physics, music and finance.
FOILS
The pdf of the foils is
here
CONTENT AND OUTLINE
- why we need
similarity search
- distance functions
(Euclidean,
LP norms,
time-warping)
- fast searching (R-trees,
M-trees)
- feature extraction (DFT,
Wavelets, SVD,
FastMap)
- main
idea behind linear
forecasting
- AR
methodology
- multivariate
regression
- Recursive
Least Squares
- de-trending;
periodicities
- Non-linear/chaotic
forecasting
- main
idea: lag-plots
- 'fractals'
and 'fractal
dimensions'
- definition
and intuition
- algorithms
for fast computation
- case
studies
GOAL - WHO SHOULD ATTEND
Researchers
that want to get up to speed with the major tools in time sequence
analysis.
Also, practitioners who want a concise, intuitive overview of the state
of the art.
PREREQUISITES:
None.
The emphasis is on the intuition behind all these mathematical tools.
PRESENTER - BIO
Christos
Faloutsos is a Professor at Carnegie Mellon University. He has received
the Presidential Young Investigator Award by the National Science
Foundation
(1989), four ``best paper'' awards,
and several teaching awards. He is a member of the executive committee
of
SIGKDD; he has published over 120 refereed articles, one monograph, and
holds four patents. His research interests include data mining in
streams and graphs, fractals,
indexing methods for multimedia and text data bases, and data base
performance.