Title: Sensor data mining: similarity search and pattern analysis
Instructor: Christos Faloutsos, CMU
DESCRIPTION - OBJECTIVES
How can we find patterns in a sequence of sensor measurements (eg.,
a sequence of temperatures, or water-pollutant measurements)? How can
we compress it? What are the major tools for forecasting and outlier
detection? The objective of this tutorial is to provide a concise and
intuitive overview of the most important tools, that can help us find patterns
in sensor sequences. Sensor data analysis becomes of increasingly high
importance, thanks to the decreasing cost of hardware and the increasing
on-sensor processing abilities. We review the state of the art in three
related fields: (a) fast similarity search for time sequences, (b) linear
forecasting with the traditional AR (autoregressive) and ARIMA methodologies
and (c) non-linear forecasting, for chaotic/self-similar time sequences,
using lag-plots and fractals. The emphasis of the tutorial is to give the
intuition behind these powerful tools, which is usually lost in
the technical literature, as well as to give case studies that illustrate
their practical use.
FOILS
In PDF
CONTENT AND OUTLINE
Similarity Search
-
why we need similarity search
-
distance functions (Euclidean, LP norms, time-warping)
-
fast searching (R-trees, M-trees)
-
feature extraction (DFT, Wavelets, SVD, FastMap)
-
Linear Forecasting
-
main idea behind linear forecasting
-
AR methodology
-
multivariate regression
-
Recursive Least Squares
-
de-trending; periodicities
-
Non-linear/chaotic forecasting
-
main idea: lag-plots
-
'fractals' and 'fractal dimensions'
-
definition and intuition
-
algorithms for fast computation
-
case studies
WHO SHOULD ATTEND
Researchers that want to get up to speed with the major tools in time sequence
analysis. Also, practitioners who want a concise, intuitive overview of
the state of the art.
ABOUT THE INSTRUCTOR
Christos Faloutsos is a Professor at Carnegie Mellon University. He has
received the Presidential Young Investigator Award by the National Science
Foundation (1989), three ``best paper'' awards (SIGMOD 94, VLDB 97, KDD01-runner-up),
and four teaching awards. He is a member of the executive committee of
SIGKDD; he has published over 100 refereed articles, one monograph, and
holds four patents. His research interests include data mining, fractals,
indexing methods for multimedia and text data bases, and data base performance.