Mining Large Time-evolving Data Using Matrix and Tensor
Tools
ICML 2007 tutorial, Cornvallis, OR, USA
DESCRIPTION - OBJECTIVES
How can we find patterns in sensor streams (eg., a
sequence of temperatures, water-pollutant measurements, or
machine room measurements)?
How can we mine Internet traffic graph over time?
Further, how can we make the process incremental?
We review the state of the art in four related fields:
(a) numerical analysis and linear algebra (b) multi-linear/tensor
analysis
(c) graph mining and (d) stream mining.
We will present both theoretical results and algorithms as well as
case studies on several real applications.
Our emphasis is on the intuition behind each method,
and on guidelines for the practitioner.
CONTENT AND OUTLINE
Foils in pdf
- Part I. Core
- Data model - Fundamental concepts
- Time series
- Matrices
- Tensors
- Matrix analysis
- SVD, PCA and eigen-decomposition
- Page-rank, HITS
-
sparse decompositions: CUR
-
Co-clustering and cross-associations
- Tensor analysis
- Intro
- Parafac
- Tucker Model
- Tucker 1 and PCA;
- Tucker 2 and Tensor PCA;
- Tucker 3 and High-order SVD (HO-SVD)
- Other models
- Combination of PARAFAC and Tucker
- DEDICOM
- Part II. Extensions
- Non-negativity
- Nonnegative matrix factorization
- Nonnegative tensor factorization
- Missing values
- Stream mining
- Incremental PCA
- Dynamic tensor analysis
- Window-based tensor analysis
- Part III. Practitioner's guide
- Software
- Intro
- Issues: Scalability, Accuracy, Sparsity
- Case studies
- sensor network, machine monitoring
- Internet forensic computing
- social network analysis
- web graph study
WHO SHOULD ATTEND
Researchers who want to get up to speed with the major tools
in stream mining, graph mining. Also, practitioners who want a concise,
intuitive
overview of the state of the art.
ABOUT THE INSTRUCTORS
- Christos Faloutsos is a Professor
at Carnegie Mellon University. He has received the Presidential Young
Investigator Award by the National Science Foundation (1989), seven
``best paper'' awards, and several teaching awards. He has served as a
member of the executive committee of SIGKDD; he has published over 140
refereed articles, one monograph, and holds five patents. His research
interests include data mining for streams and networks, fractals,
indexing for multimedia and bio-informatics data bases, and
performance.
- Tamara G. Kolda is a researcher at
Sandia National Laboratories in Livermore, California and has received
the Presidential Early Career Award for Scientists and Engineers
(2003). She has published over 25 refereed articles and released
several software packages including the MATLAB Tensor Toolbox. She is
an associate editor for the SIAM Journal on Scientific Computing. Her
research interests include multilinear algebra and tensor
decompositions, data mining, optimization, nonlinear solvers, graph
algorithms, parallel computing and the design of scientific software.
- Jimeng Sun is a PhD candidate in
Computer Science Department at Carnegie Mellon University. His rearch
interests include data mining on streams, graphs and tensors, anomaly
detection.
Last updated: June 22, 2007