Large graph mining: patterns, tools
and case studies
Christos Faloutsos and Hanghang Tong,
Carnegie
Mellon University
How do graphs look like? How do they evolve over time? How
can we find patterns, anomalies and regularities in them? How to find
influential nodes in the network? We will present both theoretical results and
algorithms as well as case studies on several real applications. Our emphasis
is on the intuition behind each method, and on guidelines for the practitioner.
The tutorial has the following parts: (a) Statistical
properties and models and graph generators of static and evolving networks. (b) Tools for the analysis of static and dynamic graphs, like the
Singular Value Decomposition, tensor decomposition for community detection,
HITS/PageRank etc. (c) Proximity measurements on graphs, the
main ideas to quantify the closeness of two nodes of the graph, fast algorithms
to compute the proximity scores, applications of proximity, like CenterPiece
subgraphs, pattern match, trend analysis etc. (d) Case studies
of how a virus or information or influence spreads through the network, how
to find influential bloggers or nodes to target for viral marketing, how
to find fraudsters on eBay, how to find communities on graphs.
Keywords: Graph mining, linear algebra, SVD, tensors, pageRank
Foils (to show up)
The
goal of the tutorial is to cover the most powerful tools for the analysis of
large, real graphs. The tutorial starts old and new patterns that most real
graphs obey (small diameter, power laws etc). It continues with powerful,
traditional tools from linear algebra (singular value decomposition SVD,
eigenvalue analysis); it shows that they form the basis for the extremely
successful PageRank and HITS algorithms; and it concludes with more advanced
tools, namely, sparse low rank approximations ('CUR' and derivatives).
The next part focuses on proximity of two nodes on a graph, and how to assess
it. We describe several measures (electric current, maximum flow, escape
probability), we compare them and we focus on the most successful ones and on
fast algorithms to compute them.
The tutorial concludes with several case studies: influence propagation, fraud
detection on e-bay, a survey of algorithms for community detection and graph
partitioning, and a description of the map/reduce method for the analysis of
Tera- and Peta-byte scale graphs.
The
proposed format is 3 hours.
·
Part
I: Patterns [0.5h - Faloutsos]
The target audience is data management, data mining and
machine learning researchers and professionals who work on static or
time-evolving graphs and want to know about tools and models when dealing with
large network datasets.
Prerequisites: Computer science background (B.Sc. or equivalent);
familiarity with undergraduate linear algebra (eigenvectors). The tutorial will
focus on intuition and examples, carefully introducing only the minimal
necessary mathematical tools, and always focusing on practical applications.
|
Christos Faloutsos is a Professor at Carnegie Mellon
University. He has received the Presidential Young Investigator Award by the
National Science Foundation (1989), the Research |
|
Hanghang
Tong
is a senior Ph.D. student in the Machine Learning Department at Carnegie
Mellon University. He has received best paper awards from SIAM-DM 2008
and ICDM 2006, and he has 25 refereed publications. He holds an M.S. degree
and a B.S. degree from Tsinghua University, P.R. China. His research
interests include data mining for multimedia and for graphs. (Full CV
at www.cs.cmu.edu/~htong/pdf/cv_Tong.pdf
) |