TITLE: Mining Large Graphs: Laws and tools. INSTRUCTORS: Christos Faloutsos and Jure Leskovec (CMU) INTENDED DURATION: 3 hours OVERVIEW How do graphs look like? How do they evolve over time? How can we find patterns, anomalies and regularities in them? The tutorial has three parts. In the first, we review some static and temporal 'laws', like the 'six degrees of separation', and we also describe several graph generators that try to match properties of real graphs. In the second part, we present powerful tools for the analysis of static graphs, like the Singular Value Decomposition, the CUR decomposition, community detection and graph partitioning, and related tools. We show how they can be used for community and topic detection on real datasets, for discovery of important nodes and important connections on social networks, computer networks, and the web. In the third part we focus on time-evolving graphs, where we cover tensors, internet traffic analysis, influence propagation in blogs, and virus propagation results. The emphasis is on the intuition behind all these tools, and on their practical impact for the analysis of large, real datasets. The more theoretical aspects and proofs are delegated to the bibliography that the tutorial will provide. TARGET AUDIENCE and EXPECTED PRIOR KNOWLEDGE The target audience is data management and machine learning researchers and professionals who work on static or time-evolving graphs. The expected prior knowledge is a Bachelors in Computer Science or equivalent, for maximum benefit. Specifically, the course needs college level background in matrix algebra, machine learning and data base management. However, the tutorial will mainly emphasize the intuition behind all the tools, so that any college graduate would be able to follow the vast majority of the tutorial material. OUTLINE Part 1) Laws and Generators [1 hour] * Laws: - Power law degree distributions - small diameters - densification * Graph Generators - Erdos-Renyi and phase transitions - preferential attachment and variants - Kronecker generators and fitting - Sampling from graphs Part 2) Tools for static and dynamics graphs [1 hour] * Singular Value Decomposition (SVD) * HITS, PageRank * Other decompositions (CUR) * Tensors and Higher-Order SVD * Connection subgraphs, CenterPiece subgraphs Part 3) Case studies [1 hour] * Virus propagation and epidemic thresholds * Influence propagation in blogs * Cost-effective outbreak detection * Web projections (* Monitoring communities over time - intrusion detection) (* Frequent subgraphs) RELATED TUTORIALS: A related tutorial with about 40% overlap was delivered at KDD 2004 (see www.cs.cmu.edu/~christos/TALKS/KDD04-tut) RELATED MATERIAL: See the foils at http://www.cs.cmu.edu/~jure/pubs/graphs-nescai07tutorial.pdf http://www.cs.cmu.edu/~christos/courses/826.S06/FOILS-pdf/420_GraphMining1.pdf http://www.cs.cmu.edu/~christos/courses/826.S06/FOILS-pdf/430_GraphMining2.pdf INSTRUCTORS' QUALIFICATIONS AND BIOS Christos Faloutsos (www.cs.cmu.edu/~christos/short-bio.txt). He is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), the ICDM 'research contributions' award in 2006, and several ``best paper'' and teaching awards. He has delivered over 20 tutorials in database and data mining conferences, he has published over 160 refereed articles, one monograph, and holds five patents. His research interests include data mining for graphs and streams, fractals, indexing methods for spatial and multimedia bases, and data base performance. Jure Leskovec (www.cs.cmu.edu/~jure). He is a PhD candidate in machine learning at Carnegie Mellon University. He received the ACM KDD 2005 ``best research paper'' award, he won the ACM KDD cup competition in 2003, he holds two patents, and he is a Microsoft Research Graduate Fellow. His research interests include machine learning and data mining on large real-world graphs and networks.