TWC: Medium: Collaborative: Know Thy Enemy: Data Mining Meets Networks for Understanding Web-Based Malware Dissemination
This material
is based upon work supported by the National Science Foundation
under Grant No. CNS-1314632. Any opinions, findings, and
conclusions or recommendations expressed in this material are those
of the author(s) and do not necessarily reflect the views of the
National Science Foundation.
1. GENERAL
INFORMATION
1.1.
Abstract
How does web-based malware spread? We use the term web-based malware to describe malware that is distributed through websites, and malicious posts in social networks. We are in an arms race against web-based malware distributors; and as in any war, knowledge is power. The more we know about them, the better we can defend ourselves. Our goal is to understand the dissemination of web-based malware by creating "MalScope," a suite of methods and tools that uses cutting-edge approaches to build spatiotemporal models, generators and sampling techniques for malware dissemination. From a scientific point of view, this project brings together two disciplines: Data Mining and Network Security. The outcome is a suite of novel, sophisticated, and scalable techniques and models that will enhance our understanding of malware dissemination at a large scale. We use two types of web-based malware dissemination data: (1) user machines accessing dangerous sites and downloading web-based malware; and (2) Facebook users being exposed to malicious posts. We already have and will continue to obtain more data from our industry partners (e.g., Symantec's WINE project), open-access projects, or collect on our own (e.g., MyPageKeeper).
The broader impact of our work is that it will enable the development of security solutions for end-users and industry. A 15-minute network outage costs a 200-employee company about $40K, while identity theft costs about $1,500 per person on average. By knowing the enemy better, security researchers and industry can more effectively stop the interconnected manifestations of Internet threats: identity theft, the creation of botnets, and DoS attacks. The PIs have a track record of technology transfer, with collaborators at industrial labs (Yahoo, MSR, Symantec, AT&T, IBM), national labs (LLNL, Sandia), open-source software ("Pegasus"), and spin-off startups (StopTheHacker). Educational impacts include developing a new course, providing publicly available educational material, and open-source software.
1.2.
Keywords
Data mining, web-based malware dissemination, graph mining.
1.3. Funding
agency
- NSF, Award Number: CNS-1314632, Duration: September 1, 2013 - August 31, 2017 (Estimated)
2. PEOPLE
INVOLVED
The following professors are co-PIs on this project:
The following CMU graduate students work on the project
Postdocs and other collaborators:
3. RESOURCES
Refereed publications:
-
Yasuko Matsubara, Yasushi Sakurai, Willem van Panhuis,
and Christos Faloutsos
FUNNEL: Automatic Mining of Spatially Coevolving Epidemics
KDD 2014, New York City, NY, USA, Aug. 24-27, 2014.
-
Yasuko Matsubara, Yasushi Sakurai, and Christos Faloutsos
AutoPlait: Automatic Mining of Co-evolving Time Sequences
SIGMOD'14, Snowbird, Utah, USA, June 22-27, 2014.
-
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos
and Shiqiang Yang,
CatchSync: Catching Synchronized Behavior in Large Directed Graphs
KDD 2014, New York City, NY, USA, Aug. 24-27, 2014.
-
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang,
Inferring Strange Behavior from Connectivity Pattern in Social Networks
PAKDD'14, Tainan, Taiwan, May 13-16, 2014.
-
Ching-Hao Mao, Chung-Jung Wu, Evangelos E. Papalexakis, Christos Faloutsos, Kuo-Chen Lee, and Tien-Cheu Kao.
Malspot: Multi2 malicious network behavior patterns analysis.
In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 1-14. Springer International Publishing, 2014.
-
Danai Koutra, Di Jin, Yuanchi Ning, and Christos Faloutsos.
Perseus: an interactive large-scale graph mining and visualization tool.
Proceedings of the VLDB Endowment 8.12 (2015): 1924-1927.
-
Alceu Ferraz Costa, Yuto Yamaguchi, Agma Juci Machado Traina, Caetano Traina Jr, and Christos Faloutsos.
RSC: Mining and modeling temporal activity in social media.
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015.
-
Kijung Shin, Bryan Hooi, and Christos Faloutsos.
M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees.
European Conference on Machine Learning and Principles
and Practice of Knowledge Discovery in Databases (ECML-PKDD) 2016,
Riva del Garda, Italy
-
Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos,
FRAUDAR: Bounding Graph Fraud in the Face of Camouflage
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2016,
San Francisco, USA
-
Miguel Araujo, Pedro Ribeiro, and Christos Faloutsos.
FastStep: Scalable Boolean Matrix Decomposition.
Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer International Publishing, 2016.
-
Evangelos E. Papalexakis, Bryan Hooi, Konstantinos Pelechrinis, Christos Faloutsos.
Power-Hop: A Pervasive Observation for Real Complex Networks.
PLoS ONE 11(3), 2016.
-
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos.
D-Cube: Dense-Block Detection in Terabyte-Scale Tensors.
ACM International Conference on Web Search and Data Mining (WSDM) 2017, Cambridge, UK
-
Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, Disha Makhija, and Mohit Kumar.
ZooBP: belief propagation for heterogeneous networks.
Proceedings of the VLDB Endowment 10.5 (2017): 625-636
-
Dhivya Eswaran, Stephan Günnemann, and Christos Faloutsos.
The Power of Certainty: A Dirichlet-Multinomial Model for Belief Propagation.
Proceedings of the 2017 SIAM International Conference on Data Mining (SDM) 2017, Houston, USA
-
Daniel YT Chino, Alceu F. Costa, Agma JM Traina, and Christos Faloutsos.
VolTime: Unsupervised Anomaly Detection on Users' Online Activity Volume.
Proceedings of the 2017 SIAM International Conference on Data Mining (SDM) 2017, Houston, USA
-
Tsubasa Takahashi, Bryan Hooi, and Christos Faloutsos.
AutoCyclone: Automatic Mining of Cyclic Online Activities with Robust Tensor Factorization.
Proceedings of the 26th International Conference on World Wide Web (WWW) 2017, Perth, Australia
-
Bryan Hooi, Kijung Shin, Hyun Ah Song, Alex Beutel, Neil Shah, and Christos Faloutsos.
Graph-Based Fraud Detection in the Face of Camouflage
ACM Transactions on Knowledge Discovery from Data (TKDD) 11.4 (2017): 44
-
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos.
DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams.
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2017, Halifax, Canada (to appear)
-
Hemank Lamba, Bryan Hooi, Kijung Shin, Christos Faloustos, and Juergen Pfeffer.
zooRank: Ranking Suspicious Entities in Time-Evolving Tensors.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) 2017, Skopje, Macedonia (to appear)
-
Bryan Hooi, Shenghua Liu, Asim Smailagic, and Christos Faloutsos.
BEATLEX: Summarizing and Forecasting Time Series with Patterns.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) 2017, Skopje, Macedonia (to appear)
Refereed publications joint with the co-PIs
-
Pravallika Devineni, Danai Koutra, Michalis Faloutsos, and Christos Faloutsos.
If walls could talk: Patterns and anomalies in Facebook wallposts.
Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) 2015, Paris, France
-
Venkata Krishna Pillutla, Zhanpeng Fang, Pravallika Devineni, Christos Faloutsos and Danai Koutra.
On Skewed Multi-dimensional Distributions:
the FusionRP Model, Algorithms, and Discoveries
Proceedings of the 2017 SIAM International Conference on Data Mining (SDM) 2016, Miami USA
-
Priya Govindan, Sucheta Soundarajan, Tina Eliassi-Rad, and Christos Faloutsos.
NIMBLECORE: A Space-efficient External Memory Algorithm
for Estimating Core Numbers.
Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) 2016, San Francisco, CA
-
Huy Hang, Adnan Bashir, Michalis Faloutsos, Christos Faloutsos, Tudor Dumitras:
"Infect-me-not": A User-centric and Site-centric Study of
Web-Based Malware.
IFIP networking, 2016, Vienna, Austria.
-
Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos.
CoreScope: Graph Mining Using k-Core Analysis - Patterns, Anomalies and Algorithms.
IEEE International Conference on Data Mining (ICDM) 2016, Barcelona, Spain
-
Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos.
Patterns and Anomalies in k-Cores of Real-World Graphs with Applications.
Knowledge and Information Systems (2017): 1-34
Papers of the co-PIs
-
Guowu Xie, Huy Hang, Michalis Faloutsos.
Scanner Hunter: Understanding HTTP scanning traffic.
ASIACCS 2014, Kyoto, Japan.
Tutorials
Ph.D. Thesis
- Alex Beutel
User Behavior Modeling with Large-Scale Graph Analysis
available as CMU Tech report CMU-CS-16-105, 2016.
- Evangelos (Vagelis) Papalexakis
Mining Large Multi-Aspect Data: Algorithms and Applications
available as CMU Tech report CMU-CS-16-124, 2016.
Last updated: July 31, 2017, by Christos Faloutsos