Overview
PEGASUS is a Peta-scale graph mining system, fully written in Java. It runs in parallel, distributed manner on
top of Hadoop. Hadoop is a cloud computing platfrom, as well as an open source implementation of MapReduce
framework which was originally designed for web-scale data processing by Google.
PEGASUS provide large scale algorithms for important graph mining tasks:
•
Degree
•
PageRank
•
Random Walk with Restart (RWR)
•
Radius
•
Connected Components
The details of PEGASUS can be found in the following paper:
U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos.
PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations.
IEEE International Conference On Data Mining 2009, Miami, Florida, USA.
Graph Mining with PEGASUS
Graph Mining is an area of data mining to find patterns, rules, and anomalies of graphs.
Why Should We Care?
Graphs or networks are everywhere, ranging from the Internet Web graph, social networks(FaceBook, Twitter),
biological networks, and many more. Finding patterns, rules, and anomalies have numerous applications including,
but not limited to, the followings:
•
Ranking web pages by search engine
•
'viral' or 'word-of-mouth' marketing
•
Patterns of disease with potential impact for drug discovery
•
Computer network security: email/IP traffic and anomaly detection
Why PEGASUS?
Existing works on graph mining has limited scalability: usually, the maximum graph size is order of millions.
PEGASUS breaks the limit by scaling up the algorithms to billion-scale graphs. The breakthrough was possible by
the careful algorithm design and implementation for Hadoop, a massive cloud computing platform. To summarize,
PEGASUS has three major advantages.
1.
Large Graph Mining Package
Graphs with billions of nodes and edges
2.
Parallel Algorithms on Hadoop
Massive cloud computing platform
3.
Open Source
Apache Public License 2.0
Thanks to PEGASUS, we could analyze one the largest publicly available Web Graphs, from Yahoo!, with 6,7 billion
edges.
Publicity
PEGASUS is gaining popularity among academia, as well as from industries.
The PEGASUS paper received the best paper runner-up award at
International Conference on Data Mining (ICDM) 2009
The PEGASUS web site has been visited by people from 64 countries.
What is Pegasus?
DOWNLOAD
USING PEGASUS
PUBLICATIONS
ABOUT
SCHOOL OF COMPUTER SCIENCE