Radius Plot
The radius distribution is plotted by the plot radius
[graph_name] command. The output file yweb_radius.eps
is generated in the current directory. Here is the Radius
distribution plotted.
PageRank Plot
The PageRank distribution is plotted by the plot
pagerank [graph_name] command. The output file
yweb_pagerank.eps is generated in the current
directory. Here is the PageRank distribution plotted.
Plotting Results
Once you have run the algorithms, you can plot the results to find interesting patterns and anomalies. We will show
how to plot the distributions of degree, pagerank, radius, and the correlations among them.
Degree Plot
The degree distribution is plotted by the plot deg [graph_name] command.
The output file www_deg_inout.eps is
generated in the current directory. Here is the
degree distribution plotted.
Overview
This demo shows how to use PEGASUS for mining large graphs. We will analyze a web graph by computing the
degree, PageRank, radius distributions, and the correlations among them. This demo is composed of the following
four parts:
1.
Interactive Shell
2.
Managing graphs
3.
Running algorithms
4.
Plotting results
Interactive Shell
PEGASUS supports an interactive shell so that users can manage graphs, run algorithms, and generate plots. To
access the shell, type pegasus.sh in the PEGASUS installation directory. Then, the PEGASUS shell will appear. For
available commands in the shell, type help.
Managing graphs
To use PEGASUS, the graphs to be analyzed should be uploaded to the HaDoop File System (HDFS). In the shell,
the add command is used for uploading a graph to HDFS. To add a local edge file 'www_edges.tab' to HDFS and
name it to 'www', issue the following command:
add www_edges.tab www
You can see the list of the current graphs by the list command.
As you see, the graph 'www' is added to HDFS. Now we are ready to run algorithms.
Running Algorithms
We will compute the degree, PageRank, and the radius of the www graph. For the purpose, we use the compute
command.
Degree
To compute the degree, use the compute deg [graph_name] command. On entering the command, it will ask
additional parameters: the type of the degree, and the number of reducers. In this demo, we use inout for the
degree type, and 10 for the number of reducers. After entering the parameters, the degree is computed on Hadoop.
When the computation is finished, you will see the following messages.
PageRank
To compute the PageRank, use the compute pagerank [graph_name] command. On entering the command, it will
ask additional parameters: the number of nodes in the graph, the number of reducers, and whether to symmetrize
the graph. In this demo, we use 325729 for the number of nodes, and 10 for the number of reducers, and 'nosym'
which means not to symmetrize the graph. After entering the parameters, the PageRank is computed on Hadoop.
When the computation is finished, you will see the following messages.
Radius
To compute the Radius, use the compute radius [graph_name] command. On entering the command, it will ask
additional parameters: the number of nodes in the graph, the number of reducers, and whether to symmetrize the
graph. In this demo, we use 325729 for the number of nodes, and 10 for the number of reducers, and makesym
which means to symmetrize the graph so that we get the undirected radius. After entering the parameters, the
radius is computed on Hadoop.
When the computation is finished, you will see the following messages.
Correlation Plots
In addition to the distribution of individual properties of graph, you can plot the correlation plots of the two
properites. PEGASUS generates three correlation plots: degree vs. PageRank, radius vs. PageRank, and radius vs.
degree. To generate the correlation plots, type plot corr [graph_name] command.
Then, the following three output files are created: [graph_name]_deg_radius.png,
[graph_name]_pagerank_deg.png, and [graph_name]_pagerank_radius.png. Here are sample outputs.
Demo
DOWNLOAD
USING PEGASUS
PUBLICATIONS
ABOUT
SCHOOL OF COMPUTER SCIENCE