III:Small:Influence and Virus Propagation in Large Graphs - Theory and Algorithms
This material
is based upon work supported by the National Science Foundation
under Grant No. IIS-1017415. Any opinions, findings, and
conclusions or recommendations expressed in this material are those
of the author(s) and do not necessarily reflect the views of the
National Science Foundation.
1. GENERAL
INFORMATION
1.1.
Abstract
Given a graph, such as a social/computer network, or the blogosphere, how will a virus (or rumor or new product) propagate in
it? Will it take over, creating a pandemic? How to select the
'k' best nodes/edges for immunization, or, conversely, the best
'k' blogs for the fastest dissemination of a new idea? The answer to such questions is vital, for public health, for network
security, for market penetration, for blog monitoring and many
more applications.
In the PI's past work, they studied arbitrary graphs, and showed
that propagation depends on a single number, namely, the first
eigenvalue of the adjacency matrix of the network. Specifically,
they studied the so-called 'epidemic threshold' for flu-like
propagation (``SIS'' model = susceptible-infectious-susceptible),
on un-directed, un-weighted static graphs. All earlier work focused on full cliques, or homogeneous graphs, or specific cases
of power-law graphs - *all* of which are special cases of the
PI's eigenvalue result.
The major thrusts of the current proposal are two. The first is
*theory*: For a mumps-like model (``SIR'' = susceptible - infected - recovered), and for additional models, when will a virus result in a pandemic? What can we say about weighted graphs? About
time-evolving graphs, like an ad-hoc network of mobile phone
users? The second thrust is on *algorithms*: Given a graph, a
virus model (SIS, SIR, etc), and a fixed budget of 'k'
nodes/edges to immunize, how can we quickly find an optimal or
near-optimal solution, to best contain the virus? How can we
modify the algorithm, when the network changes over time?
The TECHNICAL MERIT of the work is that it is the first to focus
on *arbitrary* graphs, thus including real ones. In contrast,
the vast majority of past analytical work makes unrealistic assumptions about the graph topology (cliques, homogeneous graphs
etc.).
The BROADER IMPACT is high, as dynamics of large-scale graphs appear in numerous settings: cascades on blogs; product penetration
and viral marketing; rumor/information propagation; immunization
policies; advertisement policies etc.
1.2.
Keywords
Data mining, Virus Propagation, Immunization, Viral Marketing.
1.3. Funding
agency
- NSF, Award Number: IIS-1017415, Duration:
09/01/2010-08/31/2012 (Expected)
2. PEOPLE
INVOLVED
In addition to the PI, the following people work on
the project.
3.
RESEARCH
3.1. Project
goals
There are two motivating questions, one for each proposed thrust: (a) Theory: Given the
virus/influence property and the graph structure, what is the condition that an epidemic will
break out (or a piece of information will survive)? (b) Algorithms: Given the virus/influence
property and the graph structure, which k nodes/edges are the best to immunize so that we can
prevent an epidemic from breaking out to the maximum extent? We give the highlights of each
thrust next.
Theory: The first thrust tries to understand the epidemic threshold of virus propagation. In
our past work, we discovered a general epidemic threshold for SIS model with un-directed static
graphs. First of all, we propose to generalize it to (1) other types of viruses (e.g., SIR), and (2)
more general graphs (e.g., directed graphs). Unlike the existing work
on epidemic threshold, which are designed for
some special graphs; in our proposed work, we aim to address the arbitrary, real graphs. What
is more, we also want to study some important settings, which are barely (if at all) studied by
the existing work. To name a few, we plan to study the epidemic threshold (1) for multiple
competing viruses, (2) on dynamic graphs, etc.
Algorithms: The second thrust involves the development of immunization algorithms. In
our on-going work, we developed a fast near-optimal immunization strategy for SIS model with
un-directed, un-weighted static graphs. We propose to continue this work by generalizing it to
(1) SIR virus propagation model; (2) directed weighted graphs. Unlike the existing immuniza-
tion algorithms which are based on some heuristic and/or designed for some special graphs,
the proposed immunization algorithms will focus on arbitrary, real graphs; and try to (ap-
proximately) maximize the corresponding epidemic threshold. What is more, we also want to
develop immunization algorithms under some important, yet barely (if at all) studied settings
in the existing work, including (1) dynamic immunization, (2) multiple viruses cost-sensitive
immunization. Finally, we want to handle
the scalability issue by implementing our algorithms on map/reduce architecture.
3.2. Current
Results
Our main results so far are as follows:
- We gave the first parameter-free algorithm
to spot the seed nodes
(=patient(s) zero = 'culprits') in an epidemic,
using MDL (Minimum Description
Language) to estimate the most likely count of culprits.
-
We introduced the 'spikeM' model,
where we showed that epidemics and cascades in general,
follow a strange temporal pattern: they have
an exponential growth phase, followed by a power-law
decay phase.
-
We showed how to strengthen a network, given
the ability to add a fixed number of new edges;
conversely, we showed how to weaken it,
given the ability to severe $k$ edges.
-
We studied multiple, competing viruses,
and we were the first to show that, under mild assumptions,
and for arbitrary underlying topology,
we have a 'winner takes all' phenomenon,
no matter how strong is the second-best virus (idea, product).
- We are the first to show the generalized threshold theorem - which nicely de-couples the effect of the topology
and the virus model on the epidemic threshold. Our result unifies and includes as
special case older results and shows that the threshold depends
on the first eigenvalue of the connectivity matrix, (a) for
any static graph and (b) for all propagation models in standard
literature (more than 25, including H.I.V.).
- We derived the first closed formula for the epidemic threshold for the flu-like SIS model for a given set of alternating graphs. We showed
that it depends on the first eigenvalue of the so-called 'system' matrix.
- We developed a general framework to analyze epidemic spreading processes on mobile ad hoc networks, and using our framework, we are the first to derive the epidemic threshold
for any mobility model under the SIS model, and (c) we show that the node velocity in mobility models does not affect the epidemic threshold.
- We developed NetShield, a fast, scalable and provably near-optimal (constant-factor of optimal) algorithm to find the best-k nodes to be immunized
in a graph. We outperform all the prominent competitors including 'Accquaintance' immunization, 'PageRank' etc.
- We also developed fast heuristics for immunization under time-varying graphs which again outperformed other baselines like average degree, maximum degree etc.
- We developed Complex Linear Dynamical Systems (CLDS) to automatically find features for multiple time-sequences. We were able to cluster motion capture and BGP instability (which can be
thought of spreading virally) time-sqeuences automatically with CLDS.
3.3.
Publications
- Efficiently Spotting the Starting Points of an Epidemic in a Large Graph.
B. Aditya Prakash, Jilles Vreeken and Christos Faloutsos.
In Knowledge and Information Systems Journal, Springer. 2013.
- Fractional Immunization on Networks [PDF]
B. Aditya Prakash, Lada Adamic, Theodore Iwashyna, Hanghang Tong and Christos Faloutsos
in SDM 2013, Austin
- Spotting Culprits in Epidemics: How many and Which ones? [PDF]
B. Aditya Prakash, Jilles Vreeken and Christos Faloutsos
in IEEE ICDM 2012, Brussels
Invited to KAIS Journal Special Issue (ICDM Best papers)
- Competing Meme Propagation on Networks: A Case Study of Composite Networks [PDF]
Xuetao Wei, Nicholas Valler, B. Aditya Prakash, Iulian Neamtiu, Michalis Faloutsos and Christos Faloutsos
in ACM SIGCOMM Computer Communication Review, October 2012
- Gelling, and Melting, Large Graphs through Edge Manipulation [PDF]
Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad, Michalis Faloutsos and Christos Faloutsos
in ACM CIKM 2012, Mauii
Received the Best Paper Award (among all three DB, IR, KM tracks)
- Rise and Fall Patterns of Information Diffusion: Model and Implications
[PDF] [ foils ]
[
code ]
Yasuko Matsubara, Yasushi Sakurai, B. Aditya Prakash, Lei Li and Christos Faloutsos
in SIGKDD 2012, Beijing
- Interacting Viruses on a Network: Can both survive? [PDF]
Alex Beutel, B. Aditya Prakash, Roni Rosenfeld and Christos Faloutsos
in SIGKDD 2012, Beijing
- Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks [PDF]
B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos
in Knowledge and Information Systems Journal, Springer. 2012.
- Winner-takes-all: Competing Viruses on fair-play networks [PDF]
B. Aditya Prakash, Alex Beutel, Roni Rosenfeld, Christos Faloutsos
in WWW 2012, Lyon
- Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks [PDF] [extended arXiv version]
B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos
in IEEE ICDM 2011, Vancouver
- Time Series Clustering: Complex is Simpler! [PDF]
Lei Li, B. Aditya Prakash
in ICML 2011, Bellevue
- Epidemic Spread in Mobile Ad Hoc Networks: Determining the Tipping Point [PDF]
Nicholas Valler, B. Aditya Prakash, Hanghang Tong, Michalis Faloutsos, Christos Faloutsos
in IFIP NETWORKING 2011, Valencia
- On the Vulnerability of Large Graphs [PDF][CODE]
Hanghang Tong, B. Aditya Prakash, Charalampos Tsourakakis, Tina Eliassi-Rad, Christos Faloutsos, Duen Horng Chau
in IEEE ICDM 2010, Sydney
- Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms [PDF]
B. Aditya Prakash, Hanghang Tong, Nicholas Valler, Michalis Faloutsos, Christos Faloutsos
in ECML-PKDD 2010, Barcelona
- Parsimonious Linear Fingerprinting for Time Series [PDF][CODE]
Lei Li, B. Aditya Prakash, Christos Faloutsos
in VLDB 2010, Singapore
Last updated: September 23,
2013, by Christos Faloutsos and Alex Beutel