Goldenberg et al. ICML 2006

From ScribbleWiki: Analysis of Social Media

Jump to: navigation, search
Created and maintained by Sachin Agarwal (User:Sachina)
HomePage: Sachin Agarwal  Contact me

Anna Goldenberg, Alice Zheng, Exploratory study of a new model for evolving networks, Workshop on Statistical Network Analysis: Models, Issues, and New Directions at the 23rd International Conference on Machine Learning (ICML 2006), Pittsburgh, PA USA

Contents

Exploratory study of a new model for evolving networks

This paper propose generative model of evolving social networks. The model allows for birth and death of social links and addition of new people. The social interaction spheres for each person is called "context" and a distribution is assumed for that person over "contexts". It focuses on interpersonal relationships over time. The authors study the statistical properties of simulated networks using this model relative to the famous social networks. The parameters are learned using Gibbs sampling.

There has been some previous work on social network modeling. One of the approaches[ Snijders et al. ] model network using regression on sufficient statistics of graph motifs such as dyads and triads. [ Albert et al. ] and [ Newman ]propose generative models using random graphs to mimic large-scale average behaviors such as degree distribution. Although these graphs are generated dynamically, their links cannot be modified once they are established which contradicts the real-life behavior.

The authors state an example which is used to present and motivate their model in paper. It has been exactly stated in paper as "Imagine that Andy moves to a new town. He may find some new collaborators at work, make friends at parties, or meet fellow gym-goers while exercising. In general, Andy lives in a number of different spheres of interaction or contexts. As time goes on, he may find himself repeatedly meeting certain people in different contexts, consequently developing stronger bonds. Acquaintances he never meets again may quickly fade away. Andy’s new friends may also introduce him to their friends (a well known transitive phenomenon called triadic closures in social science".

The Model

Notation

Dynamic Contextual Friendship Model (DCFM)allows the addition of new people in network at each time step. T denotes the total number of time steps and N_t denotes number of people at time t. M_t denotes the number of people added to the network. Hence N=N_{t-1} + M_t where N is the total number of people at any time step. W denotes the weight matrix and W^t denotes weight matrix at time step t.

The Generative Process

Figure 1. Graphical representation of one time step of the generative model (Goldenberg et al. ICML 2006)
Figure 1. Graphical representation of one time step of the generative model (Goldenberg et al. ICML 2006)

Every person selects his/her context R_i^t from a multinomial distribution whose parameters are sampled from a Dirichlet prior. A DYAD^t represents the number of possible pairwise meetings at time t. DYAD^t is defined as DYAD^t={(i,j)| 1<=i<=N_t, i<j<=N_t}. The random variable F_{ij}^t is sampled from a bernoulli distribution where F_{ij}^t=1 denotes that i and j meet at time t. The parameters of the bernoulli distribution denotes the effectiveness of their meeting depending on the friendliness parameter both for each i and j (i and j should be in same context).

The newcomers at time step t may form triadic closures with the people already in context at time step t-1. It is assumed that the probability that a newcomer j is introduced to existing person i is proportional to the weight of the links between i and the people whom j meets in his context. A TRIAD^t is defined as {(i,j)| 1<=i<=N_t-1, 1<=j<=M_t}. The random variable G_{ij}^t is sampled from a Bernoulli distribution whose parameters depend on the weights of links between i and people whom j meets in that context.

The weights are Poisson distributed. If i and j meet at a time step (F_{ij}^t=1 or G_{ij}^t=1), then the mean of Poisson distribution of W{ij}^t is equal to some multiple (r_h) of their old connection. This multiple signifies the effectiveness of their meeting. If they do not meet, then their mean weight will decrease with rate (r_l <1). r_h and r_l have conjugate gamma distribution priors. The complete graphical model has been displayed in Figure 1.

Experiments

Figure 2. Effects of the friendliness parameter on a network of 200 people with fixed contexts (Goldenberg et al. ICML 2006)
Figure 2. Effects of the friendliness parameter on a network of 200 people with fixed contexts (Goldenberg et al. ICML 2006)
Figure 3. Effects of the frequency of context switching on a network of 200 people. (b = 8) (Goldenberg et al. ICML 2006)
Figure 3. Effects of the frequency of context switching on a network of 200 people. (b = 8) (Goldenberg et al. ICML 2006)
Figure 4. Log-log plot of the degree distributions of a network with 200 people. (Goldenberg et al. ICML 2006)
Figure 4. Log-log plot of the degree distributions of a network with 200 people. (Goldenberg et al. ICML 2006)
Figure 5. Birth (top) and death (bottom) of links in a network of 600 people over 600 time steps. (Goldenberg et al. ICML 2006)
Figure 5. Birth (top) and death (bottom) of links in a network of 600 people over 600 time steps. (Goldenberg et al. ICML 2006)
Figure 6. Weight dynamics for 4 different pairs in a network of 600 people over 600 time steps. (Goldenberg et al. ICML 2006)
Figure 6. Weight dynamics for 4 different pairs in a network of 600 people over 600 time steps. (Goldenberg et al. ICML 2006)
Figure 7. Weekly email exchange counts for four randomly selected pairs between 136 Enron employees. (Goldenberg et al. ICML 2006)
Figure 7. Weekly email exchange counts for four randomly selected pairs between 136 Enron employees. (Goldenberg et al. ICML 2006)

The metrics used in experiments are:

  • Degree distribution: degree of a node is the number of its neighbors. The degree distribution in large natural networks follow power law distribution.
  • Clustering coefficient: It is the ratio of the number of edges that exists between its neighbors and the number of edges that would exist if neighbors form a clique. The clustering coefficient of whole network is average over all nodes.
  • Average path length: It is the average of shortest paths between every pair of nodes in a graph.
  • Effective diameter: It is the maximum of shortest path distances between any pair of nodes.

In all the experiments number of contexts K=10. The fixed settings for different parameters have been specified in the paper.

Friendliness

The friendliness parameter is sampled from Beta(a,b). The effect of b (keeping a=1) has been demonstrated on various metrics discussed above in figure 2. The results show that as people become less friendly, the average node degree decreases. The clustering coefficient increases as friendliness goes down. This is because low friendliness makes small clusters which become more densely connected than bigger clusters. Average path and effective diameter has large variance at low friendliness levels.

Frequency of context switching

Figure 3 shows the effect of context switching between on a 200 node network. It shows a phase transition every 30 steps. When people switch too frequently, they cannot meet everybody in the context before moving on. Thus they have fewer neighbors and form small clusters. The average path length and effective diameter are slightly long. When people never switch contexts, the clustering coefficient is high because everybody in the same context knows everybody else, and average path length and diameter are long because there are few paths to people outside of the current context.

Degree distribution

The model is used to generate variety of degree distributions. Lower levels of friendliness have more power law like degree distributions while higher level results in long tails. Figure 4 shows when people are more friendly, the drop off in the number of people with high node degree is slower than would be expected under the power law.


Birth and death of links

Figure 5 shows link birth rates as the proportion of newly established ties to the number of possible births, and link death rates as the proportion of the number of deaths to the number of links that exist at that point in time. At the beginning, there are few existing links. Therefore the birth rate is relatively high. The birth rate becomes smaller at larger values of t. At each context switch, a fresh pool of possible connections becomes available, and weaker links from previous connections are now more likely to die out.

Weight Distributions

The authors compare evolution of simulated weights with email exchange in the Enron dataset. Figure 6 shows the weight progressions over time in simulated network and Figure 7 shows patterns of weekly email exchanges between Enron employees.

References

Views
Personal tools
  • Log in / create account