Lecture 8: Evolution (1/1)

How does time modify large networks? (i.e., how will FB look in 10 years?)

COMS 4995-1: Introduction to Social Networks Thursday, September 22nd

1 Outline

* Why is that important?

* What we could expect from models so far? * What is empirically observed? o Densification, shrinking diameter * How can we explain and understand it?

2 Motivation

* So far, we have only analyzed static graphs o But many social properties are dynamic o One may think that social graphs are results of dynamics going on (see part II of the lecture) * Understanding evolution has direct implications o Are graph generators/models accurate o Extrapolation (what if the network increases?) o Anomaly detection (see part III of the lecture)

3 Outline

* Why is that important?

* What we could expect from models so far? * What is empirically observed? o Densification, shrinking diameter * How can we explain and understand it?

4 Previous models

* Arrival of new nodes: o Users joining Facebook, G+, etc. from invitation o Citation/Collaboration: new paper/movie/webpage o Communication Netw.: new routers, new subscribers * These nodes connect to the current graph: o Creating new connections, citations, links, etc.

* Let the process runs n steps, look at final result!

5 Previous assumptions

* Degree = constant or slowly varying with size o Typically fixed using the original graph to model o Assume that even if population grows large, each node remains with a finite local neighborhood

* Distance = slowly growing with size o With constant degree, should be ~ log(N) o Explains the “small-world” phenomenon

6 Example 1: Unif. Random Graph

* N goes to infinity, p=c/N (resp. p=c*log(N)/N) o Avg Degree : deg(N) ~ c (resp. deg(N) ~ c*log(N)) o Ensures giant connected component, connectivity o Diameter connected comp. grows as log(N)

7 Example 2: the Copying model

* N comes and use preferential attachment o Nodes join sequentially, out-degree is fixed, o Some very large degree, but avg remains constant o Diameter connected comp. ~ log(N)

Riordan, O., & Bollobas, B. (2004). The diameter of a scale-free random 8 graph. Combinatorica. Example 3: Augmented lattice

* Nodes have constant # of neighbors o Diameter ~ log(N) (in fact, greedy routing ~ log(N)2) o Same result for any augmentation

9 Outline

* Why is that important?

* What we could expect from models so far? * What is empirically observed? o Densification, shrinking diameter * How can we explain and understand it?

10 Paper Distillation

DATA SET EMPIRICAL RESULT Citation (arXiv, US-patents) 1. Densification (degree grows polyn.) Graph Evolution: Densification and Affiliation (arXiv, IMDB) Shrinking Diameters 2. Shrinking diameters

JURE LESKOVEC Technology (Inter-AS links BGP) Carnegie Mellon University effective diameter Communication (Email, and effect of missing past CHRISTOS FALOUTSOS Recommendation) Carnegie Mellon University effect of disconnected component How do real graphs evolve over time? What are normal growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution3. over long periods,Relation densification/degree it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time with the number of edges growing superlinearly in the number of nodes. Second, the average distance between nodes often shrinks over time in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) or O(log(log n)). Existing graph generation models do not exhibit these types of behavior even at a qualitative level. We provide a new graph generator, based on a forest fire spreading process that has a simple, MODEL intuitive justification, requires very few parameters (like the flammabilityANALYSIS of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.

Community Guided Attachment (CGA) This material is based on work supported by the National Science FoundationProof that CGA produces densification under Grants No. IIS-0209107 SENSOR-0329549 EF-0331657IIS-0326322 IIS- 0534205, CCF-0325453, IIS-0329064, CNS-0403340, CCR-0122581; a David and Lucile Packard Foundation Fellowship; and also by the Pennsylvania Infrastructure Technology Alliance (PITA), a partnership of Carnegie Mellon, Lehigh University and the Commonwealth of Pennsylvania’s Department of Community and Economic “Topical hierarchy + distance matters” Development (DCED). Additional funding was provided by a generous giftand heavy-tailed degree from Hewlett-Packard. J. Leskovec was partially supported by the Microsoft Research Graduate Fellowship. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties. Author’s address: J. Leskovec, Machine Learning Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, 15213 Pittsburgh PA, USA; email: jure@ cs.cmu.edu. Permission to make digital or hard copies part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along Forest Fire (FF) with the full citation. Copyrights for components of this work owned by othersFF numerically produces densification, than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific per- mission and/or a fee. Permissions may be requested from the Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax 1 (212) 869-0481, or [email protected]. “attachment with exploration” + heavy tailed degree and shrinking C 2007 ACM 1556-4681/2007/03-ART2 $5.00. DOI 10.1145/1217299.1217301 http://doi.acm.org/ ! 10.1145/1217299.1217301 ACM Transactions on Knowledge Discovery from Data, Vol. 1, No. 1, Article 2, Publicationdiameter date: March 2007.

Leskovec, J., Kleinberg, J., & Faloutsos11 , C. (2007). Graph evolution: Densification and shrinking diameters. ACM Trans. KDD Temporal Data Sets

* Citation (arXiv, US-patents)

* Affiliation (arXiv, IMDB)

* Technology (Inter-AS links BGP), Communication (Email, Recommendation)

12 Analysis of degree evolution o How degree grows as time passes? o How degree grows a size grows? obeys power law degree ~ Na-1 #{edges} ~ Na a=1 (sparse, cst deg), a=2 (dense, cst fract.)

Leskovec, J., Kleinberg, J., & Faloutsos13 , C. (2007). Graph evolution: Densification and shrinking diameters. ACM Trans. KDD Example of Results

Leskovec, J., Kleinberg, J., & Faloutsos14 , C. (2007). Graph evolution: Densification and shrinking diameters. ACM Trans. KDD Example of Results How to define diameter D ? o D=max{ d(u,v) | u,v in V } o D=∞ if not connected, so only connected pairs o D large only only a pair, so 90% percentile of distance

Diameter shrinks with time!

Leskovec, J., Kleinberg, J., & Faloutsos15 , C. (2007). Graph evolution: Densification and shrinking diameters. ACM Trans. KDD Validation (1): The missing past o Partial data set: What about citations before 93? o Post-t0, Post-t0-no-past

o Little effect so this should not explain shrinkage

Leskovec, J., Kleinberg, J., & Faloutsos16 , C. (2007). Graph evolution: Densification and shrinking diameters. ACM Trans. KDD Validation (2): Connectedness o Random graph: giant component arrives and then distance shrinks

Leskovec, J., Kleinberg, J., & Faloutsos17 , C. (2007). Graph evolution: Densification and shrinking diameters. ACM Trans. KDD Densification: Flickr and Y! 360

Kumar, R., Novak, J., & Tomkins, A. (2010). Structure and evolution of 18 online social networks. Diameter: Flickr and Y! 360

Kumar, R., Novak, J., & Tomkins, A. (2010). Structure and evolution of 19 online social networks. Paper Distillation

DATA SET EMPIRICAL RESULT Citation (arXiv, US-patents) 1. Densification (degree grows polyn.) Graph Evolution: Densification and Affiliation (arXiv, IMDB) Shrinking Diameters 2. Shrinking diameters

JURE LESKOVEC Technology (Inter-AS links BGP) Carnegie Mellon University effective diameter JON KLEINBERG Cornell University Communication (Email, and effect of missing past CHRISTOS FALOUTSOS Recommendation) Carnegie Mellon University effect of disconnected component How do real graphs evolve over time? What are normal growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution3. over long periods,Relation densification/degree it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time with the number of edges growing superlinearly in the number of nodes. Second, the average distance between nodes often shrinks over time in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) or O(log(log n)). Existing graph generation models do not exhibit these types of behavior even at a qualitative level. We provide a new graph generator, based on a forest fire spreading process that has a simple, MODEL intuitive justification, requires very few parameters (like the flammabilityANALYSIS of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.

Community Guided Attachment (CGA) This material is based on work supported by the National Science FoundationProof that CGA produces densification under Grants No. IIS-0209107 SENSOR-0329549 EF-0331657IIS-0326322 IIS- 0534205, CCF-0325453, IIS-0329064, CNS-0403340, CCR-0122581; a David and Lucile Packard Foundation Fellowship; and also by the Pennsylvania Infrastructure Technology Alliance (PITA), a partnership of Carnegie Mellon, Lehigh University and the Commonwealth of Pennsylvania’s Department of Community and Economic “Topical hierarchy + distance matters” Development (DCED). Additional funding was provided by a generous giftand heavy-tailed degree from Hewlett-Packard. J. Leskovec was partially supported by the Microsoft Research Graduate Fellowship. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties. Author’s address: J. Leskovec, Machine Learning Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, 15213 Pittsburgh PA, USA; email: jure@ cs.cmu.edu. Permission to make digital or hard copies part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along Forest Fire (FF) with the full citation. Copyrights for components of this work owned by othersFF numerically produces densification, than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific per- mission and/or a fee. Permissions may be requested from the Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax 1 (212) 869-0481, or [email protected]. “attachment with exploration” + heavy tailed degree and shrinking C 2007 ACM 1556-4681/2007/03-ART2 $5.00. DOI 10.1145/1217299.1217301 http://doi.acm.org/ ! 10.1145/1217299.1217301 ACM Transactions on Knowledge Discovery from Data, Vol. 1, No. 1, Article 2, Publicationdiameter date: March 2007.

Leskovec, J., Kleinberg, J., & Faloutsos20 , C. (2007). Graph evolution: Densification and shrinking diameters. ACM Trans. KDD Summary

* Across 8 various data-sets + 2 online social services: o Densification: degree increases polynomially with n o Shrinking diameter: network becomes closer * Questions: o Can we reproduce this property? o Is there a general predictive model? o Is densification responsible for shrinking diameter?

21 Outline

* Why is that important?

* What we could expect from models so far? * What is empirically observed? o Densification, shrinking diameter * How can we explain and understand it?

22 Origin of densifications?

* Model1: One could simply add densification o Nodes join sequentially, adding Na-1 edges (a in [1;2])

* But why should they obey this law? o Who decides a? from which dynamics?

23 A hierarchical model

* Model2: Community Guided Attachment (CGA) o Ass. 1: we all belong to a series of nested sets

o Ass. 2: Probability to link decreases with distance

c>1 called difficulty constant

24 A hierarchical model

* Let us consider the leaves of a regular b-ary tree * Thm1: as n grows large, the avg degree d is

25 26 A hierarchical model

* Let us consider the complete regular b-ary tree * Thm1: as n grows large, the avg degree d is

27 Summary

* When difficulty < expansion (i.e., when c < b ) o Densification occurs, and heavy tailed degree

* This model does not exhibit shrinking diameter

28 A local exploration model

* Model3: Forest Fire (FF) o Ass. 1: Every node find first a point of attachment o Ass. 2: Then it explores the graph

1. Choose an anchor (uniformly in the graph) 2. Choose a # out-edges, # in edges

29 A local exploration

1. Choose an anchor (uniformly in the graph) 2. Choose a number of neighbors

No out-edges, Ni in edges to follow according to geometric variable survival p, and pb

3. Choose neighbors (uniformly at random), 4. Apply recursively step 2 to these neighbors, without revisiting previously seen nodes

30 An example

31 Graph Evolution: Densification and Shrinking Diameters • 29

4 4 10 10

3 3 10 10

2 Shrinking diameter 2 10 10 Count Count

1 1 10 10

0 0 10 0 1 2 3 10 0 1 2 3 * Densification is observed 10 10 10 10 10 10 10 10 Node in−degree Node out−degree In-degree Out-degree * If parameter p and pb are well chosen: Fig. 6. Degree distribution of a sparse graph with decreasing diameter (forward-burning proba- bility: 0.37, backward probability: 0.32).

7 7

6.5 6.5

6 6

5.5 5.5

5 5

4.5 4.5

4 4 Effective diameter Effective Effective diameter Effective

3.5 3.5

3 3

2.5 2.5 2 3 4 5 6 0 0.5 1 1.5 2 2.5 3 10 10 10 10 10 Number of nodes 5 x 10 Number of nodes (a) Effective diameter (b) Effective diameter (log nodes) p = 0.37, pb = 0.34 p = 0.37, pb = 0.34 32 Fig. 7. Evolution of effective diameter of the Forest Fire model while generating a large graph. Both plots show the same data; the left one plots on linear scales and the right one plots on log-linear scales (effective diameter vs. log number of nodes). Error bars show the confidence interval of the estimated effective diameter. Notice that the effective diameter shrinks and then slowly converges.

at time t 1, or we can have some probability q > 0 that a newcomer will form no links (not= even to its ambassador) and so become an orphan. We find that such variants of the model have a more pronounced decrease in the effective diameter over time, with large distances caused by groups of nodes linking to different orphans gradually diminishing as further nodes arrive to connect them together.

Multiple ambassadors. We experimented with allowing newcomers to choose more than one ambassador with some positive probability, that is, rather than burning links starting from just one node, there is some probability that a newly arriving node burns links starting from two or more. This extension also accentuates the decrease in effective diameter over time as nodes linking to multiple ambassadors serve to bring together formerly far-apart parts of the graph.

ACM Transactions on Knowledge Discovery from Data, Vol. 1, No. 1, Article 2, Publication date: March 2007. A Sharp phase transition

30 • J. Leskovec et al. * As a function of p b DPL DPL a sharp transition 2 2 2 2

very clique−like very sparse clique−like o Small pb: sparse sparse graph graph graph graph

o High pb: dense Diameter Diameter

0 Diameter factor 0 Diameter factor Densification exponent * Seems simultaneous Densification exponent 1 1

32 J. Leskovec et al. transition • 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8

1 1 a=1.3 a=1.6Forward burning probability Forward burning probability =−0.05 =−0.20 0.8 0.8 (a)We fix burning ratio, r =0.5 (b) We fix backward-burning probability pb= 0.3 0.6 0.6 and vary forward-burning probability p and vary forward-burning probability p

0.4 0.4 Backward burning ratio Backward burning ratio burning Backward 0.2 0.2 Fig. 8. We vary the forward-burning probability while fixing burning ratio (a) or backward-burning

0 0 0.4 0.5 0.6 0.7 0.8 0.9 1 0.4 0.5probability0.6 0.7 0.8 0.9 (a).1 The plot gives a very precise cut through Forest Fire parameter space. Notice that Forward probability Forward probability (a) Densification exponent a = 1.3 (b) Densificationeach exponent plota = has 1.6 two vertical axes: DPL exponent on the left, and the diameter log-fit factor on the right. Diameter factor α = 0.05 Diameter factor α = 0.20 − Observe− a very sharp transition in DPL exponent and a narrow region, indicated by vertical dashed 33 Fig. 10. We superimpose the densification power law exponent a and diameter log-fit α factor over the Forest Fire Model parameter space. Notice that the shape oflines, the transition where boundary ofForest the Fire produces slowly densifying graphs with decreasing effective diameter. densification and the shrinking diameter very much follow the same shape.

Figure 9(b) gives the contour plot for the effective diameter log-fit factor α as previously defined. Each contour corresponds to diameterBurning factor α. We vary a fixed percentage of neighbors. We also considered a version of α in range 0.3 α 0.1, with step-size 0.05. Notice, that the boundary in parameter space− ≤ between≤ decreasing and increasingForest effective diameter Fire is very where the fire burns a fixed percentage of node’s edges, that is, the narrow. Do contour plots of densification power law and shrinkingnumber diameters of from burned edges is proportional to the node’s degree. When a fire comes Figure 9 follow the same shape? More exactly, does the boundary between de- creasing and increasing diameters follow the same shapeinto as the a transition node, in for each unburned neighbor we independently flip a biased coin, the densification exponent? We answer this question in Figure 10, where we superimpose phase contours of DPL and the effective diameter over the Forest Fireand parameter the space. fire The spreads to nodes where the coin came up heads. This process con- left plot superimposes phase contours for the densification power law exponent a 1.3 and the diameter log-fit factor α 0.05. Thetinues right plot superimposes recursively until no nodes are burned. In case of forward- and backward- contours= for a 1.6 and α 0.30. In both= − cases we observe very good align- ment of the two= phase lines= which− suggests the sameburning shape of the transition probabilities, we have two coins, one for out- and one for in-edges. boundary for the densification power law exponent and the effective diameter. We also observe similar behavior with orphans and multipleThe ambassadors. problem with this version of the model is that, once there is a single large These additional features in the model help further separate the diameter decrease/increase boundary from the densification transitionfire thatand so widen burns the a large fraction of the graph, many subsequent fires will also region of parameter space for which the model produces reasonably sparse graphs with decreasing effective diameters. burn much of the graph. This results in a bell-shaped, nonheavy-tailed degree distribution and gives two regimes of densification—slower densification before 5. DENSIFICATION AND THE DEGREE DISTRIBUTION OVER TIME Many real-world graphs exhibit power-law degree distributionsthe first [Albert big and fire, and quadratic (a 2) densification afterwards. Barabasi 1999; Faloutsos et al. 1999]. As we saw in Section 3, the average = degree increases over time, and the graphs densify followingWe the also power-law experimented with the model where burning probability decayed relationship between the number of nodes and the number of edges. Here we analyze the relation between the densification and theexponentially power-law degree dis- as the fire moves away from the ambassador node. tribution over time and find evidence that some of the real world graphs obey

ACM Transactions on Knowledge Discovery from Data, Vol. 1, No. 1, Article 2, Publication date: March 2007. 4.2.3 Phase Plot. In order to understand the densification and the diame- ter properties of graphs produced by the Forest Fire Model, we explored the full parameter space of the basic model in terms of the two underlying parameters, the forward-burning probability p and the backward-burning ratio r. Note there are two equivalent ways to parameterize the Forest Fire model. We can use the forward-burning probability p and the backward-burning ratio r or the forward-burning probability p and the backward-burning probability pb (pb rp). We examine both and show two cuts through the parameter space. Figure= 8 shows how the densification exponent and the effective diameter depend on forward-burning probability p. In the left plot of Figure 8, we fix the backward-burning probability pb 0.3, and, in the right plot, we fix the backward-burning ratio r 0.5. We vary= forward-burning probability and plot the densification power law= exponent. The densification exponent a is computed as in Section 3 by fitting a relation of the form e(t) n(t)a. Notice the very sharp ∝ ACM Transactions on Knowledge Discovery from Data, Vol. 1, No. 1, Article 2, Publication date: March 2007. Summary

* Densification and Shrinking diameter are typical o Contradicts previous beliefs o Seen in many contexts, and not just artefacts * Densification is relatively easy to obtain o Similar to an expanding graph * Shrinking diameters is more subtle o It is not as pronounced, and typically occur close to the critical densification regime.

34 BACK-UP SLIDES

35 Example of Results

Leskovec, J., Kleinberg, J., & Faloutsos36 , C. (2007). Graph evolution: Densification and shrinking diameters. ACM Trans. KDD