Degrees, Power Laws and Popularity

Degrees, Power Laws and Popularity Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester [email protected] http://www.ece.rochester.edu/~gmateosb/ February 13, 2020 Network Science Analytics Degrees, Power Laws and Popularity 1 Degree distributions Degree distributions Power-law degree distributions Visualizing and fitting power laws Popularity and preferential attachment Network Science Analytics Degrees, Power Laws and Popularity 2 Descriptive analysis of network characterstics I Given a network graph representation of a complex system ) Structural properties of G key to system-level understanding Example I Q1: Underpinning of various types of basic social dynamics? A: Study vertex triplets (triads) and patterns of ties among them I Q2: How can we formalize the notion of ìmportance' in a network? A: Define measures of individual vertex (or group) centrality I Q3: Can we identify communities and cohesive subgroups? A: Formulate as a graph partitioning (clustering) problem I Characterization of individual vertices/edges and network cohesion I Social network analysis, math, computer science, statistical physics Network Science Analytics Degrees, Power Laws and Popularity 3 Degree I Def: The degree dv of vertex v is its number of incident edges ) Degree sequence arranges degrees in non-decreasing order 3 3 2 5 4 2 1 2 4 6 2 3 I In figure ) Vertex degrees shown in red, e.g., d1 = 2 and d5 = 3 ) Graph's degree sequence is 2,2,2,3,3,4 I In general, the degree sequence does not uniquely specify the graph I High-degree vertices are likely to be influential, central, prominent Network Science Analytics Degrees, Power Laws and Popularity 4 Degree distribution I Let N(d) denote the number of vertices with degree d ) Fraction of vertices with degree d is P (d) := N(d) Nv I Def: The collection fP(d)gd≥0 is the degree distribution of G I Histogram formed from the degree sequence (bins of size one) P(d) d I P(d) = probability that randomly chosen node has degree d ) Summarizes the local connectivity in the network graph Network Science Analytics Degrees, Power Laws and Popularity 5 Joint degree distribution I Q: What about patterns of association among nodes of given degrees? I A: Define the two-dimensional analogue of a degree distribution 3 Router-level Internet Protein interaction (Degree) (Degree) 2 2 log log 02468 0 2 4 6 8 10 0 2 4 6 8 10 02468 log2(Degree) log2(Degree) Fig. 4.3 Image representation of the logarithmically transformed joint degree distribution log f , for the router-level Internet data (left) and the protein interaction data (right). Col- { 2 d,d′ } I Prob.ors of range random from blue (low edge relative having frequency) incident to red (high relative vertices frequency), with with whitedegrees indicating (d1; d2) areas with no data. Note that both x- and y-axes are also on base-2 logarithmic scales. Network Science Analytics Degrees, Power Laws and Popularity 6 A simple random graph model I Def: The Erdös-Renyirandom graph model Gn;p I Undirected graph with n vertices, i.e., of order Nv = n I Edge (u; v) present with probability p, independent of other edges n I Simulation is easy: draw 2 i.i.d. Bernoulli(p) RVs Example I Three realizations of G 1 . The size Ne is a random variable 10; 6 Network Science Analytics Degrees, Power Laws and Popularity 7 Degree distribution of Gn;p I Q: Degree distribution P (d) of the Erdös-Renyigraph Gn;p? I Define I f(v; u)g = 1 if (v; u) 2 E, and I f(v; u)g = 0 otherwise. ) Fix v. For all u 6= v, the indicator RVs are i.i.d. Bernoulli(p) I Let Dv be the (random) degree of vertex v. Hence, X Dv = I f(v; u)g u6=v ) Dv is binomial with parameters (n − 1; p) and n − 1 P(d) = P (D = d) = pd (1 − p)(n−1)−d v d I In words, the probability of having exactly d edges incident to v ) Same for all v 2 V , by independence of the Gn;p model Network Science Analytics Degrees, Power Laws and Popularity 8 Behavior for large n I Q: How does the degree distribution look like for a large network? I Recall Dv is a sum of n − 1 i.i.d. Bernoulli(p) RVs ) Central Limit Theorem: Dv ∼ N (np; np(1 − p)) for large n 0.2 0.2 p=0.5, n=20 Binomial(20,1/2) 0.18 p=0.5, n=40 0.18 p=0.5, n=60 Binomial(60,1/6) Poisson(10) 0.16 0.16 0.14 0.14 0.12 0.12 0.1 0.1 P(d) P(d) 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 d d I Makes most sense to increase n with fixed E [Dv ] = (n − 1)p = µ ) Law of rare events: Dv ∼ Poisson(µ) for large n Network Science Analytics Degrees, Power Laws and Popularity 9 Law of rare events I Substituting p = µ/n in the binomial PMF yields n! µ d µ n−d P (d) = 1 − n (n − d)!d! n n n(n − 1) ::: (n − d + 1) µd (1 − µ/n)n = nd d! (1 − µ/n)d n −µ I In the limit, red term is lim (1 − µ/n) = e n!1 I Black and blue terms converge to 1. Limit is the Poisson PMF d −µ d µ e −µ µ lim Pn(d) = 1 = e n!1 d! 1 d! I Approximation usually called \law of rare events" I Individual edges happen with small probability p = µ/n I The aggregate (degree, number of edges), though, need not be rare Network Science Analytics Degrees, Power Laws and Popularity 10 The Gn;p model and real-world networks I For large graphs, Gn;p suggests P (d) with an exponential tail ) Unlikely to see degrees spanning several orders of magnitude Linear scale Logarithmic scale 0.12 100 -50 0.1 10 10-100 0.08 10-150 0.06 P(d) P(d) 10-200 0.04 10-250 0.02 10-300 0 0 10 20 30 40 50 101 102 103 d d I Concentrated distribution around the mean E [Dv ] = (n − 1)p I Q: Is this in agreement with real-world networks? Network Science Analytics Degrees, Power Laws and Popularity 11 World Wide Web I Degree distributions of the WWW analyzed in [Broder et al '00] ) Web a digraph, study both in- and out-degree distributions I Majority of vertices naturally have small degrees ) Nontrivial amount with orders of magnitude higher degrees Network Science Analytics Degrees, Power Laws and Popularity 12 Internet autonomous systems 3 I The topology of the AS-level Internet studied in [Faloutsos '99] I Right-skewed degree distributions also found for router-level Internet Network Science Analytics Degrees, Power Laws and Popularity 13 Seems to be a structural pattern I More heavy-tailed degree distributions found in [Barabasi-Albert '99] P(d) d d d Author collaboration Web graph Power grid I These heterogeneous, diffuse degree distributions are not exponential Network Science Analytics Degrees, Power Laws and Popularity 14 Power laws Degree distributions Power-law degree distributions Visualizing and fitting power laws Popularity and preferential attachment Network Science Analytics Degrees, Power Laws and Popularity 15 Power-law degree distributions 1 4 − 5 − ) ) 6 − 10 8 requency requency − − (F (F 2 2 og og l l 10 − 15 − 12 − 0 2 4 6 8 10 02468 log2(Degree) log2(Degree) Fig. 4.1 Degree distributions. Left: the router-level Internet network graph described in Sec- tion 3.5.2. Right: the network of measured interactions among proteins in S. cerevisiae (yeast), I Log-logas of Januaryplots 2007. show In each roughly plot, both x- a and lineary-axes are decay in base-2, logarithmic suggesting scale. the power law P(d) / d −α ) log P (d) = C − α log d I Power-law exponent (negative slope) is typically α 2 [2; 3] I Normalization constant C is mostly uninteresting I Power laws often best followed in the tail, i.e., for d ≥ dmin Network Science Analytics Degrees, Power Laws and Popularity 16 Copyright 2009 Springer Science+Business Media, LLC. These figures may be used for noncom- mercial purposes as long as the source is cited: Kolaczyk, Eric D. Statistical Analysis of Network Data: Methods and Models (2009) Springer Science+Business Media LLC. Power law and exponential degree distributions (a) (b) 0.15 101000 Figure 4.4 0 0.15 10 -2.1-2.1 -2.1-2.1 0 p ~~ kk 0.15 pk ~ k -2.1 10 kk -2.1 Poisson vs. Power-law Distributions P(d)=dk -2.1 -1-1 P(d)=dp ~ k-2.1 p ~ k-2.1 1010 k -2.1 k -1 p ~ k pk ~ k 10 k 10-1 (a) Comparing a Poisson function with a 0.1 1010-2-2 power-law function (= 2.1) on a linear plot. -2 0.1 p 10-2 0.1 p10kk Both distributions have k= 10. p pk -3-3 P(d)p kk p1010 p POISSONPoisson k -3 p k POISSON 10-3 (b) The same curves as in (a), but shown on a k POISSON 10 0.05 1010-4-4 log-log plot, allowing us to inspect the dif- -4 0.05 10-4 0.05 10 POISSONPoisson ference between the two functions in the 1010-5-5 POISSON high-k regime.

Load more