Degrees, Power Laws and Popularity

Degrees, Power Laws and Popularity

Degrees, Power Laws and Popularity Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester [email protected] http://www.ece.rochester.edu/~gmateosb/ February 13, 2020 Network Science Analytics Degrees, Power Laws and Popularity 1 Degree distributions Degree distributions Power-law degree distributions Visualizing and fitting power laws Popularity and preferential attachment Network Science Analytics Degrees, Power Laws and Popularity 2 Descriptive analysis of network characterstics I Given a network graph representation of a complex system ) Structural properties of G key to system-level understanding Example I Q1: Underpinning of various types of basic social dynamics? A: Study vertex triplets (triads) and patterns of ties among them I Q2: How can we formalize the notion of `importance' in a network? A: Define measures of individual vertex (or group) centrality I Q3: Can we identify communities and cohesive subgroups? A: Formulate as a graph partitioning (clustering) problem I Characterization of individual vertices/edges and network cohesion I Social network analysis, math, computer science, statistical physics Network Science Analytics Degrees, Power Laws and Popularity 3 Degree I Def: The degree dv of vertex v is its number of incident edges ) Degree sequence arranges degrees in non-decreasing order 3 3 2 5 4 2 1 2 4 6 2 3 I In figure ) Vertex degrees shown in red, e.g., d1 = 2 and d5 = 3 ) Graph's degree sequence is 2,2,2,3,3,4 I In general, the degree sequence does not uniquely specify the graph I High-degree vertices are likely to be influential, central, prominent Network Science Analytics Degrees, Power Laws and Popularity 4 Degree distribution I Let N(d) denote the number of vertices with degree d ) Fraction of vertices with degree d is P (d) := N(d) Nv I Def: The collection fP(d)gd≥0 is the degree distribution of G I Histogram formed from the degree sequence (bins of size one) P(d) d I P(d) = probability that randomly chosen node has degree d ) Summarizes the local connectivity in the network graph Network Science Analytics Degrees, Power Laws and Popularity 5 Joint degree distribution I Q: What about patterns of association among nodes of given degrees? I A: Define the two-dimensional analogue of a degree distribution 3 Router-level Internet Protein interaction (Degree) (Degree) 2 2 log log 02468 0 2 4 6 8 10 0 2 4 6 8 10 02468 log2(Degree) log2(Degree) Fig. 4.3 Image representation of the logarithmically transformed joint degree distribution log f , for the router-level Internet data (left) and the protein interaction data (right). Col- { 2 d,d′ } I Prob.ors of range random from blue (low edge relative having frequency) incident to red (high relative vertices frequency), with with whitedegrees indicating (d1; d2) areas with no data. Note that both x- and y-axes are also on base-2 logarithmic scales. Network Science Analytics Degrees, Power Laws and Popularity 6 A simple random graph model I Def: The Erd¨os-Renyirandom graph model Gn;p I Undirected graph with n vertices, i.e., of order Nv = n I Edge (u; v) present with probability p, independent of other edges n I Simulation is easy: draw 2 i.i.d. Bernoulli(p) RVs Example I Three realizations of G 1 . The size Ne is a random variable 10; 6 Network Science Analytics Degrees, Power Laws and Popularity 7 Degree distribution of Gn;p I Q: Degree distribution P (d) of the Erd¨os-Renyigraph Gn;p? I Define I f(v; u)g = 1 if (v; u) 2 E, and I f(v; u)g = 0 otherwise. ) Fix v. For all u 6= v, the indicator RVs are i.i.d. Bernoulli(p) I Let Dv be the (random) degree of vertex v. Hence, X Dv = I f(v; u)g u6=v ) Dv is binomial with parameters (n − 1; p) and n − 1 P(d) = P (D = d) = pd (1 − p)(n−1)−d v d I In words, the probability of having exactly d edges incident to v ) Same for all v 2 V , by independence of the Gn;p model Network Science Analytics Degrees, Power Laws and Popularity 8 Behavior for large n I Q: How does the degree distribution look like for a large network? I Recall Dv is a sum of n − 1 i.i.d. Bernoulli(p) RVs ) Central Limit Theorem: Dv ∼ N (np; np(1 − p)) for large n 0.2 0.2 p=0.5, n=20 Binomial(20,1/2) 0.18 p=0.5, n=40 0.18 p=0.5, n=60 Binomial(60,1/6) Poisson(10) 0.16 0.16 0.14 0.14 0.12 0.12 0.1 0.1 P(d) P(d) 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 d d I Makes most sense to increase n with fixed E [Dv ] = (n − 1)p = µ ) Law of rare events: Dv ∼ Poisson(µ) for large n Network Science Analytics Degrees, Power Laws and Popularity 9 Law of rare events I Substituting p = µ/n in the binomial PMF yields n! µ d µ n−d P (d) = 1 − n (n − d)!d! n n n(n − 1) ::: (n − d + 1) µd (1 − µ/n)n = nd d! (1 − µ/n)d n −µ I In the limit, red term is lim (1 − µ/n) = e n!1 I Black and blue terms converge to 1. Limit is the Poisson PMF d −µ d µ e −µ µ lim Pn(d) = 1 = e n!1 d! 1 d! I Approximation usually called \law of rare events" I Individual edges happen with small probability p = µ/n I The aggregate (degree, number of edges), though, need not be rare Network Science Analytics Degrees, Power Laws and Popularity 10 The Gn;p model and real-world networks I For large graphs, Gn;p suggests P (d) with an exponential tail ) Unlikely to see degrees spanning several orders of magnitude Linear scale Logarithmic scale 0.12 100 -50 0.1 10 10-100 0.08 10-150 0.06 P(d) P(d) 10-200 0.04 10-250 0.02 10-300 0 0 10 20 30 40 50 101 102 103 d d I Concentrated distribution around the mean E [Dv ] = (n − 1)p I Q: Is this in agreement with real-world networks? Network Science Analytics Degrees, Power Laws and Popularity 11 World Wide Web I Degree distributions of the WWW analyzed in [Broder et al '00] ) Web a digraph, study both in- and out-degree distributions I Majority of vertices naturally have small degrees ) Nontrivial amount with orders of magnitude higher degrees Network Science Analytics Degrees, Power Laws and Popularity 12 Internet autonomous systems 3 I The topology of the AS-level Internet studied in [Faloutsos '99] I Right-skewed degree distributions also found for router-level Internet Network Science Analytics Degrees, Power Laws and Popularity 13 Seems to be a structural pattern I More heavy-tailed degree distributions found in [Barabasi-Albert '99] P(d) d d d Author collaboration Web graph Power grid I These heterogeneous, diffuse degree distributions are not exponential Network Science Analytics Degrees, Power Laws and Popularity 14 Power laws Degree distributions Power-law degree distributions Visualizing and fitting power laws Popularity and preferential attachment Network Science Analytics Degrees, Power Laws and Popularity 15 Power-law degree distributions 1 4 − 5 − ) ) 6 − 10 8 requency requency − − (F (F 2 2 og og l l 10 − 15 − 12 − 0 2 4 6 8 10 02468 log2(Degree) log2(Degree) Fig. 4.1 Degree distributions. Left: the router-level Internet network graph described in Sec- tion 3.5.2. Right: the network of measured interactions among proteins in S. cerevisiae (yeast), I Log-logas of Januaryplots 2007. show In each roughly plot, both x- a and lineary-axes are decay in base-2, logarithmic suggesting scale. the power law P(d) / d −α ) log P (d) = C − α log d I Power-law exponent (negative slope) is typically α 2 [2; 3] I Normalization constant C is mostly uninteresting I Power laws often best followed in the tail, i.e., for d ≥ dmin Network Science Analytics Degrees, Power Laws and Popularity 16 Copyright 2009 Springer Science+Business Media, LLC. These figures may be used for noncom- mercial purposes as long as the source is cited: Kolaczyk, Eric D. Statistical Analysis of Network Data: Methods and Models (2009) Springer Science+Business Media LLC. Power law and exponential degree distributions (a) (b) 0.15 101000 Figure 4.4 0 0.15 10 -2.1-2.1 -2.1-2.1 0 p ~~ kk 0.15 pk ~ k -2.1 10 kk -2.1 Poisson vs. Power-law Distributions P(d)=dk -2.1 -1-1 P(d)=dp ~ k-2.1 p ~ k-2.1 1010 k -2.1 k -1 p ~ k pk ~ k 10 k 10-1 (a) Comparing a Poisson function with a 0.1 1010-2-2 power-law function (= 2.1) on a linear plot. -2 0.1 p 10-2 0.1 p10kk Both distributions have k= 10. p pk -3-3 P(d)p kk p1010 p POISSONPoisson k -3 p k POISSON 10-3 (b) The same curves as in (a), but shown on a k POISSON 10 0.05 1010-4-4 log-log plot, allowing us to inspect the dif- -4 0.05 10-4 0.05 10 POISSONPoisson ference between the two functions in the 1010-5-5 POISSON high-k regime.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    45 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us