Degrees, Power Laws and Popularity

Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester [email protected] http://www.ece.rochester.edu/~gmateosb/

February 13, 2020

Network Science Analytics Degrees, Power Laws and Popularity 1 distributions

Degree distributions

Power-law degree distributions

Visualizing and fitting power laws

Popularity and

Network Science Analytics Degrees, Power Laws and Popularity 2 Descriptive analysis of network characterstics

I Given a network graph representation of a complex system ⇒ Structural properties of G key to system-level understanding

Example

I Q1: Underpinning of various types of basic social dynamics? A: Study triplets (triads) and patterns of ties among them

I Q2: How can we formalize the notion of ‘importance’ in a network? A: Define measures of individual vertex (or group)

I Q3: Can we identify communities and cohesive subgroups? A: Formulate as a graph partitioning (clustering) problem

I Characterization of individual vertices/edges and network cohesion

I analysis, math, computer science, statistical physics

Network Science Analytics Degrees, Power Laws and Popularity 3 Degree

I Def: The degree dv of vertex v is its number of incident edges ⇒ Degree sequence arranges degrees in non-decreasing order 3 3 2 5 4 2 1 2 4 6 2 3

I In figure ⇒ Vertex degrees shown in red, e.g., d1 = 2 and d5 = 3 ⇒ Graph’s degree sequence is 2,2,2,3,3,4

I In general, the degree sequence does not uniquely specify the graph

I High-degree vertices are likely to be influential, central, prominent

Network Science Analytics Degrees, Power Laws and Popularity 4 Degree distribution

I Let N(d) denote the number of vertices with degree d ⇒ Fraction of vertices with degree d is P (d) := N(d) Nv

I Def: The collection {P(d)}d≥0 is the degree distribution of G

I Histogram formed from the degree sequence (bins of size one)

P(d)

d

I P(d) = probability that randomly chosen node has degree d ⇒ Summarizes the local connectivity in the network graph

Network Science Analytics Degrees, Power Laws and Popularity 5 Joint degree distribution

I Q: What about patterns of association among nodes of given degrees?

I A: Define the two-dimensional analogue of a degree distribution 3 Router-level Protein interaction (Degree) (Degree) 2 2 log log 02468 0 2 4 6 8 10

0 2 4 6 8 10 02468

log2(Degree) log2(Degree)

Fig. 4.3 Image representation of the logarithmically transformed joint degree distribution log f , for the router-level Internet data (left) and the protein interaction data (right). Col- { 2 d,d′ } I Prob.ors of range random from blue (low edge relative having frequency) incident to red (high relative vertices frequency), with with whitedegrees indicating (d1, d2) areas with no data. Note that both x- and y-axes are also on base-2 logarithmic scales.

Network Science Analytics Degrees, Power Laws and Popularity 6 A simple model

I Def: The Erd¨os-Renyirandom graph model Gn,p

I Undirected graph with n vertices, i.e., of order Nv = n

I Edge (u, v) present with probability p, independent of other edges

n I Simulation is easy: draw 2 i.i.d. Bernoulli(p) RVs

Example

I Three realizations of G 1 . The size Ne is a random variable 10, 6

Network Science Analytics Degrees, Power Laws and Popularity 7 Degree distribution of Gn,p

I Q: Degree distribution P (d) of the Erd¨os-Renyigraph Gn,p?

I Define I {(v, u)} = 1 if (v, u) ∈ E, and I {(v, u)} = 0 otherwise. ⇒ Fix v. For all u 6= v, the indicator RVs are i.i.d. Bernoulli(p)

I Let Dv be the (random) degree of vertex v. Hence, X Dv = I {(v, u)} u6=v

⇒ Dv is binomial with parameters (n − 1, p) and

n − 1 P(d) = P (D = d) = pd (1 − p)(n−1)−d v d

I In words, the probability of having exactly d edges incident to v

⇒ Same for all v ∈ V , by independence of the Gn,p model

Network Science Analytics Degrees, Power Laws and Popularity 8 Behavior for large n

I Q: How does the degree distribution look like for a large network?

I Recall Dv is a sum of n − 1 i.i.d. Bernoulli(p) RVs

⇒ Central Limit Theorem: Dv ∼ N (np, np(1 − p)) for large n

0.2 0.2 p=0.5, n=20 Binomial(20,1/2) 0.18 p=0.5, n=40 0.18 p=0.5, n=60 Binomial(60,1/6) Poisson(10) 0.16 0.16

0.14 0.14

0.12 0.12

0.1 0.1 P(d) P(d) 0.08 0.08

0.06 0.06

0.04 0.04

0.02 0.02

0 0 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 d d

I Makes most sense to increase n with fixed E [Dv ] = (n − 1)p = µ ⇒ Law of rare events: Dv ∼ Poisson(µ) for large n

Network Science Analytics Degrees, Power Laws and Popularity 9 Law of rare events

I Substituting p = µ/n in the binomial PMF yields

n!  µ d  µ n−d P (d) = 1 − n (n − d)!d! n n n(n − 1) ... (n − d + 1) µd (1 − µ/n)n = nd d! (1 − µ/n)d

n −µ I In the limit, red term is lim (1 − µ/n) = e n→∞

I Black and blue terms converge to 1. Limit is the Poisson PMF

d −µ d µ e −µ µ lim Pn(d) = 1 = e n→∞ d! 1 d!

I Approximation usually called “law of rare events”

I Individual edges happen with small probability p = µ/n

I The aggregate (degree, number of edges), though, need not be rare

Network Science Analytics Degrees, Power Laws and Popularity 10 The Gn,p model and real-world networks

I For large graphs, Gn,p suggests P (d) with an exponential tail ⇒ Unlikely to see degrees spanning several orders of magnitude

Linear scale Logarithmic scale 0.12 100

-50 0.1 10

10-100 0.08

10-150 0.06 P(d) P(d)

10-200 0.04

10-250 0.02

10-300

0 0 10 20 30 40 50 101 102 103 d d

I Concentrated distribution around the mean E [Dv ] = (n − 1)p

I Q: Is this in agreement with real-world networks?

Network Science Analytics Degrees, Power Laws and Popularity 11 World Wide Web

I Degree distributions of the WWW analyzed in [Broder et al ’00] ⇒ Web a digraph, study both in- and out-degree distributions

I Majority of vertices naturally have small degrees ⇒ Nontrivial amount with orders of magnitude higher degrees

Network Science Analytics Degrees, Power Laws and Popularity 12 Internet autonomous systems

3 I The topology of the AS-level Internet studied in [Faloutsos ’99]

I Right-skewed degree distributions also found for router-level Internet

Network Science Analytics Degrees, Power Laws and Popularity 13 Seems to be a structural pattern

I More heavy-tailed degree distributions found in [Barabasi-Albert ’99] P(d) P(d)

d d d Author collaboration Web graph Power grid

I These heterogeneous, diffuse degree distributions are not exponential

Network Science Analytics Degrees, Power Laws and Popularity 14 Power laws

Degree distributions

Power-law degree distributions

Visualizing and fitting power laws

Popularity and preferential attachment

Network Science Analytics Degrees, Power Laws and Popularity 15 Power-law degree distributions 1 4 − 5 − ) ) 6 − 10 8 requency requency − − (F (F 2 2 og og l l 10 − 15 − 12 − 0 2 4 6 8 10 02468

log2(Degree) log2(Degree)

Fig. 4.1 Degree distributions. Left: the router-level Internet network graph described in Sec- tion 3.5.2. Right: the network of measured interactions among proteins in S. cerevisiae (yeast), I Log-logas of Januaryplots 2007. show In each roughly plot, both x- a and lineary-axes are decay in base-2, logarithmic suggesting scale. the

P(d) ∝ d −α ⇒ log P (d) = C − α log d

I Power-law exponent (negative slope) is typically α ∈ [2, 3]

I Normalization constant C is mostly uninteresting

I Power laws often best followed in the tail, i.e., for d ≥ dmin

Network Science Analytics Degrees, Power Laws and Popularity 16

Copyright 2009 Springer Science+Business Media, LLC. These figures may be used for noncom- mercial purposes as long as the source is cited: Kolaczyk, Eric D. Statistical Analysis of Network Data: Methods and Models (2009) Springer Science+Business Media LLC. Power law and exponential degree distributions

(a) (b)

0.15 101000 Figure 4.4 0 0.15 10 -2.1-2.1 -2.1-2.1 0 p ~~ kk 0.15 pk ~ k -2.1 10 kk -2.1 Poisson vs. Power-law Distributions P(d)=dk -2.1 -1-1 P(d)=dp ~ k-2.1 p ~ k-2.1 1010 k -2.1 k -1 p ~ k pk ~ k 10 k 10-1 (a) Comparing a Poisson function with a 0.1 1010-2-2 power-law function (= 2.1) on a linear plot. -2 0.1 p 10-2 0.1 p10kk Both distributions have k= 10. p pk -3-3 P(d)p kk p1010 p POISSONPoisson k -3 p k POISSON 10-3 (b) The same curves as in (a), but shown on a k POISSON 10 0.05 1010-4-4 log-log plot, allowing us to inspect the dif- -4 0.05 10-4 0.05 10 POISSONPoisson ference between the two functions in the 1010-5-5 POISSON high-k regime. 10-5 POISSON 10-5 1010-6-6 -6 (c) A random network with k= 3 and N = 50, 10 0 1 2 3 0 1010 2020 30 40 50 10-6 10100 10101 10102 10103 k 0 1 k 2 3 illustrating that most nodes have compara- 0 10 20 k 30 40 50 100 101 d k 102 103 0 10 20 d k 30 40 50 10 10 k 10 10 ble degree k k. (d) (c) (d) A scale-free network with =2.1 and k= 3, illustrating that numerous small-degree nodes coexist with a few highly connected hubs.

I Erd¨os-Renyi’sPoisson degree distribution exhibits a sharp cutoff ⇒ PowerThelaws Largest upperHub bound exponential tails for large enough d All real networks are finite. The size of the WWW is estimated to be N Network Science Analytics1012 nodes; the size of the social Degrees, network Power is the Laws Earth’s and Popularity population, about N 17 7 × 109. These numbers are huge, but finite. Other networks pale in com- parison: The genetic network in a human cell has approximately 20,000 genes while the metabolic network of the E. Coli bacteria has only about a thousand metabolites. This prompts us to ask: How does the network size affect the size of its hubs? To answer this we calculate the expected maxi-

mum degree, kmax, called the natural cutoff of the degree distribution pk. It represents the expected size of the largest hub in a network.

It is instructive to perform the calculation first for the exponential dis- tribution p(k) = Cek .

For a network with minimum degree kmin, the normalization condition

∞ pk()dk = 1 (4.15) ∫kmin

kmin provides C = e . To calculate kmax we assume that in a network of N

nodes we expect at most one node in the (kmax, ∞) regime (ADVANCED TOPICS 3.B). In other words the probability to observe a node whose degree exceeds

kmax is 1/N: ∞ 1 ∫ pk()dk = . (4.16) kmax N

THE SCALE-FREE PROPERTY 10 Scale-free networks

I Scale-free network: degree distribution with power-law tail

I Name motivated for the scale-invariance property of power laws

I Def: A scale-free function f (x) satisfies f (ax) = bf (x), for a, b ∈ R

Example

−α I Power-law functions f (x) = x are scale-free since

f (ax) = (ax)−α = a−αf (x) = bf (x), where b := a−α

x I Exponential functions f (x) = c are not scale-free because

f (ax) = cax = (cx )a = f a(x) 6= bf (x), except when a = b = 1

I No ‘characteristic scale’ for the degrees. More soon ⇒ Functional form of the distribution is invariant to scale

Network Science Analytics Degrees, Power Laws and Popularity 18 Power-law distributions are ubiquitous

I Power-law distributions widespread beyond networks [Clauset et al ’07]

Network Science Analytics Degrees, Power Laws and Popularity 19 Normalization

−α I The power-law degree distribution P (d) = Cd is a PMF, hence

∞ ∞ X X 1 1 = P(d) = Cd −α ⇒ C = P∞ d −α d=0 d=0 d=0

I Often a power law is only valid for the tail d ≥ dmin, hence 1 1 C = ≈ = (α − 1)d α−1 P∞ d −α R ∞ x −αdx min d=dmin dmin ⇒ Sound approximation since P (d) varies slowly for large d

I The normalized power-law degree distribution is

α − 1  d −α P(d) = , d ≥ dmin dmin dmin

Network Science Analytics Degrees, Power Laws and Popularity 20 Power-law probability density function

I Often convenient to treat degrees as real valued, i.e., d ∈ R+

I Define a power-law PDF for the tail of the degree distribution as

α − 1  d −α p(d) = , d ≥ dmin dmin dmin

⇒ A valid PDF, already showed that R ∞ p(x)dx = 1 dmin ⇒ Convergence of the integral requires α > 1

I Ex: Probability that a random node has degree exceeding 100 is

Z ∞ α − 1  x −α  100 1−α P(Dv > 100) = dx = 100 dmin dmin dmin

Network Science Analytics Degrees, Power Laws and Popularity 21 Divergent moments

I Q: What is the m-th moment of a power-law distributed RV?

I From the definition of moment and the power-law PDF one has

Z ∞ α − 1  x m+1−α ∞ [Dm] = x mp(x)dx = E v 1−α m + 1 − α dmin dmin dmin ⇒ Convergence of the integral requires m + 1 < α

I For real-world networks, typically α ∈ (2, 3) so

α − 1 [D ] = d < ∞ and [Dm] = ∞, m ≥ 2 E v α − 2 min E v

I In particular, the second moment and variance are infinite ⇒ Consistent with variability and heterogeneity of degrees

Network Science Analytics Degrees, Power Laws and Popularity 22 Revisiting the scale-free property

I A measure of scale of a RV is its standard deviation σ

P(d) P(d)=d-2.1

Poisson

μ d

Large random network Gn,p √ I Randomly chosen node has degree d = µ ± µ. The scale is µ

Scale-free network

I Randomly chosen node has degree d = µ ± ∞. There is no scale

Network Science Analytics Degrees, Power Laws and Popularity 23 Visualizing and fitting power laws

Degree distributions

Power-law degree distributions

Visualizing and fitting power laws

Popularity and preferential attachment

Network Science Analytics Degrees, Power Laws and Popularity 24 Visualizing power-law degree distributions

I A simple histogram may be problematic for visualizing P (d) ⇒ Use log-log scale to warp probabilities and widespread degrees

P(d) P(d)

d d

I Large statistical fluctuations (‘noise’) in the tail for large d ⇒ With bins of size one, high-degree counts are small ⇒ Makes sense to increase the bin size

Network Science Analytics Degrees, Power Laws and Popularity 25 Logarithmic binning

I Uniformly widening bins sacrifices resolution for small degrees ⇒ Use bins of different sizes in different parts of the histogram

P(d)

d I Logarithmic binning is widely used. The n-th bin is an−1 ≤ d < an, n = 1, 2,... Ex: Common choice is a = 2, n-th bin has width 2n − 2n−1 = 2n−1

I Normalize by the bin width. Wider bins will accrue higher counts

Network Science Analytics Degrees, Power Laws and Popularity 26 Complementary cumulative distribution function

I Def: The complementary cumulative distribution function (CCDF) is

F¯(d) = P (Dv ≥ d)

⇒ Function F¯(d) is the fraction of vertices with degree at least d

I For a power-law PDF, the CCDF also obeys a power law since

Z ∞ α − 1  x −α  d −(α−1) P(Dv ≥ d) = dx = d dmin dmin dmin

I If the PDF has exponent α, then CCDF F¯(d) has exponent α − 1

Network Science Analytics Degrees, Power Laws and Popularity 27 Computing the CCDF

Step 1: List the degrees dv in descending order Step 2: Assign ranks rv (from 1 to Nv ) to vertices in that order Step 3: The CCDF is the plot of rv /Nv versus degree dv

dv rv rv/Nv F(d)

4 1 0.1 0.1

3 2 0.2

3 3 0.3 0.3

2 4 0.4 0.4 F(d) 1 5 0.5

1 1 6 0.6

1 7 0.7 1 8 0.8 0.4 0.3 1 9 0.9 0.1 1 10 1.0 1.0 1 2 3 4 d

I If degrees are repeated, CCDF is the largest value of rv /Nv I If d not observed, F¯(d) = value for next (larger) observed degree

Network Science Analytics Degrees, Power Laws and Popularity 28 Visualizing power laws with the CCDF

I Plot the CCDF in a log-log scale and look for a straight-line behavior

F(d)

d

I Mitigates noise using cumulative frequencies (cf. raw frequencies)

I No binning needed ⇒ Avoids information loss as bins widen

Network Science Analytics Degrees, Power Laws and Popularity 29 Fitting power-law distributions

I Basic, yet nontrivial task is to estimate the exponent α from data

I A power law implies the linear model log P (d) = C − α log d +  ⇒ Natural to form the linear least-squares (LS) estimator

X 2 {α,ˆ Cˆ} = arg min (log P (di ) − C + α log di ) α,C i

I Simple, very popular, but not advisable for at least three reasons: 1) Extremely noisy high-degree data, where the counts are the lowest 2) Estimates are biased. The log transform distorts unevenly the errors 3) If the power law is only valid in the tail, need to hand pick dmin

Network Science Analytics Degrees, Power Laws and Popularity 30 Linear regression inference on the CCDF

I A solution to the noise problem is to use the CCDF F¯(d) ⇒ Cumulative frequencies smoothen the noise

I Recall the CCDF follows a power law with exponent α − 1 ⇒ Can use a linear regression-based approach to findα ˆ, but . . .

F(d)P(d) P(d)

d d d

I Successive points in the CCDF plot are not mutually independent ⇒ (Ordinary) LS is not optimal for correlated errors

Network Science Analytics Degrees, Power Laws and Popularity 31 Maximum-likelihood estimator

Nv I Suppose {di }i=1 are independent and follow a power law. MLE of α? −α α−1  d  ⇒ The data PDF is f (d; α) = , d ≥ dmin dmin dmin

I The log-likelihood function is (up to constants independent of α)

Nv Nv   X X di `Nv (α) = log f (di ; α) ∝ Nv log (α − 1) − α log dmin i=1 i=1

I The MLEα ˆ (a.k.a. Hill estimator) solves the equation

Nv   ∂`Nv (α) Nv X di = − log = 0 ∂α αˆ − 1 dmin α=α ˆ i=1

I The solution is −1 " Nv  # 1 X di αˆ = 1 + log Nv dmin i=1

Network Science Analytics Degrees, Power Laws and Popularity 32 Hill plot of ML estimates

I Q: How can we go around hand-picking the value of dmin?

1) Rank-order degrees to obtain the sequence d(1) ≤ ... ≤ d(Nv )

2) For each k ∈ {1,..., Nv − 1} let dmin = d(Nv −k). The MLEs are

" k−1  #−1 1 X d(Nv −i) αˆ(k) = 1 + log k d i=0 (Nv −k) 3) Draw and examine the Hill plot ofα ˆ(k) versus k

I If a power law is credible, the Hill plot should ‘settle down’ ⇒ Identify stableα ˆ for a wide range of (intermediate) k values

I Q: Why focus on values on the intermediate range?

I Small k: Inaccurate estimation due to limited data

I Large k: Bias if power law is only valid in the tail

Network Science Analytics Degrees, Power Laws and Popularity 33 Example: Internet and protein interaction data 1 4 − 5 − ) ) 6 − 10 8 requency requency − − (F (F 2 2 og og l l 10 − 15 − 12 2 − 0 2 4 6 8 10 02468

log2(Degree) log2(Degree)

Fig. 4.1 Degree distributions. Left: the router-level Internet network graph described in Sec- tion 3.5.2. Right: the network of measured interactions among proteins in S. cerevisiae (yeast), as of January 2007.Power In each plot,law both x- and y-axes are in base-2Power logarithmic law scale. is is credible inappropriate k k ^ ^ α α 2345678 1.5 2.5 3.5 4.5 0 50000 100000 150000 0 1000 2000 3000 4000 5000 k k

I SharpFig. decay 4.2 Hill in plotsα ˆ ofsuggests maximum likelihood a simple estimates αˆ power-lawk, as a function of modelk (i.e., the number is inappropriate of order statistics used), for the Internet (left) and protein interaction (right) datasets.

Network Science Analytics Degrees, Power Laws and Popularity 34

Copyright 2009 Springer Science+Business Media, LLC. These figures may be used for noncom- mercial purposes as long as the source is cited: Kolaczyk, Eric D. Statistical Analysis of Network Data: Methods and Models (2009) Springer Science+Business Media LLC. Example: Flickr data

I Flickr social network: Nv ≈ 0.6M, Ne ≈ 3.5M [Leskovec et al ’08]

d-1.75

Linear scale Log scale P(d) P(d)

d d

-0.75 d-0.75 d d-0.75e-0.002d F(d) F(d) CCDF, log scale Power law with exp. cutoff, log scale

d d

−α −βd I Good fit to a power law with exponential cutoff F¯(d) ∝ d e

Network Science Analytics Degrees, Power Laws and Popularity 35 Popularity and preferential attachment

Degree distributions

Power-law degree distributions

Visualizing and fitting power laws

Popularity and preferential attachment

Network Science Analytics Degrees, Power Laws and Popularity 36 Popularity as a network phenomenon

I Popularity is a phenomenon characterized by extreme imbalances

I How can we quantify these imbalances? Why do they arise?

I Basic models of network behavior can be very insightful ⇒ Result of coupled decisions, correlated behavior in a population

Network Science Analytics Degrees, Power Laws and Popularity 37 Preferential attachment model

I Simple model for the creation of e.g., links among Web pages

I Vertices are created one at a time, denoted 1,..., Nv

I When node j is created, it makes a single arc to i, 1 ≤ i < j

I Creation of (j, i) governed by a probabilistic rule:

I With probability p, j links to i chosen uniformly at random in I With probability 1 − p, j links to i with probability ∝ di

out I The resulting graph is directed, each vertex has dv = 1

I Preferential attachment model leads to “rich-gets-richer” dynamics ⇒ Arcs formed preferentially to (currently) most popular nodes ⇒ Prob. that i increases its popularity ∝ i’s current popularity

Network Science Analytics Degrees, Power Laws and Popularity 38 Preferential attachment yields power laws

Theorem The preferential attachment model gives rise to a power-law in-degree 1 distribution with exponent α = 1 + 1−p , i.e.,

  − 1+ 1 P d in = d ∝ d 1−p

in I Key: “j links to i with probability ∝ di ” equivalent to copying, i.e., “j chooses k uniformly at random, and links to i if (k, i) ∈ E”

I Reflect: Copy other’s decision vs. independent decisions in Gn,p

I As p → 0 ⇒ Copying more frequent ⇒ Smaller α → 2

I Intuitive: more likely to see extremely popular pages (heavier tail)

Network Science Analytics Degrees, Power Laws and Popularity 39 Continuous approximation

in I In-degree di (t) of node i at time t ≥ i is a RV. Two facts in F1) Initial condition: di (i) = 0 since node i just created at time t = i in F2) Dynamics of di (t): Probability that new node t + 1 > i links to i is

1 d in(t) P ((t + 1, i) ∈ E) = p × + (1 − p) × i t t

I Will study a deterministic, continuous approximation to the model

I Continuous time t ∈ [0, Nv ] in I Continuous degrees xi (t):[i, Nv ] 7→ R+ are deterministic

I Require in-degrees to satisfy the following growth equation

dx in(t) p (1 − p)x in(t) i = + i , x in(i) = 0 dt t t i

Network Science Analytics Degrees, Power Laws and Popularity 40 Solving the differential equation

in I Solve the first-order differential equation for xi (t) (let q = 1 − p)

dx in p + qx in i = i dt t

in I Divide both sides by p + qxi (t) and integrate over t

Z in Z 1 dxi 1 in dt = dt p + qxi dt t

I Solving the integrals, we obtain (c is a constant)

in ln (p + qxi ) = q ln (t) + c

Network Science Analytics Degrees, Power Laws and Popularity 41 Solving the differential equation (cont.)

c I Exponentiating and letting K = e we find 1 ln (p + qx in(t)) = q ln (t) + c ⇒ x in(t) = (Ktq − p) i i q

I To determine the unknown constant K, use the initial condition 1 p 0 = x in(i) = (Ki q − p) ⇒ K = i q i q

in I Hence, the deterministic approximation of di (t) evolves as 1  p  p t q  x in(t) = × tq − p = − 1 i q i q q i

Network Science Analytics Degrees, Power Laws and Popularity 42 Obtaining the degree distribution

I Q: At time t, what fraction F¯(d) of all nodes have in-degree ≥ d? in Approximation: What fraction of all functions xi (t) ≥ d by time t? p t q  x in(t) = − 1 ≥ d i q i

I Can be rewritten in terms of i as

q  −1/q i ≤ t d + 1 p

I By time t there are exactly t nodes in the graph, so the fraction is

q  −1/q F¯(d) = d + 1 = 1 − F (d) p

Network Science Analytics Degrees, Power Laws and Popularity 43 Identifying the power law

I The degree distribution is given by the PDF p(d)

I Recall that the PDF, CDF and CCDF are related, namely

dF (x) dF¯(x) p(x) = = − dx dx

−1/q ¯ h q  i I Differentiating F (d) = p d + 1 yields

−(1+ 1 ) 1 q   q p(d) = d + 1 p p

−(1+1/q) 1 I Showed p(d) ∝ d , a power law with exponent α = 1 + 1−p ⇒ Disclaimer: Relied on heuristic arguments ⇒ Rigorous, probabilistic analysis possible

Network Science Analytics Degrees, Power Laws and Popularity 44 Glossary

I Degree distribution I Characteristic scale

I Erd¨os-Renyimodel I Logarithmic binning

I I Cumulative frequencies

I Law of rare events I Hill estimator and plot

I Right-skewed distribution I Exponential cutoff

I Logarithmic scale I Coupled decisions

I Power law I Preferential attachment model

I Exponential and heavy tails I Rich-gets-richer phenomena

I Scale-free network I Growth equation

Network Science Analytics Degrees, Power Laws and Popularity 45