So, You Think You Have a , Do You? Well Isn't That Special?

Cosma Shalizi

Statistics Department, Carnegie Mellon University

Santa Fe Institute

18 October 2010, NY Machine Learning Meetup 1 Everything good in the talk I owe to my co-authors, Aaron Clauset and 2 Power laws, p(x) ∝ x−α, are cool, but not that cool 3 Most of the studies claiming to nd them use unreliable 19th century methods, and have no value as evidence either way

4 Reliable methods exist, and need only very straightforward mid-20th century statistics

5 Using reliable methods, lots of the claimed power laws disappear, or are at best not proven You are now free to tune me out and turn on social media

Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References Summary

Cosma Shalizi So, You Think You Have a Power Law? 2 Power laws, p(x) ∝ x−α, are cool, but not that cool 3 Most of the studies claiming to nd them use unreliable 19th century methods, and have no value as evidence either way

4 Reliable methods exist, and need only very straightforward mid-20th century statistics

5 Using reliable methods, lots of the claimed power laws disappear, or are at best not proven You are now free to tune me out and turn on social media

Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References Summary

1 Everything good in the talk I owe to my co-authors, Aaron Clauset and Mark Newman

Cosma Shalizi So, You Think You Have a Power Law? 3 Most of the studies claiming to nd them use unreliable 19th century methods, and have no value as evidence either way

4 Reliable methods exist, and need only very straightforward mid-20th century statistics

5 Using reliable methods, lots of the claimed power laws disappear, or are at best not proven You are now free to tune me out and turn on social media

Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References Summary

1 Everything good in the talk I owe to my co-authors, Aaron Clauset and Mark Newman 2 Power laws, p(x) ∝ x−α, are cool, but not that cool

Cosma Shalizi So, You Think You Have a Power Law? 4 Reliable methods exist, and need only very straightforward mid-20th century statistics

5 Using reliable methods, lots of the claimed power laws disappear, or are at best not proven You are now free to tune me out and turn on social media

Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References Summary

1 Everything good in the talk I owe to my co-authors, Aaron Clauset and Mark Newman 2 Power laws, p(x) ∝ x−α, are cool, but not that cool 3 Most of the studies claiming to nd them use unreliable 19th century methods, and have no value as evidence either way

Cosma Shalizi So, You Think You Have a Power Law? 5 Using reliable methods, lots of the claimed power laws disappear, or are at best not proven You are now free to tune me out and turn on social media

Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References Summary

1 Everything good in the talk I owe to my co-authors, Aaron Clauset and Mark Newman 2 Power laws, p(x) ∝ x−α, are cool, but not that cool 3 Most of the studies claiming to nd them use unreliable 19th century methods, and have no value as evidence either way

4 Reliable methods exist, and need only very straightforward mid-20th century statistics

Cosma Shalizi So, You Think You Have a Power Law? You are now free to tune me out and turn on social media

Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References Summary

1 Everything good in the talk I owe to my co-authors, Aaron Clauset and Mark Newman 2 Power laws, p(x) ∝ x−α, are cool, but not that cool 3 Most of the studies claiming to nd them use unreliable 19th century methods, and have no value as evidence either way

4 Reliable methods exist, and need only very straightforward mid-20th century statistics

5 Using reliable methods, lots of the claimed power laws disappear, or are at best not proven

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References Summary

1 Everything good in the talk I owe to my co-authors, Aaron Clauset and Mark Newman 2 Power laws, p(x) ∝ x−α, are cool, but not that cool 3 Most of the studies claiming to nd them use unreliable 19th century methods, and have no value as evidence either way

4 Reliable methods exist, and need only very straightforward mid-20th century statistics

5 Using reliable methods, lots of the claimed power laws disappear, or are at best not proven You are now free to tune me out and turn on social media

Cosma Shalizi So, You Think You Have a Power Law? Pareto (continuous), Zipf or zeta (discrete) Explicitly: α − 1 x −α p(x) = xmin xmin (discrete version involves the Hurwitz zeta function)

Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References What Are Power Law Distributions? Why Care?

p(x) ∝ x−α (continuous) P (X = x) ∝ x−α (discrete) −(α−1) ∴ P (X ≥ x) ∝ x and log p(x) = log C − α log x

Cosma Shalizi So, You Think You Have a Power Law? Explicitly: α − 1 x −α p(x) = xmin xmin (discrete version involves the Hurwitz zeta function)

Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References What Are Power Law Distributions? Why Care?

p(x) ∝ x−α (continuous) P (X = x) ∝ x−α (discrete) −(α−1) ∴ P (X ≥ x) ∝ x and log p(x) = log C − α log x Pareto (continuous), Zipf or zeta (discrete)

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References What Are Power Law Distributions? Why Care?

p(x) ∝ x−α (continuous) P (X = x) ∝ x−α (discrete) −(α−1) ∴ P (X ≥ x) ∝ x and log p(x) = log C − α log x Pareto (continuous), Zipf or zeta (discrete) Explicitly: α − 1 x −α p(x) = xmin xmin (discrete version involves the Hurwitz zeta function)

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References Money, Words, Cities

The three classic power law distributions

Pareto's law: wealth (richest 400 in US, 2003)

Wealth 1.000 0.200 0.050 Survival function 0.010 0.002 1e+09 5e+09 2e+10 5e+10

Net worth (US$)

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References Money, Words, Cities

The three classic power law distributions

Zipf's law: word frequencies (Moby Dick)

Word Frequencies 1e+00 1e−01 1e−02 Survival function 1e−03 1e−04

1 10 100 1000 10000

Number of occurences

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References Money, Words, Cities

The three classic power law distributions Zipf's law: city populations

City Sizes 1e+00 1e−01 1e−02 Survival function 1e−03 1e−04

1e+00 1e+02 1e+04 1e+06

Population US Census (2000)

Cosma Shalizi So, You Think You Have a Power Law? Heavy (fat, long, . . . ) tails: sub-exponential decay of p(x) Extreme inequality (80/20): high proportion of summed values comes from small fraction of samples/population Scale-free: 1x −α p x X s α − ( | ≥ ) = s s i.e., another power law, same α ∴ no typical scale though xmin is the typical value

Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References Properties

Highly right skewed

Cosma Shalizi So, You Think You Have a Power Law? Extreme inequality (80/20): high proportion of summed values comes from small fraction of samples/population Scale-free: 1x −α p x X s α − ( | ≥ ) = s s i.e., another power law, same α ∴ no typical scale though xmin is the typical value

Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References Properties

Highly right skewed Heavy (fat, long, . . . ) tails: sub-exponential decay of p(x)

Cosma Shalizi So, You Think You Have a Power Law? Scale-free: 1x −α p x X s α − ( | ≥ ) = s s i.e., another power law, same α ∴ no typical scale though xmin is the typical value

Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References Properties

Highly right skewed Heavy (fat, long, . . . ) tails: sub-exponential decay of p(x) Extreme inequality (80/20): high proportion of summed values comes from small fraction of samples/population

Cosma Shalizi So, You Think You Have a Power Law? ∴ no typical scale though xmin is the typical value

Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References Properties

Highly right skewed Heavy (fat, long, . . . ) tails: sub-exponential decay of p(x) Extreme inequality (80/20): high proportion of summed values comes from small fraction of samples/population Scale-free: 1x −α p x X s α − ( | ≥ ) = s s i.e., another power law, same α

Cosma Shalizi So, You Think You Have a Power Law? though xmin is the typical value

Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References Properties

Highly right skewed Heavy (fat, long, . . . ) tails: sub-exponential decay of p(x) Extreme inequality (80/20): high proportion of summed values comes from small fraction of samples/population Scale-free: 1x −α p x X s α − ( | ≥ ) = s s i.e., another power law, same α ∴ no typical scale

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References Properties

Highly right skewed Heavy (fat, long, . . . ) tails: sub-exponential decay of p(x) Extreme inequality (80/20): high proportion of summed values comes from small fraction of samples/population Scale-free: 1x −α p x X s α − ( | ≥ ) = s s i.e., another power law, same α ∴ no typical scale though xmin is the typical value

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References Origin Myths

Catchy and mysterious origin myth from physics: Distinct phases co-exist at phase transitions ∴ Each phase can appear by uctuation inside the other, and vice versa ∴ Innite-range correlations in space and time ∴ Central limit theorem breaks down but macroscopic physical quantities are still averages ∴ they must have a scale-free distribution So critical phenomena ⇒ power laws

Cosma Shalizi So, You Think You Have a Power Law? Mixtures of exponentials work too [4]

Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References Origin Myths (cont.)

Deating origin myths: Piles of papers on my oce oor [1, 2, 3] I start new piles at rate λ, so age of piles ∼ Exponential(λ)

All piles start with size xmin Once a pile starts, on average it grows exponentially at rate µ X ∼ Pareto(λ/µ + 1, xmin)

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References Origin Myths (cont.)

Deating origin myths: Piles of papers on my oce oor [1, 2, 3] I start new piles at rate λ, so age of piles ∼ Exponential(λ)

All piles start with size xmin Once a pile starts, on average it grows exponentially at rate µ X ∼ Pareto(λ/µ + 1, xmin) Mixtures of exponentials work too [4]

Cosma Shalizi So, You Think You Have a Power Law? word frequency, protein interaction degree (yeast), metabolic network degree (E. coli), Internet autonomous system network, calls received, intensity of wars, terrorist attack fatalities, bytes per HTTP request, species per genus, # sightings per bird species, population aected by blackouts, sales of best-sellers, population of US cities, area of wildres, solar are intensity, earthquake magnitude, religious sect size, surname frequency, individual net worth, citation counts, # papers authored, # hits per URL, in-degree per URL, # entries in e-mail address books, . . .

Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References

There are lots of claims that things follow power laws, especially in the last ≈ 20 years, especially from physicists

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References

There are lots of claims that things follow power laws, especially in the last ≈ 20 years, especially from physicists word frequency, protein interaction degree (yeast), metabolic network degree (E. coli), Internet autonomous system network, calls received, intensity of wars, terrorist attack fatalities, bytes per HTTP request, species per genus, # sightings per bird species, population aected by blackouts, sales of best-sellers, population of US cities, area of wildres, solar are intensity, earthquake magnitude, religious sect size, surname frequency, individual net worth, citation counts, # papers authored, # hits per URL, in-degree per URL, # entries in e-mail address books, . . .

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References

⇒ Mason Porter's Power Law Shop

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices Denitions and Examples No Really, So What? References

Cosma Shalizi So, You Think You Have a Power Law? Suggests: Take a log-log plot of the histogram, or of the CDF, and Fit an ordinary regression line, then Use tted slope as guess for α, check goodness of t by R2 This is a clever idea for the 1890s Fun fact: statistical physics involves no actual statistics

Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References How do physicists come up with their power laws?

Remember log p(x) = log C − α log x

& similarly for the CDF

Cosma Shalizi So, You Think You Have a Power Law? This is a clever idea for the 1890s Fun fact: statistical physics involves no actual statistics

Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References How do physicists come up with their power laws?

Remember log p(x) = log C − α log x

& similarly for the CDF Suggests: Take a log-log plot of the histogram, or of the CDF, and Fit an ordinary regression line, then Use tted slope as guess for α, check goodness of t by R2

Cosma Shalizi So, You Think You Have a Power Law? for the 1890s Fun fact: statistical physics involves no actual statistics

Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References How do physicists come up with their power laws?

Remember log p(x) = log C − α log x

& similarly for the CDF Suggests: Take a log-log plot of the histogram, or of the CDF, and Fit an ordinary regression line, then Use tted slope as guess for α, check goodness of t by R2 This is a clever idea

Cosma Shalizi So, You Think You Have a Power Law? Fun fact: statistical physics involves no actual statistics

Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References How do physicists come up with their power laws?

Remember log p(x) = log C − α log x

& similarly for the CDF Suggests: Take a log-log plot of the histogram, or of the CDF, and Fit an ordinary regression line, then Use tted slope as guess for α, check goodness of t by R2 This is a clever idea for the 1890s

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References How do physicists come up with their power laws?

Remember log p(x) = log C − α log x

& similarly for the CDF Suggests: Take a log-log plot of the histogram, or of the CDF, and Fit an ordinary regression line, then Use tted slope as guess for α, check goodness of t by R2 This is a clever idea for the 1890s Fun fact: statistical physics involves no actual statistics

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References

Cosma Shalizi So, You Think You Have a Power Law? Histograms: binning always throws away information, adds lots of error log-sized bins are only innitessimally better CDF or rank-size plot: values are not independent; inecient Least-squares line: Not a normalized distribution, All the inferential assumptions for regression fail Always has avoidable error as an estimate of α Easily get large R2 for non-power-law distributions

Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Why Is This Bad?

Cosma Shalizi So, You Think You Have a Power Law? CDF or rank-size plot: values are not independent; inecient Least-squares line: Not a normalized distribution, All the inferential assumptions for regression fail Always has avoidable error as an estimate of α Easily get large R2 for non-power-law distributions

Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Why Is This Bad?

Histograms: binning always throws away information, adds lots of error log-sized bins are only innitessimally better

Cosma Shalizi So, You Think You Have a Power Law? Least-squares line: Not a normalized distribution, All the inferential assumptions for regression fail Always has avoidable error as an estimate of α Easily get large R2 for non-power-law distributions

Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Why Is This Bad?

Histograms: binning always throws away information, adds lots of error log-sized bins are only innitessimally better CDF or rank-size plot: values are not independent; inecient

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Why Is This Bad?

Histograms: binning always throws away information, adds lots of error log-sized bins are only innitessimally better CDF or rank-size plot: values are not independent; inecient Least-squares line: Not a normalized distribution, All the inferential assumptions for regression fail Always has avoidable error as an estimate of α Easily get large R2 for non-power-law distributions

Cosma Shalizi So, You Think You Have a Power Law? ln x 2 1 − ( −µ) p(x) = √ e 2σ2 1 ln xmin−µ x 2 2 ( − Φ( σ )) πσ

Stretched exponential/Weibull: X β ∼ Exponential(λ)

xβ 1 xβ p(x) = βλeλ min xβ− e−λ

Power law with exponential cut-o (negative gamma)

1/L p(x) = (x/L)−αe−x/L Γ(1 − α, xmin/L)

like a power law for x  L, like an exponential for x  L

Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Some Distributions Which Are Not Power Laws

Log-normal: ln X ∼ N (µ, σ2):

Cosma Shalizi So, You Think You Have a Power Law? Stretched exponential/Weibull: X β ∼ Exponential(λ)

xβ 1 xβ p(x) = βλeλ min xβ− e−λ

Power law with exponential cut-o (negative gamma)

1/L p(x) = (x/L)−αe−x/L Γ(1 − α, xmin/L)

like a power law for x  L, like an exponential for x  L

Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Some Distributions Which Are Not Power Laws

Log-normal: ln X ∼ N (µ, σ2):

ln x 2 1 − ( −µ) p(x) = √ e 2σ2 1 ln xmin−µ x 2 2 ( − Φ( σ )) πσ

Cosma Shalizi So, You Think You Have a Power Law? xβ 1 xβ p(x) = βλeλ min xβ− e−λ

Power law with exponential cut-o (negative gamma)

1/L p(x) = (x/L)−αe−x/L Γ(1 − α, xmin/L)

like a power law for x  L, like an exponential for x  L

Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Some Distributions Which Are Not Power Laws

Log-normal: ln X ∼ N (µ, σ2):

ln x 2 1 − ( −µ) p(x) = √ e 2σ2 1 ln xmin−µ x 2 2 ( − Φ( σ )) πσ

Stretched exponential/Weibull: X β ∼ Exponential(λ)

Cosma Shalizi So, You Think You Have a Power Law? 1/L p(x) = (x/L)−αe−x/L Γ(1 − α, xmin/L)

like a power law for x  L, like an exponential for x  L

Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Some Distributions Which Are Not Power Laws

Log-normal: ln X ∼ N (µ, σ2):

ln x 2 1 − ( −µ) p(x) = √ e 2σ2 1 ln xmin−µ x 2 2 ( − Φ( σ )) πσ

Stretched exponential/Weibull: X β ∼ Exponential(λ)

xβ 1 xβ p(x) = βλeλ min xβ− e−λ

Power law with exponential cut-o (negative gamma)

Cosma Shalizi So, You Think You Have a Power Law? like a power law for x  L, like an exponential for x  L

Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Some Distributions Which Are Not Power Laws

Log-normal: ln X ∼ N (µ, σ2):

ln x 2 1 − ( −µ) p(x) = √ e 2σ2 1 ln xmin−µ x 2 2 ( − Φ( σ )) πσ

Stretched exponential/Weibull: X β ∼ Exponential(λ)

xβ 1 xβ p(x) = βλeλ min xβ− e−λ

Power law with exponential cut-o (negative gamma)

1/L p(x) = (x/L)−αe−x/L Γ(1 − α, xmin/L)

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Some Distributions Which Are Not Power Laws

Log-normal: ln X ∼ N (µ, σ2):

ln x 2 1 − ( −µ) p(x) = √ e 2σ2 1 ln xmin−µ x 2 2 ( − Φ( σ )) πσ

Stretched exponential/Weibull: X β ∼ Exponential(λ)

xβ 1 xβ p(x) = βλeλ min xβ− e−λ

Power law with exponential cut-o (negative gamma)

1/L p(x) = (x/L)−αe−x/L Γ(1 − α, xmin/L)

like a power law for x  L, like an exponential for x  L

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References

R^2 values from samples

●● ●● ●● ● ●● ●● ●● ●● ●● ●● ●● ● ● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● 1.0 ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ●● ●● ●● ●● ●● ●● ●● ● ●● ●● ●● ●● ●● ●● ● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ●● ●● ●● ●● ●● ●● ● ●● ●● ●● ●● ●● ● ● ●● ●● ●● ●● ●● ● ●● ●● ●● ●● ● ●● ●● ●● ●● ● ●● ●● ●● ●● ● ●● ●● ●● ●● ● ● ●● ●● ●● ●● ● ●● ●● ●● ●● ● ● ●● ●● ●● ● ● ●● ●● ●● ● ●● ●● ●● ●● ● 0.8 ●● ● ●● ●● ●● ●● ●● ●● ●● ●● ● ●● ●● ●● ● ● ●● ●● ●● ● ●● ●● ●● ● ●● ●● ● ●● ●● ●● ● ●● ●● ●● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ●● ● ●● ●● ● ●● ●● ● 0.6 ●● ●

● R^2 0.4 0.2 0.0

1 10 100 1000 10000

Sample size black=Pareto, blue=lognormal500 replicates at each sample size R2 for a log normal (limiting value > 0.9) Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Abusing linear regression makes the baby Gauss cry

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Blogospheric Navel-Gazing

Shirky [5]: in-degree of weblogs follows a power-law, many consequences for media ecology, etc., etc.

Data via [6]

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Blogospheric Navel-Gazing

Shirky [5]: in-degree of weblogs follows a power-law, many consequences for media ecology, etc., etc.

In-degree distribution of weblogs, late 2003 5e-01 5e-02 Survival function Survival 5e-03 5e-04

1 5 10 50 100 500 1000

In-degree Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices You Can Do Everything with Least Squares, Right? Better Practices Actually, No No Really, So What? Alternative Distributions References Blogospheric Navel-Gazing

Shirky [5]: in-degree of weblogs follows a power-law, many consequences for media ecology, etc., etc.

In-degree distribution of weblogs, late 2003 5e-01 5e-02 Survival function Survival 5e-03 5e-04

1 5 10 50 100 500 1000

In-degree Cosma Shalizi So, You Think You Have a Power Law? n α − 1 X xi L(α, xmin) = n log − α log xmin xmin i=1 n ∂ n X xi L = − log ∂α α − 1 xmin i=1 n 1 αb = + Pn log x x i=1 i / min

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Estimating the Exponent

Use maximum likelihood

Cosma Shalizi So, You Think You Have a Power Law? n ∂ n X xi L = − log ∂α α − 1 xmin i=1 n 1 αb = + Pn log x x i=1 i / min

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Estimating the Exponent

Use maximum likelihood

n α − 1 X xi L(α, xmin) = n log − α log xmin xmin i=1

Cosma Shalizi So, You Think You Have a Power Law? n 1 αb = + Pn log x x i=1 i / min

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Estimating the Exponent

Use maximum likelihood

n α − 1 X xi L(α, xmin) = n log − α log xmin xmin i=1 n ∂ n X xi L = − log ∂α α − 1 xmin i=1

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Estimating the Exponent

Use maximum likelihood

n α − 1 X xi L(α, xmin) = n log − α log xmin xmin i=1 n ∂ n X xi L = − log ∂α α − 1 xmin i=1 n 1 αb = + Pn log x x i=1 i / min

Cosma Shalizi So, You Think You Have a Power Law? Standard error: Var n−1 1 2 O n−2 [αb] = (α − ) + ( ) Ecient: no consistent alternative with less variance In particular, dominates regression 2 Asymptotically Gaussian: (α−1) αb N (α, n ) Ancient: Worked out in the 1950s [7, 8] Computationally trivial

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Properties of the MLE

Consistent: αb → α

Cosma Shalizi So, You Think You Have a Power Law? Ecient: no consistent alternative with less variance In particular, dominates regression 2 Asymptotically Gaussian: (α−1) αb N (α, n ) Ancient: Worked out in the 1950s [7, 8] Computationally trivial

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Properties of the MLE

Consistent: αb → α Standard error: Var n−1 1 2 O n−2 [αb] = (α − ) + ( )

Cosma Shalizi So, You Think You Have a Power Law? 2 Asymptotically Gaussian: (α−1) αb N (α, n ) Ancient: Worked out in the 1950s [7, 8] Computationally trivial

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Properties of the MLE

Consistent: αb → α Standard error: Var n−1 1 2 O n−2 [αb] = (α − ) + ( ) Ecient: no consistent alternative with less variance In particular, dominates regression

Cosma Shalizi So, You Think You Have a Power Law? Ancient: Worked out in the 1950s [7, 8] Computationally trivial

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Properties of the MLE

Consistent: αb → α Standard error: Var n−1 1 2 O n−2 [αb] = (α − ) + ( ) Ecient: no consistent alternative with less variance In particular, dominates regression 2 Asymptotically Gaussian: (α−1) αb N (α, n )

Cosma Shalizi So, You Think You Have a Power Law? Computationally trivial

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Properties of the MLE

Consistent: αb → α Standard error: Var n−1 1 2 O n−2 [αb] = (α − ) + ( ) Ecient: no consistent alternative with less variance In particular, dominates regression 2 Asymptotically Gaussian: (α−1) αb N (α, n ) Ancient: Worked out in the 1950s [7, 8]

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Properties of the MLE

Consistent: αb → α Standard error: Var n−1 1 2 O n−2 [αb] = (α − ) + ( ) Ecient: no consistent alternative with less variance In particular, dominates regression 2 Asymptotically Gaussian: (α−1) αb N (α, n ) Ancient: Worked out in the 1950s [7, 8] Computationally trivial

Cosma Shalizi So, You Think You Have a Power Law? Hill Plot for weblog in-degree 35 30 25 20 ^ α 15 10 5 0

1 5 10 50 100 500 1000

xmin

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization depends on x ; Hill plot [9] αb min

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization depends on x ; Hill plot [9] αb min

Hill Plot for weblog in-degree 35 30 25 20 ^ α 15 10 5 0

1 5 10 50 100 500 1000

xmin

Cosma Shalizi So, You Think You Have a Power Law? Only want the scaling region in the tail anyway Minimize discrepancy between tted and empirical distributions [10]:

x argmin max Pˆ x P x x dmin = | n( ) − ( ; α,b min)| xmin x≥xmin argmin d Pˆ P x = KS ( n, (α,b min)) xmin

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Estimating the Scaling Region

Maximizing likelihood over xmin leads to trouble (try it and see)

Cosma Shalizi So, You Think You Have a Power Law? Minimize discrepancy between tted and empirical distributions [10]:

x argmin max Pˆ x P x x dmin = | n( ) − ( ; α,b min)| xmin x≥xmin argmin d Pˆ P x = KS ( n, (α,b min)) xmin

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Estimating the Scaling Region

Maximizing likelihood over xmin leads to trouble (try it and see) Only want the scaling region in the tail anyway

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Estimating the Scaling Region

Maximizing likelihood over xmin leads to trouble (try it and see) Only want the scaling region in the tail anyway Minimize discrepancy between tted and empirical distributions [10]:

x argmin max Pˆ x P x x dmin = | n( ) − ( ; α,b min)| xmin x≥xmin argmin d Pˆ P x = KS ( n, (α,b min)) xmin

Cosma Shalizi So, You Think You Have a Power Law? top 2.8%

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization 1.0 0.8 0.6 S K d 0.4 0.2 0.0

0 500 1000 1500 2000

xmin Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization 1.0 0.8 0.6 S top 2 8 K . % d 0.4 0.2 0.0

0 268 500 1000 1500 2000

xmin Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization

In-degree distribution of weblogs, late 2003 5e-01 5e-02 Survival function Survival 5e-03 5e-04

1 5 10 50 100 500 1000

In-degree Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization

In-degree distribution of weblogs, late 2003 5e-01 5e-02 Survival function Survival 5e-03 5e-04

1 5 10 50 100 500 1000

In-degree Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization

In-degree distribution of weblogs, late 2003 5e-01 5e-02 Survival function Survival 5e-03 5e-04

1 5 10 50 100 500 1000

In-degree Cosma Shalizi So, You Think You Have a Power Law? You shouldn't use R2 that way for a regression Use a goodness-of-t test! Kolmogorov-Smirnov statistic is nice: for CDFs P, Q

dKS (P, Q) = max |P(x) − Q(x)| x Compare empirical CDF to theoretical one Tabulated p-values, assuming the theoretical CDF isn't estimated Analytic corrections via heroic probability theory [11, pp. 99] or, use the bootstrap, like a civilized person

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Goodness-of-Fit

How can we tell if it's a good t or not, if we can't use R2?

Cosma Shalizi So, You Think You Have a Power Law? Use a goodness-of-t test! Kolmogorov-Smirnov statistic is nice: for CDFs P, Q

dKS (P, Q) = max |P(x) − Q(x)| x Compare empirical CDF to theoretical one Tabulated p-values, assuming the theoretical CDF isn't estimated Analytic corrections via heroic probability theory [11, pp. 99] or, use the bootstrap, like a civilized person

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Goodness-of-Fit

How can we tell if it's a good t or not, if we can't use R2? You shouldn't use R2 that way for a regression

Cosma Shalizi So, You Think You Have a Power Law? Kolmogorov-Smirnov statistic is nice: for CDFs P, Q

dKS (P, Q) = max |P(x) − Q(x)| x Compare empirical CDF to theoretical one Tabulated p-values, assuming the theoretical CDF isn't estimated Analytic corrections via heroic probability theory [11, pp. 99] or, use the bootstrap, like a civilized person

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Goodness-of-Fit

How can we tell if it's a good t or not, if we can't use R2? You shouldn't use R2 that way for a regression Use a goodness-of-t test!

Cosma Shalizi So, You Think You Have a Power Law? Tabulated p-values, assuming the theoretical CDF isn't estimated Analytic corrections via heroic probability theory [11, pp. 99] or, use the bootstrap, like a civilized person

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Goodness-of-Fit

How can we tell if it's a good t or not, if we can't use R2? You shouldn't use R2 that way for a regression Use a goodness-of-t test! Kolmogorov-Smirnov statistic is nice: for CDFs P, Q

dKS (P, Q) = max |P(x) − Q(x)| x Compare empirical CDF to theoretical one

Cosma Shalizi So, You Think You Have a Power Law? Analytic corrections via heroic probability theory [11, pp. 99] or, use the bootstrap, like a civilized person

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Goodness-of-Fit

How can we tell if it's a good t or not, if we can't use R2? You shouldn't use R2 that way for a regression Use a goodness-of-t test! Kolmogorov-Smirnov statistic is nice: for CDFs P, Q

dKS (P, Q) = max |P(x) − Q(x)| x Compare empirical CDF to theoretical one Tabulated p-values, assuming the theoretical CDF isn't estimated

Cosma Shalizi So, You Think You Have a Power Law? or, use the bootstrap, like a civilized person

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Goodness-of-Fit

How can we tell if it's a good t or not, if we can't use R2? You shouldn't use R2 that way for a regression Use a goodness-of-t test! Kolmogorov-Smirnov statistic is nice: for CDFs P, Q

dKS (P, Q) = max |P(x) − Q(x)| x Compare empirical CDF to theoretical one Tabulated p-values, assuming the theoretical CDF isn't estimated Analytic corrections via heroic probability theory [11, pp. 99]

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Goodness-of-Fit

How can we tell if it's a good t or not, if we can't use R2? You shouldn't use R2 that way for a regression Use a goodness-of-t test! Kolmogorov-Smirnov statistic is nice: for CDFs P, Q

dKS (P, Q) = max |P(x) − Q(x)| x Compare empirical CDF to theoretical one Tabulated p-values, assuming the theoretical CDF isn't estimated Analytic corrections via heroic probability theory [11, pp. 99] or, use the bootstrap, like a civilized person

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization

Given: n data points x1:n 1 Estimate α and xmin; ntail = # of data points ≥ xmin ∗ 2 Calculate dKS for data and best-t power law = d 3 Draw n random values b1,... bn as follows: 1 with probability ntail/n, draw from power-law 2 otherwise, pick one of the xi < xmin uniformly 4 Find , x , d for b αb dmin KS 1:n 5 Repeat many times to get distribution of dKS values 6 p-value = fraction of simulations where d ≥ d ∗ For the blogs: p = 6.6 × 10−2

Cosma Shalizi So, You Think You Have a Power Law? ∗IC is sub-optimal here Better: Vuong's normalized log-likelihood-ratio test [12] Two models, θ, ψ

R(ψ, θ) = log pψ(x1:n) − log pθ(x1:n)

R(ψ, θ) > 0 means: the data were more likely under ψ than under θ How much more likely do they need to be?

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Testing Against Alternatives

Compare against alternatives: more statistical power, more substantive information

Cosma Shalizi So, You Think You Have a Power Law? Better: Vuong's normalized log-likelihood-ratio test [12] Two models, θ, ψ

R(ψ, θ) = log pψ(x1:n) − log pθ(x1:n)

R(ψ, θ) > 0 means: the data were more likely under ψ than under θ How much more likely do they need to be?

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Testing Against Alternatives

Compare against alternatives: more statistical power, more substantive information ∗IC is sub-optimal here

Cosma Shalizi So, You Think You Have a Power Law? Two models, θ, ψ

R(ψ, θ) = log pψ(x1:n) − log pθ(x1:n)

R(ψ, θ) > 0 means: the data were more likely under ψ than under θ How much more likely do they need to be?

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Testing Against Alternatives

Compare against alternatives: more statistical power, more substantive information ∗IC is sub-optimal here Better: Vuong's normalized log-likelihood-ratio test [12]

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Testing Against Alternatives

Compare against alternatives: more statistical power, more substantive information ∗IC is sub-optimal here Better: Vuong's normalized log-likelihood-ratio test [12] Two models, θ, ψ

R(ψ, θ) = log pψ(x1:n) − log pθ(x1:n)

R(ψ, θ) > 0 means: the data were more likely under ψ than under θ How much more likely do they need to be?

Cosma Shalizi So, You Think You Have a Power Law? log p (x1 n) − log p (x1 n) n−1 ψ : θ : R(ψ, θ) = n n 1 X pψ(xi ) = log n pθ(xi ) i=1 mean of IID terms so use law of large numbers:   1 pψ(X ) R(ψ, θ) → Eν log = D(νkθ) − D(νkψ) n pθ(X ) R(ψ, θ) > 0 ≈ ψ diverges less from ν than θ does

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Distribution of Likelihood Ratios: Fixed Models

Assume X1, X2,... all IID, with true distribution ν Fix θ and ψ; what is distribution of n−1R(ψ, θ)?

Cosma Shalizi So, You Think You Have a Power Law? mean of IID terms so use law of large numbers:   1 pψ(X ) R(ψ, θ) → Eν log = D(νkθ) − D(νkψ) n pθ(X ) R(ψ, θ) > 0 ≈ ψ diverges less from ν than θ does

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Distribution of Likelihood Ratios: Fixed Models

Assume X1, X2,... all IID, with true distribution ν Fix θ and ψ; what is distribution of n−1R(ψ, θ)?

log p (x1 n) − log p (x1 n) n−1 ψ : θ : R(ψ, θ) = n n 1 X pψ(xi ) = log n pθ(xi ) i=1

Cosma Shalizi So, You Think You Have a Power Law? R(ψ, θ) > 0 ≈ ψ diverges less from ν than θ does

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Distribution of Likelihood Ratios: Fixed Models

Assume X1, X2,... all IID, with true distribution ν Fix θ and ψ; what is distribution of n−1R(ψ, θ)?

log p (x1 n) − log p (x1 n) n−1 ψ : θ : R(ψ, θ) = n n 1 X pψ(xi ) = log n pθ(xi ) i=1 mean of IID terms so use law of large numbers:   1 pψ(X ) R(ψ, θ) → Eν log = D(νkθ) − D(νkψ) n pθ(X )

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Distribution of Likelihood Ratios: Fixed Models

Assume X1, X2,... all IID, with true distribution ν Fix θ and ψ; what is distribution of n−1R(ψ, θ)?

log p (x1 n) − log p (x1 n) n−1 ψ : θ : R(ψ, θ) = n n 1 X pψ(xi ) = log n pθ(xi ) i=1 mean of IID terms so use law of large numbers:   1 pψ(X ) R(ψ, θ) → Eν log = D(νkθ) − D(νkψ) n pθ(X ) R(ψ, θ) > 0 ≈ ψ diverges less from ν than θ does

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization

Use CLT: 1 √ √ R(ψ, θ) N ( n(D(νkθ) − D(νkψ)), ω2 ) n ψ,θ where  p (X ) 2 Var log ψ ωψ,θ = pθ(X ) so if the models are equally good, we get a mean-zero Gaussian but if one is better R(ψ, θ) → ±∞, depending

Cosma Shalizi So, You Think You Have a Power Law? Everything works out as if no estimation: 1 √ ∗ ∗ 2 √ R(ψ,ˆ θˆ) N ( n(D(νkθ ) − D(νkψ )), ω ∗ ∗ ) n ψ ,θ 1 ˆ ˆ D ∗ D ∗ n R(ψ, θ) → (νkθ ) − (νkψ )  p (X ) 2 Var log ψ 2 ωb ≡ sample → ωψ∗,θ∗ pθ(X )

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Distribution of R with Estimated Models

two classes of models Ψ, Θ; ψ,ˆ θˆ = ML estimated models ψˆ → ψ∗, θˆ → θ∗: converging to pseudo-truth; ψ∗ 6= θ∗ some regularity assumptions

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Distribution of R with Estimated Models

two classes of models Ψ, Θ; ψ,ˆ θˆ = ML estimated models ψˆ → ψ∗, θˆ → θ∗: converging to pseudo-truth; ψ∗ 6= θ∗ some regularity assumptions Everything works out as if no estimation: 1 √ ∗ ∗ 2 √ R(ψ,ˆ θˆ) N ( n(D(νkθ ) − D(νkψ )), ω ∗ ∗ ) n ψ ,θ 1 ˆ ˆ D ∗ D ∗ n R(ψ, θ) → (νkθ ) − (νkψ )  p (X ) 2 Var log ψ 2 ωb ≡ sample → ωψ∗,θ∗ pθ(X )

Cosma Shalizi So, You Think You Have a Power Law? If the two models are really equally close to the truth,

R √ N (0, 1) n 2 ωb but if one is better, normalized log likelihood ratio goes to ±∞, telling you which is better Don't need to adjust for parameter #, but any o(n) adjustment is ne; [13] is probably better than ∗IC Does not assume that truth is in either Ψ or Θ Does assume ψ∗ 6= θ∗

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Vuong's Test for Non-Nested Model Classes

Assume all conditions from before

Cosma Shalizi So, You Think You Have a Power Law? Don't need to adjust for parameter #, but any o(n) adjustment is ne; [13] is probably better than ∗IC Does not assume that truth is in either Ψ or Θ Does assume ψ∗ 6= θ∗

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Vuong's Test for Non-Nested Model Classes

Assume all conditions from before If the two models are really equally close to the truth,

R √ N (0, 1) n 2 ωb but if one is better, normalized log likelihood ratio goes to ±∞, telling you which is better

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Vuong's Test for Non-Nested Model Classes

Assume all conditions from before If the two models are really equally close to the truth,

R √ N (0, 1) n 2 ωb but if one is better, normalized log likelihood ratio goes to ±∞, telling you which is better Don't need to adjust for parameter #, but any o(n) adjustment is ne; [13] is probably better than ∗IC Does not assume that truth is in either Ψ or Θ Does assume ψ∗ 6= θ∗

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Back to Blogs

Fit a log-normal to the same tail (to give the advantage to power law)

R(power law, log − normal) = −0.85 0 098 ωb = . R √ = −0.83 n 2 ωb so the log-normal ts better, but not by much  we'd see uctuations at least that big 41% of the time if they were equally good

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Fitting a log-normal to the complete data

In-degree distribution of weblogs, late 2003 5e-01 5e-02 Survival function Survival 5e-03 5e-04

1 5 10 50 100 500 1000

Cosma Shalizi In-degreeSo, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Fitting a log-normal to the complete data

In-degree distribution of weblogs, late 2003 5e-01 5e-02 Survival function Survival 5e-03 5e-04

1 5 10 50 100 500 1000

Cosma Shalizi In-degreeSo, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Fitting a log-normal to the complete data

In-degree distribution of weblogs, late 2003 5e-01 5e-02 Survival function Survival 5e-03 5e-04

1 5 10 50 100 500 1000

Cosma Shalizi In-degreeSo, You Think You Have a Power Law? Have a reference distribution, CDF F0 (or just a reference sample) and a comparison sample y1,... yn Construct relative data

ri = F0(yi ) relative CDF: −1 G(r) = F (F0 (r)) relative density f F −1 r g r ( 0 ( )) ( ) = −1 f0(F0 (r))

Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Visualization

Beyond the log-log plot: Handcock and Morris's relative distribution [14, 15] Compare two whole distributions, not just mean/variance etc.

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Visualization

Beyond the log-log plot: Handcock and Morris's relative distribution [14, 15] Compare two whole distributions, not just mean/variance etc. Have a reference distribution, CDF F0 (or just a reference sample) and a comparison sample y1,... yn Construct relative data

ri = F0(yi ) relative CDF: −1 G(r) = F (F0 (r)) relative density f F −1 r g r ( 0 ( )) ( ) = −1 f0(F0 (r))

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization

Relative data are uniform ⇔ distributions are the same g(r) tells us where and how the distributions dier Can estimate G(r) by empirical CDF of ri Can estimate g(r) by non-parametric density estimation on ri Invariant under any monotone transformation of the data (multiplication, taking logs, etc.) Related to Neyman's smooth test of goodness-of-t Can adjust for covariates exibly [15] R package: reldist, from CRAN

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization Relative Distribution with Power Laws

1 Estimate power law distribution from data

2 Use that as the reference distribution

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Estimating the Exponent Bad Practices Estimating the Scaling Region Better Practices Goodness-of-Fit No Really, So What? Testing Against Alternatives References Visualization

270 300 340 410 560 2100 1.4 1.2 1.0 0.8 Relative Density Relative 0.6 0.4 0.2

0.0 0.2 0.4 0.6 0.8 1.0

Reference proportion Cosma Shalizi So, You Think You Have a Power Law? word frequency, protein interaction degree (yeast), metabolic network degree (E. coli), Internet autonomous system network, calls received, intensity of wars, terrorist attack fatalities, bytes per HTTP request, species per genus, # sightings per bird species, population aected by blackouts, sales of best-sellers, population of US cities, area of wildres, solar are intensity, earthquake magnitude, religious sect size, surname frequency, individual net worth, citation counts, # papers authored, # hits per URL, in-degree per URL, # entries in e-mail address books Of these, the only clear power law is word frequency The rest: indistinguishable from log-normal and/or stretched exponential; and/or cut-o signicantly better than pure power law; and/or goodness-of-t is just horrible

Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References How Bad Is the Literature?

[10] looked at 24 claimed power laws

Cosma Shalizi So, You Think You Have a Power Law? Of these, the only clear power law is word frequency The rest: indistinguishable from log-normal and/or stretched exponential; and/or cut-o signicantly better than pure power law; and/or goodness-of-t is just horrible

Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References How Bad Is the Literature?

[10] looked at 24 claimed power laws word frequency, protein interaction degree (yeast), metabolic network degree (E. coli), Internet autonomous system network, calls received, intensity of wars, terrorist attack fatalities, bytes per HTTP request, species per genus, # sightings per bird species, population aected by blackouts, sales of best-sellers, population of US cities, area of wildres, solar are intensity, earthquake magnitude, religious sect size, surname frequency, individual net worth, citation counts, # papers authored, # hits per URL, in-degree per URL, # entries in e-mail address books

Cosma Shalizi So, You Think You Have a Power Law? The rest: indistinguishable from log-normal and/or stretched exponential; and/or cut-o signicantly better than pure power law; and/or goodness-of-t is just horrible

Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References How Bad Is the Literature?

[10] looked at 24 claimed power laws word frequency, protein interaction degree (yeast), metabolic network degree (E. coli), Internet autonomous system network, calls received, intensity of wars, terrorist attack fatalities, bytes per HTTP request, species per genus, # sightings per bird species, population aected by blackouts, sales of best-sellers, population of US cities, area of wildres, solar are intensity, earthquake magnitude, religious sect size, surname frequency, individual net worth, citation counts, # papers authored, # hits per URL, in-degree per URL, # entries in e-mail address books Of these, the only clear power law is word frequency

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References How Bad Is the Literature?

[10] looked at 24 claimed power laws word frequency, protein interaction degree (yeast), metabolic network degree (E. coli), Internet autonomous system network, calls received, intensity of wars, terrorist attack fatalities, bytes per HTTP request, species per genus, # sightings per bird species, population aected by blackouts, sales of best-sellers, population of US cities, area of wildres, solar are intensity, earthquake magnitude, religious sect size, surname frequency, individual net worth, citation counts, # papers authored, # hits per URL, in-degree per URL, # entries in e-mail address books Of these, the only clear power law is word frequency The rest: indistinguishable from log-normal and/or stretched exponential; and/or cut-o signicantly better than pure power law; and/or goodness-of-t is just horrible

Cosma Shalizi So, You Think You Have a Power Law? e.g., a dozen years of theorizing why animal foraging patterns should follow a power law, after [16], when they don't [17] Decision-makers waste resources planning for power laws which don't exist

Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References What's Bad About Hallucinating Power Laws?

Scientists should not try to explain things which don't happen

Cosma Shalizi So, You Think You Have a Power Law? Decision-makers waste resources planning for power laws which don't exist

Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References What's Bad About Hallucinating Power Laws?

Scientists should not try to explain things which don't happen e.g., a dozen years of theorizing why animal foraging patterns should follow a power law, after [16], when they don't [17]

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References What's Bad About Hallucinating Power Laws?

Scientists should not try to explain things which don't happen e.g., a dozen years of theorizing why animal foraging patterns should follow a power law, after [16], when they don't [17] Decision-makers waste resources planning for power laws which don't exist

Cosma Shalizi So, You Think You Have a Power Law? Then don't say that it's a power law Do look at density estimation methods for heavy-tailed distributions [18, 19] Data-independent transformation from [0, ∞) to [0, 1] Nonparametric density estimate on [0, 1] Inverse transform

Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References Does It Really Matter Whether It's a Power Law?

Maybe all that matters is that the distribution has a heavy tail Probably true for Shirky

Cosma Shalizi So, You Think You Have a Power Law? Do look at density estimation methods for heavy-tailed distributions [18, 19] Data-independent transformation from [0, ∞) to [0, 1] Nonparametric density estimate on [0, 1] Inverse transform

Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References Does It Really Matter Whether It's a Power Law?

Maybe all that matters is that the distribution has a heavy tail Probably true for Shirky Then don't say that it's a power law

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References Does It Really Matter Whether It's a Power Law?

Maybe all that matters is that the distribution has a heavy tail Probably true for Shirky Then don't say that it's a power law Do look at density estimation methods for heavy-tailed distributions [18, 19] Data-independent transformation from [0, ∞) to [0, 1] Nonparametric density estimate on [0, 1] Inverse transform

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References The Correct Line

1 Lots of distributions give straightish log-log plots

2 Regression on log-log plots is bad; don't do it, and don't believe those who do it.

3 Use maximum likelihood to estimate the scaling exponent

4 Use goodness of t to estimate the scaling region

5 Use goodness of t tests to check goodness of t

6 Use Vuong's test to check alternatives

7 Ask yourself whether you really care

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References [1] Herbert A. Simon. On a class of skew distribution functions. Biometrika, 42:425440, 1955. URL http://www.jstor.org/pss/2333389. [2] Yuji Ijiri and Herbert A. Simon. Skew Distributions and the Sizes of Business Firms. North-Holland, Amsterdam, 1977. With Charles P. Bonini and Theodore A. van Wormer. [3] William J. Reed and Barry D. Hughes. From gene families and genera to incomes and Internet le sizes: Why power laws are so common in . Physical Review E, 66:067103, 2002. doi: 10.1103/PhysRevE.66.067103. [4] B. A. Maguire, E. S. Pearson, and A. H. A. Wynn. The time intervals between industrial accidents. Biometrika, 39: 168180, 1952. URL http://www.jstor.org/pss/2332475. [5] Clay Shirky. Power laws, weblogs, and inequality. In Mitch Ratclie and Jon Lebkowsky, editors, Extreme Democracy, Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References forthcoming. URL http: //www.shirky.com.writings/powerlaw_weblog.html. [6] Henry Farrell and Daniel Drezner. The power and politics of blogs. Public Choice, 134:1530, 2008. URL http://www. utsc.utoronto.ca/~farrell/blogpaperfinal.pdf. [7] A. N. M. Muniruzzaman. On measures of location and dispersion and tests of hypotheses in a Pareto population. Bulletin of the Calcutta Statistical Association, 7:115123, 1957. [8] H. L. Seal. The maximum likelihood tting of the discrete Pareto law. Journal of the Institute of Actuaries, 78:115121, 1952. URL http://www.actuaries.org.uk/files/pdf/ library/JIA-078/0115-0121.pdf. [9] B. M. Hill. A simple general approach to inference about the tail of a distribution. Annals of Statistics, 3:11631174, 1975. Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References URL http://projecteuclid.org/euclid.aos/1176343247. [10] Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Review, 51: 661703, 2009. URL http://arxiv.org/abs/0706.1062. [11] David Pollard. Convergence of Stochastic Processes. Springer Series in Statistics. Springer-Verlag, New York, 1984. URL http://www.stat.yale.edu/~pollard/1984book/. [12] Quang H. Vuong. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57:307333, 1989. URL http://www.jstor.org/pss/1912557. [13] Hwan-sik Choi and Nicholas M. Kiefer. Dierential geometry and bias correction in nonnested hypothesis testing. Online preprint, 2006. URL http://www.arts.cornell.edu/econ/ kiefer/GeometryMS6.pdf.

Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References [14] Mark S. Handcock and Martina Morris. Relative distribution methods. Sociological Methodology, 28:5397, 1998. URL http://www.jstor.org/pss/270964. [15] Mark S. Handcock and Martina Morris. Relative Distribution Methods in the Social Sciences. Springer-Verlag, Berlin, 1999. [16] G. M. Viswanathan, V. Afanasyev, S. V. Buldyrev, E. J. Murphy, P. A. Prince, and H. E. Stanley. Lévy ight search patterns of wandering albatrosses. Nature, 381:413415, 1996. URL http://polymer.bu.edu/hes/articles/vabmps96.pdf. [17] Andrew M. Edwards, Richard A. Phillips, Nicholas W. Watkins, Mervyn P. Freeman, Eugene J. Murphy, Vsevolod Afanasyev, Sergey V. Buldyrev, M. G. E. da Luz, E. P. Raposo, H. Eugene Stanley, and Gandhimohan M. Viswanathan. Revisiting lévy ight search patterns of wandering albatrosses, bumblebees and deer. Nature, 449: Cosma Shalizi So, You Think You Have a Power Law? Power Laws: What? So What? Bad Practices Better Practices No Really, So What? References 10441048, 2007. doi: 10.1038/nature06199. URL http: //polymer.bu.edu/hes/articles/epwfmabdrsgv07.pdf. [18] Natalia M. Markovitch and Udo R. Krieger. Nonparametric estimation of long-tailed density functions and its application to the analysis of World Wide Web trac. Performance Evaluation, 42:205222, 2000. doi: 10.1016/S0166-5316(00)00031-6. [19] Natalia Markovich. Nonparametric Analysis of Univariate Heavy-Tailed Data: Research and Practice. John Wiley, New York, 2007.

Cosma Shalizi So, You Think You Have a Power Law?