Random Walk Models of Sparse Graphs

RANDOM WALK MODELS OF SPARSE GRAPHS Peter Orbanz · Columbia University Collaborators Benjamin Reddy · Columbia Daniel M. Roy · Toronto Balázs Szegedy · Alfréd Rényi Institute GRAPH-VALUED DATA Peter Orbanz 2 / 13 EXCHANGEABLE GRAPHS π random graph exchangeable m π adjacency matrix jointly exchangeable Representation theorem (Aldous, Hoover) Any exchangeable graph can be sampled as: U 1. Sample measurable and symmetric function 0 Ui j 0 2 Θ:[0; 1] [0; 1] Ui 2. Sample U1; U2;::: ∼iid Uniform[0; 1] Uj 3. Sample edge i,j ∼ Bernoulli(Θ(Ui; Uj)) 1 1 Peter Orbanz 3 / 13 CONVERGENCE Law of large numbers (Kallenberg, ’99; Lovász & Szegedy, ’06) −−−!weakly ≡ −−−!weakly Lifting Theorem (O. & Szegedy, 2011) There exists a measurable mapping ^ ^ ^ ξ : Wb W such that ξ(θ) 2 [θ]≡ for all θ 2 Wb : set of equivalence classes Peter Orbanz 4 / 13 APPLICATIONS IN STATISTICS 2 Model ⊂ fPθ j θ :[0; 1] ! [0; 1] measurable g Semiparametric Estimate functional of θ Bickel, Chen & Levina (2011) Nonparametric (Bayesian) Prior on θ Lloyd et al (2012) Nonparametric ML ML estimator of θ Wolfe & Olhede (2014) Gao, Lu & Zhou (2014) Parametric e.g. ERGMs (too many) Peter Orbanz 5 / 13 APPLICATIONS IN STATISTICS 2 Model ⊂ fPθ j θ :[0; 1] ! [0; 1] measurable g Semiparametric Estimate functional of θ Bickel, Chen & Levina (2011) Nonparametric (Bayesian) Prior on θ Lloyd et al (2012) Nonparametric ML ML estimator of θ Wolfe & Olhede (2014) Gao, Lu & Zhou (2014) Parametric e.g. ERGMs (too many) Peter Orbanz 5 / 13 MISSPECIFICATION PROBLEM Dense vs sparse A random graph is called 2 I dense if #edges = Ω((#vertices) ). I sparse if #edges = O(#vertices). Most real-world graph data ("networks") is sparse. Exchangeable graphs are dense (or empty) Z # edges observed in G p = Θ(x; y)dxdy ^p = n = p · Ω(n2) = Ω(n2) n # edges in complete graph Sparsiﬁcation (Bollobas & Riordan) Generating sparse graphs: Choose decreasing rate function ρ : N ! R+ 1. U1; U2;::: ∼iid Uniform[0; 1] 2. edge(i; j) ∼ Bernoulli(ρ(#vertices)θ(Ui; Uj)) Peter Orbanz 6 / 13 SAMPLING FROM AN EXCHANGEABLE MODEL Input data Estimate of parameter function Exchangeable sampling 1 Rate ρ(n) = Rate ρ(n) = 1 (rate ρ(n) = 1) log(n) n Peter Orbanz 7 / 13 SAMPLING IN SPARSE GRAPHS Sampling with random walks I Independent subsampling only adequate in dense case I Sparse case: Need to follow existing link structure I This means edges are not conditionally independent A simple random walk model For n = 1;:::; #edges: 1. Choose a vertex vn uniformly at random from the current graph. 2. With probability α, create a new vertex attached to vn. 3. With probability 1 − α: Start a simple random walk at vn. Stop after Poisson(λ) + 1 steps. Connect vn to terminal vertex. Peter Orbanz 8 / 13 PARAMETER EFFECTS Increasing probability of new vertex Increasing probability of long random walk Peter Orbanz 9 / 13 PARAMETER ESTIMATION (with Benjamin Reddy) Probability of random walk in graph G degree matrix graph Laplacian 1 k X −λ λ −1 k+1 −λ∆ P(v ! ujG; λ) = e (D A) = (I − ∆)e k! vu vu k=0 Poisson weight adjacency matrix Parameter estimation I Case 1: History observed N−1 Y P(GN jα; λ) = P(Gn+1jGn; α; λ) n=1 Maximum likelihood estimator derivable from random walk probability. I Case 2: Only ﬁnal graph observed Impute history by sampling. Importance sampling possible. Peter Orbanz 10 / 13 RESULTS:CARTOON Input data Reconstruction exchangeable model preferential attachment model random walk model Peter Orbanz 11 / 13 INVARIANCE UNDER A GROUP extreme points: ergodic measures all invariant measures (e.g. all graphs sampled from graphons) (e.g. all exchangeable graph laws) every invariant measure is mixture of extreme points Consequences I Representation theorem (Aldous-Hoover, de Finetti, etc). I Invariant statistical models are subsets of extreme points. I Law of large numbers: Convergence to extreme points. Open problem What is a statistically useful notion of invariance in sparse graphs? Peter Orbanz 12 / 13 CONCLUSIONS Approach Statistical theory of graphs as statistical theory of random structures. What we have I Exchangeable graphs: Some technical problems aside, statistics carries over. I “Dependent” edge case: I Inference possible in principle. I No general theory. I Excellent prediction results with very simple model. I We cannot prove much about the model. Well-studied invariance properties 1. Exchangeability: Only dense case. 2. Involution invariance: Too weak for statistical purposes. Peter Orbanz 13 / 13.

Random Walk Models of Sparse Graphs

Modern Discrete Probability I

Markov Chain Monte Carlo Estimation of Exponential Random Graph Models

Exploring Healing Strategies for Random Boolean Networks

Percolation Threshold Results on Erdos-Rényi Graphs

Probability on Graphs Random Processes on Graphs and Lattices

3 a Few More Good Inequalities, Martingale Variety 1 3.1 From

Random Boolean Networks As a Toy Model for the Brain

An Introduction to Exponential Random Graph (P*) Models for Social Networks

A Review on Statistical Inference Methods for Discrete Markov Random Fields Julien Stoehr, Richard Everitt, Matthew T

A Review on Statistical Inference Methods for Discrete Markov Random Fields Julien Stoehr

The Random Graph

Critical Points for Random Boolean Networks James F