Applications of Dense Graph Limits in Probability and Statistics

Applications of dense graph limits in probability and statistics Sourav Chatterjee (Courant Institute, NYU) Based on joint works with Persi Diaconis and S. R. S. Varadhan Sourav Chatterjee Dense graph limits in probability and statistics Real world networks I The last decade has seen an explosion in the study of real world networks, e.g. rail and road networks, biochemical networks, data communication networks such as the Internet, and social networks. I Concerted interdisciplinary effort to develop new mathematical network models to explain characteristics of observed real world networks, such as power law degree behavior, small world properties, and a high degree of clustering. I Clustering/transitivity/reciprocity refers to the prevalence of triangles in a graph. I That is, a friend of a friend is more likely to be a friend than a random individual. I Most of the popular network models, such as the preferential attachment and the configuration models, are locally tree-like and thus do not model the transitivity observed in real social networks. Sourav Chatterjee Dense graph limits in probability and statistics Exponential Random Graphs I In the social science literature, efforts to mathematically model transitivity have centered around the so-called Exponential Random Graph Models (ERGM), also called p∗ models. I Statistically, ERGM's are exponential families of distributions on the space of graphs on a given number of vertices. I The sufficient statistics are usually simple graph parameters, such as the number of edges, number of triangles, etc. I Notable early papers due to Holland and Leinhardt (1981), Frank and Strauss (1986). General development in the book of Wasserman and Faust. Recent progresses in Handcock (2003), Snijders et. al. (2006), Park and Newman (2004, 2005), etc. Sourav Chatterjee Dense graph limits in probability and statistics Example I Consider the model on simple graphs with n vertices, β p (G) = exp β E + 2 ∆ − n2 (β ; β ) β1,β2 1 n n 1 2 where E, ∆ denote the number of edges and triangles in the graph G, and n is the normalizing constant. I The normalization of the model ensures non-trivial large n limits. Without scaling, for large n, almost all graphs are empty or full. I This model is studied by Strauss (1986), Park and Newman (2004, 2005), Häggstromand Jonasson (1999), and many others. Sourav Chatterjee Dense graph limits in probability and statistics Challenges I For thirty years, nothing much could be done mathematically with these models. For example, no formula for n, no rigorously proven information about qualitative behavior. I Approximation of the normalizing constant, necessary for evaluation of maximum likelihood estimates, is usually done with the aid of Markov Chain Monte Carlo. I But: Bhamidi, Bresler and Sly (2008) have shown that, depending on β1 and β2, I either the model behaves like an Erd}os-Rényirandom graph (the uninteresting case), I or the usual MCMC algorithms take exponentially long time to converge. I Alternative (widely used) approach via pseudo-likelihood methods. But: properties are poorly understood, and appreciably higher variability than MLE. Sourav Chatterjee Dense graph limits in probability and statistics An asymptotic formula for the edge-triangle model Recall the model: β p (G) = exp β E + 2 ∆ − n2 (β ; β ) ; β1,β2 1 n n 1 2 where E and ∆ are the number of edges and triangles in G and n is the normalizing constant. Theorem (Chatterjee & Diaconis, 2011) There is a negative constant −c, depending on β1, such that when −c < β2 < 1, lim n(β1; β2) n!1 β u β u3 1 1 = sup 1 + 2 − u log u − (1 − u) log(1 − u) : 0≤u≤1 2 6 2 2 There is another negative constant −d, again depending on β1, such that the formula is not valid if β2 < −d. Sourav Chatterjee Dense graph limits in probability and statistics The symmetric phase and symmetry breaking I The region where the formula is valid is called the symmetric phase in our paper. Partially identified in an earlier work of Chatterjee & Dey (2009). I In the symmetric phase, we prove that the model behaves essentially like an Erd}os-Rényigraph (i.e. independent edges) with edge-probability u∗, where u∗ is the value of u that solves the maximization problem in the theorem. I When β2 < −d, we prove that the model stops behaving like an Erd}os-Rényimodel and enters the region of broken symmetry. I We do not understand this region very well. We can only prove that when β2 is large and negative, the random graphs generated from the model look approximately like bipartite graphs. Sourav Chatterjee Dense graph limits in probability and statistics Large negative β2 2 5 6 7 8 10 13 14 17 18 20 1 349 11 12 15 16 19 Figure: A simulated realization of the exponential random graph model on 20 nodes with edges and triangles as sufficient statistics, where β1 = 120 and β2 = −400. (Picture by Sukhada Fadnavis.) Sourav Chatterjee Dense graph limits in probability and statistics Degeneracy I Researchers, e.g. Handcock (2003), have observed that models like the edge-triangle model tend to exhibit a certain degeneracy: As the parameter values vary, the random graphs are either very sparse, or almost complete, skipping all intermediate structures. I We have a theorem that gives a proof of this phenomenon. Sourav Chatterjee Dense graph limits in probability and statistics Rigorous result about degeneracy Theorem (Chatterjee & Diaconis, 2011) Let Gn be a random graph from the edge-triangle model. Fix β1 < 0. Let eβ1=2 1 c := ; c := 1 + : 1 β =2 2 1 + e 1 β1 Suppose jβ1j is so large that c1 < c2. Let e(Gn) be the number of n edges in Gn and let f (Gn) := e(Gn)= 2 be the edge density. Then there exists q(β1) such that if β2 < q(β1), then as n ! 1, P(f (Gn) > c1) ! 0; and if β2 > q(β1), then P(f (Gn) < c2) ! 0. Remark. The difference in the values of c1 and c2 can be quite striking even for relatively small values of β1. For example, β1 = −10 gives c1 ' 0:007 and c2 = 0:9. Sourav Chatterjee Dense graph limits in probability and statistics Phase transitions and degeneracy 1.0 1.0 1.0 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.6 0.6 0.6 0.5 0.4 0.5 0.4 0.4 0.2 0.3 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 (b) β1 = −0:35 (a) β1 = −0:45 (b) β1 = −0:8 Figure: Plot of asymptotic edge density (on y-axis) vs. β2 (on x-axis) for three different values of β1. More progress on this recently by Charles Radin and Mei Yin. Sourav Chatterjee Dense graph limits in probability and statistics General formula for n Theorem (Chatterjee & Diaconis, 2011) For any β1 and β2, lim n(β1; β2) n!1 β ZZ β ZZZ = sup 1 f (x; y)dxdy + 2 f (x; y)f (y; z)f (z; x)dxdydz f 2 6 1 ZZ − f (x; y) log f (x; y) + (1 − f (x; y)) log(1 − f (x; y)) dxdy ; 2 where the supremum is over all measurable f : [0; 1]2 ! [0; 1] satisfying f (x; y) = f (y; x) and the integrals are from 0 to 1. Remarks. (a) The symmetric phase is where the maximizer is a constant function. (b) There is a general version of this theorem in our paper which applies to essentially all exponential random graph models. Sourav Chatterjee Dense graph limits in probability and statistics First step: counting graphs with a given property n(n−1)=2 I 2 simple graphs on n vertices. I Question: Given a property P and an integer n, roughly how many of these graphs have property P? 3 I For example, P may be: #triangles ≥ tn , where t is a given constant. I How this helps: The ability to count the number of graphs with a given number of triangles and a given number of edges will lead to the evaluation of the normalizing constant in the edge-triangle model. I To make any progress, need to assume some regularity on P. For example, we may demand that P be continuous or at least measurable with respect to some metric. I What metric? What space? Sourav Chatterjee Dense graph limits in probability and statistics An abstract topological space of graphs I Beautiful unifying theory developed by Laszlo Lovászand coauthors V. T. Sós,B. Szegedy, C. Borgs, J. Chayes, K. Vesztergombi, A. Schrijver and M. Freedman in the last six years. Related to earlier works of Aldous, Hoover, Kallenberg. I Let Gn be a sequence of simple graphs whose number of nodes tends to infinity. I For every fixed simple graph H, let hom(H; G) denote the number of homomorphisms of H into G (i.e. edge-preserving maps V (H) ! V (G), where V (H) and V (G) are the vertex sets). I This number is normalized to get the homomorphism density hom(H; G) t(H; G) := : jV (G)jjV (H)j This gives the probability that a random mapping V (H) ! V (G) is a homomorphism. Sourav Chatterjee Dense graph limits in probability and statistics Abstract space of graphs contd. I Suppose that t(H; Gn) tends to a limit t(H) for every H. I Then Lovász& Szegedy proved that there is a natural \limit object" in the form of a function f 2 W, where W is the space of all measurable functions from [0; 1]2 into [0; 1] that satisfy f (x; y) = f (y; x) for all x; y.

Applications of Dense Graph Limits in Probability and Statistics

Packing and Covering Dense Graphs

Embedding Trees in Dense Graphs

Fractional Arboricity, Strength, and Principal Partitions in Graphs and Matroids

Limits of Graph Sequences

The $ K $-Strong Induced Arboricity of a Graph

Sparse Exchangeable Graphs and Their Limits Via Graphon Processes

4. Elementary Graph Algorithms in External Memory∗

Sparse Graphs

Finding Bipartite Subgraphs Efficiently

Secret-Sharing Schemes for Very Dense Graphs

Dependent Random Choice

Percolation on Dense Graph Sequences