Random Geometric Graphs Efficiently

arXiv:cond-mat/0203026v2 [cond-mat.stat-mech] 8 May 2002 rtosi h cetfccmuiy[] trans- [6], collab- community 5], scientific 4, the [3, very in WWW on orations data the where like 2] networks [1, large years exploded five has last networks the over complex in interest The Introduction 1 graph scaling, graphs, bi-partitioning. random transitions, WORDS KEY included. are PACS bi-partitioning graph for Insights random relevant dimensionality. standard infinite the for from even graphs graphs, for different the distinctly cluster. expression that shows are largest analytical which the coefficient an cluster of derive size numerically present. We the found are examining is only by connectivity points and critical adjacent The dimensionality geometric is between arbitrary a vertex edges in each of which coordinates space in random graphs assigned analyse We Abstract ∗ mi:[email protected] Email: 51.n 46.k 89.75.Da 64.60.Ak, 05.10.Ln, : ewrs eclto,phase percolation, Networks, : yikIsiu,SUOes Universitet SDU–Odense Institut, Fysisk eprDall Jesper admGoercGraphs Geometric Random apse 5 K53 dneM Odense DK–5230 55, Campusvej eray1 2008 1, February ∗ n ihe Christensen Michael and Denmark 1 ayhg lseigadlwaeaept length, path neces- average low the and argue clustering have high could sary lattices One high-dimensional models networks. that world poor are real forms many pure of their lat- in and models lattices. tice graphs in like random by much individually, and localness, However, of sites, degree two small high av- any a graphs, short between random by distances characterized Like erage are networks 22]. character- world 21, world small [8, possess manyistics networks that shown the consistently of been have has It topology physi- from and cists. primarily 20], attention, 19, much attracted 18, [11, growth 17], mod- network of 15]. spectrum at [14, the be els of to end said other often the therefore are myopic Lattices a usually world. reflecting are lattice neighbours, a nearest inter- 11, in between the [4, sites graphs the random between to place actions contrast In taking on 13]. been research 12, has theoretical graphs intense fortyrandom [10], and Erdös than ago since more years Ever work groundbraking Rényi’s [9]. networks complex etc. accessible. [8] become collaborations have actor movie [7], portation rpriso elntok ierbsns [16, robustness like networks real of Properties model to used often are graphs Random 0.7 though this has not been explored much [23]. In N = 103 N = 104 the current paper we provide results on high- 0.6 N = 105 dimensional systems. N = 106 A random geometric graph (RGG) is a ran- 0.5 N = 107 8 dom graph with a metric. It is constructed by N = 10 0.4 theory

assigning each vertex random coordinates in a d- G dimensional box of volume 1, i.e. each coordinate 0.3 0.98 0.99 1 1.01 1.02 is drawn from a uniform distribution on the unit 0.04 0.2 interval. RGGs have been used sporadically in 0.02 real networks modeling [24] and extensively in 0.1 continuum percolation [25, 26, 27, 28, 29], but 0 0 0.75 1 1.25 1.5 1.75 almost exclusively in two and three dimensions. α Although RGGs are the continuum version of lattices, they deserve some attention of their Figure 1: The size of the largest cluster in random own, since percolating continuum systems dis- graphs as a function of the connectivity. Note that play behaviour that lattices are incapable of [30]. for N > 106 the Monte Carlo data is almost indis- In addition, the connectivity in RGGs can be in- tinguishable from the theoretical result in Eq. (3). creased in a more natural way than by adding Error bars are not shown since they are in all cases new bonds randomly in lattices. less than the width of the lines. Inset: A closer look at the percolation threshold α = 1. Recently, continuum percolation has been c used in the study of the stretched exponential de- cay of the correlation function in random walks ical region. Furthermore, an expression for the on fractals and the conjectured relation to relax- cluster coefficient, a quantity that has attracted ation in complex systems [31]. However, con- much interest in network theory recently, is de- tinuous systems in general and RGGs in par- rived. Results relevant for graph bi-partitioning ticular are relevant whenever we need a multi- are established. Finally, we discuss how to im- dimensional system with a metric, as for exam- plement random geometric graphs efficiently. ple when modeling the spread of diseases [32]. The layout of this paper is as follows. In Sec- In this paper we study RGGs in arbitrary di- tion 2 and 3 we describe random graphs and ran- mensions. In low dimensions the systems are dom geometric graphs, respectively. In Section 4 dominated by local interactions. For higher di- we present our results, and Section 5 contains the mensions RGGs are usually believed to approach details regarding the implementation. Finally, in standard random graphs, which we show is true Section 6 we sum up. only in some respects. We focus on ‘phase transitions’ [13, 33, 34] at the percolation threshold by looking at the size of the largest cluster, 2 Random Graphs and we determine how the value of the critical parameter in RGGs approaches that of random Random graphs consist of N vertices graphs as the dimension increases. We also ex- (points/sites) and K edges (lines) where tract the distribution of cluster sizes in the crit- each possible edge is present with probability p,

2 1 i.e. K = pN(N 1)/2. To keep the discussion 1 − independent of the system size N, graphs are often characterized by the connectivity (degree) α = 2K/N = pN, i.e. the average number of 0.75 connections per vertex, instead of K or p. As the connectivity increases clusters of vertices 0.5 appear, where a cluster consists of all vertices linked together by edges, directly or indirectly.

The size of the largest cluster in the macro- 0.25 scopic limit N can be calculated analyti- →∞ cally [10, 12]. It is NG(α), where

0 n 1 0 0.25 0.5 0.75 1 1 ∞ n − α n G(α) = 1 (αe− ) . (1) α n! − n=1 Figure 2: A 2D random geometric graph with N = X 500 and α = 5. The graph is bi-partitioned—see Sec- By the use of [12] tion 4. There are no edges across the boundaries, i.e. the boundary conditions are open, not continuous. n 1 ∞ n y = − xn x = yey (2) n! ⇔ nX=1 N, which is exactly the limit in which we are in- we can invert Eq. (1), getting terested. The critical connectivity αc for graphs with arbitrary random degree distribution pk has 1 α(G)= log(1 G). (3) recently been derived by other techniques than −G − those originally leading to Eq. (1) [4, 36, 37]. Un- fortunately, we cannot use these results in con- from which it is trivial to show that αc = 1. With Eq. (3) it is an easy task to plot the frac- nection with random geometric graphs, as will tion of vertices in the largest cluster—the giant become clear in the next section. component—as done in Fig. 1, where we see the prototype of a phase transition in combinatorial problems. 3 Random Geometric Graphs In random graphs the probability distribution A d-dimensional random geometric graph of edges p is binomial k (RGG) is a graph where each of the N vertices is k α assigned random coordinates in the box [0, 1]d, N k N k α e− pk = p (1 p) − , (4) and only points ‘close’ to each other are con- k ! − ≃ k! nected by an edge. The degree distribution of where the approximation resulting in the Pois- a RGG with average connectivity α is therefore son distribution is valid for large systems sizes given by Eq. (4) as well. However, a RGG is 1 a special kind of random graph with properties From here on we consider N ≃ N − 1 in accordance with the literature [4, 35], since we are only investigating not captured by the theoretical tools mentioned large systems. above. For one thing, the probability that three

3 0.5 between their centers is < 2r, i.e. if the spheres overlap. Since the total volume of our box is 1, 0.4 the probability that two arbitrarily chosen vertices are connected is equal to the volume of a R 10D 0.3 sphere with radius R = 2r. In continuum perco- 8D lation theory this volume is denoted the excluded 6D d 0.2 volume Vex, where Vex = 2 V in a RGG. The excluded volume is the basic quantity of interest critical distance 4D because it is directly related to the connectivity 0.1 2D α = Np = NVex, (6) 0 3 4 5 6 10 10 10 10 N/α from which it is clear why the connectivity is frequently called the total excluded volume of Figure 3: The critical distance in random geometric the system. Eqs. (5) and (6) give us graphs in various dimensions. Points within this distance of each other are connected by an edge. The 1 critical distance is equivalent to the radius R of the 1 α d + 2 d R = Γ . (7) excluded volume associated with each point. √ π N 2 vertices are cyclically connected is different in Fig. 3 shows the radius R of the excluded vol- random graphs and RGGs, regardless of the de- ume as a function of N/α = 1/p = 1/Vex. R gree distribution of the random graph. decreases monotonically: for a given connecti- RGGs are sometimes named spatial graphs [8]. vity α the spheres have to become smaller when Fig. 2 illustrates a RGG in 2D. As in lattices, more vertices are added to the graph. different boundary conditions can be applied. Eq. (7) provides us with the required relation We will see that toroidal (continuous) boundary between α and R when creating a RGG. The conditions make a vital difference compared to distance between every pair of vertices must be having open boundary conditions. calculated, and an edge is added if the distance The volume of a d-dimensional (hyper)sphere is less than R. Thus, it seems unavoidable to 2 with radius r is have a runtime of O(N ) making it unfeasible to investigate as large systems as with random πd/2rd graphs—see Fig. 1—where the number of cal- V = , (5) sphere d+2 culations for a given α needed to create all the Γ( 2 ) edges is O(N). To overcome this obstacle we where Γ(x) is the gamma function. This volume have designed a data structure which is described is needed in order to find the edges in RGGs. in Section 5, with a runtime of O(N β) where To ‘visualise’ a RGG in general, one can think β 1.3. This allows us to study RGGs with ≃ of a box filled with small spheres with radius r up to N = 411 > 4 106 vertices, which is more · and volume V given by Eq. (5), where points than an order of magnitude larger than usually are connected by an edge only if the distance accomplished [38].

4 4 Results 1 1 0.8 2D 0.8 3D 0.6 0.6

In our simulations of RGGs we define αc to be G G 0.4 0.4 the lowest connectivity at which the fraction of −15 0 15 0.2 x 0.2 vertices in the largest cluster is > 0 in the macro- 0 0 3.5 4 4.5 5 2 2.5 3 3.5 scopic limit. We make the bold claim that the α α systems we are able to analyse consist of enough 1 1 points to make the critical connectivity almost as 0.8 4D 0.8 5D sharply defined as in Fig. 1. However, our main 0.6 0.6 G G purpose is not to derive high precision percola- 0.4 0.4 tion thresholds. Instead, we are more interested 0.2 0.2 0 0 in the critical connectivity as a function of the 1.5 2 2.5 3 1.5 2 2.5 3 α α dimension of the RGGs. In this paper we express our threshold values Figure 4: The average fraction of vertices in the in terms of α. Other popular choices are the frac- largest cluster for various system sizes N (see the tional volume s occupied by the spheres [30] or legend in Fig. 5) in random geometric graphs with the density N of spheres. The relation between no edges across the boundaries. The inset in 2D il- these parameters at the percolation threshold is lustrates a finite size scaling—see the text. In higher dimensions the general shape of the curves as N in- d creases is nontrivial. Compare with Fig. 5. Error αc = NcVex = 2 ln(1 sc) (8) − − bars are < 10−3 for all curves and therefore omitted. (see e.g. [25] for a derivation). Usually, in continuum percolation the volume V of each sphere in the limit of infinite dimension is often as- is fixed while N is the independent variable in a sumed equivalent to a random graph, we expect system of size [0,L]d. The approach of measur- that Eq. (3) provides us with an expression for ing Nc or sc for various values of L has been used G (α). But what does Gd(α) look like for finite ∞ in both two [39] and three [38] dimensions, i.e. d? And what is the behaviour of αc(d)? How for discs and spheres, where the critical values does it approach α ( ) as d increases? These c ∞ are determined by the use of finite size scaling. are the questions addressed in this and the fol- This procedure resembles site percolation in lat- lowing section. tices. From the previous sections it is clear that Figs. 4 and 5 illustrate the average size of the we take a route closer to bond percolation in lat- largest cluster in RGGs in 2, 3, 4, and 5 dimen- tices by fixing L = 1 while tuning α for different sions with and without toroidal boundary condi- values of N. In Section 5 we describe how this tions. The curves correspond to N = 4k vertices has been carried out in practice. with k = 5, 6, ..., 11, where the larger systems display the sharpest transitions. The legend in The Size of the Largest Cluster Fig. 5 applies to all diagrams in Figs. 4 and 5. In these 8 diagrams each curve is based on 300 data Let Gd(α) denote the fraction of vertices in the points. In other words, Gd(α) is calculated in largest cluster in d dimensions. Since a RGG intervals of ∆α = 0.005 resulting in the smooth

5 1 1 10 0.8 2D 0.8 3D 0.6 0.6 G G 0.4 0.4

0.2 0.2 ) ∞

0 0 ( 3.5 4 4.5 5 2 2.5 3 3.5 c α α α 1 )− d 1 1 ( c α 0.8 4D 0.8 5D N = 45 0.6 0.6 N = 46 7 G G N = 4 0.4 0.4 N = 48 N = 49 0.2 0.2 N = 410 N = 411 0 0 0.1 1.5 2 2.5 3 1.5 2 2.5 3 2 3 4 5 6 7 8 9 10 α α Dimension d

Figure 5: Like Fig. 4 but with continuous boundary Figure 6: Scaling of the critical connectivity as a conditions. We see that the point at which the largest function of the dimension of the random geometric cluster becomes macroscopic is sharply defined and graphs reveals a power-law relation, Eq. (9). For d can immediately be determined by the eye with high 5 the data points are estimated by close inspection of≤ 10 precision (Table 1). The overall behaviour of the Fig. 5. For d> 5, αc is based on runs with N =4 graphs for higher dimensions is much closer to Fig. 1 points. Error bars are included. See Table 1. than Fig. 4 is. As d increases the α-interval where there is a significant difference between curves with different N get smaller and smaller. Error bars are From Figs. 4 and 5 we see that the contin- < 10−3 for all curves and therefore omitted. uous boundary conditions make the transition where G > 0 more abrupt, but that an estima- tion of αc does not depend much on the bound- lines in the figures. For every data set we have ary conditions if only we base our judgment on averaged over enough runs for error bars to be large enough systems. This is confirmed in the completely negligible. inset of Fig. 4, where αc = 4.53 is obtained Since continuous boundary conditions mean by finite size scaling, i.e. plotting G(x) where addition of extra edges, the size of the largest x = N 1/ν (α α ). However, it is clearly easier − c component G(α) obviously grows faster in Fig. 5 to make precise estimates of the critical connec- than in Fig. 4, especially in the smaller systems. tivity with than without continuous boundary These relatively few extra edges make a decisive conditions. We note in passing that the expo- difference, connecting vertices not already in the nent ν = 3 is equal to the value of ν found in same cluster. Since toroidal systems are mod- random graphs [13]. els of bulk systems, G is much less N-dependent in that case. However ‘unphysical’ RGGs with The Critical Connectivity open boundaries may seem, they are the most popular RGG version in the literature. Conse- With numerically obtained knowledge of G(α), quently, we consider them alongside the contin- it is possible to extract αc. The procedure is uous case. simple. By inspection of Fig. 5 we can estimate

6 1 d 2 3 4 5 6 7 8 10 α = 2.1 1 10 α αc 4.52 2.74 2.06 1.72 1.51 1.39 1.30 α = 2.4 = 2.6 0 10 α −1 0.01 0.01 0.02 0.02 0.02 0.02 0.02 = 2.7 10 ± α = 3.0 −3 −1 α = 3.3 10 Table 1: The critical connectivity αc in random geo- 10 metric graphs of dimension d with continuous bound- 1 10 100 −2 ary conditions. The data are plotted in Fig. 6. The 10 estimated errors in αc in the last row are rather con- −3 10 servative. Prob(cluster size)

−4 10 αc for d 5. To obtain further data points we

≤ 10 −5 have run our algorithm on RGGs with N = 4 10 0 200 400 600 800 1000 for systems of larger dimensions as well. Though cluster size this results in increased runtime per graph, the results get more homogeneous and fewer runs Figure 7: The distribution of cluster sizes in 3D are needed in order to get a decent estimate of random geometric graphs with N = 1000 vertices in the vicinity of the critical connectivity αc = 2.74. Gd(α). Our findings presented in Table 1 and The inset shows that for α αc the cluster sizes are Fig. 6 strongly suggest that given by a power-law. For≃ each value of α the data points are based on 106 graphs. γ α (d)= α ( )+ Ad− , (9) c c ∞ The inset illustrates the scale free power-law where αc( ) = 1, γ = 1.74(2) and A = 11.78(5). ∞ distribution at α = 2.6. Right below α , clusters As expected, Eq. (9) predicts that αc( ) is equal c ∞ of all sizes can be encountered. The small hump to αc in random graphs, confirming that RGGs and random graphs become more and more sim- at large cluster sizes is always present because ilar as d increases. However, when we derive the the clusters cannot contain more than all of the cluster coefficient, we will see that this is not vertices. The clusters pile up when their size true in all respects. approaches this boundary, in this case a cluster Finally, we note that our findings are in ac- size of 1000, just below the inevitable cut-off. cordance with the most precise estimates that Our simulations show that for α significantly below α the distribution is approximately ex- we know of: αc = 4.51223(5) [29] and αc = c 2.734(6) [38] in 2D and 3D, respectively, ob- ponential. As the connectivity increases the dis- tained by the use of finite size scaling. For d> 3 tribution becomes power-law-like. As α is fur- we have not been able to find any estimates of ther increased the distribution is separated in two parts; there are no clusters of medium size, αc to compare with [40]. only the largest macroscopic cluster and a few The Distribution of Cluster Sizes small ones around it. We have observed this overall behaviour in all our tests of the distri- Having examined the size of the largest cluster bution of cluster sizes in various dimensions. and the critical connectivity, we now look at the Fig. 7 shows our data in 3D. For α = 2.1 distribution of cluster sizes in RGGs. ( ) the data points lie on an almost straight line ·

7 indicating an exponential distribution. Increas-

−1 ing the connectivity to α = 2.4 ( ) results in a 10 △ broader distribution that is no longer exponen- d −2 tial. Right at the critical connectivity ( ) the C 10 ◦ −3 distribution flattens out. Clusters of all sizes are 10 observed. Right above αc ( ) two separate re- ⋄ −4 gions begin to materialise. Already at α = 3.3 10

−5 (⋆) the largest cluster makes it highly unlikely 10 that a cluster of medium size can be present as Cluster Coefficient −6 well. The distribution is cut in two. 10

0 20 40 60 80 100 The Cluster Coefficient Dimension d

In network theory the cluster coefficient C is Figure 8: The cluster coefficient C in random geo- an often calculated quantity [1, 21, 23], which metric graphs. The full line is the asymptotic solu- is defined in the following way. Let the vertices i tion, Eq. (13), valid for large d only. and j be connected directly to a common vertex k. C is then the probability that vertex i and lem. The fractional volume ‘overlap’ ρd of two vertex j are directly connected as well. From spheres only depends on the distance r between this we see that the cluster coefficient is a mea- the centers and not on any angular parts, i.e. sure of the ‘cliquishness’ of the graph. In this ρd = ρd(r). In general, the cluster coefficient section we derive C = Cd analytically in arbi- can therefore be written as trary dimensions d, showing that C decreases d 1 in an exponential fashion. Cd = ρd(r)dV. (10) Vex Vex To determine Cd we make use of the concept of Z the excluded volume Vex. If we again use the ver- In Appendix A we derive that tices i, j, and k, then i and j must both be within the excluded volume of k. Put differently, the 1 Hd(1) even d − probability that i and j are connected is equal to Cd =  (11) 3 1 the probability that two randomly chosen points  Hd( ) odd d 2 − 2 in a sphere of volume V and radius R is less ex where  than a distance R apart. In other words, given the coordinates of vertex i the probability that d/2 1 1 Γ(i) 3 i+ 2 there is an edge between i and j is equal to the Hd(x)= . (12) √π Γ(i + 1 ) 4 fraction of the excluded volume of vertex i that Xi=x 2 lies inside the excluded volume of k. By aver- When d is large Eq. (11) reduces to (see Ap- aging over all points in Vex we get the cluster pendix A) coefficient Cd. d+1 The task of calculating Cd is considerably 2 3 2 C 3 . (13) simplified by the spherical symmetry of the prob- d πd 4 ∼ r 8 The cluster coefficient is plotted in Fig. 8 ( ) d 2 3 4 5 ◦ GBP together with the asymptotic solution in Eq. (13) αc 4.52 2.84 2.275 1.99 (full line). 0.02 0.01 0.005 0.005 ± Eq. (11) shows that the cluster coefficient is GBP a purely geometric quantity depending only on Table 2: The critical connectivity αc in random geometric graphs with toroidal boundary conditions. the dimension d; neither the connectivity α nor GBP Only in 2D does αc depend noticably on N for the system size N are present. In random graphs N > 1000 (see Fig. 5). Note that without contin- GBP C = α/N, since there is per definition no corre- uous boundaries Fig. 4 shows that αc is highly lation between edges. So, in contrast to what size-dependent for d > 2. The estimated errors in GBP is usually believed, RGGs are not identical to αc in the last row are on the safe side. random graphs when d . →∞ In higher dimensions, the cluster coefficient in subsets, is minimized, is called the graph bi- RGGs becomes exceedingly small. This peculiar partitioning (GBP) problem. Fig. 2 illustrates fact can be explained by noting that the distribution of distances between two connected vertices a bi-partitioned RGG, where N/2 of the points gets more and more peaked at the maximal dis- are marked by squares, the other half being dots. tance R as d increases. This implies that if the The GBP problem of RGGs with open bound- vertices i and j are both connected to vertex k ary conditions has been tested by various heuris- in a high-dimensional space, then it is highly un- tics [41, 42, 43]. In this section we use our numer- likely that i and j are directly connected by an ical findings to establish the critical connectivity GBP edge as well. Only in low dimensions are RGGs in relation to GBP. Additionally, for α > αc dominated by small loops. On the contrary, the we argue that the cutsize E depends on N and way that a standard random graph is designed α in a simple way. implies a cluster coefficient which can only be In GBP the connectivity is critical when G = interpreted statistically, and not geometrically. 1/2. As soon as the largest cluster contains more Despite the fact that αc = 1 in both random than half of the vertices, it becomes impossible graphs and RGGs of inﬁnite dimensionality, they to bipartition the graph without violating any do not have the same topology. edges. For random graphs Eq. (3) immediately GBP gives us αc = 2 ln 2 1.386. GBP ≃ Graph Bi-partitioning In RGGs αc (d) can be extracted in the same way as αc was in Section 4. Our numerical Random geometric graphs are useful outside net- findings in RGGs with continuous boundary con- work modeling and percolation theory as well. ditions are presented in Table 2. We stress that In this section we look at RGGs in relation to the results are valid only for large N, as a closer graph bi-partitioning, a well known problem in look at Fig. 5 reveals. In 2D the average fraction combinatorial optimization. of vertices in the largest cluster is independent GBP The NP-hard problem of partitioning a graph of N only for α > αc . This means that if one with N vertices in two subsets with N/2 ver- looks at GBP in 2D with N = 1000, one cannot GBP tices each, in such a way that the cutsize E, i.e. use the value of αc in Table 2. In higher di- the number of edges between vertices in different mensions the interval around αc where Gd(α) is

9 size-dependent gets smaller and does not play a a scaling relation like [45, 46] role in relation to GBP. E N 1/ν αβ(d), (14) With open boundary conditions the picture d ∝ is messy, as Fig. 4 shows. In this case G(α) where the exponents ν and β only depend on the is highly N-dependent, and it is not possible GBP dimension of the RGG. to speak of a critical connectivity α without c The exponents in Eq. (14) can be determined specifying N. This is true despite the fact that in the following way. Given the radius R of the G(α) is an averaged quantity, i.e. for small N will excluded volume of each vertex, the cutsize must a fraction of the graphs contain a cluster with GBP be proportional to NR, since only vertices with more than N/2 vertices even when α < αc . GBP 1/2 R 2. In 2D however, all look at the vertices at one side of the partitioning curves cross at almost the same (pivotal) point, GBP plane at x = 1/2), times the average number of and it is reasonable to speak of α without i c violated edges per vertex in this region, which is specifying N. As the inset in Fig. 4 shows this d GBP proportional to NR . In other words, would lead to an estimate of αc = 4.53(1), GBP close to αc in RGGs with toroidal boundary 2 d+1 Ed N R . (15) conditions. ∝ The size of the largest cluster near αc grows If instead of R we want to express the result in GBP d so rapidly in 2D that αc = αc cannot be ruled terms of α(d) NR , we get out on the basis of our numerical data. This is ∝ 1 1 true with both open and continuous boundary 1/ν = 1 , β =1+ . (16) conditions. However, as this would imply that − d d the phase transition is of 1st order in 2D only, Since E N 2 in Eq. (15), the relation 1/ν +β = we believe that the two critical connectivities are ∝ 2 holds in arbitrary dimensions. close but not identical. Now, it is obvious that the scaling Ansatz is When bi-partitioning a RGG, it is obvious GBP reasonable only for α > αc . As Fig. 2 il- that the ‘area of contact’ [44] between the two GBP lustrates, the optimal partition at α α is subsets in the optimal configuration must be ∼ c highly complex and not at all close to a straight close to a minimum. In 2D this means that the GBP line. If we incorporate that E = 0 for α < α best achievable partition must be close to sim- c and replace Eq. (14) with ply cutting the graph in two at the coordinate values x = 1/2 or x = 1/2. This observa- 1/ν GBP β 1 2 Ed N (α(d) αc ) , (17) tion is especially relevant for large connectivi- ∝ − ties where the cutsize is, fluctuations neglected, we do not expect Eq. (16) to hold if we focus proportional to the length of the dividing line. only on a region near the critical connectivity. All this tentatively indicates how the cutsize E By the use of extremal optimization, a heuristic in GBP behaves as a function of N and α by that works particularly well near phase transi- looking at RGGs partitioned at xi = 1/2, where tions in hard combinatorial problems, Boettcher 1 i d. As we are about to argue, we expect and Percus [45, 46] have found αGBP 4.1, ≤ ≤ c ≃

10 3 1/ν 0.6 and β 1.4 in 2D for 4 <α< 6, 10 ≃ ≃ not far off our estimates in Eq. (16) valid for 2 10 large connectivities. Note that the low estimate GBP 1 of αc is expected; the algorithm does not al- 10 ways find the best partition, and some graphs 0 10 with α < αc does have E > 0. 2D 3D −1 10 4D 5D runtime (sec)

−2 RG 5 Implementation 10 RG fit 2D fit −3 5D fit The implementation is of major importance 10

−4 when studying random geometric graphs, since 10 3 4 5 6 7 8 10 10 10 10 10 10 a straightforward check of all possible edges be- N tween the N points will result in unfeasible runtimes O(N 2). We now outline how our pro- Figure 9: The runtimes (on a 400MHz SUN) of the gram works and describe how to avoid runtimes algorithms used in Sections 2 and 4. The straight β O(N 2). lines indicate t N , where β =1.2in 2D, β =1.33 in 5D and β =1∼.15 in random graphs (RG). The main idea is to divide and conquer. Par- tition the d-dimensional box in smaller subboxes and determine which subbox each vertex belongs 4. Calculate the relevant quantities (G, cluster to. Given the connectivity and thereby the ra- sizes etc.) as α increases. dius R of the excluded volume, for each vertex Obviously, a trade-off in Step 2 is involved when we then only have to look for potential edges to choosing the number of small boxes. vertices in the subboxes adjacent to the subbox Being the most time consuming part of the where the vertex itself is located. This leads to algorithm, Step 3 is the main contributor when a huge reduction in the number of comparisons. deciding how the runtime depends on N. The And this just gets better when N increases, re- runtimes for most of our runs are shown in sulting in a decrease in R as we saw in Fig. 3. Fig. 9. We see that the runtime is O(N β), By partitioning the box further as N increases where β 1.3, resulting in ‘feasible’ runtimes ≈ we avoid a linear increase in the number of com- for graphs with up to N > 4 106. Note that · parisons per vertex, which would lead to the un- the runtime of the much simpler algorithm used 2 desirable O(N ) growth. on random graphs also grows like a power-law The algorithm used when looking at RGGs is with β = 1.15, even though the number of op- simple. It works like this: erations is clearly O(N). In fact, the number of comparisons with potential neighbours per ver- 1. Generate d coordinates for each vertex. tex is very nearly constant in our implementation, i.e. the total number of neighbour tests is 2. Partition the space in small subboxes. O(N) in RGGs as well. Of course, this is only possible if the number of subboxes also increases 3. Find the edges. with N. Managing the partitioning part of the

11 algorithm adds to the runtime. To sum up, the that the distribution of cluster sizes is cut in two power-law increase in the runtime illustrated in just when the connectivity becomes larger than Fig. 9 for both random graphs and RGGs is prob- αc. Interestingly, the derivation of the cluster ably mainly due to cache misses. The slightly coefficient shows that, even in the limit of infinite higher values of β in the RGGs stems from the dimensionality d, random geometric graphs are additional time used when partitioning the d- not identical to random graphs. dimensional box into smaller boxes. Random geometric graphs share properties Step 4 is worth a comment. When running with both lattice models and standard random the algorithm we are interested in information at graphs. Random geometric graphs allow us to certain values of α. Instead of generating a new work with random graphs with a local structure. graph for every data point needed, we first set In addition, it is straightforward to add ‘long’ up the graph with the minimal connectivity we edges if one wishes to simulate, e.g., a small want to look at. This is easily accomplished with world network. With all this in mind, we hope our algorithm. Given an α-window [αmin, αmax] this paper will make random geometric graphs in which we want to examine the graph, we find more widely used in network theory. all the edges belonging to the graph when α = αmax, but we only add the edges corresponding to α = αmin. The rest of the edges, those who A Derivation of Cd are to be added when α is gradually increased to α , are stored in a priority queue. It is then max In order to determine the cluster coefficient for a simple task to increase α as one wishes. As arbitrary d, one must ﬁnd the fractional overlap mentioned earlier, in Figs. 4 and 5 each curve is ρ . Since ρ has no angular dependence, Eq. (10) based upon 300 data points, i.e. ∆α = 0.005. d d reduces to The source code, written in C, is available upon request. For a more accurate and tech- R d d 1 nical discussion of fast algorithms in relation to Cd = d ρd(r)r − dr. (18) R 0 RGGs, see e.g. [47]. Z Since ρ = 1 r , C = 3 . From Fig. 10 1 − 2R 1 4 6 Summary we see that in 2D the overlapping area—the area circumscribed by the fat lines—is 2(A B), − In this paper we have illustrated the usefulness where A is the area of the part of the circle of random geometric graphs in network theory swept out by the angle θ = 2 arccos(r/2R) be- and how to implement them efficiently. Sev- tween the two dashed lines originating from the eral properties of random geometric graphs in center of the lowest circle, and B is the area of 1 2 the vicinity of the critical connectivity α have the dashed triangle. Now, A = 2 θR and B = c 2 1 2 been analysed. We have determined the size of R cos(θ/2) sin(θ/2) = 2 R sin θ. The area of the 2 1 the largest cluster numerically and shown that overlap is then R (θ sin θ), so ρ2 = π (θ sin θ) 3√3 − − α (d) approaches α ( ) = 1 found in random and C2 = 1 . c c ∞ − 4π graphs in a power-law fashion. We have veriﬁed For d 3, the use of cylindrical coordinates ≥

12 By putting x = cos θ 1/2, the cluster coefficient − can therefore be written as r/2 d 1 6d 3 2 2 d 1 Cd exp − ln f(x) dx, (22) ≃ s π 4 2 Z0 θ R where f(x) = 1 4x (1 + x). Since the contri- − 3 butions to the integral for large d are signiﬁcant only when x 0, ln f can be expanded to 1st ≃ order and Eq. (13) is recovered.

Acknowledgements

Figure 10: Determination of the cluster coefficient We thank J. Neil Bearden, Stefan Boettcher and C, which in 2D is equal to the average fractional area overlap of the two circles. R is the radius of the circles Paolo Sibani for helpful comments. We are espe- and r the distance between their centers. The area of cially indebted to Allon Percus for very valuable the overlap is confined within the fat arcs originating discussions and correspondence. This work was from the two circles (dotted). The dashed lines are financially supported by Statens Naturvidenska- helpful in the derivation of the overlap—see the text. belige Forskningsr˚ad. and the relation References n 1 π n/2 − n i nπ 2π sin − θidθi = n+2 (19) [1] Albert-László Barabási and Réka Albert. 0 Γ( 2 ) iY=2 Z Statistical mechanics of complex networks. cond-mat/0106096v1, pages 1–54, 2001. results in

r [2] Steven H. Strogatz. Exploring complex net- d+2 arccos( R ) 2 Γ( 2 ) 2 d works. Nature, 410:268–276, 2001. ρd(r)= d+1 sin θdθ. (20) √π Γ( ) 0 2 Z [3] Bernardo A. Huberman. Growth dynamics

By reversing the integration in Cd we get of the World-Wide Web. Nature, 401:131, 1999. d+2 π 3 Γ( ) 3 C = 2 sind θdθ, (21) [4] M. E. J. Newman, S.H. Strogatz, and D.J. d √π d+1 Γ( 2 ) Z0 Watts. Random graphs with arbitrary degree distributions and their applications. which can be solved by integration by parts. The Physical Review E, 64:026118, 2001. use of the duplicate formula for the Gamma function then finally leads to Eq. (11). [5] Réka Albert and Albert-László Barabási. For large d, the ratio of the Gamma functions Emergence of scaling in random networks. in Eq. (21) is given by Stirling’s approximation. Science, 286:509–512, 1999.

13 [6] M. E. J. Newman. Clustering and preferen- the Internet to Random Breakdowns. Phys- tial attachment in growing networks. Phys- ical Review Letters, 85:4626–4628, 2000. ical Review E, 64:025102, 2001. [17] Réka Albert, Hawoong Jeong, and Albert- [7] L. A. N. Amaral, A. Scala, M. Barthélémy, LászlóBarabási. Error and attack tolerance and H. E. Stanley. Classes of small-world of complex networks. Nature, 406:378–382, networks. Proc. Natl. Acad. Sci. USA, 2000. 97:11149–11152, 2000. [18] P. L. Krapivsky, S. Redner, and F. Leyvraz. [8] Duncan J. Watts and Steven H. Strogatz. Connectivity of Growing Random Net- Collective dynamics of ‘small-world’ networks. Physical Review Letters, 85:4629– works. Nature, 393:440–442, 1998. 4632, 2000.

[9] Stuart Kauffman. At Home in the Universe. [19] Réka Albert and Albert-László Barabási. Oxford University Press, 1995. Topology of Evolving Networks: Local Events and Universality. Physical Review [10] P. Erdös and A. Rényi. On the evolution Letters, 85:5234–5237, 2000. of random graphs. Publ. Math. Inst. Hung. Acad. Sci., 5:17–61, 1960. [20] S. N. Dorogovtsev and J. F. F. Mendes. [11] Duncan S. Callaway, John E. Hopcroft, Evolution of networks with aging of sites. Jon M. Kleinberg, M. E. J. Newman, and Physical Review E, 62:1842–1845, 2000. Steven H. Strogatz. Are randomly grown [21] D. J. Watts. Small Worlds. Princeton Uni- graphs really random? Physical Review E, versity Press, 1999. 64:041902, 2001. [22] M. E. J. Newman and D. J. Watts. Scaling [12] Béla Bollobás. Random Graphs. Academic and percolation in the small-world network Press, 1985. model. Physical Review E, 60:7332–7342, [13] Scott Kirkpatrick and Bart Selman. Criti- 1999. cal behavior in the satisfiability of random [23] M. E. J. Newman. Models of the small boolean expressions. Science, 264:1297– world. Journal of Statistical Physics, 1301, 1994. 101:819–841, 2000. [14] Jon M. Kleinberg. Navigation in a small world. Nature, 406:845, 2000. [24] Jayanth R. Banavar, Amos Maritan, and Andrea Rinaldo. Size and form in efficient [15] D. Stauffer and A. Ahorony. Introduction transportation networks. Nature, 399:130– to Percolation Theory. Taylor and Francis, 132, 1999. 1992. [25] W. Xia and M. F. Thorpe. Percolation [16] Reuven Cohen, Keren Erez, Daniel ben properties of random ellipses. Physical Re- Avraham, and Shlomo Havlin. Resilience of view A, 38(5):2650–2656, 1988.

14 [26] I. Balberg. ”Universal” percolation- [35] K. Y. M. Wong and D. Sherringham. Graph threshold limits in the continuum. Physical bipartitioning and spin glasses on a ran- Review B, 31:4053–4055, 1985. dom network of fixed finite valence. Journal of Physics A: Math. Gen., 20:L793–L799, [27] U. Alon, A. Drory, and I. Balberg. Sys- 1987. tematic derivation of percolation thresholds in continuum systems. Physical Review A, [36] Michael Molloy and Bruce Reed. A Criti- 42:4634–4638, 1990. cal Point for Random Graphs with a Given Degree Sequence. Random Structures and [28] U. Alon, I. Balberg, and A. Drory. New, Algorithms, 6:161–179, 1995. heuristic, percolation criterion for continuum systems. Physical Review Letters, [37] Michael Molloy and Bruce Reed. The Size of 66:2879–2882, 1991. the Giant Component of a Random Graph with a Given Degree Sequence. Combina- [29] J. Quantanilla, S. Torquato, and R. M. Ziff. torics, Probability and Computing, 7:295– Efficient measurements of the percolation 305, 1998. threshold for fully penetrable disks. Journal [38] M. D. Rintoul and S. Torquato. Precise de- of Physics A: Math. Gen., 33:L399–L407, termination of the critical threshold and ex- 2000. ponents in a three-dimensional continuum [30] I. Balberg. Recent developments in contin- percolation model. Journal of Physics A: uum percolation. Phil. Mag. B, 56(6):991– Math. Gen., 30:L585–L592, 1997. 1003, 1987. [39] Edward T Gawlinski and H Eugene Stan- ley. Continuum percolation in two dimen- [31] Philippe Jund, Rémi Jullien, and Ian sions: Monte Carlo tests of scaling and uni- Campbell. Random walks on fractals and versality for non-interacting discs. Journal stretched exponential relaxation. Physical of Physics A: Math. Gen., 14:L291–L299, Review E, 63:036131, 2001. 1981.

[32] Romualdo Pastor-Satorras and Alessandro [40] Salvatore Torquato. Random Heteroge- Vespignani. Epidemic Spreading in Scale- nouos Materials: Microstructure and Free Networks. Physical Review Letters, Macroscopic Properties. Springer, 2002. 86:3200–3203, 2001. [41] David S. Johnson et. al. Optimization by [33] Peter Cheeseman, Bob Kanefsky, and simulated annealing: An experimental eval- William M. Taylor. Where the Really Hard uation; part 1, graph bi-partitioning. Oper- Problems Are. Proc. of IJCAI-91, pages ations Research, 37:865–892, 1989. 331–337, 1991. [42] Peter Merz and Bernd Freisleben. Memetic [34] Philip W. Anderson. Solving problems in algorithms and the ﬁtness landscape of ﬁnite time. Nature, 400:115–116, 1999. the graph bi-partitioning problem. Lecture

15 Notes in Computer Science, 1498:765–774, 1998.

[43] Stefan Boettcher and Allon G. Percus. Na- ture’s way of optimizing. Artiﬁcial Intelli- gence, 64:275–286, 2000.

[44] Stefan Boettcher and Allon G. Percus. Ex- tremal optimization for graph partitioning. Physical Review E, 64:026114, 2001.

[45] Stefan Boettcher and Allon G. Percus. Ex- tremal optimization: Methods derived from co-evolution. GECCO-99: Proceedings of the Genetic and Evolutionary Computation Conference, pages 825–832, 1999.

[46] Stefan Boettcher. Extremal optimization of graph partitioning at the percolation threshold. Journal of Physics A: Math. Gen., 32:5201–5211, 1999.

[47] Matthew T. Dickerson and David Epp- stein. Algorithms for Proximity Problems in Higher Dimensions. Comp. Geom. The- ory and Appl., 5:277–291, 1996.