arXiv:1003.3700v2 [math.PR] 5 Jan 2011 mty ncnrs,i a in ge- edges contrast, two-dimensional which by In in constrained ometry. links, closely such WWW not examples are or on networks focuses social work as this of Much years. works etr fitrs ste(iiu)ruelength route (minimum) the is interest and of networks, feature road than a inter-city rather concrete, idealized be To distances, visualize interest. we of actual are that is edges, it of two-dimensionalnumber also in but situated edges space, and vertices are onsadaRueLnt Statistic Shun Julian and Route-Length Aldous J. a Random David and over Networks Points Spatial Connected itbrh enyvna123 S (e-mail: USA 15213, Avenue, Pennsylvania Forbes Pittsburgh, 5000 University, Mellon Department, Carnegie Learning Machine Student, Graduate 00 o.2,N.3 275–288 3, DOI: No. 25, Vol. 2010, Science Statistical [email protected] www.stat.berkeley.edu/users/aldous [email protected] ekly aiona970 S (e-mail: 3860, USA # 94720, Hall California Evans Berkeley, 367 California, of University ai .Adu sPoesr eateto Statistics, of Department Professor, is Aldous J. David c ttsia Science article Statistical original the the of by reprint published electronic an is This ern iesfo h rgnli aiainand pagination in detail. original typographic the from differs reprint h oi called topic The nttt fMteaia Statistics Mathematical of Institute 10.1214/10-STS335 a trce ueatninoe h at20 last the over attention huge attracted has .INTRODUCTION 1. nttt fMteaia Statistics Mathematical of Institute , ok emti graph. geometric work, phrases: and words Key 2007–2009. ewrso admpit ntepae mhszn h clas the emphasizing plane, the in points random on networks Abstract. oe oteotmltaeo vralpsil ewrsrem networks possible all thi over question. close trade-off open How optimal intriguing graphs. the proximity to of leng comes network family v normalized one-parameter illustrate, between We a trade-off network. in the a part, in in routes Carlo of shortness measuring n ttsiin.W nrdc n oiaeapriua s particular a motivate and introduce We statisticians. and graphs imity 00 o.2,N.3 275–288 3, No. 25, Vol. 2010, admnetworks random h ae sawieu fatl eeoe ytefis uhrd author first the by developed talk a of write-up a is paper The ). URL: ; 2010 , pta network spatial erve ahmtclytatbemdl o connected for models tractable mathematically review We hc eevst ebte nw oapidprobabilists applied to known better be to deserves which .Jla hnis Shun Julian ). or ope net- complex o only not rxmt rp,rno rp,sailnet- spatial graph, random graph, Proximity This . in 1 oes ealta h otsuidntokmodel, network the studied network most the spatial that of Recall statistics class models. and particular a probability to applied communities the from created). is readers junction a then planar be cross, to edges networks (if require graphs not do we because ing n,bteulyteword the equally but ing, oi fti ae seSection lengths” (see route paper “short this pur- specific of many the topic for than interesting other potentially poses are but on pers, graphs Proximity connected. random always are These geometry. graphs computational within studied been rmrl eaeitrse nhvn hr route short having in a interested “benefit”? of are corresponding “cost” we the the is Primarily as what length network but total network, take to ral in n n oe uhntok hc ename we which network, such distribu- novel the process one point and Poisson tion, the hav- points specifically on ing depend which constructions imagine 2.1 in only word work the we dimensions, Because two cities. given two between nSection in omlzdlnt nthe in length normalized lentv stecasof class the is alternative u ao ups st rwteatninof attention the draw to is purpose major Our iulzn daie odntok,i snatu- is it networks, road idealized Visualizing osntpri ohcnetvt n bounded and connectivity both permit not does , amrlynetwork Hammersley admgoercgraph geometric random onshv ensuidi nyafwpa- few a only in studied been have points 2.3 hc ntedtriitccs have case deterministic the in which , tatistic sdsrbdi Section in described is , n of s hand th aMonte ia rxmt graphs proximity family s planar ∞ → isan ains spatial [ uring prox- 40 eiwdi Section in reviewed ] 6.5 R R ii.A attractive An limit. ol emislead- be would .Oecudalso could One ). a emislead- be may reviewed , 2.5 . 2 D. J. ALDOUS AND J. SHUN lengths. Choosing an appropriate statistic to mea- Finally, recall this is a nontechnical account. Our sure the latter turns out to be rather subtle, and purpose is to elaborate verbally the ideas outlined the (only) technical innovation of this paper is the above; some technical aspects will be pursued else- introduction (Section 3.2) and motivation of a spe- where. cific statistic R for measuring the effectiveness of a network in providing short routes. 2. MODELS FOR CONNECTED SPATIAL In the theory of spatial networks over random NETWORKS points, it is a challenge to quantify the trade-off between network length [precisely, the normalized There are several conceptually different ways of defining networks on random points in the plane. To length L defined at (2)] and route length efficiency statistics such as R. Our particular statistic R is not be concrete, we call the points cities; to be consis- tent about language, we regard x as the position of amenable to explicit calculation even in compara- i city i and represent network edges as line segments tively tractable models, but in Section 4 we present (x ,x ). the results from Monte Carlo simulations. In partic- i j First (Sections 2.1–2.3) are schemes which use de- ular, Figure 7 shows the trade-off for the particular terministic rules to define edges for an arbitrary de- β-skeleton family of proximity graphs. terministic configuration of cities; then one just ap- Given a normalized network length L, for any real- plies these rules to a random configuration. Second, ization of cities there is some network of normalized one can have random rules for edges in a determin- length L which minimizes R. As indicated in Sec- istic configuration (e.g., the probability of an edge tion 5, by general abstract mathematical arguments, between cities i and j is a function of Euclidean there must exist a deterministic function R (L) opt distance d(x ,x ), as in popular small worlds mod- giving (in the “number of cities ” limit under i j els [39]), and again apply to a random configura- the random model) the minimum→∞ value of R over tion. Third, and more subtly, one can have construc- all possible networks of normalized length L. An in- tions that depend on the randomness model for city triguing open question is as follows: positions—Section 2.5 provides a novel example. how close are the values Rβ-skel(L) from We work throughout with reference to Euclidean the β-skeleton proximity graphs to the op- distance d(x,y) on the plane, even though many timum values Ropt(L)? models could be defined with reference to other met- rics (or even when the triangle inequality does not As discussed in Section 5.3, at first sight it looks easy hold, for the MST). to design heuristic for networks which should improve over the β-skeletons, for example, 2.1 The Geometric Graph by introducing Steiner points, but in practice we In Sections 2.1–2.3 we have an arbitrary configura- have not succeeded in doing so. tion x = x of city positions, and a deterministic This paper focuses on the random model for city i rule for defining{ } the edge-set . Usually in graph positions because it seems the natural setting for theory one imagines a finite configuration,E but note theoretical study. As a complement, in [10] we give that everything makes sense for locally finite con- empirical data for the values of (L, R) for certain figurations too. Where helpful, we assume “general real-world networks (on the 20 largest cities, in each position,” so that intercity distances d(x ,x ) are all of 10 US States). In [8] we give analytic results and i j distinct. bounds on the trade-off between L and the mathe- For the geometric graph one fixes 0

A moment’s thought shows these graphs are in gen- Gabriel graph. There does not exist a city inside • eral not connected, so we turn to models which are the disc whose diameter is the line segment from “by construction” connected. We remark that the xi to xj. connectivity threshold c in the finite n-vertex model [23]. There exists some n • of the random geometric graph has been studied in disc, with xi and xj on its boundary, so that no detail—see Chapter 13 of [40]. city is inside the disc. 2.2 A Nested Sequence of Connected Graphs The inclusions (1) are immediate from these defini- tions. Because the MST (for a finite configuration) The material here and in the next section was de- is connected, all these graphs are connected. veloped in graph theory with a view toward algo- Figure 1 illustrates the relative neighborhood and rithmic applications in and Gabriel graphs. Figures for the MST and the Delau- pattern recognition. The 1992 survey [28] gives the nay triangulation can be found online at http://www. history of the subject and 116 citations. But every- spss.com/research/wilkinson/Applets/edges.html. thing we need is immediate from the (careful choice Constructions such as the relative neighborhood of) definitions. On our arbitrary configuration x we and Gabriel graphs have become known loosely as can define four graphs whose edge-sets are nested as proximity graphs in [28] and subsequent literature, follows: and we next take the opportunity to turn an implicit definition in the literature into an explicit definition. MST relative n’hood Gabriel Delaunay. (1) ⊆ ⊆ ⊆ 2.3 Proximity Graphs Here are the definitions (for MST and Delaunay, it 1 1 Write v and v+ for the points ( , 0) and ( , 0). is easy to check these are equivalent to more familiar − 2 2 The lune is the intersection of the− open discs of definitions). In each case, we write the criterion for radii 1 centered at v and v+. So v and v+ are an edge (xi,xj) to be present: not in the lune but are− on its boundary.− Define a 2 Minimum spanning tree (MST) [24]. There does template A to be a subset of R such that: • not exist a sequence i = k0, k1,...,km = j of cities (i) A is a subset of the lune. such that (ii) A contains the open line segment (v , v+). − max(d(x ,x ), d(x ,x ),...,d(x ,x )) (iii) A is invariant under the “reflection in the y- k0 k1 k1 k2 km−1 km axis” map Reflect (x ,x ) = ( x ,x ) and the “re- x 1 2 − 1 2 < d(xi,xj). flection in the x-axis” map Reflecty(x1,x2) = (x1, x ). Relative neighborhood graph. There does not exist 2 − (iv) A is open. • a city k such that For arbitrary points x,y in R2, define A(x,y) to max(d(xi,xk), d(xk,xj)) < d(xi,xj). be the image of A under the natural transforma-

Fig. 1. The relative neighborhood graph (left) and Gabriel graph (right) on different realizations of 500 random points. 4 D. J. ALDOUS AND J. SHUN tion (translation, rotation and scaling) that takes 2.5 The Hammersley Network (v , v+) to (x,y). − There is a quite separate recent literature in the- Definition. Given a template A and a locally oretical probability [26, 27] defining structures such finite set of vertices, the associated proximity graph as trees and matchings directly on the infinite Pois- G has edgesV defined by, for each x,y , ∈V son point process. In this spirit, we observe that the (x,y) is an edge of G iff A(x,y) contains Hammersley process studied in [6] can be used to no vertex of . define a new network on the infinite Poisson point V From the definitions: process, which we name the Hammersley network. This network is designed to have the feature that if A is the lune, then G is the relative neighbor- • each vertex has exactly 4 edges, in directions NE hood graph; (between North and East), NW, SE and SW. The if A is the disc centered at the origin with radius conceptual difference from the networks in the previ- • 1/2, then G is the Gabriel graph. ous section is that there is not such a simple “local” But the MST and Delaunay triangulation are not criterion for whether a potential edge (xi,xj) is in instances of proximity graphs. the network. And edges cross, creating junctions.

Note that replacing A by a subset A′ can only For a picturesque description, imagine one-eyed introduce extra edges. It follows from (1) that the frogs sitting on an infinitely long, thin log, each be- proximity graph is always connected. The Gabriel ing able to see only the part of the log to their left graph is planar. But if A is not a superset of the disc before the next frog. At random times and positions centered at the origin with radius 1/2, then G might (precisely, as a space–time Poisson point process of not be a subgraph of the Delaunay triangulation, rate 1) a fly lands on the log, at which instant the and in this case edges may cross, so G is not planar (unique) frog which can see it jumps left to the fly’s (e.g., if the vertex-set is the four corners of a square, position and eats it. This defines a continuous time then the diagonals would be edges). Markov process (the Hammersley process) whose For a given configuration x, there is a collection of states are the configurations of positions of all the proximity graphs indexed by the template A, so by frogs. There is a stationary version of the process in choosing a monotone one-parameter family of tem- which, at each time, the positions of the frogs form plates, one gets a monotone one-parameter family a Poisson (rate 1) point process on the line. of graphs, analogous to the one-parameter family Now consider the space–time trajectories of all the c of geometric graphs. Here is a popular choice [30] frogs, drawn with time increasing upward on the Gin which β = 1 gives the Gabriel graph and β = 2 page. See Figure 2. For each frog, the part of the gives the relative neighborhood graph. trajectory between the completions of two successive Definition (The β-skeleton family). (i) For 0 < jumps consists of an upward edge (the frog remains β< 1 let Aβ be the intersection of the two open discs in place as time increases) followed by a leftward 1 of radius (2β)− passing through v and v+. edge (the frog jumps left). − (ii) For 1 β 2 let Aβ be the intersection of Reinterpreting the time axis as a second space the two open≤ discs≤ of radius β/2 centered at ( (β ± − axis, and introducing compass directions, that part 1)/2, 0). of the trajectory becomes a North edge followed by 2.4 Networks Based on Powers of Edge-Lengths a West edge. Now replace these two edges by a single North-West straight edge. Doing this procedure for It is not hard to think of other ways to define one- parameter families of networks. Here is one scheme each frog and each pair of successive jumps, we ob- used in, for example, [38]. Fix 1 p< . Given a tain a collection of NW paths, that is, a network in configuration x, and a route (sequence≤ ∞ of vertices) which each city (the reinterpreted space–time ran- dom points) has an edge to the NW and an edge x0,x1,...,xk, say, the cost of the route is the sum of pth powers of the step lengths. Now say that a pair to the SE. Finally, we repeat the construction with (x,y) is an edge of the network p if the cheapest the same realization of the space–time Poisson point route from x to y is the one-stepG route. As p in- process but with frogs jumping rightward instead of creases from 1 to , these networks decrease from leftward. This yields a network on the infinite Pois- the ∞ to the MST. Moreover, for p 2 son point process, which we name the Hammersley the network is a subgraph of the Gabriel graph.≥ network. See Figure 3. Gp CONNECTED NETWORKS OVER RANDOM POINTS 5

Remarks. (a) To draw the Hammersley network randomization has effect only near the boundary of on random points in a finite square, one needs ex- the square. ternal randomization to give the initial (time 0) frog (b) The property that each vertex has exactly positions, in fact, two independent randomizations 4 edges, in directions NE (between North and East), for the leftward and the rightward processes. So to NW, SE and SW, is immediate from the construc- be pedantic, one gets a random network over the tion. Note, however, that while adjacent NW space– given realization of cities. However, one can deduce time trajectories in Figure 2 do not cross, the corre- from the theoretical results in [6] that the external sponding diagonal roads in the Hammersley network may cross, so it is not a planar graph, though this has only negligible effect on route lengths. (c) Intuition, confirmed by Figure 7 later, says that the Hammersley network is not very efficient as a road network. It serves to demonstrate that there do exist random networks other than the familiar ones, and provides an instance where imposing de- terministic constraints (the four edges, in this case) on a random network makes it much less efficient. How general a phenomenon is this? 2.6 Normalized Length The notion of normalized network length L is most easily visualized in the setting of an infinite deter- ministic network which is “regular” in the sense of consisting of a repeated pattern. First choose the unit of length so that cities have an average density of one per unit area. Then define (2) L = average network length per unit area, ∆¯ = average degree (number of incident edges) Fig. 2. Space–time trajectories in Hammersley’s process. (3) of cities. Figure 4 shows the values of L and ∆¯ for some simple “repeated pattern” networks. Though not di- rectly relevant to our study of the random model, we find Figure 4 helpful for two reasons: as intuition for the interpretation of the different numerical values of L, and because we can make very loose analogies (Section 6.6) between particular networks on ran- dom points and particular deterministic networks.

3. NORMALIZED LENGTH AND ROUTE-LENGTH EFFICIENCY 3.1 The Random Model For the remainder of the paper we work with “the random model” for city positions. The finite model assumes n random vertices (cities) distributed inde- pendently and uniformly in a square of area n. The infinite model assumes the Poisson point process of Fig. 3. The Hammersley network on 2500 random points. rate 1 (per unit area) in the plane. The quantities 6 D. J. ALDOUS AND J. SHUN

Fig. 4. Variant square, triangular and hexagonal lattices. Drawn so that the density of cities is the same in each diagram, and ordered by value of L.

L, ∆¯ above and R below that we discuss may be in- Euclidean distance between the cities. So ℓ(i, j) ≥ terpreted as exact values in the infinite model or as d(i, j), and we write n limits in the finite model; see Section 5. We →∞ ℓ(i, j) use the word normalized as a reminder of the “den- r(i, j)= 1 sity 1” convention—we choose the normalized unit d(i, j) − of distance to make cities have average density 1 per unit area. After this normalization, L is the average so that “r(i, j) = 0.2” means that route length is network length per unit area. 20% longer than straight line distance. With n cities n we get 2 such numbers r(i, j); what is a reasonable 3.2 The Route-Length Efficiency Statistic R way to combine these into a single statistic? Two In designing a network, it is natural to regard total natural possibilities are as follows: length as a “cost”. The corresponding “benefit” is Rmax := max r(i, j), having short routes between cities. Write ℓ(i, j) for j=i the route length (length of shortest path) between (4) cities i and j in a given network, and d(i, j) for Rave := ave(i,j) r(i, j), CONNECTED NETWORKS OVER RANDOM POINTS 7

has a small value of Rave says nothing about route lengths between nearby cities. We propose a statistic R which is intermediate be- tween Rave and Rmax. First consider (see discussion below for details)

ρ(d) := mean value of r(i, j) over city-pairs with d(i, j)= d

and then define

(5) R := max ρ(d). 0 d< ≤ ∞ In words, R = 0.2 means that on every scale of dis- tance, route lengths are on average at most 20% longer than straight line distance. On an intuitive level, R provides a sensible and interpretable way to compare efficiency of different networks in providing short routes. On a technical

Fig. 5. Efficient or inefficient? Rave would judge this net- level, we see two advantages and one disadvantage work efficient in the n →∞ limit. of using R instead of Rave. Advantage 1. Using R to measure efficiency, there is a meaningful n limit for the network length/ where ave(i,j) denotes average over all distinct pairs →∞ efficiency trade-off [the function R (L) discussed (i, j). The statistic Rmax has been studied in the opt context of the design of networks in Section 5], and so, in particular, it makes sense [37] where it is called the stretch. However, being to compare the values of R for networks with differ- ent n. an “extremal” statistic Rmax seems unsatisfactory as a descriptor of real world networks—for instance, Advantage 2. A more realistic model for traffic it seems unreasonable to characterize the UK rail would posit that volume of traffic between two cities γ network as inefficient simply because there is no very varies as a power-law d− of distance d, so that in direct route between Oxford and Cambridge. calculating Rave it would be more realistic to weight γ The statistic Rave has a more subtle drawback. by d− . This means that the optimal network, when Consider a network consisting of: using Rave as optimality criterion, would depend on γ. Use of R finesses this issue; the value of γ does the minimum-length connected network (Steiner • not affect R. A related issue is that volume of traffic tree) on given cities; between two cities should depend on their popula- and a superimposed sparse collection of randomly • tions. Intuitively, incorporating random population oriented lines (a Poisson line process [45]). sizes should make the optimal R smaller because the See Figure 5. By choosing the density of lines to network designer can create shorter routes between be sufficiently low, one can make the normalized larger cities. We see this effect in data [10]; R calcu- network length be arbitrarily close to the minimum lated via population-weighting is typically slightly needed for connectivity. But it is easy to show (see smaller. But we have not tried theoretical study. [7] for careful analysis and a stronger result) that Disadvantage. The statistic R is tailored to the in- one can construct such networks so that Rave 0 as finite model, in which it makes sense to consider two n . Of course no one would build a road network→ cities at exactly distance d apart (then the other city looking→∞ like Figure 5 to link cities, because there are positions form a Poisson point process). For finite n many pairs of nearby cities with only very indirect we need to discretize. For the empirical data in [10], routes between them. The disadvantage of Rave as a where n = 20, we average over intervals of width 1 descriptive statistic is that (for large n) most city- unit (recall the unit of distance is taken such that pairs are far apart, so the fact that a given network the density of cities is 1 per unit area), that is, for 8 D. J. ALDOUS AND J. SHUN d = 1, 2,..., 5, we calculate a value of 0.21 at d = 5. This arises from the partic- ular structure (from each city there is one road in ρ˜(d) := mean value of r(i, j) over city-pairs each quadrant) resembling the deterministic “diag- 1 1 (6) with d 2 < d(i, j) < d + 2 , onal lattice” of Figure 4, in which the route between − some nearby pairs will be via two diagonal roads and R˜ := max ρ˜(d) 1 d< a junction. ≤ ∞ ˜ and use R as proxy for R. For larger n we can use 4. LENGTH-EFFICIENCY TRADE-OFF FOR shorter intervals. Thus, there is, in principle, a cer- TRACTABLE NETWORKS tain fuzziness to the notion of R for finite networks, and, in particular, it is not clear how to assign a Recall that our overall theme is the trade-off be- value of R to regular networks such as those in Fig- tween network length and route-length efficiency, and that in this paper we focus on n limits ure 4. But in practice, for networks we have studied →∞ on real-world data and on random points, this is not in the random model and the particular statistics L a problem, as explained next. and R. The models described in Section 2 are “tractable” 3.3 Characteristic Shape of the Function ρ(d) in the specific sense that one can find exact analytic For the connected networks on random points (ex- formulas for normalized length L. Unfortunately R cluding the Hammersley network) we are discussing, is not amenable to analytic calculation, and we re- the function ρ(d) has a characteristic shape (see sort to Monte Carlo simulation to obtain values for Figure 6) attaining its maximum between 2 and 3 R. Table 1 and Figure 7 show the values of (L, R) and slowly decreasing thereafter. We suspect that in the models. We explain below how the values of “this characteristic shape holds for any reasonable L are calculated. model,” but we do not know how to turn that phrase Notes on Table 1. (a) Values of R from our simu- into a precise conjecture. Note that “smoothness lations with n = 2500. near the maximum” implies that any calculated value (b) Value of L for MST from Monte Carlo [19]. In R˜ at (6) is quite insensitive to the choice of dis- principle, one can calculate arbitrarily close bounds cretization. [11], but apparently this has never been carried out. This characteristic shape has a common-sense in- Of course, ∆¯ = 2 for any tree. terpretation. Any efficient network will tend to place (c) The Gabriel graph and the relative neighbor- roads directly between unusually close city-pairs, hood graph fit the assumptions of Lemma 1 with 2π √3 implying that ρ(d) should be small for d< 1. For c = π/4 and c = 3 4 , respectively, and their ta- large d the presence of multiple alternate routes ble entries for L and− ∆¯ are obtained from Lemma helps prevent ρ(d) from growing. At distance 2 3 1, as are the values for β-skeletons in Figure 7. from a typical city i there will be about π32 π2−2 (d) For the Hammersley network, every degree 16 other cities j. For some of these j there− will be≈ equals 4, so L = 2 (mean edge-length). It follows cities k near the straight line from i to j, so the from theory [6] that× a typical edge, say, NE from network designer can create roads from i to k to j. (x,y), goes to a city at position (x+ξx,y+ξy), where The difficulty arises where there is no such inter- mediate city k: including a direct road (xi,xj) will Table 1 increase L, but not including it will increase ρ(d) for Statistics of tractable networks on random points 2

Fig. 6. The function ρ(d) for three theoretical networks on random cities. Irregularities are Monte Carlo random variation.

Fig. 7. The normalized network length L and the route-length efficiency statistic R for certain networks on random points. The ◦ show the beta-skeleton family, with RN the relative neighborhood graph and G the Gabriel graph. The • are special models: △ shows the Delaunay triangulation,  shows the network G2 from Section 2.4 and ♦ shows the Hammersley network.

ξx and ξy are independent with Exponential(1) dis- the minimum-length triangulation. Our simulation tribution. So mean edge-length equals results in Figure 6 for ρ(d) for the Delaunay tri-

∞ ∞ 2 2 x y angulation are roughly consistent with a simulation (7) x + y e− − dx dy 1.62. Z0 Z0 ≈ result in [13] saying that ρ(65) 0.05. p ≈ (e) For any triangulation, ∆¯ = 6 in the infinite model. For the Delaunay triangulation, L = ES where 4.1 A Simple Calculation for Proximity Graphs S is the perimeter length of a typical cell, and it 32 Let us give an example of an elementary calcula- is known ([35], page 113) that ES = 3π . Note [33] that the Delaunay triangulation is in general not tion for proximity graphs over random points. 10 D. J. ALDOUS AND J. SHUN

Lemma 1. For a proximity graph with template A on the Poisson point process, π3/2 (8) L = , 4c3/2 π (9) ∆=¯ , c where c = area(A).

Proof. Take a typical city at position x0. For a city x at distance s the chance that (x0,x) is an edge equals exp( cs2) and so − mean-degree = ∞ exp( cs2)2πs ds, Z0 − Fig. 8. An ad hoc modification of the relative neighborhood 1 ∞ graph, introducing junctions. L = s exp( cs2)2πs ds. 2 Z0 − Evaluating the integrals gives (8) and (9).  to limit constants definable in terms of the analo- One can derive similar integral formulas for other gous network on the infinite model (rate 1 Poisson “local” characteristics, for example, mean density point process on the infinite plane). For the proxim- of triangles and moments of vertex degree. See [18, ity graphs or Delaunay triangulation, the network 20, 21, 34] for a variety of such generalizations and definition applies directly to the infinite model and specializations. proof of (10) is straightforward. For the Hammers- ley network, (10) is implicit in [6], and for the MST 4.2 Other Tractable Networks detailed arguments can be found in [9, 43]. We do not know any other ways of defining net- 5.2 Optimal Networks works on random points which are both “natural” and are tractable in the sense that one can find ex- We now turn to consideration of optimal networks. act analytic formulas for L. In particular, we know Given a configuration x of n cities in the area-n 1 no tractable way of defining networks with deliber- square, and a value of L which is greater than n− ate junctions as in Figure 8. Note also that, while (length of Steiner tree), one can define a number × it is easy to make ad hoc modifications to the ge- ˜ ometric graph to ensure connectivity, these destroy Rn(x,L)=min of R over all networks tractability. On the other hand, one can construct (11) on x with normalized length L, “unnatural” networks (see, e.g., [8]) designed to per- ≤ mit calculation of L. where R˜ is the discretized version (6) calculated us- ing intervals of some suitable length δ . Applying 5. OPTIMAL NETWORKS AND N →∞ n this to a random configuration X in the finite model LIMITS gives, for each L, a random variable 5.1 Tractable Models Ξn(L) := Rn(X,L). As mentioned earlier, the quantities L, ∆¯ ,R we discuss may be interpreted as exact values in the in- One intuitively expects convergence to some deter- finite model or as n limits in the finite model. ministic limit To elaborate briefly,→∞ in a realization of the finite (12) Ξ (L) R (L) say, as n . model (n cities distributed independently and uni- n → opt →∞ formly in a square of area n), a network in Table 1 The analogous result for Rmax will be proved care- 1 has a normalized length Ln = n− (network length) fully in [8], and the same “superadditivity” argu- ¯ × and an average degree ∆n which are random vari- ment could be used to prove (12). See [43, 44, 47] for ables, but there is convergence (in probability and general background to such results. The point is that in expectation) we do not have any explicit description of the opti- (10) L L, ∆¯ ∆¯ as n mal [i.e., attaining the minimum in (11)] networks in n → n → →∞ CONNECTED NETWORKS OVER RANDOM POINTS 11 the finite or infinite models, so it seems very chal- between distant cities to be roughly proportional to lenging to prove the natural stronger supposition graph distance (number of edges), which is a more that the finite optimal networks themselves converge relevant quantity in some contexts. However, when (in some appropriate sense) to a unique infinite op- one considers design of optimal networks, replacing timal network for which the value R = Ropt(L) is or partially replacing route length by graph distance attained. leads to quite different optimal networks [1, 22]. For some other cost/benefit functionals leading to yet 5.3 The Curve R (L) opt different optimal networks see [2, 14]. Every possible network on the infinite Poisson point 6.2 Rigorous Proof of Finite R in Random process defines a pair (L, R), and the curve R = Proximity Graphs Ropt(L) can be defied equivalently as the lower bound- ary of the set of possible values of (L, R). There is no Table 1 presented the Monte Carlo numerical value reason to believe that proximity graphs are exactly 0.38 of R for the relative neighborhood graph on ≈ optimal, and, indeed, Figure 7 shows that the De- random points. From a rigorous viewpoint, the as- launay triangulation is slightly more efficient than sertion that a random network has R< is es- ∞ the corresponding β-skeleton. But our attempts to sentially the assertion that ρ(d)= O(d) as d . →∞ do better by ad hoc constructions (e.g., by introduc- This is often nontrivial to prove. A general sufficient ing degree-3 junctions—see Figure 8 for an example) condition for this property, which applies to the rel- have been unsuccessful. And, indeed, the fact that ative neighborhood graph (and hence all proximity the two special models in Figure 7 lie close to the graphs), is proved in [3]. The related fact that the β-skeleton curve lends credence to the idea that this limit limd ρ(d)/d exists is proved in [4]. →∞ curve is almost optimal. We therefore speculate that 6.3 Real-World Trade-Off Between Network the function Ropt looks something like the curve in Length and Route-Length Efficiency Figure 9, which we now discuss. What can we say about Ropt(L)? It is a priori Recall that our central theme is seeking to quan- nonincreasing. It is known [47] that there exists a tify the trade-off between normalized network length l Euclidean Steiner tree constant LST representing the and route-length efficiency R. Figure 9 suggests that limit normalized Steiner tree length in the random for optimal networks the “law of diminishing re- model, and clearly Ropt(L)= for LL ; opt ∞ ST 0.13 as L increases to 2 but decreases only slowly (13) as L increases further. This suggests a kind of “eco- R (L) 0 as L opt → →∞ nomic prediction” for the lengths of real-world net- are not trivial to prove rigorously, but follow from works which are perceived by users to be efficient in the corresponding facts for Rmax proved in [8]. But providing short routes: we are unable to prove rigorously that Ropt(L) is the length of an efficient network linking n strictly decreasing or that it is continuous. cities in a region of area A will be roughly 2√An. 6. FINAL REMARKS √ 6.1 Toy Models for Road Networks Here the An arises from undoing the normaliza- tion and the “2” is the value of L. Of course, this is The idea of using proximity graphs as toy models rough: we mean “closer to 2 than to 1 or 3.” for road networks has previously been noted [30] but 6.4 Other Results for the Random Network not investigated very thoroughly. It is an intuitively Models natural idea to a network designer: whether or not to place a direct road from city i to a nearby city There is substantial literature on the networks j depends (partly) on whether some other city k is (MST, proximity graphs, Delaunay triangulation) in close to the line between them. the deterministic setting. In the random case, cen- As observed by a referee, for the kind of models tral limit theorems for total network length have studied in this paper we expect route length ℓ(i, j) been studied in many models: for the MST in [29, 31, 12 D. J. ALDOUS AND J. SHUN

Fig. 9. Speculative shape for the curve Ropt(L), with ◦ and • values from tractable networks in Figure 7.

32], and for the Delaunay triangulation, Voronoi tes- being natural models for road networks, proximity sellation, relative neighborhood and Gabriel graphs graphs might be useful in modeling communication in [12, 25, 42]. Large deviation estimates for to- networks suffering line of sight interference. tal network length are given for the Gabriel graph At a more mathematical level, for questions such in [46], Section 11.4, and presumably could be ex- as spread-out percolation [41] or critical value of tended to other models. Otherwise the literature for contact processes [15], random proximity graphs with the random case is rather diffuse, with different fo- small A are an interesting alternative to the usual cuses for different networks. For instance, work on lattice- or random graph-based models. For instance, MSTs has focused on connections with critical con- it is natural to conjecture that the critical value p∗ tinuum percolation [17]. For the relative neighbor- A ¯ for edge percolation on a random proximity graph hood graph and the Gabriel graph, [20] calculates ∆ with template A satisfies and [18] shows that, in the finite model, in a certain 1 range the β-skeletons have (15) p∗ π− area(A) as area(A) 0 A ∼ → (14) Rmax grows as order log n/ log log n ¯ p [the right side = 1/∆ from (9)] and that the criti- and [21] shows the same order for maximum vertex cal value λA∗ for the contact process has the same degree in the Gabriel graph. As for the Delaunay tri- asymptotics. angulation, there has been surprisingly little follow- up to the seminal analysis by Miles [35] (various 6.6 Analogies Between Deterministic and maximal statistics are studied in [16]), though the Random Networks closely related Voronoi tessellation has been studied As mentioned earlier, we may make very loose in more detail [36]. analogies between particular networks on random 6.5 Speculative Applications of Random points and particular deterministic networks in Fig- Proximity Graphs ure 4, based in part on exact equality of ∆¯ in the latter three cases: Random proximity graphs seem an interesting ob- ject of study from many viewpoints, in particular, Relative n’hood graph punctured lattice, as an attractive alternative to random geometric ↔ graphs for modeling spatial networks that are con- Gabriel graph square lattice, ↔ nected by design. It is remarkable that results such Hammersley network diagonal lattice, as (14) are the only nonelementary results about ↔ them that we can find in the literature. As well as Delaunay triangulation triangular lattice. ↔ CONNECTED NETWORKS OVER RANDOM POINTS 13

6.7 Scale Invariant Continuum Networks [5] Aldous, D. J. (2010). Scale-invariant random spatial networks. To appear. Introducing the statistic R can be viewed as one [6] Aldous, D. J. and Diaconis, P. (1995). Hammersley’s approach to resolving the “paradox” from [7], dis- interacting particle process and longest increasing cussed in Section 3.2, that the more natural statis- subsequences. Probab. Theory Related Fields 103 tic Rave does not lead to realistic optimal networks 199–213. MR1355056 in the n limit. This particular approach was [7] Aldous, D. J. and Kendall, W. S. (2008). Short- prompted→∞ by visualizing real-world road networks— length routes in low-cost networks via Poisson line patterns. Adv. in Appl. Probab. 40 1–21. cf. discussion in Section 3.3. Let us mention a mathe- MR2411811 matically more sophisticated alternative, under study [8] Aldous, D. J., Bhamidi, S. and Lando, T. (2010). as a work in progress [5]. Instead of a discrete Pois- The stretch-length tradeoff in geometric networks: son process of cities, we imagine a continuum limit. Worst-case and average-case study. To appear. [9] Aldous, D. J. and Steele, J. M. (1992). Asymptotics That is, for each finite set (z1,...,zk) of points in the plane, there is a random network (z ,...,z ) link- for Euclidean minimal spanning trees on random S 1 k points. Probab. Theory Related Fields 92 247–258. ing the points, consistent as more points are added. MR1161188 Mathematically natural structural properties for the [10] Aldous, D. J. and Choi, A. (2009). A route-length distribution of such a process are as follows: efficiency statistic for road networks. Unpublished manuscript. Available at www.stat.berkeley.edu/ (i) translation and rotation invariance, ˜aldous/Spatial/paper.pdf. (ii) scale invariance, [11] Avram, F. and Bertsimas, D. (1992). The minimum spanning tree constant in geometric probability and where the latter means that routes, as point-sets in under the independent model: A unified approach. R2 , are invariant in distribution under Euclidean Ann. Appl. Probab. 2 113–130. MR1143395 scaling. This implies that the quantity ρ(d) anal- [12] Avram, F. and Bertsimas, D. (1993). On central limit ogous to (5), assumed finite, is a constant, which we theorems in geometrical probability. Ann. Appl. Probab. 3 1033–1046. MR1241033 can call R′. The analog L′ of L is defined by [13] Baccelli, F., Tchoumatchenko, K. and Zuyev, S. the expected length of the network on (2000). Markov paths on the Poisson–Delaunay n uniform random points in the area-n graph with applications to routeing in mobile net- works. Adv. in Appl. Probab. 32 1–18. MR1765174 square grows L′n as n . ∼ →∞ [14] Barthelemy,´ M. and Flammini, A. (2006). Optimal In this setting we can study the optimal trade-off traffic networks. J. Stat. Mech. Theory Exp. 2006 between L′ and R′, and the kind of “paradoxical” L07002. Figure 5 network cannot arise because it violates [15] Berger, N., Borgs, C., Chayes, J. T. and Saberi, scale-invariance. A. (2005). On the spread of viruses on the internet. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms 301–310 (elec- ACKNOWLEDGMENTS tronic). ACM, New York. MR2298278 Aldous’s research supported by NSF Grant DMS- [16] Bern, M., Eppstein, D. and Yao, F. (1991). The ex- 0704159. We thank three anonymous referees for pected extremes in a Delaunay triangulation. Inter- nat. J. Comput. Geom. Appl. 1 79–91. MR1099499 helpful comments. [17] Bezuidenhout, C., Grimmett, G. and Loffler,¨ A. (1998). Percolation and minimal spanning trees. J. 92 REFERENCES Stat. Phys. 1–34. MR1645627 [18] Bose, P., Devroye, L., Evans, W. and Kirkpatrick, [1] Aldous, D. J. (2008). Spatial transportation networks D. (2006). On the spanning ratio of Gabriel graphs with transfer costs: Asymptotic optimality of hub and β-skeletons. SIAM J. Discrete Math. 20 412– and spoke models. Math. Proc. Cambridge Philos. 427 (electronic). MR2257270 Soc. 145 471–487. MR2442138 [19] Cortina-Borja, M. and Robinson, T. (2000). Esti- [2] Aldous, D. J. (2008). Optimal spatial transportation mating the asymptotic constants of the total length networks where link-costs are sublinear in link- of Euclidean minimal spanning trees with power- capacity. J. Stat. Mech. 2008 P03006. weighted edges. Statist. Probab. Lett. 47 125–128. [3] Aldous, D. J. (2009). Which connected spatial net- MR1747099 works on random points have linear route-lengths? [20] Devroye, L. (1988). The expected size of some graphs Available at arXiv:0911.5296v1. in computational geometry. Comput. Math. Appl. [4] Aldous, D. J. (2009). The shape theorem for route- 15 53–64. MR0937563 lengths in connected spatial networks on random [21] Devroye, L., Gudmundsson, J. and Morin, P. (2009). points. Available at arXiv:0911.5301v1. On the expected maximum degree of Gabriel and 14 D. J. ALDOUS AND J. SHUN

Yao graphs. Adv. in Appl. Probab. 41 1123–1140. research and the clustering of points in the plane. MR2663239 Geog. Anal. 12 205–222. [22] Gastner, M. T. and Newman, M. E. J. (2006). Shape [35] Miles, R. E. (1970). On the homogeneous planar and efficiency in spatial distribution networks. J. Poisson point process. Math. Biosci. 6 85–127. Stat. Mech. Theory Exp. 2006 P01015 (electronic). MR0279853 [23] George, P.-L. and Borouchaki, H. (1998). Delaunay [36] Møller, J. (1994). Lectures on Random Vorono˘ıTes- Triangulation and Meshing. Editions Herm`es, Paris. sellations. Lecture Notes in Statistics 87. Springer, MR1686530 New York. MR1295245 [24] Graham, R. L. and Hell, P. (1985). On the history [37] Narasimhan, G. and Smid, M. (2007). Geometric Span- of the minimum spanning tree problem. IEEE Ann. ner Networks. Cambridge Univ. Press, Cambridge. History Comput. 07 43–57. MR0783327 MR2289615 [25] Heinrich, L. (1994). Normal approximation for some [38] Narayanaswamy, S., Kawadia, V., Sreenivas, R. S. mean-value estimates of absolutely regular tessella- and Kumar, P. R. (2002). Power control in ad- tions. Math. Methods Statist. 3 1–24. MR1272628 hoc networks: Theory, architecture, and [26] Holroyd, A. E., Pemantle, R., Peres, Y. and implementation of the COMPOW protocol. In Proc. Schramm, O. (2009). Poisson matching. Ann. European Wireless Conference, Florence, Italy. Inst. H. Poincar´e Probab. Statist. 45 266–287. [39] Newman, M. E. J. (2003). The structure and func- 45 MR2500239 tion of complex networks. SIAM Rev. 167–256. [27] Holroyd, A. E. and Peres, Y. (2003). Trees and MR2010377 Penrose, M. D. matchings from point processes. Electron. Comm. [40] (2003). Random Geometric Graphs. Oxford Univ. Press, Oxford. MR1986198 Probab. 8 17–27 (electronic). MR1961286 [41] Penrose, M. D. (1993). On the spread-out limit [28] Jaromczyk, J. W. and Toussaint, G. T. (1992). Rela- for bond and continuum percolation. Ann. Appl. tive neighborhood graphs and their relatives. Proc. Probab. 3 253–276. MR1202526 IEEE 80 1502–1517. [42] Penrose, M. D. and Yukich, J. E. (2001). Cen- [29] Kesten, H. and Lee, S. (1996). The central limit theo- tral limit theorems for some graphs in computa- rem for weighted minimal spanning trees on random tional geometry. Ann. Appl. Probab. 11 1005–1041. points. Ann. Appl. Probab. 6 495–527. MR1398055 MR1878288 [30] Kirkpatrick, D. G. and Radke, J. D. (1985). A frame- [43] Penrose, M. D. and Yukich, J. E. (2003). Weak laws work for computational morphology. In Computa- of large numbers in geometric probability. Ann. tional Geometry (G. T. Toussaint, ed.) 217–248. El- Appl. Probab. 13 277–303. MR1952000 sevier, Amsterdam. [44] Steele, J. M. (1997). Probability Theory and Combina- [31] Lee, S. (1997). The central limit theorem for Euclidean torial Optimization. CBMS-NSF Regional Confer- 7 minimal spanning trees. I. Ann. Appl. Probab. ence Series in Applied Math 69. SIAM, Philadel- 996–1020. MR1484795 phia, PA. MR1422018 [32] Lee, S. (1999). The central limit theorem for Euclidean [45] Stoyan, D., Kendall, W. S. and Mecke, J. (1995). minimal spanning trees. II. Adv. in Appl. Probab. Stochastic Geometry and Its Applications, 2nd ed. 31 969–984. MR1747451 Wiley, Chichester. MR0895588 [33] Lloyd, E. L. (1977). On triangulations of a set of points [46] Talagrand, M. (1995). Concentration of measure in the plane. In 18th Annual Symposium on Founda- and isoperimetric inequalities in product spaces. tions of Computer Science (Providence, RI, 1977) Inst. Hautes Etudes´ Sci. Publ. Math. 81 73–205. 228–240. IEEE Comput. Soc., Long Beach, CA. MR1361756 MR0660693 [47] Yukich, J. E. (1998). Probability Theory of Classical [34] Matula, D. W. and Sokal, R. R. (1980). Properties Euclidean Optimization Problems. Lecture Notes in of Gabriel graphs relevant to geographical variation Math. 1675. Springer, Berlin. MR1632875