Connected Spatial Networks Over Random Points and a Route
Total Page:16
File Type:pdf, Size:1020Kb
Statistical Science 2010, Vol. 25, No. 3, 275–288 DOI: 10.1214/10-STS335 c Institute of Mathematical Statistics, 2010 Connected Spatial Networks over Random Points and a Route-Length Statistic David J. Aldous and Julian Shun Abstract. We review mathematically tractable models for connected networks on random points in the plane, emphasizing the class of prox- imity graphs which deserves to be better known to applied probabilists and statisticians. We introduce and motivate a particular statistic R measuring shortness of routes in a network. We illustrate, via Monte Carlo in part, the trade-off between normalized network length and R in a one-parameter family of proximity graphs. How close this family comes to the optimal trade-off over all possible networks remains an intriguing open question. The paper is a write-up of a talk developed by the first author during 2007–2009. Key words and phrases: Proximity graph, random graph, spatial net- work, geometric graph. 1. INTRODUCTION between two given cities. Because we work only in two dimensions, the word spatial may be mislead- The topic called random networks or complex net- ing, but equally the word planar would be mislead- works has attracted huge attention over the last 20 ing because we do not require networks to be planar years. Much of this work focuses on examples such graphs (if edges cross, then a junction is created). as social networks or WWW links, in which edges Our major purpose is to draw the attention of are not closely constrained by two-dimensional ge- readers from the applied probability and statistics ometry. In contrast, in a spatial network not only communities to a particular class of spatial network are vertices and edges situated in two-dimensional models. Recall that the most studied network model, space, but also it is actual distances, rather than the random geometric graph [40] reviewed in Section number of edges, that are of interest. To be concrete, 2.1, does not permit both connectivity and bounded arXiv:1003.3700v2 [math.PR] 5 Jan 2011 we visualize idealized inter-city road networks, and normalized length in the n limit. An attractive a feature of interest is the (minimum) route length alternative is the class of proximity→∞ graphs, reviewed in Section 2.3, which in the deterministic case have David J. Aldous is Professor, Department of Statistics, been studied within computational geometry. These University of California, 367 Evans Hall # 3860, graphs are always connected. Proximity graphs on Berkeley, California 94720, USA (e-mail: random points have been studied in only a few pa- [email protected]; URL: pers, but are potentially interesting for many pur- www.stat.berkeley.edu/users/aldous). Julian Shun is poses other than the specific “short route lengths” Graduate Student, Machine Learning Department, topic of this paper (see Section 6.5). One could also Carnegie Mellon University, 5000 Forbes Avenue, imagine constructions which depend on points hav- Pittsburgh, Pennsylvania 15213, USA (e-mail: ing specifically the Poisson point process distribu- [email protected]). tion, and one novel such network, which we name This is an electronic reprint of the original article the Hammersley network, is described in Section 2.5. published by the Institute of Mathematical Statistics in Visualizing idealized road networks, it is natu- Statistical Science, 2010, Vol. 25, No. 3, 275–288. This ral to take total network length as the “cost” of a reprint differs from the original in pagination and network, but what is the corresponding “benefit”? typographic detail. Primarily we are interested in having short route 1 2 D. J. ALDOUS AND J. SHUN lengths. Choosing an appropriate statistic to mea- Finally, recall this is a nontechnical account. Our sure the latter turns out to be rather subtle, and purpose is to elaborate verbally the ideas outlined the (only) technical innovation of this paper is the above; some technical aspects will be pursued else- introduction (Section 3.2) and motivation of a spe- where. cific statistic R for measuring the effectiveness of a network in providing short routes. 2. MODELS FOR CONNECTED SPATIAL In the theory of spatial networks over random NETWORKS points, it is a challenge to quantify the trade-off between network length [precisely, the normalized There are several conceptually different ways of defining networks on random points in the plane. To length L defined at (2)] and route length efficiency statistics such as R. Our particular statistic R is not be concrete, we call the points cities; to be consis- tent about language, we regard x as the position of amenable to explicit calculation even in compara- i city i and represent network edges as line segments tively tractable models, but in Section 4 we present (x ,x ). the results from Monte Carlo simulations. In partic- i j First (Sections 2.1–2.3) are schemes which use de- ular, Figure 7 shows the trade-off for the particular terministic rules to define edges for an arbitrary de- β-skeleton family of proximity graphs. terministic configuration of cities; then one just ap- Given a normalized network length L, for any real- plies these rules to a random configuration. Second, ization of cities there is some network of normalized one can have random rules for edges in a determin- length L which minimizes R. As indicated in Sec- istic configuration (e.g., the probability of an edge tion 5, by general abstract mathematical arguments, between cities i and j is a function of Euclidean there must exist a deterministic function R (L) opt distance d(x ,x ), as in popular small worlds mod- giving (in the “number of cities ” limit under i j els [39]), and again apply to a random configura- the random model) the minimum→∞ value of R over tion. Third, and more subtly, one can have construc- all possible networks of normalized length L. An in- tions that depend on the randomness model for city triguing open question is as follows: positions—Section 2.5 provides a novel example. how close are the values Rβ-skel(L) from We work throughout with reference to Euclidean the β-skeleton proximity graphs to the op- distance d(x,y) on the plane, even though many timum values Ropt(L)? models could be defined with reference to other met- rics (or even when the triangle inequality does not As discussed in Section 5.3, at first sight it looks easy hold, for the MST). to design heuristic algorithms for networks which should improve over the β-skeletons, for example, 2.1 The Geometric Graph by introducing Steiner points, but in practice we In Sections 2.1–2.3 we have an arbitrary configura- have not succeeded in doing so. tion x = x of city positions, and a deterministic This paper focuses on the random model for city i rule for defining{ } the edge-set . Usually in graph positions because it seems the natural setting for theory one imagines a finite configuration,E but note theoretical study. As a complement, in [10] we give that everything makes sense for locally finite con- empirical data for the values of (L, R) for certain figurations too. Where helpful, we assume “general real-world networks (on the 20 largest cities, in each position,” so that intercity distances d(x ,x ) are all of 10 US States). In [8] we give analytic results and i j distinct. bounds on the trade-off between L and the mathe- For the geometric graph one fixes 0 <c< and matically more tractable stretch statistic R at max defines ∞ (4), in both worst-case and random-case settings for city positions. Let us also point out a (perhaps) (xi,xj) iff d(xi,xj) c. nonobvious insight discussed in Section 3.3: in de- ∈E ≤ For the K-neighbor graph one fixes K 1 and de- signing networks to be efficient in the sense of pro- ≥ viding short routes, the main difficulty is providing fines short routes between city-pairs at a specific distance (x ,x ) iff x is one of the K closest i j ∈E i (2–3 standardized units) apart, rather than between neighbors of xj, or xj is one of the K clos- pairs at a large distance apart. est neighbors of xi. CONNECTED NETWORKS OVER RANDOM POINTS 3 A moment’s thought shows these graphs are in gen- Gabriel graph. There does not exist a city inside • eral not connected, so we turn to models which are the disc whose diameter is the line segment from “by construction” connected. We remark that the xi to xj. connectivity threshold c in the finite n-vertex model Delaunay triangulation [23]. There exists some n • of the random geometric graph has been studied in disc, with xi and xj on its boundary, so that no detail—see Chapter 13 of [40]. city is inside the disc. 2.2 A Nested Sequence of Connected Graphs The inclusions (1) are immediate from these defini- tions. Because the MST (for a finite configuration) The material here and in the next section was de- is connected, all these graphs are connected. veloped in graph theory with a view toward algo- Figure 1 illustrates the relative neighborhood and rithmic applications in computational geometry and Gabriel graphs. Figures for the MST and the Delau- pattern recognition. The 1992 survey [28] gives the nay triangulation can be found online at http://www. history of the subject and 116 citations. But every- spss.com/research/wilkinson/Applets/edges.html. thing we need is immediate from the (careful choice Constructions such as the relative neighborhood of) definitions. On our arbitrary configuration x we and Gabriel graphs have become known loosely as can define four graphs whose edge-sets are nested as proximity graphs in [28] and subsequent literature, follows: and we next take the opportunity to turn an implicit definition in the literature into an explicit definition.