Connected Spatial Networks Over Random Points and a Route-Length Statistic David J
Total Page:16
File Type:pdf, Size:1020Kb
Statistical Science 2010, Vol. 25, No. 3, 275–288 DOI: 10.1214/10-STS335 © Institute of Mathematical Statistics,2010 Connected Spatial Networks over Random Points and a Route-Length Statistic David J. Aldous and Julian Shun Abstract. We review mathematically tractable models for connected net- works on random points in the plane, emphasizing the class of proximity graphs which deserves to be better known to applied probabilists and statisti- cians. We introduce and motivate a particular statistic R measuring shortness of routes in a network. We illustrate, via Monte Carlo in part, the trade-off between normalized network length and R in a one-parameter family of prox- imity graphs. How close this family comes to the optimal trade-off over all possible networks remains an intriguing open question. The paper is a write-up of a talk developed by the first author during 2007– 2009. Key words and phrases: Proximity graph, random graph, spatial network, geometric graph. 1. INTRODUCTION Recall that the most studied network model, the ran- dom geometric graph [40]reviewedinSection2.1, The topic called random networks or complex net- does not permit both connectivity and bounded nor- works has attracted huge attention over the last 20 malized length in the n limit. An attractive al- years. Much of this work focuses on examples such as ternative is the class of proximity→∞ graphs,reviewedin social networks or WWW links, in which edges are not Section 2.3,whichinthedeterministiccasehavebeen closely constrained by two-dimensional geometry. In studied within computational geometry. These graphs contrast, in a spatial network not only are vertices and are always connected. Proximity graphs on random edges situated in two-dimensional space, but also it is points have been studied in only a few papers, but are actual distances, rather than number of edges, that are potentially interesting for many purposes other than of interest. To be concrete, we visualize idealized inter- the specific “short route lengths” topic of this paper city road networks, and a feature of interest is the (min- (see Section 6.5). One could also imagine construc- imum) route length between two given cities. Because tions which depend on points having specifically the we work only in two dimensions, the word spatial may Poisson point process distribution, and one novel such be misleading, but equally the word planar would be network, which we name the Hammersley network,is misleading because we do not require networks to be described in Section 2.5. planar graphs (if edges cross, then a junction is cre- Visualizing idealized road networks, it is natural to ated). take total network length as the “cost” of a network, but Our major purpose is to draw the attention of read- what is the corresponding “benefit”? Primarily we are ers from the applied probability and statistics commu- interested in having short route lengths. Choosing an nities to a particular class of spatial network models. appropriate statistic to measure the latter turns out to be rather subtle, and the (only) technical innovation of this David J. Aldous is Professor, Department of Statistics, paper is the introduction (Section 3.2)andmotivation University of California, 367 Evans Hall # 3860, Berkeley, California 94720, USA (e-mail: [email protected]; of a specific statistic R for measuring the effectiveness URL: www.stat.berkeley.edu/users/aldous). Julian Shun is of a network in providing short routes. Graduate Student, Machine Learning Department, In the theory of spatial networks over random points, Carnegie Mellon University, 5000 Forbes Avenue, it is a challenge to quantify the trade-off between net- Pittsburgh, Pennsylvania 15213, USA (e-mail: work length [precisely, the normalized length L de- [email protected]). fined at (2)] and route length efficiency statistics such 275 276 D. J. ALDOUS AND J. SHUN as R.OurparticularstatisticR is not amenable to ex- First (Sections 2.1–2.3)areschemeswhichusede- plicit calculation even in comparatively tractable mod- terministic rules to define edges for an arbitrary deter- els, but in Section 4 we present the results from Monte ministic configuration of cities; then one just applies Carlo simulations. In particular, Figure 7 shows the these rules to a random configuration. Second, one can trade-off for the particular β-skeleton family of prox- have random rules for edges in a deterministic config- imity graphs. uration (e.g., the probability of an edge between cities Given a normalized network length L,foranyreal- i and j is a function of Euclidean distance d(xi,xj ),as ization of cities there is some network of normalized in popular small worlds models [39]), and again apply length L which minimizes R.AsindicatedinSec- to a random configuration. Third, and more subtly, one tion 5,bygeneralabstractmathematicalarguments, can have constructions that depend on the randomness there must exist a deterministic function Ropt(L) giv- model for city positions—Section 2.5 provides a novel ing (in the “number of cities ”limitunderthe example. random model) the minimum value→∞ of R over all pos- We work throughout with reference to Euclidean dis- sible networks of normalized length L.Anintriguing tance d(x,y) on the plane, even though many mod- open question is as follows: els could be defined with reference to other metrics (or even when the triangle inequality does not hold, for the how close are the values Rβ-skel(L) from the MST). β-skeleton proximity graphs to the optimum values Ropt(L)? 2.1 The Geometric Graph As discussed in Section 5.3,atfirstsightitlookseasyto In Sections 2.1–2.3 we have an arbitrary configura- design heuristic algorithms for networks which should tion x xi of city positions, and a deterministic rule for defining={ } the edge-set .Usuallyingraphtheory improve over the β-skeletons, for example, by intro- E ducing Steiner points, but in practice we have not suc- one imagines a finite configuration, but note that every- ceeded in doing so. thing makes sense for locally finite configurations too. This paper focuses on the random model for city po- Where helpful, we assume “general position,” so that sitions because it seems the natural setting for theoret- intercity distances d(xi,xj ) are all distinct. For the geometric graph one fixes 0 <c< and ical study. As a complement, in [10]wegiveempiri- ∞ cal data for the values of (L, R) for certain real-world defines networks (on the 20 largest cities, in each of 10 US (xi,xj ) iff d(xi,xj ) c. States). In [8]wegiveanalyticresultsandboundson ∈ E ≤ For the K-neighbor graph one fixes K 1anddefines the trade-off between L and the mathematically more ≥ tractable stretch statistic R at (4), in both worst-case (xi,xj ) iff xi is one of the K closest max ∈ E and random-case settings for city positions. Let us also neighbors of xj ,orxj is one of the K clos- point out a (perhaps) nonobvious insight discussed in est neighbors of xi. Section 3.3:indesigningnetworkstobeefficientinthe Amoment’sthoughtshowsthesegraphsareingeneral sense of providing short routes, the main difficulty is not connected, so we turn to models which are “by con- providing short routes between city-pairs at a specific struction” connected. We remark that the connectivity distance (2–3 standardized units) apart, rather than be- threshold cn in the finite n-vertex model of the random tween pairs at a large distance apart. geometric graph has been studied in detail—see Chap- Finally, recall this is a nontechnical account. Our ter 13 of [40]. purpose is to elaborate verbally the ideas outlined above; some technical aspects will be pursued else- 2.2 A Nested Sequence of Connected Graphs where. The material here and in the next section was de- veloped in graph theory with a view toward algorith- 2. MODELS FOR CONNECTED SPATIAL mic applications in computational geometry and pat- NETWORKS tern recognition. The 1992 survey [28]givesthehis- There are several conceptually different ways of tory of the subject and 116 citations. But everything defining networks on random points in the plane. To we need is immediate from the (careful choice of) defi- be concrete, we call the points cities;tobeconsistent nitions. On our arbitrary configuration x we can define four graphs whose edge-sets are nested as follows: about language, we regard xi as the position of city i and represent network edges as line segments (xi,xj ). (1) MST relative n’hood Gabriel Delaunay. ⊆ ⊆ ⊆ CONNECTED NETWORKS OVER RANDOM POINTS 277 Here are the definitions (for MST and Delaunay, it 2.3 Proximity Graphs is easy to check these are equivalent to more familiar Write v and v for the points ( 1 , 0) and ( 1 , 0). definitions). In each case, we write the criterion for an − + 2 2 The lune is the intersection of the open− discs of radii 1 edge (xi,xj ) to be present: centered at v and v .Sov and v are not in the − + − + Minimum spanning tree (MST) [24]. There does not lune but are on its boundary. Define a template A to be • 2 exist a sequence i k0,k1,...,km j of cities such asubsetofR such that: = = that (i) A is a subset of the lune. (ii) A contains the open line segment (v ,v ). max(d(xk0 ,xk1 ), d(xk1 ,xk2 ), . , d(xkm 1 ,xkm )) − + − (iii) A is invariant under the “reflection in the y- <d(x,x ). axis” map Reflectx(x1,x2) ( x1,x2) and the “re- i j = − flection in the x-axis” map Reflecty(x1,x2) (x1, Relative neighborhood graph. There does not exist a x ). = • 2 city k such that − (iv) A is open. 2 max(d(xi,xk), d(xk,xj )) < d(xi,xj ). For arbitrary points x,y in R ,defineA(x, y) to be the image of A under the natural transformation Gabriel graph.