Query Processing in Spatial Network Databases
Total Page:16
File Type:pdf, Size:1020Kb
Query Processing in Spatial Network Databases Dimitris Papadias Jun Zhang Department of Computer Science Department of Computer Science Hong Kong University of Science and Technology Hong Kong University of Science and Technology Clearwater Bay, Hong Kong Clearwater Bay, Hong Kong [email protected] [email protected] Nikos Mamoulis Yufei Tao Department of Computer Science and Information Systems Department of Computer Science University of Hong Kong City University of Hong Kong Pokfulam Road, Hong Kong Tat Chee Avenue, Hong Kong [email protected] [email protected] than their Euclidean distance. Previous work on spatial Abstract network databases (SNDB) is scarce and too restrictive Despite the importance of spatial networks in for emerging applications such as mobile computing and real-life applications, most of the spatial database location-based commerce. This necessitates the literature focuses on Euclidean spaces. In this development of novel and comprehensive query paper we propose an architecture that integrates processing methods for SNDB. network and Euclidean information, capturing Every conventional spatial query type (e.g., nearest pragmatic constraints. Based on this architecture, neighbors, range search, spatial joins and closest pairs) we develop a Euclidean restriction and a has a counterpart in SNDB. Consider, for instance, the network expansion framework that take road network of Figure 1.1, where the rectangles advantage of location and connectivity to correspond to hotels. If a user at location q poses the efficiently prune the search space. These range query "find the hotels within a 15km range", the frameworks are successfully applied to the most result will contain a, b and c (the numbers in the figure popular spatial queries, namely nearest correspond to network distance). Similarly, a nearest neighbors, range search, closest pairs and e- neighbor query will return hotel b. Note that the results of distance joins, in the context of spatial network the corresponding conventional queries are different (e.g., databases. the Euclidean nearest neighbor is d, which is actually the farthest hotel in the network). Furthermore, queries may 1. Introduction combine both location and network aspects, such as "find Spatial databases have been well studied in the last 20 the nearest hotel to the south" (e.g., hotel a). years resulting in the development of numerous conceptual models, multi-dimensional indexes and query processing techniques [RSV02]. Surprisingly, most of existing work considers Cartesian (typically, Euclidean) spaces, where the distance between two objects is determined solely by their relative position in space. However, in practice, objects can usually move only on a pre-defined set of trajectories as specified by the underlying network (road, railway, river etc.). Thus, the important measure is the network distance, i.e., the length Figure 1.1: Road network query example of the shortest trajectory connecting two objects, rather A crucial pre-requisite for solving these queries in SNDB is a realistic architecture, which captures spatial entities Permission to copy without fee all or part of this material is granted (e.g., hotels) and the underlying network, preserving both provided that the copies are not made or distributed for direct Cartesian co-ordinates and connectivity. In addition, this commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by architecture must take into account real-life constraints, permission of the Very Large Data Base Endowment. To copy such as unidirectional roads, “off-network” (but still otherwise, or to republish, requires a fee and/or special permission from reachable) entities, etc. Furthermore, although the the Endowment th network is almost static, the entities may change with Proceedings of the 29 VLDB Conference, high frequencies (e.g., a new/existing hotel opens/closes). Berlin, Germany, 2003 It is also possible that entire entity sets are added as more R-trees (like most spatial access methods) were motivated information or services become available (e.g., a new by the need to efficiently process range queries, where "restaurant" dataset is incorporated in the system). the range usually corresponds to a rectangular window or In this paper we propose a flexible architecture for SNDB, a circular area around a query point. The R-tree answers by separating the network from the entity datasets. In the query q (shaded area) in Figure 2.1 as follows. The particular, we employ a disk-based network representation root is first retrieved and the entries (e.g., E1, E2) that that preserves connectivity and location, while spatial intersect the range are recursively searched because they entities are indexed by respective spatial access methods may contain qualifying points. Non-intersecting entries for supporting Euclidean queries and dynamic updates. (e.g., E3) are skipped. Note that for non-point data (e.g., Using this architecture, we develop two frameworks, lines, polygons), the R-tree provides just a filter step to Euclidean restriction and network expansion, for prune non-qualifying objects. The output of this phase has processing the most common spatial queries, namely to pass through a refinement step that examines the actual nearest neighbors, range search, closest pairs and distance object representation to determine the actual result. The joins. The resulting algorithms expand conventional concept of filter and refinement steps applies to all spatial processing techniques by integrating connectivity and queries on non-point objects. location information for efficient pruning of the search A nearest neighbor (NN) query retrieves the (k≥1) data space. To the best of our knowledge, this is the first work point(s) closest to a query point q. The R-tree NN dealing with the efficient processing of spatial queries in algorithm proposed in [HS99] keeps a heap with the SNDB. entries of the nodes visited so far. Initially, the heap The rest of the paper is organized as follows: Section 2 contains the entries of the root sorted according to their overviews related work. Section 3 describes our minimum distance (mindist) from q. The entry with the architecture and its application in real-life scenarios. minimum mindist in the heap (E1 in Figure 2.1) is Sections 4 and 5 present algorithms for nearest neighbor expanded, i.e., it is removed from the heap and its and range search queries, respectively, while Sections 6 children (E3, E4, E5) are added together with their mindist. and 7 discuss closest pairs and spatial joins. Section 8 The next entry visited is E2 (its mindist is currently the evaluates the proposed techniques with comprehensive minimum in the heap), followed by E6, where the actual experiments, and Section 9 concludes the paper with a result (h) is found and the algorithm terminates, because discussion. the mindist of all entries in the heap is greater than the distance of h. The algorithm can be easily extended for 2. Related Work the retrieval of k nearest neighbors (kNN). Furthermore, it In this section we overview previous work related to is optimal (it visits only the nodes necessary for obtaining location (Section 2.1) and connectivity (Section 2.2) the nearest neighbors) and incremental, i.e., it reports representation and processing in databases. neighbors in ascending order of their distance to the query point, and can be applied when the number k of nearest 2.1 Spatial Query Processing in the Euclidean Space neighbors to be retrieved is not known in advance. R-trees [G84, SRF87, BKSS90] are the most popular An intersection join retrieves all intersecting object pairs indexes for Euclidean query processing due to their (s,t) from two datasets S and T. If both S and T are simplicity and efficiency. The R-tree can be viewed as a indexed by R-trees, the R-tree join algorithm [BKS93] multi-dimensional extension of the B-tree. Figure 2.1 traverses synchronously the two trees, following entry shows an exemplary R-tree for a set of points {a,b,…,j} pairs that overlap; non-intersecting pairs cannot lead to assuming a capacity of three entries per node. Points that solutions at the lower levels. Several spatial join are close in space (e.g., a,b) are clustered in the same leaf algorithms have been proposed for the case where only node (E ) represented as a minimum bounding rectangle 3 one of the inputs is indexed by an R-tree or no input is (MBR). Nodes are then recursively grouped together indexed [RSV02]. For point datasets, where intersection following the same principle until the top level, which joins are meaningless, the corresponding problem is the e- consists of a single root. distance join, which finds all pairs of objects (s,t) s ∈ S, t mindist( E 1 ) 10 ∈ T within (Euclidean) distance e from each other. R-tree i E 7 f E1 j Root join can be applied in this case as well, the only difference 8 E E E e 6 g 1 2 being that a pair of intermediate entries is followed if their E2 d E5 h 6 E E distance is below (or equal to) e. The intersection join can E 1 E E E 2 E E 4 3 4 5 6 7 q mindist( E 2 ) be considered as a special case of the e-distance join, 4 c = mindist( E 6 ) E3 where e=0. a a b c d e f g h i j 2 E E E E E b 3 4 5 6 7 Finally, a closest-pairs query outputs the (k≥1) pairs of mindist( E ) 3 objects (s,t) s ∈ S, t ∈ T with the smallest (Euclidean) 0 2 4 6 8 10 distance. The algorithms for processing such queries Figure 2.1: An R-tree example [CMTV00] combine spatial joins with nearest neighbor search.