Query and Update Efficient B+-Tree Based Indexing of Moving Objects
Total Page:16
File Type:pdf, Size:1020Kb
Query and Update Efficient B -Tree Based Indexing of Moving Objects ¢ ¢ Christian S. Jensen ¡ Dan Lin Beng Chin Ooi £ ¤ Department of Computer Science School of Computing Aalborg University, Denmark National University of Singapore, Singapore ¥ [email protected] lindan, ooibc ¦ @comp.nus.edu.sg Abstract 1 Introduction A number of emerging applications of data man- An infrastructure is emerging that enables data manage- agement technology involve the monitoring and ment applications that rely on the tracking of the locations querying of large quantities of continuous vari- of moving objects such as vehicles, users of wireless de- ables, e.g., the positions of mobile service users, vices, and goods. Further, a wide range of other applica- termed moving objects. In such applications, large tions, beyond those to do with moving objects, rely on the quantities of state samples obtained via sensors sampling of continuous, multidimensional variables. The are streamed to a database. Indexes for moving provisioning of high performance and scalable data man- objects must support queries efficiently, but must agement support for such applications presents new chal- also support frequent updates. Indexes based on lenges. One key challenge derives from the need to accom- minimum bounding regions (MBRs) such as the modate frequent updates while simultaneously allowing for R-tree exhibit high concurrency overheads dur- efficient query processing [6, 13]. ing node splitting, and each individual update is This combination of desired functionality is particularly known to be quite costly. This motivates the de- troublesome in the context of indexing of multidimensional sign of a solution that enables the B § -tree to man- data. The dominant indexing technique for multidimen- age moving objects. We represent moving-object sional data with low dimensionality, the R-tree [5] (and locations as vectors that are timestamped based on its descendants such as the R © -tree [1]), was conceived their update time. By applying a novel lineariza- for largely static data sets and exhibits poor update perfor- tion technique to these values, it is possible to mance. The Time-Parameterized R-tree (TPR-tree) [19] (as index the resulting values using a single B § -tree well as several of its recent descendants [11]) models object that partitions values according to their timestamp locations as linear functions of time and supports queries on and otherwise preserves spatial proximity. We de- the current and anticipated near-future positions of mov- velop algorithms for range and ¨ nearest neigh- ing objects. While the use of linear rather than constant bor queries, as well as continuous queries. The functions may reduce the need for updates by a factor of proposal can be grafted into existing database sys- three [3], update performance remains a problem. tems cost effectively. An extensive experimental Individual updates tend to be costly, and the problem is study explores the performance characteristics of exacerbated by the concurrency control algorithms of the the proposal and also shows that it is capable of R-trees, such as the Rlink-tree [8], not being able to ade- substantially outperforming the R-tree based TPR- quately handling a high degree of concurrent accesses that tree for both single and concurrent access scenar- involve updates. Notably, frequent tree ascents caused by ios. node splitting and propagation of MBR updates lead to costly lock conflicts. This problem is inherent in many multi-dimensional indexes. Another problem with existing solutions to moving-object indexing is that they are not eas- Permission to copy without fee all or part of this material is granted pro- ily integrated into existing database systems. vided that the copies are not made or distributed for direct commercial This paper proposes a novel way of indexing moving ob- advantage, the VLDB copyright notice and the title of the publication and jects using the classical B § -tree without compromising on its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, query and storage efficiency. The motivation for using the § requires a fee and/or special permission from the Endowment. B § -tree is threefold. First, the B -tree is used widely in Proceedings of the 30th VLDB Conference, commercial database systems and has proven to be very Toronto, Canada, 2004 efficient with respect to queries as well as updates, ro- 768 bust with respect to varying workloads, and scalable. Sec- database systems cost effectively. ond, being a one-dimensional index, it does not exhibit the The paper reports on an extensive experimental study, update performance problems associated with MBR-based which includes a comparison with the TPR-tree. The re- multi-dimensional indexes. Third, it is typically appropri- sults show that the B -tree is efficient with respect to stor- ate to model moving-object extents as points. This enables age space and range and ¨ nearest neighbor queries. Indeed, § linearization and subsequent B -tree indexing. the B -tree is capable of outperforming the TPR-tree by a To use the B § -tree, we must be able to linearize the rep- wide margin in single and concurrent access environments. resentation of the locations of the moving objects. This The rest of the paper is organized as follows. Section 2 is done by means of a space-filling curve, which enumer- reviews related work. Section 3, describes the structure of ates every point in a discrete, multi-dimensional space. At- the proposed B -tree, and it presents the associated query tractive space-filling curves such as the Peano curve (or Z- and update operations. Section 4 covers comprehensive curve) and the Hilbert curve, which we use in this paper, performance experiments. Finally, Section 5 concludes. preserve proximity, meaning that points close in multidi- mensional space tend to be close in the one-dimensional 2 Related Work space obtained by the curve [12]. Traditional indexes for multi-dimensional databases, such A B § -tree with the above space-filling curves works as the R-tree [5] and its variants (e.g., [1]) were, implicitly very well for static databases. A naive way to accommodate or explicitly, designed with the main objective of support- moving points is to update each object in the database at ing efficient query processing as opposed to enabling effi- each time interval. To avoid an excessive update overhead, cient update. This works well in applications where queries we propose a novel indexing method, termed the B -tree, are relatively much more frequent than updates. However, where “ ” indicates the flexibility of the proposed method applications involving the indexing of moving objects ex- in employing a specific (“ ”) space-filling curve as part of hibit workloads characterized by heavy loads of updates, in the linearization function. addition to frequent queries. First, we model moving objects as linear functions of Several new index structures have been proposed for time. Thus, the data to be indexed in the B -tree are not moving-object indexing, and recent surveys exist that cover points (constant functions), but linear functions coupled different aspects of these [11, 13]. One may distinguish with the times they were updated. Intuitively, an update oc- between indexing of the past positions versus indexing of curs when the position predicted by an existing function is the current and near-future positions of spatial objects. Our deemed inaccurate [3]. Second, we effectively “partition” approach belongs to the latter category. the index, placing entries in partitions based on their update Past positions of moving objects are typically approxi- time. More specifically, we first partition the time axis into mated by polylines composed of line segments. It is pos- intervals where the duration of an interval is an approxi- sible to index line segments by R-trees, but the trajectory mation of the maximum duration in-between two updates memberships of segments are not taken into account. In of any object location. We then partition each such inter- contrast to this, the Spatio-Temporal R-tree [15] attempts to val into equal-length sub-intervals, termed phases, where also group segments according to their trajectory member- is determined based on minimum time duration within ships, while also taking spatial locations into account. The which each object issues an update of its position. Each Trajectory-Bundle tree [15] aims only for trajectory preser- phase is assigned the time point it ends as a label times- vation, leaving other spatial properties aside. tamp, and a label timestamp is mapped to a partition. An The representations of the current and near-future posi- update is placed in the partition given by the label time- tions of moving objects are quite different, as are the index- stamp of the phase during which it occurs. For an object, ing challenges and solutions. Positions are represented as the value indexed by the B -tree is the concatenation of its points (constant functions) or functions of time, typically partition number and the result of applying the underlying linear functions. The Lazy Update R-tree [9] aims to re- space-filling method to the position of the object as of the duce update cost by handling updates of objects that do not label timestamp of its phase. move outside their leaf-level MBRs specially, and a gener- This mapping scheme overcomes the limitation of the alized approach to bottom-up update in R-trees has recently B § -tree, which is able to only keep the snapshot of all the been examined [10]. objects at the same time point. This scheme reduces the Tayeb et al. [24] use PMR-Quadtrees [20] for index- update frequency, it preserves spatial proximity within each ing the future linear trajectories of one-dimensional mov- partition, and it facilitates queries on anticipated near-future ing points as line segments in -space.