Query and Update Efficient B - Based Indexing of

Moving Objects

¢ ¢

Christian S. Jensen ¡ Dan Lin Beng Chin Ooi

£ ¤ Department of Computer Science School of Computing

Aalborg University, Denmark National University of Singapore, Singapore ¥

[email protected] lindan, ooibc ¦ @comp.nus.edu.sg

Abstract 1 Introduction

A number of emerging applications of data man- An infrastructure is emerging that enables data manage- agement technology involve the monitoring and ment applications that rely on the tracking of the locations querying of large quantities of continuous vari- of moving objects such as vehicles, users of wireless de- ables, e.g., the positions of mobile service users, vices, and goods. Further, a wide range of other applica- termed moving objects. In such applications, large tions, beyond those to do with moving objects, rely on the quantities of state samples obtained via sensors sampling of continuous, multidimensional variables. The are streamed to a database. Indexes for moving provisioning of high performance and scalable data man- objects must support queries efficiently, but must agement support for such applications presents new chal- also support frequent updates. Indexes based on lenges. One key challenge derives from the need to accom- minimum bounding regions (MBRs) such as the modate frequent updates while simultaneously allowing for R-tree exhibit high concurrency overheads dur- efficient query processing [6, 13]. ing node splitting, and each individual update is This combination of desired functionality is particularly known to be quite costly. This motivates the de- troublesome in the context of indexing of multidimensional

sign of a solution that enables the B § -tree to man- data. The dominant indexing technique for multidimen- age moving objects. We represent moving-object sional data with low dimensionality, the R-tree [5] (and

locations as vectors that are timestamped based on its descendants such as the R © -tree [1]), was conceived their update time. By applying a novel lineariza- for largely static data sets and exhibits poor update perfor- tion technique to these values, it is possible to mance. The Time-Parameterized R-tree (TPR-tree) [19] (as

index the resulting values using a single B § -tree well as several of its recent descendants [11]) models object that partitions values according to their timestamp locations as linear functions of time and supports queries on and otherwise preserves spatial proximity. We de- the current and anticipated near-future positions of mov-

velop algorithms for range and ¨ nearest neigh- ing objects. While the use of linear rather than constant bor queries, as well as continuous queries. The functions may reduce the need for updates by a factor of proposal can be grafted into existing database sys- three [3], update performance remains a problem. tems cost effectively. An extensive experimental Individual updates tend to be costly, and the problem is study explores the performance characteristics of exacerbated by the concurrency control algorithms of the the proposal and also shows that it is capable of R-trees, such as the Rlink-tree [8], not being able to ade- substantially outperforming the R-tree based TPR- quately handling a high degree of concurrent accesses that tree for both single and concurrent access scenar- involve updates. Notably, frequent tree ascents caused by ios. node splitting and propagation of MBR updates lead to costly lock conflicts. This problem is inherent in many multi-dimensional indexes. Another problem with existing solutions to moving-object indexing is that they are not eas- Permission to copy without fee all or part of this material is granted pro- ily integrated into existing database systems. vided that the copies are not made or distributed for direct commercial This paper proposes a novel way of indexing moving ob- advantage, the VLDB copyright notice and the title of the publication and

jects using the classical B § -tree without compromising on its date appear, and notice is given that copying is by permission of the

Very Large Data Base Endowment. To copy otherwise, or to republish, query and storage efficiency. The motivation for using the § requires a fee and/or special permission from the Endowment. B § -tree is threefold. First, the B -tree is used widely in Proceedings of the 30th VLDB Conference, commercial database systems and has proven to be very Toronto, Canada, 2004 efficient with respect to queries as well as updates, ro-

768 bust with respect to varying workloads, and scalable. Sec- database systems cost effectively. ond, being a one-dimensional index, it does not exhibit the The paper reports on an extensive experimental study, update performance problems associated with MBR-based which includes a comparison with the TPR-tree. The re-

multi-dimensional indexes. Third, it is typically appropri- sults show that the B -tree is efficient with respect to stor-

ate to model moving-object extents as points. This enables age space and range and ¨ nearest neighbor queries. Indeed, § linearization and subsequent B -tree indexing. the B -tree is capable of outperforming the TPR-tree by a

To use the B § -tree, we must be able to linearize the rep- wide margin in single and concurrent access environments. resentation of the locations of the moving objects. This The rest of the paper is organized as follows. Section 2 is done by means of a space-filling curve, which enumer- reviews related work. Section 3, describes the structure of ates every point in a discrete, multi-dimensional space. At- the proposed B -tree, and it presents the associated query tractive space-filling curves such as the Peano curve (or Z- and update operations. Section 4 covers comprehensive curve) and the Hilbert curve, which we use in this paper, performance experiments. Finally, Section 5 concludes. preserve proximity, meaning that points close in multidi- mensional space tend to be close in the one-dimensional 2 Related Work space obtained by the curve [12]. Traditional indexes for multi-dimensional databases, such

A B § -tree with the above space-filling curves works as the R-tree [5] and its variants (e.g., [1]) were, implicitly very well for static databases. A naive way to accommodate or explicitly, designed with the main objective of support- moving points is to update each object in the database at ing efficient query processing as opposed to enabling effi- each time interval. To avoid an excessive update overhead, cient update. This works well in applications where queries we propose a novel indexing method, termed the B -tree, are relatively much more frequent than updates. However, where “ ” indicates the flexibility of the proposed method applications involving the indexing of moving objects ex- in employing a specific (“ ”) space-filling curve as part of hibit workloads characterized by heavy loads of updates, in the linearization function. addition to frequent queries. First, we model moving objects as linear functions of Several new index structures have been proposed for time. Thus, the data to be indexed in the B -tree are not moving-object indexing, and recent surveys exist that cover points (constant functions), but linear functions coupled different aspects of these [11, 13]. One may distinguish with the times they were updated. Intuitively, an update oc- between indexing of the past positions versus indexing of curs when the position predicted by an existing function is the current and near-future positions of spatial objects. Our deemed inaccurate [3]. Second, we effectively “partition” approach belongs to the latter category. the index, placing entries in partitions based on their update Past positions of moving objects are typically approxi- time. More specifically, we first partition the time axis into mated by polylines composed of line segments. It is pos- intervals where the duration of an interval is an approxi- sible to index line segments by R-trees, but the trajectory mation of the maximum duration in-between two updates memberships of segments are not taken into account. In of any object location. We then partition each such inter- contrast to this, the Spatio-Temporal R-tree [15] attempts to val into equal-length sub-intervals, termed phases, where also group segments according to their trajectory member- is determined based on minimum time duration within ships, while also taking spatial locations into account. The which each object issues an update of its position. Each Trajectory-Bundle tree [15] aims only for trajectory preser- phase is assigned the time point it ends as a label times- vation, leaving other spatial properties aside. tamp, and a label timestamp is mapped to a partition. An The representations of the current and near-future posi- update is placed in the partition given by the label time- tions of moving objects are quite different, as are the index- stamp of the phase during which it occurs. For an object, ing challenges and solutions. Positions are represented as the value indexed by the B -tree is the concatenation of its points (constant functions) or functions of time, typically partition number and the result of applying the underlying linear functions. The Lazy Update R-tree [9] aims to re- space-filling method to the position of the object as of the duce update cost by handling updates of objects that do not label timestamp of its phase. move outside their leaf-level MBRs specially, and a gener- This mapping scheme overcomes the limitation of the alized approach to bottom-up update in R-trees has recently

B § -tree, which is able to only keep the snapshot of all the been examined [10]. objects at the same time point. This scheme reduces the Tayeb et al. [24] use PMR- [20] for index- update frequency, it preserves spatial proximity within each ing the future linear trajectories of one-dimensional mov-

partition, and it facilitates queries on anticipated near-future ing points as line segments in   -space. The segments positions. span the time interval that starts at the current time and ex- Based on the above, we propose efficient algorithms for tends some time into the future, after which time, a new tree

range and ¨ nearest neighbor queries, as well as for continu- must be built. Kollis et al. [7] employ dual transformation

ous queries. The algorithms are general and can be applied techniques which represent the position of an object mov-  to indexes that use sampling techniques to model moving ing in a  -dimensional space as a point in a -dimensional objects. Like any new indexing method built on top of the space. Their work is largely theoretical in nature. Based on

B § -tree, the paper’s proposal can be grafted into existing a similar technique, Patel et al. [14] have most recently de-

769 veloped a practical indexing method, termed STRIPES, that Specifically, we use a space-filling curve for this pur- supports efficient updates and queries at the cost of higher pose. Such a curve is a continuous path which visits every space requirements. point in a discrete, multi-dimensional space exactly once Finally, we cover the Time-Parameterized R-tree (TPR- and never crosses itself.

tree) [19] in some detail, as we use this tree for comparison We consider versions of the B -tree that use the Peano

in our performance study. An extension to the R © -tree, the curve (or Z-curve) and the Hilbert curve (see Figure 1). Al- TPR-tree indexes linear functions of time. The current loca- though other curves may be used, these two are expected tion of a moving point is found simply by applying the func- to be particularly good. Analytical and empirical stud- tion representing its location to the current time. MBRs are ies [4, 12] show that for the two-dimensional space we con- also functions of time. Specifically, in each dimension, the sider, these curves are effective in preserving proximity, lower bound of an MBR is to move with the maximum meaning that points close in multidimensional space tend downward speed of all enclosed objects, while the upper to be close in the one-dimensional space obtained by the bound is set to move with the maximum upward speed of all curve. The Hilbert curve is expected to be (slightly) better enclosed objects. As enclosed objects may be both moving than the Peano curve [4]. points and moving rectangles, this ensures that the bound- ing rectangles are indeed bounding at all times considered. Frequent updates are needed to ensure that moving objects that are currently close are assigned to the same bound- ing rectangles. Further, bounding rectangles never shrink and are generally larger than strictly needed. To counter this phenomenon, the so-called “tightening” is applied to bounding rectangles when they are accessed. Algorithms for nearest neighbor and reverse nearest Peano curve (Z−curve) Hilbert curve (H−curve) neighbor queries on moving objects have been proposed Figure 1: Space-Filling Curves based on the TPR-tree [2]. Next, two notable proposals

exist that build on the ideas of the TPR-tree. Procopiuc et In what follows, we term the value obtained from the   al. [16] propose the STAR-tree. This index seems to be best space-filling curve the ; and for brevity, we use the suited for workloads with infrequent updates. Tao et al. [22] Peano curve in most discussions. adopt assumptions about the query workload that differ To reduce this load, we model point values as linear slightly from those underlying the TPR-tree. This leads to functions of time, rather than simply as static points, i.e., a different grouping of objects into index tree nodes. constant functions. A recent study of GPS logs obtained To the best of the authors’ knowledge, no proposals for from two dozen cars traveling in a semi-urban environment the indexing of moving objects exist that use a combina- measures the number of updates needed to ensure that the tion of temporal partitioning and space-filling curves. How- values recorded in the database do not differ by more than ever, our work adopts a design philosophy similar to that some threshold from the real values. For realistic thresh- of iDistance [25], where application of a mapping func- olds, the use of linear functions reduces the amount of up-

dates to one third in comparison to constant functions [3].

$ ' tion that uses reference points and metric distances with $

"!" # %&#  respect to the reference points enables B § -tree indexing of An object location is thus given by , a po-

high-dimensional points for the purpose of nearest neighbor sition and a velocity, and an update time, or timestamp, )( ,

search. where these values are valid. ( In a leaf-node entry, an object updated at is repre-

3 Structure and Algorithms sented by a value *+ ,- . / 01(. :

¤GF ¤

*  . 2 01(34!6587:9<;)= > ?A@27:@27CB9ED 5 ?)H>ED We first describe the structure of the B -tree. We then (1)

cover algorithms for range and ¨ NN queries and contin- NEOQP,RIHRIRS,

uous queries. Finally, update, insertion, and deletion are where I2 JLKM is an index partition determined ?)H>

covered. by the update time, is obtained using a space-filling

F

¤

TD curve, 5 denotes the binary value of , and denotes concatenation. We proceed to detail this definition. 3.1 Index Structure

If we index the timestamped object locations without § The base structure of the B -tree is that of the B -tree. differentiating them based on their timestamps, we not only Thus, the internal nodes serve as a directory. In order to lose the proximity preserving property of the space-filling

support B-link concurrency control [21], each internal node curve; the index will also be ineffective in locating an object   contains a pointer to its right sibling (the pointer is non-null based on its . To overcome such problems, we ef- if one exists). The leaf nodes contain the moving-object lo- fectively “partition” the index, placing entries in partitions cations being indexed and corresponding index time. We based on their update time. More specifically, we denote

proceed to describe how object locations are mapped to by U+1VW( the time duration that is the maximum duration single-dimensional values. in-between two updates of any object location. We then

770 

partition the time axis into intervals of duration U+VW( , and , the storage space is a little more than that of the TPR-tree.

we sub-partition each such interval into equal-length sub- When is , query windows must be enlarged more than p intervals, termed phases. the enlargements of MBRs in the TPR-tree (enlargement By mapping the update times in the same phase to the details are covered in Section 3.2.1). Therefore, we choose

same so-called label timestamp and by using the label x!m .

y!z U+ ! -e

timestamps as prefixes of the representations of the object To exemplify, let , VW( , and assume a p

locations, we obtain index partitions, and the update times Peano curve of order 3 (i.e., the space domain is {}|s{ ).

£ ¤

!~  23)\€ eT e  eL‚ !

of updates determine the partitions they go to. In particular, Object positions ,

# p

 Ce ƒQ\€ /eT„3 e  L ‡†ˆ!‰  )QAM CeT )eT  

an update with timestamp ( is assigned a label timestamp , and are in-

# p p p

RXZY\[]!_^ Ua

(]` , where operation returns the serted at times 0, 10, and 100, respectively. We calculate *a ,- 

nearest future label timestamp of . the for each as follows.

For example, Figure 2 shows a B -tree with d!6 .

 !fe

( Step 1: Calculate label timestamps and index partitions.

Objects with timestamp obtain label timestamp £

£

£ £

¤

¤ ¤

X

 !‹^1 Ce 7:9E;)= >T-?A@/7:@27CB9 !Œea!y /eeQ e Hc !mƒe

` b

RXZY\[g! Ua eihj U+

VW( (lk VW(

; objects with obtain XŠY\[ ,

p ¤

¤

1XZY\[G!mU+

 !‹^1 e VW( e QHcMXE! -e 7:9E;)= >T-?A@/7:@27CB9 !

` b

label timestamp ; and so on. XŠY\[ ,

p p p p

¤

!i Ce 

p

†

†

X

 !‹^1 ee 7:9<;)= >T-?A@/7:@27CB9 ! -e QHc ! {e

` b

XŠY\[ ,

p p p

¤ eQ x !i

B −tree p

£ ¤

£ ¤

 E†   XZY&[

Step 2: Calculate positions and at XZY\[ , , and

† 

XZY\[ , respectively.

£ ¤

EŽ !o ‚ Ž !o /3) Q Ž !i ‘ 

, , † .

p p

Step 3: Calculate Z-values.

¤ ¤

£

5„’ 8 .  Ž HD !o Ce ee 

p pp

¤ ¤

¤

5„’ 8 .  Ž HD !o Cee e  p

0 ∆tmu 2∆tmu time pp

¤ ¤

5„’ 8 .  Ž HD !o eQeeQe 

† p update insert ... p insert

Step 4: Calculate *+ ,-  .

£ ¤

eL“!o /eee eQe  !

update *+ T L 2

p pp p€”

¤ ¤

*  L 2  eQ“!y /e ee e  !Q

p p pQp p

Figure 2: The B -Tree

¤

*+ T L 2  eeQ•!i e eQeeQe  ! ƒ

†

p p p p p p XZY&[ Next, for an object with label timestamp  , we com-

It is worth noting that at most three ranges exist at a XZY\[ pute its position at  according to its position and velocity single point in time. As time passes, repeatedly the first at 1( . We then apply the space-filling curve to this (future) position to obtain the second component of Equation 1. range expires (shaded area), and a new range is appended This mapping has two main advantages. First, it en- (dashed line). This use of rolling ranges enables the B -tree ables the tree to index object positions valid at different to handle time effectively.

times, overcoming the limitation of the B § -tree, which is only able to index a snapshot of all positions at the same 3.2 Querying time. Second, it reduces the update frequency compared to In the following, we outline the search strategies for the

having to update the positions of all objects at each times- B -tree. tamp when only some of them need to be updated. The two components of the mapping function in Equation 1 are 3.2.1 Range Query consequently defined as follows:

A range query retrieves all objects whose location falls

X X

( (

£ £ ¤ ¤

–, )–, D2M5 –, )–, D

within the rectangular range –—!˜ 5

 7:9<;)= >T-?A@/7:@27CB9n!o CRXZY\[ 2Ua  JrsS% C

b VW(3b `

' $ $ 'mw

p #qp u

at time ™ not prior to the current time (“ ” denotes lower

?R>n!t O.uCvEK. 1# # RXZY\[  

` ( #

bound, and “ v ” denotes upper bound).

With the transformation, the B -tree will contain data A key challenge is to support predictive queries, i.e.,

belonging to ` phases, each given by a label timestamp queries that concern future times. Traditionally, indexes p and corresponding to a time interval. Within each of these, that use linear functions handle predictive queries by means we apply a space-filling curve to an object position. of MBR enlargement (e.g., the TPR-tree); to the best of

The choice of the value of affects query performance our knowledge, no algorithm for the predictive queries has

and storage space. A large results in smaller enlargements been proposed for indexes that use snapshots of moving ob- of query windows (covered in Section 3.2), but also results jects (e.g., the LUR-tree). We present a generic approach in more partitions and therefore a looser relationship among to processing such queries that is not constrained by the

object locations. In addition, a large yields a higher space base structure. Figure 6 outlines the range query algorithm,

overhead due to more internal nodes. When is larger than which we proceed to explain.

771 To handle queries on the anticipated near future posi- each cell, we obtain the final enlargement speed in the area

tions of objects, the B -tree uses query-window enlarge- where the query window resides. Such a histogram can eas- ment instead of MBR enlargement. This is done through ily be maintained in main memory.

the TimeParameterizedRegion function call in the algo- The time argument of a query exceeds the reference time rithm. Because the B -tree stores an object’s location as of of any object by at most U+VW( . This is reasonable, as it is some time after its update time, the enlargement involves of little use to query so far into the future that all the values two cases: a location must either be brought back to an ear- on which the result is based will have been updated before

lier time or forward to a later time. that time is reached. Considering the example in Figure 4,

£

¤ U+1VW(

suppose a query is issued between U+1VW( and and

£ ¤

¥ ¥

that ¥<¦ , , and are the partitions corresponding to the £

q q † ¤

current t t t time ¤

U+ Ua U+

VW( VW(

1 ref 2 label timestamps VW( , , and , respectively.

¥§¦ U+ Partition (or subtree) may need to be extended to VG(

q' p1 vu vl p3 q' 1 2 2 2 T T T T q 0 1 2 3 1 q2 ∆ ∆ l p'1 u u p'3 l 0 t mu 2 tmu time v1 v1 v1 v1 p'2 p'4 query p interval vl 4 vu 2 p2 2 T0 time length Figure 3: Query Window Enlargement T 1 enlargement

Consider the example in Figure 3, where šH›Zœ denotes the T2 time when the locations of four moving objects are updated

to their current value index, and where predictive queries Figure 4: Time Length Enlargement

£ ¤

– – )™1

and (solid rectangles) have time parameters and £ ¥

at most, and after that, ¥§¦ expires; the may be extended

£ ™Hž

, respectively. The figure shows the stored positions as ¤

¤

Ua U+ ¥ VW(

backward to and forward to VG( ; may be ex-

£ )™1

solid dots and positions of the two first objects at and the †

¤ ¤ U+1VW(

tended backward to U+1VW( and forward to . For

 ž positions of the two last at ™ as circles. The two positions

for each object are connected by an arrow. either subtree, we can see that the maximum enlargement U+

The relationship between the two positions for each ob- length is VW( .

$ '¡w

Ÿ

#    NEŽŸ4! N

™ šH›Zœ

ject is ` . The first two of the four # objects thus are in the result of the first query, and the last two objects are in the result of the second query. To obtain

I £

£ 4 –-Ž this result, query rectangle – needs to be enlarged to I

(dashed). This is achieved by attaching maximum speeds 2 Index pages

' ' ' ' '

X X

( (

( I

£

¤ £ ¤ £ £ 1 I to the sides of – : , , , and . For example, is 3 jump

obtained as the largest projection onto the x-axis of a ve-

£ £ –-Ž locity of an object in –-Ž . (As we do not yet know , a ...... Data pages conservative approximation is used; more on this shortly.) I I

¤ 1 m

For – , the enlargement speeds are computed similarly.

' ( For example, ¤ is obtained by projecting all velocities of

' Figure 5: “Jump” in the Index

(

¤ ¤ objects in –Ž onto the y-axis; is then set to the largest

Next, we traverse the partitions of the B -tree with label

speed multiplied by . w

#]p

X X

( (

£ £ ¤ ¤

 

b VG( ™

–, –, DHE5 –, –, D:

The enlargement of query –¢!z 5 timestamp no less than (i.e., they

#¨p #

X X

( (

£ £ ¤ ¤



™

K,–, K,–, DH35 K,–, K,–, D

is given by query –-ŽE!o 5 : should be valid at ) to find objects falling in the enlarged

†

¤

–-Ž Ua h© Ua

™%k VW(

query window . For example, if VW( ,

£ ' w

X X

¥<¦

  –,   hq

™ ŸE` Ÿ šR›Zœ ™ šH›Zœ

X if Partition needs not be searched. In each partition, the use

#

' w

Ÿ

K,–, ! X

( (2)

  –, 

šH›Zœ Ÿ ™ ŸE` otherwise of a space-filling curve means that a range query in the na-

# tive, two-dimensional space becomes a set of range queries

' w

£

( (

  –,   hq

™ Ÿ¤` Ÿ šH›Šœ ™ šH›Zœ

( if in the transformed, one-dimensional space—see Figure 5.

#

' w

Ÿ

K,–, ! X

( (3)

  –, C

šH›Šœ Ÿ ™ Ÿ¤` otherwise Hence multiple traversals of the index result. We optimize # these traversals by calculating the start and end points of The implementation of the computation of enlargement the one-dimensional ranges and traverse the intervals by speeds proceeds in two steps. We first set them according “jumping” in the index (as in [17]).

to the maximum speeds of all objects, thus obtaining a pre- Let us step through the entire algorithm in detail. For

Ž liminary – . Then, with the aid of a two-dimensional his- each partition of the B -tree, we check whether it is valid at

togram (e.g., a grid) that captures the maximum and min- the query time ™ according to its label timestamp (lines 1– –Ž imum projections of velocities onto the axes of objects in 2). If it is valid, we enlarge query window – to by

772

£ Æ]Ž

function TimeParameterizedRegion (line 3) and calculate by enlarging it to a range ™ and proceeding as described

£ ¤ ¤ £ ¤

I &&€\I V‡ª I V ¨

all start and end points I in ascending in the previous section. If at least objects are currently

£

 \« r Æ]Ž

order of their , where is the number of intervals covered by ™ and are enclosed in the inscribed circle of

¤1¬ £ ¤R¬ £

I I  Æ  ¨ ¨

™ ™

(line 4). The pair of points ª start and end interval at time , the NN algorithm returns the nearest

­

¬ r

( k¯®°k ). objects and then stops. It is safe to stop because we have

p £

We locate the leaf node containing the first point I , then considered all the objects that can possibly be in the result.

£ ¤

Æ P Æ

™ ™

traverse its right siblings using the B-link sibling pointers Otherwise, we extend ™ by to obtain and an

­

¤ £

¤

I Æ]Ž

until we reach the next point , where interval ends enlarged window ™ . This time, we search the region

¤ £

ƤŽ ƤŽ ™

(lines 5–10). ™ and adjust the neighbor list accordingly. This

#

­

¤

Ÿ Æϙ To find IR† , the start point of interval , we backtrack to process is repeated until we obtain an so that there are

a higher level where one “jumping” occurs, upon which we ¨ objects within its inscribed circle. ­

proceed to retrieve the objects with positions in interval ¤ .

£ ¤

/–3 C–, –, \)¨<™\

This takes place in line 6 of the algorithm, where traversal Algorithm ¨ NN query

£ ¤

)–,  ¨ from the root is avoided as much as possible. When all Input: a query point –3 C–, , a number of neighbors,

the intervals have been checked in this manner, we have and a query time ™

£

Æ – P ™

obtained the set of all objects that may possibly belong to 1. construct range ™ with as center and extension

£

£

ƤŽ /Æ  

±_²W³Š´}µ€¶•·¸)·´0µ&¹µ€¸³Šº&µM»T¼½µ&¾Q³Z¿QÀ ™ ™ –

the result of range query . For each object, we compute its 2. ™ ±Ð¹)¸ÑTµ

position at  and return only those objects whose positions 3. flag // not enough objects

I ±

are actually in the query window – (lines 11–14). 4. // first query region is being searched p

5. while ÒW€Ó –. Algorithm Range query( ™ )

6. if IÔ! then

p

– 

Input: is the query range and ™ is the query time

£ Æ]Ž

7. find all objects in region ™

e

1. for ®0± to 8. else

¬

£

¥ ™

Æ]Ž ƤŽ

Ÿ Ÿ

™ ª

2. if partition of the B -tree is valid at then 9. find all objects in region ™

#

–Ž C–.™&

±_²W³Z´}µM¶•·-¸·-´}µ\¹)µ&¸)³Zº€µ€»T¼½µ€¾³Š¿À

Ÿ

¨ Æ

3. 10. if objects exist in inscribed circle of ™ then

£ ¤

Ž

I €&€&I V –

4. calculate start and end points for 11. flag ±ÖÕC·-×Ùص

¨ r 5. for ± to do

p 12. else

¤)Á £

I ª

I I `

6. locate leaf node containing point 13. ±

p

£

Ÿ Ÿ

Ƥ™ CƤ™ ª P,™&

7. repeat 14. ±_ړÀ ׊·¸¾Qµ

Â

Ÿ

Ÿ

ƤŽ CƤ™ ™\ 8. store candidate objects in ±_²W³Š´0µM¶•·-¸·-´}µ\¹)µ&¸)³Zº€µ€»T¼Ûµ&¾³Š¿À

15. ™ –

9. follow the right pointer to the sibling node 16.return ¨ NNs with respect to ¤)Á

10. until node with point I is reached ¨

11. for each object in  do Figure 7: NN Query Algorithm

 – 12. if the object’s position at ™ is inside then 13. add the object to the result set In some B § -tree implementations, leaf nodes are not 14. return result set only chained left to right, but also right to left. The ¨ NN search algorithm can exploit right to left sibling pointers to Figure 6: Range Query Algorithm avoid always having to traverse the tree from the root when an interval is extended for a next iterative range search. This reduces the search cost but increases the update cost.

3.2.2 ¨ Nearest Neighbor Query 3.2.3 Continuous Queries

Assuming a set of ÃÅȨ objects and given a query object

£ ¤

–,  ¨

with position –a!l C–, , the nearest neighbor query The queries considered so far in this section may be con- ¨

( ¨ NN query) retrieves objects for which no other objects sidered as one-time queries: they run once and complete  are nearer to the query object at time ™ not prior to the when a result has been returned. Intuitively, a continuous current time. query is a one-time query that is run at each point in time We compute this query by iteratively performing range during a time interval. Further, a continuous query takes a

queries with an incrementally enlarged search region until

9

¨ answers are obtained. The algorithm is outlined in Fig-

fixed time ™ we have used so far. The query then main-

£ –

ure 7. We first construct a range Æ]™ centered at and with tains the result of the corresponding one-time query at time

Á Á

P,™a!oÇ Ç ¨

b

U+ )Ý8Þ/Þ2ß

extension , where is the estimated distance 9

™ ›

` from when the query is issued at time and

Á Ç between the query object and its ¨ ’th nearest neighbor; until it is deactivated. can be estimated by the equation [23]:

Such a query can be supported by a query –MX with time

ŸŠà1à ŸÙà1à

5  Ua  U+ uD

(,á½` ™ (áÛ` ™½`

 interval (“l” is a time in-

žÎ

–&X ¨

 terval) [2]. Query can be computed by the algorithms we

Á

È ÉgÊ

Ç ! p½#"Ì

p½#qË have presented previously, with relatively minor modifica- Ã Í

tions: (i) we use the end time of the time interval to perform

£ ™ We compute the range query with range Æϙ at time , forward enlargements, and we use the start time of the time

773

interval for backward enlargements; (ii) we store the an- does differ with respect to update in the B § -tree. The B -

ŸÙà1à (á

swer sets during the time interval. Then, from time  tree only updates objects when their moving functions have

ÝÞ2Þ/ß u –&X

to ›G` , the answer to is maintained during update been changed. This is realized by clustering updates during

)ÝÞ2Þ/ß u

operations. At ݉` , a new query with time interval a certain period to one time point and maintaining several

5 ÝÞ/Þ2ß U+ uÝÞ2Þ/ß U+ -u:D

™“` ›“` ™“`

›“` is computed. corresponding sub-trees. For example (see Figure 8), ob-

£

 ¥§¦

To maintain a continuous range query during updates, jects updated between 1¦ and are stored in partition ;

£ ¤ ¤

 ¥ ¥<¦

we simply add or remove the object from the answer set if objects updated between  and are stored in ; etc. ,

£ ¤ £ ¤

¥ † 1† Rå ¥ ¥

the inserted or deleted object resides in the query window. ¥ , and co-exist before . From to , , , and

¦ ¥ Such operations only introduce CPU cost. ¥§† co-exist, and has expired. The total size of the three

The maintenance of continuous ¨ NN queries is some- sub-trees is equal to that of one tree indexing all the objects. what more complex. Insertions also only introduce CPU In some applications, there may be some object posi- cost: an inserted object is compared with the current an- tions that are updated relatively rarely. For example, most swer set. Deletions of objects not in the answer set does objects may be updated at least each 10 minutes, but a few not affect the query. However, if a deleted object is in the objects are updated once a day. Instead of letting outliers current answer set, the answer set is no longer valid. In this force a large maximum update interval, we use a “maxi-

case, we issue a new query with a time interval of length u at mum update interval” within which a high percentage of €ã

the time of the deletion. If the deletion time is ›Hä , a query objects have been updated.

&ã U+ \ã U+ u

™ ›2ä\` ™ `

with time interval [ ›2ä\` ] is triggered at Object positions that are not updated within this interval

\ã €ã \ã u

›2ä ›Hä3` ›2ä , and the answer set is maintained from to . are “flushed” to a new partition using their positions at the

The choice of the “optimal” u value involves a trade-off label timestamp of the new partition. In the example shown

between the cost of the computation of the query with the in Figure 8, suppose that some object positions in ¥J¦ are not

time interval and the cost of maintaining its result. On the updated at the time when ¥§¦ expires. At this time, we move

¤ u one hand, we want to avoid a small as this entails frequent these objects to ¥ . Although this introduces additional up- recomputations of queries, which involves a substantial I/O date cost, the (controllable) amortized cost is expected to

cost. On the other hand, a large u introduces a substantial be very small since outliers are rare. cost: Although computing one or a few queries is cost ef- The forced movement of an object’s position to a new fective in itself, we must also take into account the cost of partition does not cause any problem with respect to lo- maintaining the larger answer set, which may generate ad- cating the object, since the new partition can be calculated ditional I/Os on each update. based on the original update time. Likewise, the query effi- We note that maintenance of continuous range queries ciency is not affected.

incur only CPU cost. Thus, we compute a range query with

u u U+ U+ ™ a relatively large such that is bounded by VG(

# 4 Performance Studies

ŸÙà1à (á

since the answer set obtained at  is no longer valid

ŸŠà1à

 (,á Ua1VW( ¨ at ` . For the continuous NN queries, we 4.1 Experimental Settings

examine the effect of u further in the experiments.

Two versions of the B -tree were implemented: B (Z-

3.3 Update, Insertion, and Deletion curve) and B (H-curve), denoting the B -tree using the

Peano and the Hilbert curve, respectively. Both B -trees The insertion algorithm is straightforward. Given a new ob- and the TPR-tree were implemented in C, and all the ex-

ject, we calculate its index key according to Equation 1, and periments were conducted on a 2.6G PentiumIV Personal § then insert it into the B -tree as in the B -tree. To delete Computer with 1 Gbyte of memory. an object, we assume that the positional information for the We use synthetic datasets of moving objects with posi-

object used at its last insertion and the last insertion time eQee

tions in the space domain of eQeeæ| . In most ex-

p p are known. Then we calculate its index key and employ the periments, we use uniform data, where object positions are

same deletion algorithm as in the B § -tree. Therefore, the chosen randomly, where the objects move in a randomly § B -tree directly inherits the good properties of the B -tree, chosen direction, and where a speed ranging from 0 to 3 is and we expect efficient update performance. chosen at random. One may think of the unit of space being kilometer and the unit of speed being kilometer per minute. t t t t t t 0 1 2 3 4 5 time Other datasets were generated using an existing data generator, where objects move in a network of two-way T T T T T 0 1 2 3 4 routes that connect a given number of uniformly distributed x B −tree destinations [19]. Objects start at random positions on x B −tree routes and are assigned at random to one of three groups of

x objects with maximum speeds of 0.75, 1.5, and 3. When- B −tree ever an object reaches one of the destinations, it chooses the next target destination at random. Objects accelerate Figure 8: B -Tree Evolution as they leave a destination, and they decelerate as they ap-

However, one should note that update in the B -tree proach a destination.

774 For each dataset, we constructed the index at time e , and 0.14 1400 TPR-tree TPR-tree X X measured the average query cost after the index ran for e B-tree (Z-curve) 0.12 B-tree (Z-curve) p 1200 X X B-tree (H-curve) B-tree (H-curve) 0.1 time units. The parameters used are summarized in Table 1, 1000 0.08 where values in bold denote default values used. 800 0.06 Parameter Setting 600 400 0.04 Page size 4K Range query I/Os 200 0.02 Node capacity 200 Range query cpu time (s) 0 0 Max update interval 120 100K 300K 500K 700K 900K 100K 300K 500K 700K 900K Max predictive interval 60, 120 Number of moving objects Number of moving objects

Query window size 10, . . . , 50, . . . , 100 (a) I/O (b) CPU time ç ç ( NN query) 10, 20, 30, 40, 50 Number of queries 200 Figure 10: Effect of Data Sizes on Range Query Perfor- Dataset size 100K, . . . , 500K, . . . , 1M mance Space-filling curve Z, H is relatively independent of the number of moving objects. Dataset Uniform, Network-based

As the dataset grows, the range query cost of the B -trees Table 1: Parameters and Their Settings increases mainly due to the increase of the number of ob- jects inside the range. However, the structure of the TPR- tree is affected more by the dataset size. When the number 4.2 Storage Requirement of objects increases, the MBRs in the TPR-tree have higher probabilities of overlapping; this is consistent with earlier Storage requirement is an important issue in moving ob- findings for the R-tree [18].

ject databases since some applications may choose to cache The B -tree(H-curve) achieves better performance than

the whole index in main memory to improve performance. the B -tree(Z-curve) because the Hilbert curve generates a

Figure 9 shows the storage requirement of both indexes, in better distance-preserving mapping than the Peano curve, which the B -trees require less storage space than the TPR- and hence yields fewer search intervals on the B -tree, i.e., tree. The TPR-tree requires slightly more storage space as less disk access.

its fanout is slightly less than that of the B -tree.

35000 4.3.2 Effect of Data Distribution TPR-tree 30000 BX-tree (Z-curve) This experiment uses the road network dataset to study the BX-tree (H-curve) 25000 effect of data distribution on the indexes. The dataset con-

20000 tains 500K data points. Figure 11 shows the range query

15000 cost when the number of destinations in the simulated net- work of routes is varied. The term “uniform” in the figure 10000 indicates the case where the objects can choose their mov- Storage space (Kbytes) 5000 ing directions freely. 0 100K 200K 300K 400K 500K 600K 700K 800K 900K 1M 0.07 Number of moving objects 800 TPR-tree TPR-tree X BX-tree (Z-curve) 0.06 B-tree (Z-curve) 700 X X B-tree (H-curve) Figure 9: Storage Requirement B-tree (H-curve) 600 0.05

500 0.04 400 0.03 4.3 Range Query 300 0.02

Range query I/Os 200 4.3.1 Effect of Data Sizes 0.01 100 Range query cpu time (s)

0 0 In the first set of experiments, we study the range query per- 100 120160 200300 400 500Uniform 100 120160 200300 400500Uniform Number of destinations Number of destinations formance of the TPR-tree and the B -trees while varying the number of uniformly distributed moving objects from (a) I/O (b) CPU time 100K to 1M. Figure 10 shows the average number of I/O Figure 11: Effect of Data Distribution on Range Query Per- operations and the CPU time per range query for each in- formance dex.

We observe that both B -tree variants scale very well Observe that the query cost in the TPR-tree increases and maintain consistent performance, while the TPR-tree slightly with the number of destinations, i.e., as the datasets degrades linearly with the increase of the dataset size. becomes increasingly “uniform.” This is consistent with

When the dataset reaches 1M objects, the B -trees are previous results [19]. In contrast, the performance of the

nearly 5 times better than the TPR-tree. This behavior may B -trees is not affected by the data skew because objects are

be explained as follows. In the B -trees, every object has a stored using space-filling curves, meaning that the density linear order, which is determined by the space domain and has less of an effect on the index.

775 2000 0.3 4.3.3 Effect of Speed Distribution TPR-tree TPR-tree 1800 BX-tree (Z-curve) BX-tree (Z-curve) 0.25 Figure 12 shows the effect of speed of moving objects on 1600 BX-tree (H-curve) BX-tree (H-curve) 1400

0.2 the TPR-tree and the B -trees, by varying the è value of the 1200 Zipf distribution from 0 (uniform distribution) to 2 (skewed, 1000 0.15 800 0.1 80% objects have speed lower than 20% of the maximum 600

Range query I/Os 400 speed). All the indexes yield better performance when the 0.05 number of fast moving objects decreases because MBRs in 200 Range query cpu time (s) 0 0 the TPR-tree obtain smaller expanding speeds and because 0 100K 200K 300K 400K 500K 0 100K 200K 300K 400K 500K Number of updates Number of updates the enlargements made to query windows for the B -trees (a) I/O (b) CPU time also become smaller. The results for ¨ NN queries exhibit similar performance trends, so we omit the results due to Figure 14: Effect of Time Elapsed on Range Query Perfor- space constraints. mance

700 0.06 as time passes. In contrast, the B -tree structure is not af- TPR-tree 600 TPR-tree 0.05 X BX-tree (Z-curve) B-tree (Z-curve) fected as much by the updates. In fact, the B -trees are X X 500 B-tree (H-curve) B-tree (H-curve) 0.04 almost time independent. 400 0.03 300

4.4 ¨ NN Query 0.02 200 Range query I/Os

100 0.01 We proceed to evaluate the efficiency of ¨ NN queries using Range query cpu time (s) 0 0 the same settings as for range queries. Figures 15–17 show 0 0.5 1 1.5 2 0 0.5 1 1.5 2 in turn the effect of dataset size, data distribution, and time Zipf's factor, theta Zipf's factor, theta

passed on ¨ NN query performance. The performance dif-

(a) I/O (b) CPU time ¨ ference between the TPR-tree and the B -tree of the NN

Figure 12: Effect of Speed on Range Query Performance queries exhibits a behavior similar to that of range queries. ¨ The B -tree’s NN search algorithm is essentially an in- 4.3.4 Effect of Query Window Sizes cremental range query algorithm; hence, the results exhibit similar patterns as the results for range queries. We next study the effect of the query window size, vary- ing the window length from 10 to 100 for a dataset of size 0.18 1400 TPR-tree TPR-tree 0.16 X X 500K. As expected, the result in Figure 13 shows that the 1200 B-tree (Z-curve) B-tree (Z-curve) 0.14 X BX-tree (H-curve) B-tree (H-curve) query cost increases with the query window size. Larger 1000 0.12 windows contain more objects and therefore lead to more 800 0.1 node accesses, and the effect is slightly more obvious on 600 0.08 0.06

KNN query I/Os 400 the TPR-tree. 0.04 200 KNN query cpu time (s) 0.02 1000 0.12 0 0 TPR-tree TPR-tree 100K 300K 500K 700K 900K 100K 300K 500K 700K 900K 900 BX-tree (Z-curve) X 0.1 B-tree (Z-curve) Number of moving objects 800 X X Number of moving objects B-tree (H-curve) B-tree (H-curve) 700 0.08 (a) I/O (b) CPU time 600 500 0.06 Figure 15: Effect of Data Sizes on ¨ NN Query Performance 400 0.04 300

Range query I/Os 200 0.02 1000 100 Range query cpu time (s) 0.1 TPR-tree TPR-tree 0 0 900 0.09 BX-tree (Z-curve) BX-tree (Z-curve) 800 0.08 10 2030 4050 60 70 80 90100 10 20 30 40 50 60 70 80 90100 X X B-tree (H-curve) B-tree (H-curve) Query window size Query window size 700 0.07 600 0.06 (a) I/O (b) CPU time 500 0.05 400 0.04 Figure 13: Effect of Query Window Sizes on Range Query 300 0.03 KNN query I/Os Performance 200 0.02 100 KNN query cpu time (s) 0.01 0 0 4.3.5 Effect of Time 100 120 160 200 300 400 500Uniform 100 120 160 200 300 400 500Uniform Number of destinations Number of destinations To study the search performance of the indexes with the (a) I/O (b) CPU time passage of time and updates, we compute the query cost

Figure 16: Effect of Data Distribution on ¨ NN Query Per- using the same 200 range queries with query window size formance 50, but after every 50K updates in a 500K dataset. Fig-

ure 14 summarizes the results, showing that the TPR-tree Figure 18 shows the effect on performance of the num-

¨ ¨ degrades considerably faster than the B -trees due to con- ber of required nearest neighbors. As increases, the tinuous enlargements of the MBRs which are not updated search and CPU costs increase slightly for both indexes.

776 2500 0.4 0.07 0.012 TPR-tree TPR-tree TPR-tree TPR-tree BX-tree (Z-curve) 0.35 X X X B-tree (Z-curve) 0.06 B-tree (Z-curve) 0.01 B-tree (Z-curve) 2000 X X X B-tree (H-curve) 0.3 B-tree (H-curve) B-tree (H-curve) BX-tree (H-curve) 0.05 0.008 1500 0.25 0.04 0.2 0.006 0.03 1000 0.15 0.004

Maintenace I/Os 0.02 KNN query I/Os 0.1 500 0.002 KNN query cpu time (s) 0.05 0.01 Maintenace cpu time (ms)

0 0 0 0 0 100K 200K 300K 400K 500K 0 100K 200K 300K 400K 500K 2 4 6 8 10 12 2 4 6 8 10 12 Number of updates Number of updates Query recomputation interval length Query recomputation interval length

(a) I/O (b) CPU time (a) I/O (b) CPU time ¨ Figure 17: Effect of Time Elapsed on ¨ NN Query Perfor- Figure 20: Maintenance Cost of Continuous NN Query mance

maximum update interval is e , the maximum recomputa-

p ƒQe}!yƒe

Due to the data size and side effect of the query and MBR tion interval tested is -e . As can be observed,

p #

enlargement, the effect of ¨ is not significant. the maintenance cost decreases with the increase of the re- computation length in all indexes. This is because the cost 1000 0.1 TPR-tree TPR-tree to maintain the answer set under continuous updates is very 900 0.09 BX-tree (Z-curve) BX-tree (Z-curve) 800 0.08 X BX-tree (H-curve) B-tree (H-curve) small, and the recomputation cost constitutes the major I/O 700 0.07 600 0.06 cost. The smaller the recomputation interval u , the more 500 0.05 number of recomputations. 400 0.04

300 0.03 Figure 20 shows the performance of the continuous ¨ NN KNN query I/Os 200 0.02

KNN query cpu time (s) query. We observe that, for all the indexes, the mainte- 100 0.01 0 0 nance cost first decreases until a point before it increases

10 20 30 40 50 10 20 30 40 50 ‘

k k again. The best u is approximately for the TPR-tree and

u (a) I/O (b) CPU time approximately for the B -trees. As the becomes larger,

the number of recomputation decreases, however, the pos- ¨ Figure 18: Effect of ¨ on NN Query Performance sibility to remove objects from the results increases, and consequently, additional recomputations result.

4.5 Continuous Range and ¨ NN Queries 4.6 Update In moving object database applications, continuous queries We compare the average update cost (amortized over in- are expected to be common, and hence efficient support for

sertion and deletion) of the B -trees against the TPR-tree. such queries is important. To investigate the maintenance Note that for each update, one deletion and one insertion cost of continuous queries, we perform a series of experi- are issued, leaving the size of the tree unchanged. ments where we vary the length of the query recomputation interval u . We evaluate the amortized cost per single update 4.6.1 Effect of Data Sizes operation (insertion or deletion) in maintaining one contin-

uous query. Indexes were created at time e , and after run- First we examine the update performance with respect to -ee

ning e time units, queries (with maximum predictive dataset size. We compute the average update cost after the p

interval ƒe ) were issued. Then the workload was run for maximum update interval of 120 time units. From Fig-

another e time units while maintaining the result set of ure 21, we can see that the B -trees achieve significant im- p the queries. provement over the TPR-tree. In most cases, one update

0.014 80 0.025 0.0012 TPR-tree TPR-tree TPR-tree TPR-tree 70 BX-tree (Z-curve) X 0.012 X X B-tree (Z-curve) B-tree (Z-curve) B-tree (Z-curve) X 0.001 ) 0.02 X X X B-tree (H-curve B-tree (H-curve) B-tree (H-curve) B-tree (H-curve) 60 0.01 0.0008 50 0.015 0.008 0.0006 40 0.006 0.01 30

0.0004 Update I/Os 0.004 20 Maintenance I/Os

Update cpu time (s) 0.005 0.002 0.0002 10 Maintenance cpu time (ms) 0 0 0 0 6 12 18 24 30 36 42 48 54 60 6 1218 2430 36 42 48 54 60 100K 300K 500K 700K 900K 100K 300K 500K 700K 900K Query recomputation interval length Query recomputation interval length Number of moving objects Number of moving objects (a) I/O (b) CPU time (a) I/O (b) CPU time Figure 19: Maintenance Cost of Continuous Range Query Figure 21: Effect of Data Sizes on Update Cost

Figure 19 shows the continuous range query perfor- in the B -tree only incurs several I/O operations. However,

mance. Since the maximum predictive length is ƒe and the the update cost in the TPR-tree increases significantly as

777 the dataset grows in size. This is because in the B -trees, 120 0.03 TPR-tree TPR-tree X given the key, an insertion and a deletion needs to traverse X 0.025 100 B-tree (Z-curve) B-tree (Z-curve) X X B-tree (H-curve) B-tree (H-curve) only one path, no matter how large the dataset is. Thus, the 80 0.02

cost of update in the B -tree is only related to the height of 60 0.015 the tree. 40 0.01 Update I/Os

In this experiment, the performance of the B -tree(Z-

20 Update cpu time (s) 0.005 curve) and the B -tree(H-curve) are comparable, since the update efficiency is independent of the spatial proximity 0 0 60 120 180 240 60 120 180 240 preservation. However, in the TPR-tree, traversal of mul- Maximum update interval Maximum update interval tiple paths is inevitable due to the overlaps among MBRs. (a) I/O (b) CPU time As the density of values increases due to the increase in data size, more overlap and hence higher update cost results. Figure 23: Effect of Maximum Update Interval affects the performance of the TPR-tree significantly. In 4.6.2 Effect of Time

contrast, the update operation in the B -tree depends only Next, we investigate performance degradation across time. on the key value which does not change over time.

We measure the performance of the TPR-tree and the B - trees after every ‚e K updates. Figure 22 shows the update 4.7 Effect of Concurrent Accesses and Buffer Space cost as a function of the number of updates. As before, the In this section, we compare the concurrent performance

50 0.016 of the TPR-tree and the B -tree. We implemented the R- TPR-tree TPR-tree X 45 BX-tree (Z-curve) 0.014 B-tree (Z-curve) link technique [8] for the TPR-tree and the B-link tech- X 40 BX-tree (H-curve) B-tree (H-curve) 0.012 35 nique [21] for the B -tree. 30 0.01 We used multi-thread programs to simulate multi-user 25 0.008

environments. The number of threads varies from to { . 20 0.006 p

Update I/Os 15 0.004 Workloads contain an equal number of queries and updates. 10 Update cpu time (s) We investigated the throughput and response time of search 5 0.002 0 0 and update operations. Throughput is the rate at which op- 50K 150K 250K 350K 450K 550K 50K 150K 250K350K 450K 550K erations could be served by the system. Response time is Number of updates Number of updates the time interval between issuing an operation and getting (a) I/O (b) CPU time the response from the system when the task was success- Figure 22: Effect of the Number of Updates on Update Cost fully completed. Figure 24 shows throughput and response time for the

B -trees are not affected by time, which again illustrates the three indexes. The throughputs of the B -trees are much efficiency and feasibility of the B § -tree. We observe that

the gap between the TPR-tree and the B -trees widens as 350 0.25 TPR-tree TPR-tree time passes. At the point when the TPR-tree stabilizes after 300 BX-tree (Z-curve) BX-tree (Z-curve) X 0.2 B-tree (H-curve) BX-tree (H-curve) 500K updates, the cost of the TPR-tree is nearly 10 times 250 0.15 that of the B -trees. The reason for the degeneration of the 200

150 TPR-tree is that each deletion entails a search to retrieve the 0.1 100

object to be removed, and the cost of this search increases Throughput (/s)

Response time (s) 0.05 with the number of updates. 50 We note that this cost can be reduced by maintaining a 0 0 hash-table for quickly locating the object and then perform- 1 2 4 6 8 1 2 4 6 8 Number of threads Number of threads ing a bottom-up update [10]. However, such an auxiliary (a) Throughput (b) Response time structure incurs additional storage overhead and increases complexity. Figure 24: Effect of Concurrent Operations higher than that of the TPR-tree, and the response times 4.6.3 Effect of Update Interval Length

of the B -trees are always less than those of the TPR-tree.

In this experiment, we investigate the effect of maximum The main reason is that the B -trees seldom lock internal update interval length on the indexes, by varying the maxi- nodes. Recall that, in the query processing, we will first mum update interval from 60 to 240. Figure 23 shows the travel down to the leaf level, then retrieve the leaf nodes for average update cost after the indexes run for one maximum the answers by following the left-to-right sibling links. We update interval. We observe that the performance of the may occasionally ascend to an internal node for a “jump,” TPR-tree degrades fairly quickly as the maximum update but this often happens at the lower levels of the index. For

interval increases, whereas the B -trees are not affected. the TPR-tree, a query triggers searching of multiple paths, The main reason is that, as the update interval increases, which introduces locks on the internal nodes that reduce the the overlap among MBRs becomes more severe and thus parallelism of concurrent operations.

778 We also included an LRU buffer and studied the effects [3] A. Civilis, C. S. Jensen, J. Nenortaite, and S. Pakalnis. Ef- of different buffer sizes. As with other indexes, both in- ficient Tracking of Moving Objects with Precision Guaran- dexes experience reduced I/O’s as the buffer size increases. tees. In Proc. MobiQuitous, 2004, 10 pages, to appear. [4] C. Faloutsos and S. Roseman. Fractals for Secondary Key Since the B -tree incurs less page reads originally, the ef- Retrieval. In Proc. PODS, pp. 247–252, 1989. fect of an increasing buffer size on the index is conse- [5] A. Guttman. R-trees: A Dynamic Index Structure for Spatial quently less pronounced than for the TPR-tree. We also Searching. In Proc. ACM SIGMOD, pp. 47–57, 1984. examined the effects of the density of objects over the data [6] C. S. Jensen and S. Saltenis. Towards Increasingly Update space. The results are as may be expected; due to space Efficient Moving-Object Indexing. IEEE Data Eng. Bull., constraints, we do not include the graphs here. 25(2): 35–40, 2002. [7] G. Kollios, D. Gunopulos, V. J. Tsotras. On Indexing Mo- bile Objects. In Proc. PODS, pp. 261–272, 1999. 5 Conclusion and Research Directions [8] M. Kornacker and D. Banks. High-Concurrency Locking in R-Trees. In Proc. VLDB, pp. 134–145, 1995. Database applications that entail the storage of samples [9] D. Kwon, S. Lee, and S. Lee. Indexing the Current Positions of continuous, multi-dimensional variables pose new chal- of Moving Objects Using the Lazy Update R-Tree. In Proc. lenges to database technology. This paper addresses the MDM, pp. 113–120, 2002. challenge of providing support for indexing that is efficient [10] M. L. Lee, W. Hsu, C. S. Jensen, B. Cui, and K. L. Teo. for querying as well as update. Supporting Frequent Updates in R-Trees: A Bottom-Up Ap- proach. In Proc. VLDB, pp. 608–619, 2003.

We proposed a new indexing scheme, the B -tree, which [11] M. F. Mokbel, T. M. Ghanem, and W. G. Aref. Spatio- is based on the B § -tree. This scheme uses a new lineariza- Temporal Access Methods. IEEE Data Eng. Bull., tion technique that exploits the volatility of the data values, 26(2): 40–49, 2003. i.e., moving-object locations, being indexed. Algorithms [12] B. Moon, H. V. Jagadish, C. Faloutsos, and J. H. Saltz. Anal-

are provided for range and ¨ NN queries on the current or ysis of the Clustering Properties of the Hilbert Space-Filling near-future positions of the indexed objects, as well as for Curve. IEEE TKDE, 13(1): 124–141, 2001. so-called continuous counterparts of these types of queries. [13] B. C. Ooi, K. L. Tan, and C. Yu. Fast Update and Effi- cient Retrieval: an Oxymoron on Moving Object Indexes. Queries that reach into the future are handled via query In Proc. of Int. Web GIS Workshop, Keynote, 2002. region enlargement, as opposed to the MBR enlargement [14] J. M. Patel, Y. Chen and V. P. Chakka. STRIPES: An Effi- used in TPR-trees. cient Index for Predicted Trajectories. In Proc. ACM SIG- Extensive performance studies were conducted that in- MOD, 2004, to appear.

dicate that the B -tree is both efficient and robust. In fact, [15] D. Pfoser, C. S. Jensen, and Y. Theodoridis. Novel Ap- it is capable of outperforming the TPR-tree by factors of proaches in Query Processing for Moving Objects. In Proc.

VLDB, pp. 395–406, 2000. as much as 10. Further, being a B § -tree index, the B -tree [16] C. M. Procopiuc, P. K. Agarwal, and S. Har-Peled. Star- may be grafted into existing database systems cost effec- Tree: An Efficient Self-Adjusting Index for Moving Ob- tively. jects. In Proc. ALENEX, pp. 178–193, 2002. Several promising directions for future work exist, one [17] F. Ramsak, V. Markl, R. Fenk, M. Zirkel, K. Elhardt, and

being to consider the use of the B -tree for the processing R. Bayer. Integrating the UB-Tree Into a Database System Kernel. In Proc. VLDB, pp. 263–272, 2000. of new kinds of queries. Another is the use of the B -tree for other continuous variables than the positions of mobile [18] N. Roussopoulos and D. Leifker. Direct Spatial Search on Pictorial Databases Using Packed R-Trees. In Proc. ACM service users. Yet another direction is to apply the lineariza- SIGMOD, pp. 17–31, 1985. tion technique to other index structures. [19] S. Saltenis, C. S. Jensen, S. T. Leutenegger, and M. A. Lopez. Indexing the Positions of Continuously Mov- Acknowledgements ing Objects. In Proc. ACM SIGMOD, pp. 331–342, 2000. [20] H. Samet. The and Related Hierarchical Data The work of D. Lin and B. C. Ooi was in part funded by Structures. ACM Comp. Surv., 16(2): 187–260, 1984. an A*STAR project on spatial-temporal databases. In ad- [21] V. Srinivasan and M. J. Carey. Performance of B-Tree Con- currency Control Algorithms. In the Proc. ACM SIGMOD, dition to his primary affiliation with Aalborg University, pp. 416–425, 1991. C. S. Jensen is an adjunct professor in Department of Tech- [22] Y. Tao, D. Papadias, and J. Sun. The TPR*-Tree: An nology, Agder University College, Norway. His work was Optimized Spatio-Temporal Access Method for Predictive funded in part by grants 216 and 333 from the Danish Na- Queries. In Proc. VLDB, pp. 790–801, 2003. tional Center for IT Research. [23] Y. Tao, J. Zhang, D. Papadias, and N. Mamoulis. An Ef- ficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces. IEEE References TKDE, to appear. [1] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The [24] J. Tayeb, O. Ulusoy, and O. Wolfson. A Quadtree Based Dy- R*-Tree: An Efficient and Robust Access Method for Points namic Attribute Indexing Method. The Computer Journal, and Rectangles. In Proc. ACM SIGMOD, pp. 322–331, 41(3): 185–200, 1998. 1990. [25] C. Yu, B. C. Ooi, K. L. Tan and H. V. Jagadish. Indexing [2] R. Benetis, C. S. Jensen, G. Karciauskas, and S. Saltenis. the Distance: An Efficient Method to KNN Processing. In Nearest Neighbor and Reverse Nearest Neighbor Queries Proc. VLDB, pp. 421–430, 2001. for Moving Objects. In Proc. IDEAS, pp. 44–53, 2002.

779