Parallel R-trees



Ibrahim Kamel and Christos Faloutsos

Department of CS

University of Maryland

College Park, MD 20742

Abstract 1 Intro duction

One of the requirements for the database management

We consider the problem of exploiting paralleli sm to accel-

systems (DBMSs) of the future is the ability to handle

erate the p erformance of spatial access metho ds and sp ecif-

spatial data. Spatial data arise in many applications,

ically, R-trees [11]. Our goal is to design a server for spatial

including: Cartography [26], Computer-Aided Design

data, so that to maximize the throughput of range queries.

(CAD) [18], [10], computer vision and rob otics [2], tradi-

This can b e achieved by (a) maximizing parallelism for large

tional databases, where a record with k attributes corre-

range queries, and (b) by engaging as few disks as p ossible

sp onds to a p oint in a k -d space, rule indexing in exp ert

on p oint queries [22].

database systems [25], temp oral databases, where time

We prop ose a simple hardware architecture consisting of

can b e considered as one more dimension [15], scienti c

one pro cessor with several disks attached to it. On this archi-

databases, with spatial-temp oral data, etc.

tecture, we prop ose to distribute the no des of a traditional

In the ab ove applications, one of the most typical

R-, with cross-disk p ointers (`Multiplexed' R-tree). The

queries is the range query: Given a rectangle, retrieve

R-tree co de is identical to the one for a single-disk R-tree,

all the elements that intersect it. A sp ecial case of the

with the only addition that we have to decide which disk

range query is the point query or stabbing query, where

a newly created R-tree no de should b e stored in. We pro-

the query rectangle degenerates to a p oint.

p ose and examine several criteria to cho ose a disk for a new

In this pap er we study the problem of improving the

no de. The most successful one, termed `proximity index' or

PI, estimates the similarity of the new no de with the other

search p erformance using parallelism, and sp eci cally,

R-tree no des already on a disk, and cho oses the disk with

multiple disk units. There are two main reasons for

the lowest similarity. Exp erimental results show that our

using multiple disks, as opp osed to a single disk:

scheme consistently outp erforms all the other heuristics for

(a) All of the ab ove applications will b e I/O b ound.

no de-to-disk assignments, achieving up to 55% gains over the

Our measurements on a DECstation 5000 showed

Round Robin one. Exp eriments also indicate that the mul-

that the CPU time to pro cess an R-tree page,

tiplexed R-tree with PI heuristic gives b etter resp onse time

once brought in core, is 0.12 msec. This is 156

than the disk-strippin g (="Sup er-no de") approach, and im-

times smaller, than the average disk access time

p oses lighter load on the I/O sub-system.

(20 msec). Therefore it is imp ortant to parallelize

The sp eed up of our metho d is close to linear sp eed up,

the I/O op eration.

increasing with the size of the queries.

(b) The second reason for using multiple disk units is

that several of the ab ove applications involve huge

amounts of data, which do not t in one disk. For

12



example, NASA exp ects 1 Terabyte (=10 ) of data

Also, a memb er of UMIACS. This research was sp onsored par-

16

tially by the National Science Foundation under the grants IRI- p er day; this corresp onds to 10 bytes p er year of

8719458 and IRI-8958546, by a Department of Commerce Joint

satellite data. Geographic databases can b e large,

Statistical Agreement JSA-91-9, by a donation from EMPRESS

for example, the TIGER database mentioned ab ove

Software Inc. and by a donation from Thinking Machines Inc..

is 19 Gigabytes. Historic and temp oral databases

tend to archive all the changes and grow quickly in

size.

The target system is intended to op erate as a server,

resp onding to range queries of concurrent users. Our

goal is to maximize the throughput, which translates

into the following two requirements: 1

`minLoad': Queries should touch as few no des as p os- 1 sible, imp osing a light load on the I/O sub-system.

Q h

a corollary, queries with small search regions

As 2 ate as few disks as p ossible. should activ 8

7

`uniSpread': No des that qualify under the same query,

Qs

should b e distributed over the disks as uniformly as 9

Q l

As a corollary, queries that retrieve a lot

p ossible. 1 3

of data should activate as many disks as p ossible.

10 hitecture consists of one pro- The prop osed hardware arc 4 11

5 12

cessor with several disks attached to it. We do not

consider multi-pro cessor architectures, b ecause multi-

ple CPUs will probably b e an over-kill, increasing the 6

dollar cost and the complexity of the system without re- 0

k. Moreover, a multi-processor

lieving the I/O b ottlenec 0 1

lo osely-coupled architecture, like GAMMA [5], will have

communication costs, which are non-existing in the pro-

Figure 1: Data (dark rectangles) organized in an R-tree.

p osed single-pro cessor architecture. In addition, our ar-

Fanout=3 - Dotted rectangles indicate queries

chitecture is simple: it requires only widely available

o -the-shelf comp onents, without the need for synchro-

we omit a detailed description of the metho d. Figure 1

nized disks, multiple CPU's, or sp ecialized op erating

illustrates data rectangles (in black), organized in an R-

system.

tree with fanout 3. Figure 2 shows the le structure for

On this architecture, we will distribute the no des

the same R-tree, where no des corresp ond to disk pages.

of a traditional R-tree. We prop ose and study several

heuristics on how to choose a disk to place a newly cre-

Root

ated R-tree no de. The most successful heuristic, based

on the `proximity index', estimates the similarity of the

new no de with the other R-tree no des already on a disk, 1 2 3

and chooses the disk with the least similar contents. Ex-

p erimental results showed that our scheme consistently

outp erforms other heuristics. 4 5 6 7 8 9 10 11 12

The pap er is organized as follows. Section 2 brie y

describ es the R-tree and its variants. Also, it surveys

Figure 2: The le structure for the R-tree of the previ-

previous e orts to parallelize other le structures. Sec-

ous gure (fanout=3)

tion 3 prop oses the `multiplexed' R-tree as a way to

store an R-tree on multiple disks. Section 4 examines

In the rest of this pap er, the term `no de' and the term

alternative criteria to choose a disk for a newly create

`page' will b e used interchangeably, except when dis-

R-tree no de. It also intro duces the `proximity' measure

cussing ab out the sup er-no des metho d, subsection 3.2.

and derives the formulas for it. Section 5 presents ex-

Extensions, variations and improvements to the origi-

p erimental results and observations. Section 6 gives the

nal R-tree structure include the packed R-trees [20], the

conclusions and directions for future research.

+ 

R -tree [23] and the R -tree [3].

There is also much work on how to organize tra-

ditional le structures on multi-disk or multi-processor

2 Survey

machines. For the B-tree, Pramanic and Kim prop osed

PNB-tree [24] which uses a `sup er-no de' (`sup er-page')

Several spatial access metho ds have b een prop osed. A

scheme on synchronized disks. Seeger and Larson [22]

recent survey can b e found in [21]. These classi ca-

prop osed an algorithm to distribute the no des of the

tion includes metho ds that transform rectangles into

B-tree on di erent disks. Their algorithm takes into ac-

p oints in a higher dimensionality space [12], metho ds

count not only the resp onse time of the individual query

that use linear quadtrees [8] [1] or, equivalently, the

but also, the throughput of the system.

z -ordering [17] or other space lling curves [6] [13],

and nally, metho ds based on trees (k-d-trees [4], k-

Parallelization of the R-trees is an unexplored topic,

d-B-trees [19], hB-trees [16], cell-trees [9] e.t.c.)

to the b est of the authors' knowledge. Next, we present

One of the most characteristic approaches in the last some software designs to achieve ecient paralleliza-

class is the the R-tree [11]. Due to space limitations, tion.

3 Alternative designs 3.3 Multiplexed (`MX') R-tree

In this scheme we use a single R-tree, with each no de

The underlying le structure is the R-tree. Given that,

spanning one disk page. No des are distributed over

our goal is to design a server for spatial ob jects on a

the d disks, with p ointers across disks. For example,

parallel architecture, to achieve high throughput under

Figure 3 shows one p ossible multiplexed R-tree, corre-

concurrent range queries.

sp onding to the R-tree of Figure 1. The ro ot no de kept

The rst step is to decide on the hardware architec-

in main memory while other no des are distributed over

ture. For the reasons mentioned in the intro duction, we

the disks A, B and C. For the multiplexed R-tree, each

prop ose a single pro cessor with multiple disks attached

The next step is to decide how to distribute an

to it. Root

R-tree over multiple disks. There are three ma jor ap-

proaches to do that: (a) d indep endent R-trees, (b) Disk

stripping (or `sup er-no des', or `sup er-pages'), and (c)

the `Multiplexed' R-tree, or MX R-tree for short, which

1 2 3

we describ e and prop ose later. We examine the three

approaches qualitatively:

4 7 11 6 9 12 5 8 10

3.1 Indep endent R-trees

this scheme we can distribute the data rectangles

In Disk A Disk B Disk C

b etween the d disks and build a separate R-tree index for

each disk. This mainly works for unsynchronized disks.

Figure 3: R-tree stored on three disks

The p erformance will dep end on how we distribute the

rectangles over the di erent disks. There are two ma jor

p ointer consists of a disk id, in addition to the page id

approaches:

of the traditional R-tree. However, the fanout of the

Data Distribution The data rectangles are as-

no de is not a ected, b ecause the disk id can b e enco ded

signed to the di erent disks in a round robin fashion,

within the 4 bytes of the page id.

or using a hashing function. The data load (numb er of

Notice that the prop osed metho d ful lls b oth re-

rectangles p er disk) will b e balanced. However, this ap-

quirements (minLoad and uniSpread): For example,

proach violates the minimum load (`minLoad') require-

from Figure 3, we see that the `small' query Q of Fig-

s

ment: even small queries will activate all the disks.

ure 1 will activate only one disk p er level (disk B, for

Space Partitioning The idea is to divide the space

no de 2, and disk A, for no de 7), ful lling the minimum

into d partitions, and assign each partition to a separate

load requirement. The large query Q will activate al-

l

disk. For example, for the R-tree of Figure 1, we could

most all the disks in every level (disks B and C at level

assign no des 1, 2 and 3 to disks A, B and C, resp ectively.

2, and then all three disks at the leaf level), ful lling

The children of each no de follow their parent on the

the uniform spread requirement.

same disk. This approach will activate few disks on

Thus, with a careful no de-to-disk assignment, the

small queries, but it will fail to engage all disks on large

MX R-tree should outp erform b oth the metho ds that

queries (uniform spread, or `uniSpread' requirement).

use sup er-no des as well as the ones that use d indep en-

dent R-trees. Our goal now is to nd a go o d heuristic

to assign no des to disks.

3.2 Sup er-no des

By its construction, the multiplexed R-tree ful lls

the minimum load requirement. To meet the uniform

In this scheme we have only one large R-tree with each

spread requirement, we have to nd a go o d heuristic to

no de (=`sup er-no de') consisting of d pages; the i-th page

assign no des to disks. In order to measure the quality

is stored on the i-th disk (i = 1; : : : ; d). To retrieve a

of such heuristics, we shall use the resp onse time as a

no de from the R-tree we read in parallel all d pages

criterion, which we calculate as follows.

that constitute this no de. In other words, we `strip e'

Let R(q ) denote the resp onse time for the query

the sup er-no de on the d disks, using page-stripping [7].

q . First, we have to discuss how the search algorithm

Almost identical p erformance will b e obtained with bit-

works. Given a range query q the search algorithm needs

or byte-level stripping.

a queue of no des, which is manipulated as follows:

This scheme can work b oth with synchronized or

Algorithm 1: Range Search

unsynchronized disks. However, this scheme violates

the `minLoad' requirement: regardless of the size of the S1. Insert the ro ot no de of the R-tree in the pro cessing

query, all the d disks b ecome activated. queue.

page access time

S2. while (more no des in queue)

 Pick a no de n from the pro cessing queue.

 Pro cess no de n, by checking for intersections If this is a leaf no de,

with the query rectangle. Disk A 11

print the results; otherwise send a list of re-

d disks, in parallel.

quests to some or all of the Disk B 2 9 12 The no des that the disks retrieve are inserted at the

10

queue (FIFO). Since the CPU is

end of the pro cessing Disk C 3 8

much faster than the disk, we assume that the CPU

time is negligible (=0) compared to the time required

time

a disk to retrieve a page. Thus, the measure for

by 1 2 3

the resp onse time is the numb er of disk accesses that

the latest disk will require. The `disk-time'

Figure 5: Disk-Time diagram for the huge query Q of

h

helps visualize this concept b etter. Figure 4 presents

Figure 1.

the `disk-time' diagram for the query Q of Figure 1.

l

The horizontal axis is time, divided in slots. The dura-

tion of each slot is the time for a disk access (which is

pread'). As discussed b efore, the minimum load require-

considered constant). The diagram indicates when each

ment is ful lled.

disk is busy, as well as the page it is seeking, during each

When a no de (page) in the R-tree over ows, it is

time slot. Thus, the resp onse time for Q is 2, while its

split into two no des. One of these no des, say, N , has

l

0

load L(Q )= 4, b ecause Q retrieved 4 pages in total.

l l to b e assigned to another disk. If we carefully select this

As another example, the `huge' query Q of Fig-

new disk we can improve the search time. Let disk O f ()

h

ure 1 results in the disk-time diagram of Figure 5, with

b e the function that maps no des to the disks they reside.

resp onse time R(Q )=3, and a load of L(Q )=7.

Ideally, we should consider all the no des that are on the

h h

same level with N , b efore we decide where to store it. 0

page access time

However, this will require to o many disk accesses. Thus,

we consider only the sibling no des N ;:::;N , that is,

1 k

the no des that have the same father N with N .

f ather 0

the father no de comes at no extra cost, b e- 11 Accessing

Disk A

cause we have to bring it in main memory anyway, to

N . Notice that we do not need to access the sib- 2 insert 9 0

Disk B

ling no des N ;:::;N , b ecause all the information we

1 k um b ound- 3 need ab out them (extend of MBR (= minim

Disk C

ing rectangle) and disk of residence) are recorded in the father no de.

1 2 3 time Thus, the problem can b e informally abstracted as

follows:

Problem 1: Disk assignment

Figure 4: Disk-Time diagram for the large query Q of

l

Given a no de (= rectangle) N , a set of no des

0

Figure 1.

N ;:::;N and the assignment of no des to disks

1 k

(diskOf() function)

Given the ab ove examples, we have the following

Assign N to a disk, to maximize the resp onse time on

de nition for the resp onse time: 0

range queries.

De nition 1 (Resp onse Time) The resp onse time

There are several criteria that we have considered:

R(q ) for a query q is the resp onse time of the latest

Data balance: Ideally, all disks should have the same

disk in the disk-time diagram.

numb er of R-tree no des. If a disk has many more

pages than others, it is more likely to b ecome a `hot

sp ot' during query pro cessing.

4 Disk assignment algorithms

Area balance: Since we are storing not only p oints but

The problem we examine in this section is how to assign also rectangles, the area of the pages stored on a

no des to disks, within the Multiplexed R-tree frame- disk is another factor. A disk that covers a larger

work. The goal is to minimize the resp onse time, to sat- area than the rest is again more likely to b ecome a

isfy the requirement for uniform disk activation (`uniS- hot sp ot.

Proximity: Another factor that a ects the search time city-block or L ) distance. Rectangles with high

1

is the spatial relation b etween the no des that are proximity (i.e., intersecting, or close to each other)

stored on the same disk. If two no des are intersect- should b e assigned to di erent disks. The proximity

ing, or are close to each other, they should b e stored index of a new no de N and a disk D (which con-

0

on di erent disks, to maximize the parallelism. tains the sibling no des N ;:::;N ) is the proximity

1 k

of the most `proximal' no de to N .

0

We can not satisfy all these criteria simultaneously, b e-

cause some of them may con ict. Next, we describ e

This heuristic assigns no de N to the disk with the

0

some heuristics, each trying to satisfy one or more of the

lowest proximity index, i.e., to the disk with the

ab ove criteria. In Section 5 we compare these heuristics

least similar no des with resp ect to N . Ties are

0

exp erimentally.

resolved using the numb er of no des (data balance):

Round Robin (`RR'). When a new page is created

N is assigned to the disk with the fewest no des.

0

by splitting, this criterion assigns it to a disk in

For the setting of Figure 6, PI will assign N to disk

0

a round robin fashion. Without deletions, this

B b ecause it contains the most remote rectangles.

scheme achieves p erfect data balance. For exam-

Intuitively, disk B is the b est choice for N .

0

ple, in Figure 6, RR will assign N to the least

0

Although favorably prepared, the example of Fig-

p opulated disk, that is, disk C.

ure 6 indicates that PI should p erform b etter than the

Minimum Area (`MA'). This heuristic tries to balance

rest of the heuristics. Next we show how to calculate

the area of the disks: When a new no de is cre-

exactly the `proximity' of two rectangles.

ated, the heuristic assigns it to the disk that has

the smallest area covered. For example, in Fig-

4.1 Proximity measure

ure 6, MA would assign N to disk A, b ecause the

0

light gray rectangles N , N ,N and N of disk A

1 3 4 6

Whenever a new R-tree no de N is created, it should b e

0

have the smallest sum of area.

placed on the disk that contains no des (= rectangles)

are as dissimilar to N as p ossible. Here we try N that

father 0

to quantify the notion of similarity b etween two rectan-

Disk A

gles. The prop osed measure can b e trivially generalized

N 7

of any dimensionality. For N to hold for hyper-rectangles 8 Disk B

N 6

clarity, we examine 1- and 2- dimensional spaces rst.

Disk C

Intuitively, two rectangles are similar if they qualify N

5 N4 often under the same query. Thus, a measure of sim-

ilarity of two rectangles R and S is the prop ortion of

queries that retrieve b oth rectangles. Thus,

N 3

(R; S ) = Prob f a query retrieves b oth R N 2 pr oximity

N

S g 1 N and

0

or, formally

pr oximity (R; S ) =

Figure 6: No de N is to b e assigned to one of the three

0

jq j #of q uer ies r etr iev ing both

= (1)

disks.

total # of q uer ies jQj

Minimum Intersection (`MI'). This heuristic tries to To avoid complications with in nite numb ers, let's as-

minimize the overlap of no des that b elong to the sume during this subsection that our address space is

same disk. Thus, it assigns a new no de to such a

discrete, with very ne granularity (The case of a con-

disk, so that the new no de intersects as little as

tinuous address space will b e the limit for in nitely ne

p ossible with the other no des on that disk. Ties granularity).

are broken using one of the ab ove criteria.

Based on the ab ove de nition, we can derive the

formulas for the proximity, given the co ordinates of the

Proximity Index (`PI'). This heuristic is based on the

two rectangles R and S . To simplify the presentation,

proximity measure, which we describ e in detail in

let's consider the 1-dimensional case rst.

the next subsection. Intuitively, this measure com-

pares two rectangles and assesses the probability 1-d Case: Without loss of generality, we can normalize

that they will b e retrieved by the same query. As our co ordinates, and assume that all our data segments

we shall see so on, it is related to the Manhattan (or lie within the unit line segment [0,1]. Consider two line

segments R and S where R=(r , r ) and S =(s , (b) If R and S are disjoint, let  b e the distance

star t end star t

s ). b etween them (see Figure 9). Then, a query has to

end

If we represent each segment X as the p oint (x ,

star t

x ), the segments R and S are transformed to 2-

end

dimensional p oints [12] as shown in Figure 7. In the

x-end

x end

1

S S-end

R R S R-end R S 0 .5 1x -1 0R-end S-start 1 x-start ∆ ∆ x

0 0.5 1 -1 0.5 0 R-start 0.5 S-start 1 xstart

Figure 9: The shaded area contains all the segments

Figure 7: Mapping line segments to p oints

that intersect R and S

same Figure, the area within the dashed lines is a mea-

cover the segment (r , s ), in order to retrieve b oth

end star t

sure of the numb er of all the p ossible query segments,

segments. The numb er of such queries is prop ortional

ie, queries whose size is  1 and who intersect the unit

to the shaded area in Figure 9.b; its area is given by

segment. There are two cases to consider, dep ending

2

q = 1=2  (1 ) (4)

on whether R and S intersect or not. Without loss

and the proximity measure for R and S is

of generality, we assume that R starts b efore S (ie.,

2

pr oximity (R; S ) = jq j=jQj = 1=3  (1 ) (5)

r  s )

star t star t

Notice that the two formulas agree, when R and S

(a) If R and S intersect, let `I ' denote their intersec-

just touch: in this case,  =  = 0 and the proximity is

tion, and let  b e its length. Every query that intersects

1=3.

`I ' will retrieve b oth segments R and S . The total num-

n-d Case: The previous formulas can b e generalized

b er jQj of p ossible queries is prop ortional to the trap e-

by assuming uniformity and indep endence. For a 2-d

zoidal area within the dashed lines in Figure 7; its area

space, let R and S b e two data rectangles, with R , R

x y

is jQj = (2  2 1  1)=2 = 3=2. The total numb er of

denoting the x and y pro jections of R. A query X will

queries jq j that retrieve b oth R and S is prop ortional to

retrieve b oth R and S if and only if (a) its x-pro jection

the shaded area of Figure 8

X retrieves b oth R and S and (b) its y -pro jection

x x x

X retrieves b oth R and S .

y y y

Since the x and y sizes of the query rectangles are in-

dep endent, the fraction of queries that meet b oth of the

e criteria is the pro duct of the fractions for each in-

x-end ab ov

dividual axis, i.e., the proximity measure pr oximity ()

2

in two dimensions is given by:

pr oximity (R; S ) = 2

S

pr oximity (R ;S )  pr oximity (R ;S ) (6)

x x y y

R

n-dimensions is straightforward:

0 .5 1x -1 0S-start R-end 1 x-start The generalization for n

δ δ Y

pr oximity (R ;S ) (7) pr oximity (R; S ) =

i i n

i=1

Figure 8: The shaded area contains all the segments

where R and S are the pro jections on the i-th axis,

i i

that intersect R and S simultaneously

and the pr oximity () function for segments is given by

eqs. 3 and 5.

2 2

jq j = ((1 +  )  )=2 = 1=2  (1 + 2   ) (2) The proximity index measures the similarity of a

Thus, for intersecting segments R and S we have rectangle R with a set of rectangles R = fR ;:::;R g.

0 1 k

pr oximity (R; S ) = jq j=jQj = 1=3  (1 + 2   ) (3) We need this concept to assess the similarity of a new

where  is the length of the intersection rectangle R and a disk D , containing the rectangles of 0

amined (1-25 disks), the delay caused by the CPU

Symbols De nitions

is negligible.

a average area of a data rectangle

Without loss of generality, the address space was nor-

c cover quotient

malized to the unit square. There are several factors

d numb er of disks

that a ect the search time. We studied the following

disk O f () maps no des to disks

input parameters, ranging as mentioned: The numb er

L(q ) total numb er of pages touched

of disks d (5-25); the total numb er of data rectangles N

by q (`Load')

(25,000 to 200,000); the size of queries q  q (q ranged

s s s

N numb er of data rectangles

from 0 (p oint queries) to 0.25); the page size P (1Kb to

P size of a disk page in Kbytes

4Kb).

pr oximity () proximity of two n-d rectangles

n

Another imp ortant factor, which is derived from N

q side of a query rectangle

s

and the average area a of the data rectangles, is the

R(q ) resp onse time for query q

"cover quotient" c (or "density") of the data rectangles.

r (q ) relative resp onse time

This is the sum of the areas of the data rectangles in

(compared to PI)

the unit square, or equivalently, the average numb er of

s sp eed-up

rectangles that cover a randomly selected p oint. Math-

ematically: c = N  a. For the selected values of N and

Table 1: Summary of Symb ols and De nitions a, the cover quotient ranges from 0.25 - 2.0.

The data rectangles were generated as follows:

Their centers were uniformly distributed in the unit

the set R. The proximity index is the proximity of the

square; their x and y sizes were uniformly distributed

most similar rectangle in R. Formally:

in the range [0,max], where max = 0.006

pr oximity I ndex(R; R) = max pr oximity (R; R ) (8)

n i

The query rectangles were squares with side q .

s

R 2R

i

Their centers are uniformly distributed in the unit

where R 2 R, and n is the dimensionality of the address

i

square. For every exp eriment, 100 randomly generated

space.

queries were asked and the results were averaged. Data

or query rectangles that were not completely inside the

unit square were clipp ed.

5 Exp erimental results

The proximity index heuristic p erformed very well

To assess the merit of the proximity index heuristic in our exp eriments, and therefore is the prop osed ap-

over the other heuristics, we ran simulation exp eri- proach. To make the comparison easier, we normalize

ments on two-dimensional rectangles. We augmented the resp onse time of the di erent heuristics to that of

the original R-tree co de with some routines to han- the proximity index and plot the ratios of the resp onse

times.

dle the multiple disks (e.g., `cho ose disk()', `proximity()'

e.t.c.) The co de is written in C under Ultrix and the

In the following subsections, we present (a) a com-

simulation exp eriments ran on a DECstation 5000. We

parison among the no de-to-disk assignment heuristics

used b oth the linear and the quadratic splitting algo-

(MI, MA, RR and PI); recall that they are all within

rithm of Guttman [11]. The quadratic algorithm re-

the multiplexed R-tree framework. (b) A more detailed

sulted in b etter R-trees, i.e., with smaller father no des.

study of the RR heuristic vs. the PI one. (c) A com-

The exp onential algorithm was very slow and it was not

parison of the PI vs. sup er-no de metho d (d) A study of

used. Unless otherwise stated, all the results we present

the sp eed-up achieved by PI.

are based on R-trees that used the quadratic split algo-

rithm.

5.1 Comparison of the disk assignment

In our exp eriments we assume that

heuristics

 all d disk units are identical.

 the page access time is constant.

Figure 10 shows the actual resp onse time for each of

 the rst two levels of the multiplexed R-tree (the

the four heuristics (RR, MI, MA, PI), as a function

ro ot and its children) t in main memory. The

of the query size q . The parameters were as follows:

s

required space is of the order of 100 Kb, which is a

N =25,000 c=0.26, d=10, P =4. This b ehavior is typ-

mo dest requirement even for p ersonal computers.

ical for several combinations of the parameter values:

 the CPU time is negligible. As discussed b efore, P =1,2,4; c=0.5,1,2; d=5,10,20. The main observation

the CPU is two orders of magnitude faster than is that PI and MI, the two heuristics that take the spa-

the disk. Thus, for a numb er of disks we have ex- tial relationships into account, p erform the b est. Round Coverage = 0.26, Page size = 4k, no disks = 10 Page size = 4k, No disks = 10 response time ratio of response time min area prox 5.00 round robin cov=0.26 1.55 4.80 min intersection cov=0.5 4.60 Proximity index cov=1 1.50 4.40 cov=2

4.20 1.45 4.00 3.80 1.40 3.60 1.35 3.40 3.20 1.30 3.00

2.80 1.25 2.60 2.40 1.20 2.20 1.15 2.00 1.80 1.10 1.60

1.40 1.05 1.20 1.00 1.00 sizeof qside x 10-3 qs x 10-3

0.00 50.00 100.00 150.00 0.00 50.00 100.00 150.00

Figure 10: Comparison among all heuristics (PI,MI,RR Figure 12: Relative resp onse time (RR over PI) vs query

and MA)- resp onse time vs query size size.

page sizes P varied: 1, 2 and 4Kb. The conclusion is

Robin is the next b est, while the Minimum Area heuris-

that the gains of PI increase with increasing page size.

tic has the worst p erformance.

This is b ecause the PI heuristic considers only sibling

Comparing the MI and PI heuristics, we see that

no des (no des under the same father); with a larger the

MI p erforms as well as the proximity index heuristic

page size, the heuristic takes more no des into account,

for small queries; for larger queries, the proximity index

and therefore makes b etter decisions.

wins. The reason is that MI may assign the same disk

Figure 12 illustrates the e ect of the cover quotient

to two non-intersecting rectangles that are very close to

on the relative gains of PI over RR. The page size P was

each other.

xed at 4Kb; the cover quotient varied (c=0.26, 0.5, 1

and 2). Everything else was the same as in Figure 11.

5.2 Proximity index versus Round

The main observation is that r (q ) decreases with the

Robin

cover quotient c. This is explained as follows: For large

c, there are more rectangles in the vicinity of the newly

Here we study the savings that the prop osed heuristic

created rectangle, which means that we need to take

PI can achieve over the RR. The reason we have chosen

more sibling no des into account and use more disk units

RR is b ecause it is the simplest heuristic to design and

to get a b etter results.

implement. We show that the extra e ort to design the

An observation common to b oth Figures is that r (q )

PI heuristic pays o consistently.

p eaks for medium size queries. For small queries, the

b er of no des to b e retrieved is small, leaving little cover=0.26, # rect=25k, # disks=10 num

ratio of response time

for improvement. For huge queries, almost all prox ro om P=1k 1.55

P=2k the no des need to b e retrieved, in which case the data P=4k

1.50

balance of RR achieves go o d results.

1.45 e should re-emphasize that Figures 11 and 12 re- 1.40 W

1.35

veal only a small fraction of the exp erimental data we

1.30

gathered. We also p erformed exp eriments with several

1.25

values for the numb er of disks d, as well as with the

1.20

linear splitting algorithm for the R-tree. In all our ex-

1.15

variantly outp erformed RR. 1.10 p eriments, PI in

1.05

1.00 qs x 10-3

0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00

5.3 Comparison with the sup er-no de

metho d

Figure 11: Relative resp onse time (RR over PI) vs query

In order to justify our claims ab out the advantages

size.

of the Multiplexed (`MX') R-tree over the sup er-no de

Figure 11 plots the resp onse time of RR relative to metho d, we compared the two metho ds with resp ect to

PI as a function of the query size q . The numb er of the two requirements, `uniform spread' and `minimum

s

disks was d=10, the cover quotient is c=0.26 and the load'. The measure for the rst is the resp onse time

coverage = 2, # rect = 100k

(q ); the measure for the second is the load L(q ). We R load supernode 230.00

multiplexed t graphs with resp ect to b oth measures. presen 220.00 210.00

200.00

13 compares the resp onse time of the Figure 190.00

180.00

R-tree (with PI) against the sup er-no de Multiplexed 170.00 160.00 150.00 140.00 130.00 120.00 Coverage = 2, # rect = 100k 110.00 response time 100.00 46.00 Supernode 90.00 44.00 Multiplexed R-tree 80.00 42.00 70.00 40.00 60.00 38.00 50.00 36.00 40.00 34.00 30.00 32.00 20.00 30.00 10.00 0.00 -3 28.00 sizeof qside x 10 0.00 50.00 100.00 150.00 200.00 26.00 24.00 22.00

20.00

Total numb er of pages retrieved (load), vs. 18.00 Figure 14:

16.00

q - d=5.

14.00 query size s 12.00 10.00 8.00 6.00

4.00 Sp eed-up 2.00 Sizeof qside x 10-3 5.4

0.00 50.00 100.00 150.00 200.00

The standard measure of the eciency of a parallel sys-

Figure 13: Resp onse time vs query size for Multiplexed

tem is the sp eed up s, which is de ned as follows: Let

R-tree, with PI, and for sup er-no des. (d=5 disks)

R (q ) b e the resp onse time for the query q on a system

d

with d disks. Then: s = R (q )=R (q ) We examined

1 d

exclusively the multiplexed R-tree metho d, with the PI

metho d. Notice that the di erence in p erformance in-

heuristic, since it seems to o er the b est p erformance.

creases with the query size q . In general, the multi-

s

Due to space limitations, we rep ort here the conclusions

plexed R-tree outp erforms the sup er-no de scheme for

only; the details are in a technical rep ort [14]. The ma-

large queries. The only situation where the sup er-no de

jor result was that the sp eed up is high, eg. 84% of the

scheme p erforms slightly b etter is when there are many

linear sp eedup, for cover quotient c=2, page size P =4

disks d and the query is small. The explanation is that,

and for query side size q =0.25. Moreover, the sp eed up

s

since d is large, the R-tree with sup er-no des has fewer

increases with the size of the query.

levels than the multiplexed R-tree; in addition, since

the query is small, the resp onse time of b oth trees is

the b ounded by the height of the resp ective tree. How-

6 Conclusions

ever, this is exactly the situation where the sup er-no de

metho d violates the `minimum load' requirement, im-

We have studied alternative designs for a spatial ob ject

p osing a large load on the I/O sub-system and paying

server, using R-trees as the underlying le structure.

p enalties in throughput. In order to gain insight on the

Our goal is to maximize the parallelism for large queries,

e ect on the throughput, we plot the `load' for each

and on the same time to engage as few disks as p ossible

metho d, for various parameter values. Recall that the

for small queries. To achieve these goals, we prop ose

load L(q ) for a query q is the total numb er of pages

 a hardware architecture with one CPU and multi-

touched (1 sup er-page counts as d simple pages). Fig-

ple disk, which is simple, e ective and inexp ensive.

ure 14 shows the results for the same setting as b efore

It has no communication costs, it requires inexp en-

(Figure 13). The multiplexed R-tree imp oses a much

sive, general purp ose comp onents, it can easily b e

lighter load: for small queries, its load is 2-3 times

expanded (by simply adding more disks) and it can

smaller than the load of the sup er-no de metho d. In-

take easily advantage of large bu er p o ols.

terestingly, the absolute di erence increases with the

 a software architecture (termed `Multiplexed' R-

query size.

tree). It op erates exactly like a single-disk R-tree,

with the only di erence that its no des are care-

The conclusion of the comparisons is that the pro-

fully distributed over the d disks. Intuitively, this

p osed metho d has b etter resp onse time than the sup er-

approach should b e b etter than the Sup er-no de

no de metho d, at least for large queries. In addition, it

approach and the `indep endent R-trees' approach

will lead to higher throughput, b ecause it tends to im-

with resp ect to throughput.

p ose lighter loads on the disks. Both results agree with

our intuition, and indicate that the prop osed metho d  the `proximity index' (PI) criterion, which decides

will o er a higher throughput for a spatial ob ject server. how to distribute the no des of the R-tree on the d

disks. Sp eci cally, it tries to store a new no de on [11] A. Guttman. R-trees: a dynamic index structure for

spatial searching. Proc. ACM SIGMOD, pages 47{57,

that one disk that contains no des as dissimilar to

June 1984.

the new no de as p ossible.

[12] K. Hinrichs and J. Nievergelt. The grid le: a data

Extensive simulation exp eriments show that the PI cri-

structure to supp ort proximity queries on spatial ob-

terion consistently outp erforms other criteria (round

jects. Proc. of the WG'83 (Intern. Workshop on Graph

robin and the minimum area), and that it p erforms ap-

Theoretic Concepts in Computer Science), pages 100{

113, 1983.

proximately as well or b etter than the minimum inter-

[13] H.V. Jagadish. Linear clustering of ob jects with mul-

section criterion.

tiple attributes. ACM SIGMOD Conf., pages 332{342,

A comparison with the sup er-no de (= disk strip-

May 1990.

ping) approach shows that the prop osed metho d o ers

[14] Ibrahim Kamel and Christos Faloutsos. Parallel r-trees.

a b etter resp onse time for large queries, and that it im-

Proc. of ACM SIGMOD Conf., pages 195{204, June

p oses lighter load, leading to higher throughput.

1992. Also available as Tech. Rep ort UMIACS TR 92-

1, CS-TR-2820.

With resp ect to the sp eed up, the prop osed metho d

can achieve near linear for large queries. Thus, the mul-

[15] Curtis P. Kolovson and Michael Stonebraker. Seg-

ment indexes: Dynamic indexing techniques for multi-

tiplexed R-tree with the PI heuristic seems to b e the

dimensional interval data. Proc. ACM SIGMOD, pages

b est metho d to use, in order to implement a spatial ob-

138{147, May 1991.

ject server.

[16] David B. Lomet and Betty Salzb erg. The hb-tree: a

Future research could fo cus on the use of the prox-

multiattribute indexing metho d with go o d guaranteed

imity concept to aid the parallelization of other spatial

p erformance. ACM TODS, 15(4):625{658, Decemb er

1990.

le structures.

Acknowledgements: The authors would like to

[17] J. Orenstein. Spatial query pro cessing in an ob ject-

thank George Panagopoulos, for his help with the type-

oriented database system. Proc. ACM SIGMOD, pages

setting.

326{336, May 1986.

[18] J. K. Ousterhout, G. T. Hamachi, R. N. Mayo, W. S.

Scott, and G. S. Taylor. Magic: a vlsi layout system. In

References

21st Design Automation Conference, pages 152 { 159,

Alburquerque, NM, June 1984.

[1] Walid G. Aref and Hanan Samet. Optimization strate-

[19] J.T. Robinson. The k-d-b-tree: a search structure for

gies for spatial query pro cessing. Proc. of VLDB (Very

large multidimensi on al dynamic indexes. Proc. ACM

Large Data Bases), pages 81{90, September 1991.

SIGMOD, pages 10{18, 1981.

[2] D. Ballard and C. Brown. Computer Vision. Prentice

[20] N. Roussop oulos and D. Leifker. Direct spatial search

Hall, 1982.

on pictorial databases using packed r-trees. Proc. ACM

[3] N. Beckmann, H.-P. Kriegel, R. Schneider, and

SIGMOD, May 1985.

B. Seeger. The r*-tree: an ecient and robust ac-

cess metho d for p oints and rectangles. ACM SIGMOD,

[21] H. Samet. The Design and Analysis of Spatial Data

pages 322{331, May 1990. Structures. Addison-Wesley, 1989.

[4] J.L. Bentley. Multidimensi on al binary search trees

[22] Bernhard Seeger and Per-Ake Larson. Multi-disk b-

used for asso ciative searching. CACM, 18(9):509{517,

trees. Proc. ACM SIGMOD, pages 138{147, May 1991.

September 1975.

[23] T. Sellis, N. Roussop oulos, and C. Faloutsos. The r+

[5] D. DeWitt, R.H. Gerb er, G. Graefe, M.L. Heytens,

tree: a dynamic index for multi-dimensi ona l ob jects. In

K.B. Kumar, and M. Muralikrishn a. Gamma - a high

Proc. 13th International Conference on VLDB, pages

p erformance data ow database machine. In Proc. 12th

507{518, England,, September 1987. also available as

International Conference on VLDB, pages 228{237,

SRC-TR-87-32, UMIACS-TR-87-3, CS-TR-1795.

Kyoto, Japan, August 1986.

[24] S.Pramanik and M.H. Kim. Parallel pro cessing of large

[6] C. Faloutsos and S. Roseman. Fractals for sec-

no de b-trees. Trans on Computers, 39(9):1208{1212,

ondary key retrieval. Eighth ACM SIGACT-SIGMOD-

90.

SIGART Symposium on Principles of Database Systems

[25] M. Stonebraker, T. Sellis, and E. Hanson. Rule indexing

(PODS), pages 247{252, March 1989. also available as

implementations in database systems. In Proceedings of

UMIACS-TR-89-47 and CS-TR-2242.

the First International Conference on Expert Database

[7] H. Garcia-Molina and K. Salem. The impact of disk

Systems, Charleston, SC, April 1986.

stripping on reliabili ty. IEEE Database Engineering,

[26] M. White. N-Trees: Large Ordered Indexes for Multi-

11(1):26{39, March 1988.

Dimensional Space. Application Mathematics Research

[8] I. Gargantini. An e ective way to represent quadtrees.

Sta , Statistical Research Division , U.S. Bureau of the

Comm. of ACM (CACM), 25(12):905{910, Decemb er

Census, Decemb er 1981.

1982.

[9] O. Gunther. The cell tree: an index for geometric data.

Memorandum No. UCB/ERL M86/89, Univ. of Cali-

fornia, Berkeley, Decemb er 1986.

[10] A. Guttman. New Features for Relational Database Sys-

tems to Support CAD Applications. PhD thesis, Uni-

versity of California, Berkeley, June 1984.