Parallel R-Trees
Total Page:16
File Type:pdf, Size:1020Kb
Parallel R-trees Ibrahim Kamel and Christos Faloutsos Department of CS University of Maryland College Park, MD 20742 Abstract 1 Intro duction One of the requirements for the database management We consider the problem of exploiting paralleli sm to accel- systems (DBMSs) of the future is the ability to handle erate the p erformance of spatial access metho ds and sp ecif- spatial data. Spatial data arise in many applications, ically, R-trees [11]. Our goal is to design a server for spatial including: Cartography [26], Computer-Aided Design data, so that to maximize the throughput of range queries. (CAD) [18], [10], computer vision and rob otics [2], tradi- This can b e achieved by (a) maximizing parallelism for large tional databases, where a record with k attributes corre- range queries, and (b) by engaging as few disks as p ossible sp onds to a p oint in a k -d space, rule indexing in exp ert on p oint queries [22]. database systems [25], temp oral databases, where time We prop ose a simple hardware architecture consisting of can b e considered as one more dimension [15], scienti c one pro cessor with several disks attached to it. On this archi- databases, with spatial-temp oral data, etc. tecture, we prop ose to distribute the no des of a traditional In the ab ove applications, one of the most typical R-tree, with cross-disk p ointers (`Multiplexed' R-tree). The queries is the range query: Given a rectangle, retrieve R-tree co de is identical to the one for a single-disk R-tree, all the elements that intersect it. A sp ecial case of the with the only addition that we have to decide which disk range query is the point query or stabbing query, where a newly created R-tree no de should b e stored in. We pro- the query rectangle degenerates to a p oint. p ose and examine several criteria to cho ose a disk for a new In this pap er we study the problem of improving the no de. The most successful one, termed `proximity index' or PI, estimates the similarity of the new no de with the other search p erformance using parallelism, and sp eci cally, R-tree no des already on a disk, and cho oses the disk with multiple disk units. There are two main reasons for the lowest similarity. Exp erimental results show that our using multiple disks, as opp osed to a single disk: scheme consistently outp erforms all the other heuristics for (a) All of the ab ove applications will b e I/O b ound. no de-to-disk assignments, achieving up to 55% gains over the Our measurements on a DECstation 5000 showed Round Robin one. Exp eriments also indicate that the mul- that the CPU time to pro cess an R-tree page, tiplexed R-tree with PI heuristic gives b etter resp onse time once brought in core, is 0.12 msec. This is 156 than the disk-strippin g (="Sup er-no de") approach, and im- times smaller, than the average disk access time p oses lighter load on the I/O sub-system. (20 msec). Therefore it is imp ortant to parallelize The sp eed up of our metho d is close to linear sp eed up, the I/O op eration. increasing with the size of the queries. (b) The second reason for using multiple disk units is that several of the ab ove applications involve huge amounts of data, which do not t in one disk. For 12 example, NASA exp ects 1 Terabyte (=10 ) of data Also, a memb er of UMIACS. This research was sp onsored par- 16 tially by the National Science Foundation under the grants IRI- p er day; this corresp onds to 10 bytes p er year of 8719458 and IRI-8958546, by a Department of Commerce Joint satellite data. Geographic databases can b e large, Statistical Agreement JSA-91-9, by a donation from EMPRESS for example, the TIGER database mentioned ab ove Software Inc. and by a donation from Thinking Machines Inc.. is 19 Gigabytes. Historic and temp oral databases tend to archive all the changes and grow quickly in size. The target system is intended to op erate as a server, resp onding to range queries of concurrent users. Our goal is to maximize the throughput, which translates into the following two requirements: 1 `minLoad': Queries should touch as few no des as p os- 1 sible, imp osing a light load on the I/O sub-system. Q h a corollary, queries with small search regions As 2 ate as few disks as p ossible. should activ 8 7 `uniSpread': No des that qualify under the same query, Qs should b e distributed over the disks as uniformly as 9 Q l As a corollary, queries that retrieve a lot p ossible. 1 3 of data should activate as many disks as p ossible. 10 hitecture consists of one pro- The prop osed hardware arc 4 11 5 12 cessor with several disks attached to it. We do not consider multi-pro cessor architectures, b ecause multi- ple CPUs will probably b e an over-kill, increasing the 6 dollar cost and the complexity of the system without re- 0 k. Moreover, a multi-processor lieving the I/O b ottlenec 0 1 lo osely-coupled architecture, like GAMMA [5], will have communication costs, which are non-existing in the pro- Figure 1: Data (dark rectangles) organized in an R-tree. p osed single-pro cessor architecture. In addition, our ar- Fanout=3 - Dotted rectangles indicate queries chitecture is simple: it requires only widely available o -the-shelf comp onents, without the need for synchro- we omit a detailed description of the metho d. Figure 1 nized disks, multiple CPU's, or sp ecialized op erating illustrates data rectangles (in black), organized in an R- system. tree with fanout 3. Figure 2 shows the le structure for On this architecture, we will distribute the no des the same R-tree, where no des corresp ond to disk pages. of a traditional R-tree. We prop ose and study several heuristics on how to choose a disk to place a newly cre- Root ated R-tree no de. The most successful heuristic, based on the `proximity index', estimates the similarity of the new no de with the other R-tree no des already on a disk, 1 2 3 and chooses the disk with the least similar contents. Ex- p erimental results showed that our scheme consistently outp erforms other heuristics. 4 5 6 7 8 9 10 11 12 The pap er is organized as follows. Section 2 brie y describ es the R-tree and its variants. Also, it surveys Figure 2: The le structure for the R-tree of the previ- previous e orts to parallelize other le structures. Sec- ous gure (fanout=3) tion 3 prop oses the `multiplexed' R-tree as a way to store an R-tree on multiple disks. Section 4 examines In the rest of this pap er, the term `no de' and the term alternative criteria to choose a disk for a newly create `page' will b e used interchangeably, except when dis- R-tree no de. It also intro duces the `proximity' measure cussing ab out the sup er-no des metho d, subsection 3.2. and derives the formulas for it. Section 5 presents ex- Extensions, variations and improvements to the origi- p erimental results and observations. Section 6 gives the nal R-tree structure include the packed R-trees [20], the conclusions and directions for future research. + R -tree [23] and the R -tree [3]. There is also much work on how to organize tra- ditional le structures on multi-disk or multi-processor 2 Survey machines. For the B-tree, Pramanic and Kim prop osed PNB-tree [24] which uses a `sup er-no de' (`sup er-page') Several spatial access metho ds have b een prop osed. A scheme on synchronized disks. Seeger and Larson [22] recent survey can b e found in [21]. These classi ca- prop osed an algorithm to distribute the no des of the tion includes metho ds that transform rectangles into B-tree on di erent disks. Their algorithm takes into ac- p oints in a higher dimensionality space [12], metho ds count not only the resp onse time of the individual query that use linear quadtrees [8] [1] or, equivalently, the but also, the throughput of the system. z -ordering [17] or other space lling curves [6] [13], and nally, metho ds based on trees (k-d-trees [4], k- Parallelization of the R-trees is an unexplored topic, d-B-trees [19], hB-trees [16], cell-trees [9] e.t.c.) to the b est of the authors' knowledge. Next, we present One of the most characteristic approaches in the last some software designs to achieve ecient paralleliza- class is the the R-tree [11]. Due to space limitations, tion. 3 Alternative designs 3.3 Multiplexed (`MX') R-tree In this scheme we use a single R-tree, with each no de The underlying le structure is the R-tree. Given that, spanning one disk page. No des are distributed over our goal is to design a server for spatial ob jects on a the d disks, with p ointers across disks. For example, parallel architecture, to achieve high throughput under Figure 3 shows one p ossible multiplexed R-tree, corre- concurrent range queries.