The R*-Tree: an Efficient and Robust Access Method for Points And

The R*-tree: An Efficient and Robust Access Method for Points and Rectangles+ Norbert Beckmann, Hans-Peterbegel Ralf Schneider,Bernhard Seeger Praktuche Informatlk, Umversltaet Bremen, D-2800 Bremen 33, West Germany Abstract The most important property of this simple approxlmatlon The R-tree, one of the most popular access methods for 1s that a complex object 1s represented by a limited number rectangles, IS based on the heurlstlc optlmlzatlon of the area of bytes Although a lot of mformatlon 1s lost, mnumum of the enclosmg rectangle m each mner node By running bounding rectangles of spatial oblects preserve the most numerous experiments m a standardized testbed under highly essential geometric properties of the object, 1 e the varying data, queries and operations, we were able to design location of the oblect and the extension of the object in the R*-tree which mcorporates a combined optlmlzatlon of each axis area, margin and overlap of each enclosmg rectangle m the In [SK 881 we showed that known SAMs organlzmg directory Using our standardized testbed m an exhaustive (mmlmum bounding) rectangles are based on an underlymg performance comparison, It turned out that the R*-tree point access method (PAM) using one of the followmg three clearly outperforms the exlstmg R-tree varmnts Guttman’s techniques cllpplng, transformation and overlapping linear and quadratic R-tree and Greene’s variant of the R-tree regions This superlorlty of the R*-tree holds for different types of The most popular SAM for storing rectangles 1s the R- queries and operations, such as map overlay. for both tree [Gut 841 Followmg our classlflcatlon, the R-tree 1s rectangles and multldlmenslonal points m all experiments based on the PAM B+-tree [Knu 731 usmg the technique From a practical pomt of view the R*-tree 1s very attractive over-lapping regions Thus the R-tree can be easily because of the followmg two reasons 1 It effrclently implemented which considerably contributes to Its supports pomt and spattal data at the same time and 2 Its popularity lmplementatlon cost 1s only slightly higher than that of other R-trees The R-tree 1s based on a heurlstlc optlmlzatlon The optlmlzatton crlterlon which It persues, 1s to mmlmlze the l.Introduction area of each enclosing rectangle m the mner nodes This In this paper we will consider spatial access methods crlterlon IS taken for granted and not shown to be the best (SAMs) which are based on the approxlmatlon of a complex possible Questions arise such as Why not mnumlze the spatial object by the mmlmum boundmg rectangle with the margin or the overlap of such mlnlmum bounding sides of the rectangle parallel to the axes of the data space rectangles Why not optimize storage utlllzatlon? Why not optunlze all of these criteria at the same hme? Could these criteria mteract in a negative way? Only an engineering approach will help to find the best possible combmatlon of yIp--- + This work was supported by grant no Kr 670/4-3 from the optimization criteria Deutsche Forschun&iememschaft (German Research Necessary condltlon for such an engmeermg approach 1s Society) and by the Mmlstry of Environmental and Urban the avallablhty of a standardized testbed which allows us to Planning of Bremen run large volumes of experiments with highly varying data, queries and operations We have implemented such a standardized testbed and used It for performance comparisons parucularly of pomt access methods [KSSS 891 Pemxss~on to copy wthout fee all or part of this maternal IS granted prowded As the result of our research we designed a new R-tree that the copses are not made or dlstnbuted for dwzct commeraal advantage, the varmnt, the R*-tree, which outperforms the known R-tree ACM copy&t notice and the title of the pubbcatlon and its date appear, and notw IS gwn that cqymg II by pemuwon of the Assoaatlon for Computmg variants under all experiments For many reallstlc profiles Machmq To copy othemw, or to repubbsh requ,res a fee and/or speoflc of data and operations the gam m performance 1s quite pemllsslon 0 1990 ACM 089791365 5/!90/0@35/0322$150 considerable Additionally to the usual point query, 322 rectangle mtersectton and rectangle enclosure query, we have parameters of good retrieval performance affect each other m analyzed our new R*-tree for the map overlay operation. a very complex way, such that rt 1s rmposstble to optrmlze also called spatial lout. which 1s one of the most rmportant one of them without influencing other parameters whtch operatrons m geographic and envrronmental database may cause a deterroratron of the overall performance systems Moreover, smce the data rectangles may have very different This paper is organized as follows In sectron 2, we size and shape and the drrectory rectangles grow and shrmk tntroduce the prrncrples of R-trees rncludrng their dynamically, the success of methods which wrll opttmrze optimizatron criteria In section 3 we present the existing one parameter 1svery uncertam Thus a heurrstrc approach IS R-tree variants of Guttman and Greene Section 4 describes applied, whrch is based on many different experiments rn detail the design our new R*-tree The results of the carried out m a systematrc framework comparrsons of the R*-tree wrth the other R-tree varmnts are reported m section 5 Section 6 concludes the paper In this section some of the parameters which are essential for the retrieval performance are considered Furthermore, 2. Principles of R-trees and possible tnterdependencres between different parameters and optimization criteria optnnrzatron criteria are analyzed An R-tree 1s a B+-tree like structure which stores multrdrm- (01) The area covered by a drrectory rectangle should be ensional rectangles as complete ObJects without clipping mtnrmrzed, 1 e the area covered by the boundmg rectangle them or transformmg them to higher drmensronal points but not covered by the enclosed rectangles, the dead space, before should be mmlmrzed Thus will Improve performance smce A non-leaf node contarns entries of the form (cp, decrsrons which paths have to be traversed, can be taken on Rectangle) where cp 1s the address of a child node m the higher levels R-tree and Rectangle 1s the mnumum boundmg rectangle of all rectangles which are entries m that child node A leaf (02) The overlap between drrectory rectangles should be node contams entries of the form (Old, Rectangle) where mwmrzed Thts also decreases the number of paths to be Old refers to a record m the database, describing a spatial traversed oblect and Rectangle 1s the enclosrng rectangle of that spatial oblect Leaf nodes containing entries of the form (03) The margrn of a dwectory rectangle should be (datuob.tect, Rectangle) are also possrble This wrll not mlnrmrzed Here the margin 1s the sum of the lengths of the affect the basic structure of the R-tree In the followmg we edges of a rectangle Assummg fixed area, the oblect wnh wrll not consider such leaf nodes the smallest margrn IS the square Thus mmtmrzmg the margin mstead of the area, the dvectory rectangles wrll be Let M be the maximum number of entries that will fit m one shaped more quadratrc Essentrally queries with large node and let m be a parameter specrfymg the mrmmum quadratic query rectangles will profit from this optimizatron number of entries m a node (2 5 m < M/2) An R-tree More important. mmrmrzatlon of the margm wrll basrcally satisfies the following properties improve the structure Since quadratic objects can be packed l The root has at least two children unless rt 1s a leaf easier, the bounding boxes of a level will build smaller l Every non-leaf node has between m and M children unless directory rectangles m the level above Thus clustermg it is the root rectangles into bounding boxes wrth only little variance of l Every leaf node contans between m and M entries unless the lengths of the edges wrll reduce the area of dtrectory It 1s the root rectangles l All leaves appear on the same level (04) Storage utlltzatron should be optlmrzed Higher An R-tree (R*-tree) 1s completely dynamrc, msertrons and storage utrhzatron will generally reduce the query cost as the deletions can be intermixed with queries and no perrodrc height of the tree wrll be kept low Evidently. query types global reorgamzatron 1s required Obvrously, the structure with large query rectangles are influenced more smce the must allow overlappmg drrectory rectangles Thus rt cannot concentratron of rectangles m several nodes wrll have a guarantee that only one search path 1s requued for an exact stronger effect rf the number of found keys IS hrgh match query For further mformatron we refer to [Gut841 We wrll show m this paper that the overlappmg-regrons- Keepmg the area and overlap of a directory rectangle small, techruque does not rmply bad average retrieval performance requires more freedom m the number of rectangles stored m Here and rn the followmg, we use the term directory one node Thus mmrmrzmg these parameters will be paid rectangle, which 1s geometrrcally the minrmum bounding with lower storage utrlrzatton. Moreover, when applymg rectangle of the underlymg rectangles (01) or (02) more freedom rn choosrng the shape 1s necessary Thus rectangles wrll be less quadratic Wrth (01) The main problem rn R-trees 1s the followrng For an the overlap between directory rectangles may be affected m arbitrary set of rectangles, dynamrcally burld up bounding a postttve way srnce the coverrng of the data space 1s boxes from subsets of between m and M rectangles, m a reduced As for every geometrrc optrmrzatron, muumrzmg way that arbitrary retrieval operatrons with query rectangles the margins wrll also lead to reduced storage uttltzatton of arbitrary srze are supported effrcrently The known However, smce more quadratrc directory rectangles support 323 packing better, It will be easier to maintain high storage Algorithm QuadraticSplit utlllzatlon Obviously, the performance for queries with [Divide a set of M+l entries mto two groups] sufflclently large query rectangles will be affected more by QSl Invoke PickSeeds to choose two entries to be the first the storage utlllzatlon than by the parameters of (Ol)-(03) entries of the groups QS2 Repeat DlstrlbuteEntry 3.

Load more