The Buddy-Tree: an Efficient and Robust Access Method for Spatial Data Base Systems *
Total Page:16
File Type:pdf, Size:1020Kb
The Buddy-Tree: An Efficient and Robust Access Method for Spatial Data Base Systems * BERNHARD SEEGER+ and HANS-PETER KRIEGEL PRAKTISCHE INFORMATIK, UNIVERSITY OF BFtEMEN,D-2800 BREMEN 33, WEST GERMANY In this paper, we propose a new multidimensional access In non-standarddatabase applications, such as geographic method, called the buddy-tree, to support point as well as information processing or CAD/CAM, accessmethods spatial data in a dynamic environment. The buddy-tree are required that support efficient manipulation of can be seen as a compromise of the R-tree and the grid multidimensional geometric objects on secondary file, but it is fundamentally different from each of them. storage. Moreover, efficient access methods are an Becausegrid files loose performancefor highly correlated essential part in knowledge-basedsystems [HCKW 903. data, the buddy-tree is designedto organize such data very We can basicly distinguish between point accessmethods efficiently, partitioning only such parts of the data space (PAMs) and spatial access methods (SAMs) which are which contain data and not partitioning empty data space. designed to handle multidimensional point data, e.g. The directory consists of a very flexible partitioning and records ordered by a multidimensional key, and spatial reorganization scheme based on a generalization of the data, e.g. polygons or rectangles,respectively. buddy-system. As for B-trees, the buddy-tree fulfills the First of all, these accessmethods must be dynamic, i.e. property that insertions and deletions are restricted to they should support arbitrary insertions and deletions of exactly one path of the directory. Additional important objects without any global reorganizations and without properties which are not fulfilled in this combination by any loss of performance. Moreover they should any other multidimensional tree-basedaccess method are: efficiently support a large set of queries, such as range, (i) the directory grows linear in the number of records, partial match,join and nearestneighbor queries, (ii) no overflow pages are allowed, (iii) the data spaceis The basic principle of all multidimensional PAMs is to partitioned into minimum bounding rectangles of the partition the data space into page regions, shortly actual data and (iv) the performance is basicly regions, such that all records of a data page are taken independent of the sequenceof insertions. In this paper, from one region. We classify according to the following we introduce the principles of the buddy-tree, the three properties of regions: the regions are pairwise organization of its directory and the most important disjoint or not, the regions are rectangular or not and the algorithms. Using our standardizedtestbed, we present a partition into regions is complete or not, i.e. the union performance comparison of the buddy-tree with other of all regions spans the complete data space or not, access methods demonstrating the superiority and Obviously, this classification yields six classes, four of robustnessof the buddy-tree. which are filled with known PAMs. Without going into detail, in table 1 we present well known PAMs according to thesethree criteria. All of the PAMs in class (C 1) perform rather efficient Permission to copy without fee all or part of this material is for uniform and uncorrelated data. However, for highly granted provided that the copies arc not made or distributed for correlateddatatheirperformancedegenerates. direct commercial advantage. the VLDB copyright notice and the title of the publication and its date appear. and notice is given (*) This work was supported by grant no. Kr 670/4-3 from that copying is by permission of the Very Large Data Base the Deutsche Forschungsgemeinschaft DFG (German Research Society) and by the Ministry of Enviromental Endowment. To copy otherwise. or to republish. requires a fee and Urban Planning of Bremen and/or special permission from the Endowment. (+) B.Seeger has a one year leave of absence from the University of Bremen which he presently spending at the University of Waterloo, Canada. fiiancially supported by Proceedings of the 16th VLDB Conference a Post Doctoral Fellowship from the DFG Brisbane, Australia 1990 590 property class PAM rectangular complete disjoint interpolation hashing wur 831. MOLHPE m 861. quartile hashing [KS 891. PLOP-hashing [KS 881. k-d-B tree [Rob 813. multidimensional extendible X hashing [Tam 82,Oto 841, balanced multidimensional (Cl) x X extendible hash tree [Oto 861. grid file [NHS 841. Zlevel grid file [Hin 851. interpolation-based grid file [Ouk 851 twin grid file [HSW 881 C2) x X (C3> x X buddy tree, multilevel grid file IwK 851 B+-tree with z-o&z [OM 841, BANG file [Fre 873. K4) X X hB-tree [LS 891 Table 1 : Classification of multidimensional PAMs. Therefore other PAMs like the BANG-file or hB-tree whose sidesare parallel to the axes of the data space.One have been proposed allowing more general shapes of technique to generate such a SAM from a PAM is the regions which are constructedby difference and union of transformation of d-dimensional rectangles into 2d- rectangles. dimensional points where for example the first d Quite a different approachfor the efficient organization components represent the center, the remaining d of highly correlated data is the buddy-tree. The most components represent the extension of the rectangle important characteristic is that the union of all regions ([Hin85], [SK88]). These 2d-dimensional points are does not span the complete data space.Thus the buddy- highly correIatedand occupy only a small part of the data tree avoids partitioning empty data space. Instead the space.In particuhu for such distributions the buddy-tree buddy-tree uses a similar concept as the R-tree [Gut841 performs very efficiently. and the R*-tree [BKSS 901 for spatial data, but differs In the paper we will use the following notations: The from the R-tree variants by avoiding overlap in the tree parameter d, d2 1, specifies the dimension of the data directory. In comparison to previously proposed tree spaceD. The data space D is composedof the domains structuressuch as the K-D-B-tree, the buddy-tree guaran- Di, 1 I i I d, of the i-th axis. On these domains an order teesa more efficient dynamic behavior.Moreover, indirect relation should be well defined. Without loss of splits which cause low storage utilization and high generality we assume that D is given by the d- insertion costs in the K-D-B-tree, are completely avoided. dimensional unit square [O,l)d The parametersb, b > 1, Therefore, the sameproperties are fullidled as for B-trees and c, c > 1, denote the capacity of a data page and [BM 72]:deletions, insertions and exact match queries are directory page,respectively. restricted to one path of the directory. ‘Ibis behavior is The paper is organized as follows. In section 2 we guaranteedby using a generalization of the buddy system introduce the principles and the properties of the buddy- which was originally proposed for the grid file. Due to tree on a more informal level. In section 3 we present a this concept, the performanceof the buddy-tree is almost formal description of the structure of the buddy-tree and independentof the sequencein which data is inserted. in section 4 we propose a generally applicable Furthermore, we propose a special implementation implementation technique for increasing the fan out of technique for the buddy-tree which can be generalizedto directory nodes. Section 5 contains a description of the other accessmethods, such as the R-tree variants. From essential algorithms of the buddy-tree. Finally, in section this the buddy-tree gains a high fan out of the directory 6 we present an experimental performance comparison nodes. Thus the height of the tree and the retrieval cost which demonstratesthe superiority of the buddy-tree to are reduced. Most SAMs assumethat geometric objects other PAMs, such as the hB-tree, the BANG-file and the are approximated by a minimal bounding rectangle grid file. 591 of the @J&IV - Tra l empty data spaceis not partitioned The buddy-tree organizesdata using a tree-baseddirectory l insertion and deletion of a record is restricted to where each axis is treated equally. In contrast to the K-D- exactly one path B-tree Rob811 (one of the first multidimensional trees), l no overflow pages the buddy-tree performs well in a highly dynamic l directory grows linear in the number of records environment, i. e. insertions, deletions and a change of l performanceis basicly independentof the sequenceof the data distribution do not affect performance. This insertions property is achieved by applying a modified version of the l efficient behavior for insertions and deletions so-called buddy-system which is well-known from the l very high fan out of the directory nodes grid file [NHS84] to the buddy-tree. Additionally, the performanceof the buddy-tree is almost independentof the With the following example we intend to visualize the sequenceof insertions which is an essential drawback of basic propertiesof the buddy-tree: previous tree-structures, like the K-D-B-tree or hB-tree lLS891. Another important feature of the buddy-tree is that it Let the dimension be d = 2, the capacity of a directory does not partition empty data space. Therefore queries, page be c = 5 and the capacity of a data page be b = 4. such as partial match queries, where the query region Then the following snapshots depict the growth of the intersects with empty data space,can be performed much buddy-tree starting with the empty file. In the data pages faster than by conventional structures partitioning the the actual points are stored. Minimum bounding complete data space. This property is very similar to the rectangles of at most 4 points are represented in the variants of the R-tree, originally designed for spatial data directory pagesindicated by a light fill pattern.