Multidimensional Access Methods VOLKER GAEDE IC-Parc, Imperial College, London and OLIVER GU¨ NTHER Humboldt-Universita¨T, Berlin

Total Page:16

File Type:pdf, Size:1020Kb

Load more

Multidimensional Access Methods VOLKER GAEDE IC-Parc, Imperial College, London AND OLIVER GU¨ NTHER Humboldt-Universita¨t, Berlin Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that overlap a given search region). More than ten years of spatial database research have resulted in a great variety of multidimensional access methods to support such operations. We give an overview of that work. After a brief survey of spatial data management in general, we first present the class of point access methods, which are used to search sets of points in two or more dimensions. The second part of the paper is devoted to spatial access methods to handle extended objects, such as rectangles or polyhedra. We conclude with a discussion of theoretical and experimental results concerning the relative performance of various approaches. Categories and Subject Descriptors: H.2.2 [Database Management]: Physical Design—access methods; H.2.4 [Database Management]: Systems; H.2.8 [Database Management]: Database Applications—spatial databases and GIS; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval— search process, selection process General Terms: Design, Experimentation, Performance Additional Key Words and Phrases: Data structures, multidimensional access methods 1. INTRODUCTION nated in the geosciences and mechani- cal CAD, the range of possible applica- With an increasing number of computer tions has expanded to areas such as applications that rely heavily on multi- robotics, visual perception, autonomous dimensional data, the database commu- navigation, environmental protection, nity has recently devoted considerable and medical imaging [Gu¨ nther and attention to spatial data management. Buchmann 1990]. Although the main motivation origi- The range of interpretation given to This work was partially supported by the German Research Society (DFG/SFB 373) and by the ESPRIT Working Group CONTESSA (8666). Authors’ address: Institut fu¨ r Wirtschaftsinformatik, Humboldt-Universita¨t zu Berlin, Spandauer Str. 1, 10178 Berlin, Germany; email: ^{gaede,guenther}@wiwi.hu-berlin.de&. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. © 1998 ACM 0360-0300/98/0600–0170 $05.00 ACM Computing Surveys, Vol. 30, No. 2, June 1998 Multidimensional Access Methods • 171 CONTENTS tial database and image database are sometimes even used interchangeably. Strictly speaking, however, spatial databases contain multidimensional 1. INTRODUCTION data with explicit knowledge about ob- 2. ORGANIZATION OF SPATIAL DATA jects, their extent, and their position in 2.1 What Is Special About Spatial? space. The objects are usually repre- 2.2 Definitions and Queries sented in some vector-based format, and 3. BASIC DATA STRUCTURES 3.1 One-Dimensional Access Methods their relative positions may be explicit 3.2 Main Memory Structures or implicit (i.e., derivable from the in- 4. POINT ACCESS METHODS ternal representation of their absolute 4.1 Multidimensional Hashing positions). Image databases often place 4.2 Hierarchical Access Methods 4.3 Space-Filling Curves for Point Data less emphasis on data analysis. They 5. SPATIAL ACCESS METHODS provide storage and retrieval for unana- 5.1 Transformation lyzed pictorial data, which are typically 5.2 Overlapping Regions represented in some raster format. 5.3 Clipping Techniques developed for the storage 5.4 Multiple Layers 6. COMPARATIVE STUDIES and manipulation of image data can be 7. CONCLUSIONS applied to other media as well, such as infrared sensor signals or sound. In this survey we assume that the goal is to manipulate analyzed multidi- the term spatial data management is mensional data and that unanalyzed just as broad as the range of applica- images are handled only as the source tions. In VLSI CAD and cartography, from which spatial data can be derived. this term refers to applications that rely The challenge for the developers of a mostly on two-dimensional or layered spatial database system lies not so two-dimensional data. VLSI data are much in providing yet another collection usually represented by rectilinear poly- of special-purpose data structures. lines or polygons whose edges are iso- Rather, one has to find abstractions and oriented, that is, parallel to the coordi- architectures to implement generic sys- nate axes. Typical operations include tems, that is, to build systems with ge- intersection and geometric routing neric spatial data-management capabil- [Shekhar and Liu 1995]. Cartographic ities that can be tailored to the data are also two-dimensional with requirements of a particular application points, lines, and regions as basic prim- domain. Important issues in this con- itives. In contrast to VLSI CAD, how- text include the handling of spatial rep- ever, the shapes are often characterized resentations and data models, multidi- by extreme irregularities. Common op- mensional access methods, and pictorial erations include spatial searches and or spatial query languages and their map overlay, as well as distance-related optimization. operations. In mechanical CAD, on the This article is a survey of multidimen- other hand, data objects are usually sional access methods to support search three-dimensional solids. They may be operations in spatial databases. Figure represented in a variety of data formats, 1, which was inspired by a similar including cell decomposition schemes, graph by Lu and Ooi [1993], gives a first constructive solid geometry (CSG), and overview of the diversity of existing boundary representations [Kemper and multidimensional access methods. The Wallrath 1987]. Yet other applications goal is not to describe all of these struc- emphasize the processing of unanalyzed tures, but to discuss the most prominent images, such as X-rays and satellite im- ones, to present possible taxonomies, agery, from which features are ex- and to establish references to other lit- tracted. In those areas, the terms spa- erature. ACM Computing Surveys, Vol. 30, No. 2, June 1998 172 • V. Gaede and O. Gu¨ nther Figure 1. History of multidimensional access methods. Several shorter surveys have been had an impact on the design of multidi- published previously in various Ph.D. mensional access methods. Sections 4 theses such as Ooi [1990], Kolovson and 5 form the core of this survey, pre- [1990], Oosterom [1990], and Schiwietz senting a variety of point access meth- [1993]. Widmayer [1991] gives an over- ods (PAMs) and spatial access methods view of work published before 1991. (SAMs), respectively. Some remarks Like the thesis by Schiwietz, however, about theoretical and experimental his survey is available only in German. analyses are contained in Section 6, and Samet’s books [1989, 1990] present the Section 7 concludes the article. state of the art until 1989. However, they primarily cover quadtrees and re- 2. ORGANIZATION OF SPATIAL DATA lated data structures. Lomet [1991] dis- cusses the field from a systems-oriented 2.1 What Is Special About Spatial? point of view. The remainder of the article is orga- To obtain a better understanding of the nized as follows. Section 2 discusses requirements in spatial database sys- some basic properties of spatial data tems, we first discuss some basic prop- and their implications for the design erties of spatial data. First, spatial data and implementation of spatial data- have a complex structure. A spatial bases. Section 3 gives an overview of data object may be composed of a single some traditional data structures that point or several thousands of polygons, ACM Computing Surveys, Vol. 30, No. 2, June 1998 Multidimensional Access Methods • 173 arbitrarily distributed across space. It one needs special multidimensional ac- is usually not possible to store collec- cess methods. The main problem in the tions of such objects in a single rela- design of such methods, however, is tional table with a fixed tuple size. Sec- that there exists no total ordering ond, spatial data are often dynamic. among spatial objects that preserves Insertions and deletions are interleaved spatial proximity. In other words, there with updates, and data structures used is no mapping from two- or higher-di- in this context have to support this dy- mensional space into one-dimensional namic behavior without deteriorating space such that any two objects that are over time. Third, spatial databases tend spatially close in the higher-dimen- to be large. Geographic maps, for exam- sional space are also close to each other ple, typically occupy several gigabytes in the one-dimensional sorted sequence. of storage. The integration of secondary This makes the design of efficient ac- and tertiary memory is therefore essen- cess methods in the spatial domain tial for efficient processing [Chen et al. much more difficult than in traditional 1995]. Fourth, there is no standard al- databases, where a broad range of effi- gebra defined on spatial data, although cient and well-understood access meth- several proposals have been made in the ods is available. Examples for such one- past [Egenhofer 1989; Gu¨ ting 1989; dimensional access methods (also called Scholl and Voisard 1989; Gu¨ ting and single key structures, although that Schneider 1993]. This means in particu- term is somewhat misleading) include lar that there is no standardized set of the B-tree [Bayer and McCreight 1972] base operators. The set of operators de- and extendible hashing [Fagin et al. pends heavily on the given application 1979]; see Section 3.1 for a brief discus- domain, although some operators (such sion.
Recommended publications
  • A Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries

    A Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries

    ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume IV-4/W2, 2017 2nd International Symposium on Spatiotemporal Computing 2017, 7–9 August, Cambridge, USA A NOVEL APPROACH OF INDEXING AND RETRIEVING SPATIAL POLYGONS FOR EFFICIENT SPATIAL REGION QUERIES J.H. Zhao a,b, X.Z. Wang a, *, F.Y. Wang a,b, Z.H. Shen a, Y.C. Zhou a , Y.L. Wang c a Computer Network Information Center, Chinese Academy of Sciences, Beijing, China 100190 - (zjh, wxz, wfy, bluejoe zyc )@cnic.cn b University of Chinese Academy of Sciences, Beijing 100049 c Lawrence University, Appleton, Wisconsin, United States 54911 - [email protected] KEY WORDS: Spatial Region Query, DKD-Tree, Spatial Polygon Indexing, Spark, Retrieval Efficiency ABSTRACT: Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree) suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS.
  • The PN-Tree: a Parallel and Distributed Multidimensional Index

    The PN-Tree: a Parallel and Distributed Multidimensional Index

    The PN-Tree: A Parallel and Distributed Multidimensional Index M. H. ALI [email protected] Computer Science Dept., Purdue University, West Lafayette, IN 47907, USA A. A. SAAD M. A. ISMAIL Department of Computer Science and Automatic Control, Faculty of Engineering, Alexandria University, Alexandria, Egypt Abstract. Multidimensional indexing is concerned with the indexing of multi-attributed records, where queries can be applied on some or all of the attributes. Indexing multi-attributed records is referred to by the term multidimensional indexing because each record is viewed as a point in a multidimensional space with a number of dimensions that is equal to the number of attributes. The values of the point coordinates along each dimension are equivalent to the values of the corresponding attributes. In this paper, the PN-tree , a new index structure for multidimensional spaces, is presented. This index structure is an efficient structure for indexing multidimensional points and is parallel by nature. Moreover, the proposed index structure does not lose its efficiency if it is serially processed or if it is processed using a small number of processors. The PN-tree can take advantage of as many processors as the dimensionality of the space. The PN-tree makes use of B +-trees that have been developed and tested over years in many DBMSs. The PN-tree is compared to the Hybrid tree that is known for its superiority among various index structures. Experimental results show that parallel processing of the PN-tree reduces significantly the number of disk accesses involved in the search operation. Even in its serial case, the PN-tree outperforms the Hybrid tree for large database sizes.
  • An Introduction to Spatial Database Systems

    An Introduction to Spatial Database Systems

    An Introduction to Spatial Database Systems Ralf Hartmut Güting Praktische Informatik IV, FernUniversität Hagen D-58084 Hagen, Germany [email protected] Abstract: We propose a definition of a spatial database system as a database system that offers spatial data types in its data model and query language and supports spatial data types in its implemen- tation, providing at least spatial indexing and spatial join methods. Spatial database systems offer the underlying database technology for geographic information systems and other applications. We survey data modeling, querying, data structures and algorithms, and system architecture for such systems. The emphasis is on describing known technology in a coherent manner rather than on listing open problems. Invited Contribution to a Special Issue on Spatial Database Systems of the VLDB Journal (Vol. 3, No. 4, October 1994) September 1994 – 1 – 1 What is a Spatial Database System? In various fields there is a need to manage geometric, geographic, or spatial data, which means data related to space. The space of interest can be, for example, the two-dimensional abstraction of (parts of) the surface of the earth – that is, geographic space, the most prominent example –, a man-made space like the layout of a VLSI design, a volume containing a model of the human brain, or another 3d-space representing the arrangement of chains of protein molecules. At least since the advent of relational database systems there have been attempts to manage such data in database systems. Characteristic for the technology emerging to address these needs is the capability to deal with large collections of relatively simple geometric objects, for example, a set of 100 000 polygons.
  • Performance Comparison of Point and Spatial Access Methods

    Performance Comparison of Point and Spatial Access Methods

    Lecture Notes in Computer Science Edited by G. Goos and J. Hartmanis 409 A. Buchmann Ο. Günther T.R.Smith Y.-F.Wang (Eds.) Design and Implementation of Large Spatial Databases First Symposium SSD '89 Santa Barbara, California, July 17/18, 1989 Proceedings Springer-Verlag Berlin Heidelberg New York London Paris Tokyo Hong Kong Contents Data Structures Invited Talk: 7 + 2 Criteria for Assessing and Comparing Spatial Data Structures J. Nievergelt, ΕΤΗ Zürich, Switzerland 3 The Fieldtree: A Data Structure for Geographic Information Systems A U. Frank, R. Barrera, University of Maine, USA 29 A Full Resolution Elevation Representation Requiring Three Bits per Pixel C. A Shaffer, Virginia Polytechnic Institute, USA 45 System and Performance Issues The DASDBS GEO-Kerael: Concepts, Experiences, and the Second Step A. Wolf, ΕΤΗ Zürich, Switzerland 67 Performance Comparison of Point and Spatial Access Methods if.-P. Kriegel, M. Schiwietz, R. Schneider, B. Seeger, University of Bremen, FRG 89 Strategies for Optimizing the Use of Redundancy in Spatial Databases A Orenstein, Object Design, Inc., Cambridge, Massachusetts, USA.... 115 VIII Geographie Applications Invited Talk: Tiling Large Geographical Databases M. F. Goodchild, University of California, Santa Barbara, USA 137 Extending a Database to Support the Handling of Environmental Measurement Data L. Neugebauer, University of Stuttgart, FRG 147 Thematic Map Modeling M. Scholl, A. Voisard, INRIA, Chesnay, France 167 Quadtrees Invited Talk: Hierarchical Spatial Data Structures Κ Samet, University of Maryland, USA 193 Distributed Quadtree Processing C. H. Chien, T. Kanade, Carnegie-Mellon University, Pittsburgh, USA... 213 Node Distribution in a PR Quadtree C.-H. Ang, Κ Samet, University of Maryland, USA 233 Modeling and Data Structures An Object-Oriented Approach to the Design of Geographic Information Systems P.
  • Indexing Structures for Range Searching with Point Objects: a Survey

    Indexing Structures for Range Searching with Point Objects: a Survey

    Published by : International Journal of Engineering Research & Technology (IJERT) http://www.ijert.org ISSN: 2278-0181 Vol. 5 Issue 01, January-2016 Indexing Structures for Range Searching with Point Objects: A Survey P. Z. Piah P. O. Asagba Department of Computer Science Department of Computer Science Kenule Benson Saro-Wiwa Polytechnic University of Port Harcourt Bori-Rivers State, Nigeria Port Harcourt, Nigeria V. Ejiofor K. T. Igulu Department of Computer Science Department of Computer Science Nnamdi Azikiwe University Kenule Benson Saro-Wiwa Polytechnic Awka, Nigeria Bori-Rivers State, Nigeria Abstract—This paper discusses the various indexing since data cannot persist in the main memory. The quest is to structures for range queries. Attention is directed to point develop efficient indexing structures. The b-Tree and its objects because of its correlation to tuples of the tables of variants are the de facto structures used in most modern relational databases. It discusses the structures in details and relation databases. Another challenge of the community is possible ways of implementing them in relational databases. A indexing structures for range queries spanning many special attention is given to structures that are relatively efficient for range searching in multidimensional space. attributes. This paper reviews structures that can aid efficient manipulation of range queries especially in relational Keywords—Range; Searching; Query; Indexing; Structures; databases utilizing computational geometric techniques. This Multidimensional; Geometric Objects paper is organized as follows: section I introduces the paper, section II discusses query and its various categories, section I. INTRODUCTION III discusses the selected and relevant indexing structures and section IV concludes our discussion.
  • Spatial Data Structures

    Spatial Data Structures

    Spatial Data Structures Hanan Samet Computer Science Department and Institute of Advanced Computer Studies and Center for Automation Research University of Maryland College Park, MD 20742 Abstract An overview is presented of the use of spatial data structures in spatial databases. The fo cus is on hierarchical data structures, includin g a number of variants of quadtrees, which sort the data with resp ect to the space o ccupied by it. Such techniques are known as spatial indexing metho ds. Hierarchical data structures are based on the principle of recursive decomp osition. They are attractive b ecause they are compact and dep ending on the nature of the data they save space as well as time and also facilitate op erations such as search. Examples are given of the use of these data structures in the representation of di erent data typ es such as regions, p oints, rectangles, lines, and volumes. Keywords and phrases: spatial databases, hierarchical spatial data structures, p oints, lines, + rectangles, quadtrees, o ctrees, r-tree, r -tree image pro cessing. This work was supp orted in part by the National Science Foundation under Grant IRI{9017393. Ap- p ears in Modern Database Systems: The Object Model, Interoperability, and Beyond, W. Kim, ed., Addison Wesley/ACM Press, Reading, MA, 1995, 361-385. 1 Intro duction Spatial data consists of spatial ob jects made up of p oints, lines, regions, rectangles, surfaces, volumes, and even data of higher dimension which includes time. Examples of spatial data include cities, rivers, roads, counties, states, crop coverages, mountain ranges, parts in a CAD system, etc.
  • Learning Multi-Dimensional Indexes

    Learning Multi-Dimensional Indexes

    Learning Multi-dimensional Indexes Vikram Nathan∗, Jialin Ding∗, Mohammad Alizadeh, Tim Kraska {vikramn,jialind,alizadeh,kraska}@mit.edu Massachusetts Institute of Technology ABSTRACT be tree-based data structures (e.g., k-d trees, R-Trees, or oc- Scanning and filtering over multi-dimensional tables are key trees) or a specialized sort order over multiple attributes (e.g., operations in modern analytical database engines. To opti- a space-filling curve like Z-ordering or hand-picked hierar- mize the performance of these operations, databases often chical sort). Indeed, many state-of-the-art analytical database create clustered indexes over a single dimension or multi- systems use multi-dimensional indexes or sort-orders to im- dimensional indexes such as R-Trees, or use complex sort prove the scan performance of queries with predicates over orders (e.g., Z-ordering). However, these schemes are often several columns. For example, both Redshift [1] and Spark- hard to tune and their performance is inconsistent across dif- SQL [4] use Z-ordering to lay out the data; Vertica can define ferent datasets and queries. In this paper, we introduce Flood, a sort-order over multiple columns (e.g., first age, then date), a multi-dimensional in-memory read-optimized index that while IBM Informix, along with other spatial database sys- automatically adapts itself to a particular dataset and work- tems, uses an R-Tree [15]. load by jointly optimizing the index structure and data storage However, multidimensional indexes still have significant layout. Flood achieves up to three orders of magnitude faster drawbacks. First, these techniques are extremely hard to tune.
  • Spatial Partitioning and Indexing

    Spatial Partitioning and Indexing

    Geographic Information Technology Training Alliance (GITTA) presents: Spatial Partitioning and Indexing Responsible persons: Claudia Dolci, Dante Salvini, Michael Schrattner, Robert Weibel Spatial Partitioning and Indexing Content 1. Spatial Partitioning and Indexing .............................................................................................................2 1.1. Overview ...........................................................................................................................................3 1.1.1. Spatial Object Approximation ................................................................................................... 3 1.1.2. Spatial Data Access Methods .....................................................................................................5 1.1.3. Basics of Computer File and Database Structures .....................................................................6 1.1.4. Principles of Spatial Data Access and Search ............................................................................7 1.2. Regular Decomposition .................................................................................................................... 9 1.2.1. Regular Grids .............................................................................................................................9 1.2.2. Geometry allocation .................................................................................................................10 1.2.3. Quadtrees ................................................................................................................................