The Quadtree and Related Hierarchical Data Structures
Total Page:16
File Type:pdf, Size:1020Kb
The Quadtree and Related Hierarchical Data Structures HANAN SAMET Computer Sdence Department, University of Maryland, College Park, Maryland 20742 A tutorial survey is presented of the quadtree and related hierarchical data structures. They are based on the principle of recursive decomposition. The emphasis is on the representation of data used in applications in image processing, computer graphics, geographic information systems, and robotics. There is a greater emphasis on region data (i.e., two-dimensional shapes) and to a lesser extent on point, curvilinear, and three dimensional data. A number of operations in which such data structures find use are examined in greater detail. Categories and Subject Descriptors: E.1 [Data]: Data Structures-trees; H.3.2 [Information Storage and Retrieval]: Information Storage-file organization; 1.2.1 [Artificial Intelligence]: Applications and Expert Systems-cartography; 1.2.10 [Artificial Intelligence): Vision and Scene Understanding-representations, data structures, and transforms; 1.3.3 [Computer Graphics]: Picture/Image Generation display algorithms; viewing algorithms; 1.3.5 [Computer Graphics]: Computational Geometry and Object Modeling-curve, surface, solid, and object representations; geometric algorithms, languages, and systems; l.4.2 [Image Processing]: Compression (Coding)-approximate methods; exact coding; 1.4. 7 [Image Processing]: Feature Measurement-moments; projections; size and shape; J.6 [Computer-Aided Engineering]: Computer-Aided Design (CAD) General Terms: Algorithms Additional Key Words and Phrases: Geographic information systems, hierarchical data structures, image databases, multiattribute data, multidimensional data structures, octrees, pattern recognition, point data, quadtrees, robotics INTRODUCTION structures used in different domains are related to each other and to quadtrees. This Hierarchical data structures are becoming presentation concentrates on these differ increasingly important representation tech ent representations and illustrates how a niques in the domains of computer graph number of basic operations that use them ics, image processing, computational geom are performed. etry, geographic information systems, and Hierarchical data structures are useful robotics. They are based on the principle of because of their ability to focus on the recursive decomposition (similar to divide interesting subsets of the data. This focus and conquer methods [Aho et al. 1974]). ing results in an efficient representation One such data structure is the quadtree. As and improved execution times and is thus we shall see, the term quadtree has taken particularly useful for performing set op on a generic meaning. In this survey it is erations. Many of the operations that we our goal to show how a number of data describe can often be performed equally as Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. © 1984 ACM 0360-0300/84/0600-0187 $00.75 Computing Surveys, Vol.16, No. 2, June 1984 188 • Hanan Samet CONTENTS of no interest, and we wish to spend a minimal amount of effort searching such regions. Yet, traditional region representa tions such as the boundary code [Freeman INTRODUCTION 1974] are very local in application, making 1. OVERVIEW OF QUADTREES it difficult to avoid examining a corn-grow 2. REGION DATA ing area that meets the desired elevation 2.1 Neighbor-Finding Techniques 2.2 Alternative Ways to Represent Quadtrees criterion. In contrast, hierarchical methods 2.3 Conversion such as the region quadtree are more global 2.4 Set Operations in nature and enable the elimination of 2.5 Transformations larger areas from consideration. Another 2.6 Areas and Moments query might be to determine whether two 2. 7 Connected Component Labeling 2.8 Perimeter roads intersect within a given area. We 2.9 Component Counting could check them point by point, but a more 2.10 Space Requirements efficient method of analysis would be to 2.11 Skeletons and Medial Axis Transforms represent them by a hierarchical sequence 2.12 Pyramids 2.13 Quadtree Approximation Methods of enclosing rectangles and to discover 2.14 Volume Data whether in fact the rectangles do overlap. 3. POINT DATA If they do not, then the search is termi 3.1 Point Quadtrees and k-d Trees nated, but if an intersection is possible, 3.2 Region-Based Qualities 3.3 Comparison of Point Quadtrees then more work may have to be done, de and Region-Based Quadtrees pending on which method of representation 3.4 CIF Quadtrees is used. A similar query can be constructed 3.5 Bucket Methods for point data-for example, to determine 4. CURVILINEAR DATA all cities within 50 miles of St. Louis that 4.1 Strip Trees 4.2 Methods Based on a Regular Decomposition have a population in excess of 20,000 peo 4.3 Comparison ple. Again, we could check each city indi 5. CONCLUSIONS vidually, but using a representation that ACKNOWLEDGMENTS decomposes the United States into square REFERENCES areas having sides of length 100 miles would mean that at most four squares need to be examined. Thus California and its adjacent states can be safely ignored. Finally, sup efficiently, or more so, with other data pose that we wish to integrate our queries structures. However, hierarchical data over a database containing many different structures are attractive because of their types of data (e.g., points, lines, and areas). conceptual clarity and ease of implemen A typical query might be, "Find all cities tation. with a population in excess of 5000 people As an example of the type of problems to in wheat-growing regions within 20 miles which the techniques described in this sur of the Mississippi River." In the remainder vey are applicable, consider a cartographic of this survey we shall present a number of database consisting of a number of maps different ways of representing data so that and some typical queries. The database such queries and other operations can be contains a contour map, say at 50-foot ele efficiently processed. vation intervals, and a land use map clas The coverage and scope of the survey are sifying areas according to crop growth. Our focused on region data, and are concerned wish is to determine all regions between to a lesser extent witli point, curvilinear, 400- and 600-foot elevation levels where and three-dimensional data. Owing to space wheat is grown. This will require an inter limitations, algorithms are presented only section operation on the two maps. Such in a descriptive manner. Whenever possi an analysis could be rather costly, depend ble, however, we have tried to motivate ing on the way the data are represented. critical steps by a liberal use of examples. For example, areas where corn is grown are The concept of a pyramid is discussed only Computing Surveys, Vol. 16, No. 2, June 1984 The Quadtree and Related Hierarchical Data Structures • 189 briefly, and the reader is referred to the position process is applied) may be fixed collection of papers edited by Rosenfeld beforehand, or it may be governed by prop [1983] for a more comprehensive exposi erties of the input data. tion. Similarly, we discuss image compres Our first example of quadtree represen sion and coding only in the context of hi tation of data is concerned with the repre erarchical data structures. Results from sentation of region data. The most studied computational geometry, although related quadtree approach to region representa to many of the topics covered in this survey, tion, termed a region quadtree, is based on are only discussed briefly in the context of the successive subdivision of the image ar representations for curvilinear data. For ray into four equal-sized quadrants. If the more details on early results involving some array does not consist entirely of l's or of these and related topics, the interested entirely ofO's (i.e., the region does not cover reader may consult the surveys by Bent the entire array), it is then subdivided into ley and Friedman [1979], Edelsbrunner quadrants, subquadrants, etc. until blocks [1984), Nagy and Wagle [1979], Requicha are obtained (possibly single pixels) that [1980), Srihari [1981], Samet and Rosen consist entirely of l's or entirely of O's; that feld [1980], and Toussaint [1980]. Over is, each block is entirely contained in the mars (1983] has produced a particularly region or entirely disjoint from it. Thus the good treatment of point data. A broader region quadtree can be characterized as a view of the literature can be found in re variable resolution data structure. For ex lated bibliographies, for example, Edels ample, consider the region shown in Figure brunner and van Leeuwen [1983] and Ro la, which is represented by the 23 by 23 senfeld [1984]. Nevertheless, given the binary array in Figure lb. Observe that the broad and rapidly expanding nature of the l's correspond to picture elements (termed field, we are bound to have omitted signif pixe/,s) that are in the region and the O's icant concepts and references. In addition correspond to picture elements that are we at times devote a disproportionate outside the region. The resulting blocks for amount of attention to some concepts at the array of Figure lb are shown in Figure the expense of others. This is principally le. This process is represented by a tree of for expository purposes as we feel that it is degree 4 (i.e., each nonleaf node has four better to understand some structures well sons). The root node corresponds to the rather than to give the reader a quick run entire array.