
Data Structures for Range Searching JON LOUIS BENTLEY Departments of Computer Sctence and Mathematics, Carnegte-Mellon Unwerslty, Pittsburgh, Pennsylvanta 15213 JEROME H. FRIEDMAN Computatmn Research Group, Stanford Lmear Accelerator Center, Stanford, Cahfornia 94305 Much research has recently been devoted to "multikey" searching problems. In this paper the partmular multlkey problem of range searching Is investigated and a number of data structures that have been proposed as solutions to this problem are surveyed. The purposes of this paper are to bring together a collection of widely scattered results, to acquaint the reader with the structures currently avadable for solving the particular problem of range searching, and to display a set of general methods for attacking multikey searching problems. Keywords and Phrases: analysis of algorithms, orthogonal range queries, range searching, cells, multidimensional binary search trees, projection CR Categorws. 3.63, 3.74, 5.25 INTRODUCTION tems, statistics, and design automation) and, in addition, serves as a representative The study of data structures for facilitating of the entire class of multikey searching rapid searching is a fascinating subject of problems. both practical and theoretical interest. We need some definitions to describe this Knuth [KNUT73] provides a definitive trea- particular searching problem precisely. In tise on the subject of searching when the database terminology a file is a collection search is based on only one "key," but he of records, each containing several attri- points out that not much was known at the butes or keys. A query asks for all records time his book was published about data satisfying certain characteristics. An or- structures for sets that have many "keys." thogonal range query asks for all records This subject area, which is often called with key values each within specified "multikey searching," "multidimensional ranges (that is, each key is between speci- searching," or "multiple attribute re- fied upper and lower bounds). The process trieval," has been the focus of a great deal of retrieving the appropriate records is of research in the past few years. In this called range searching. This problem can paper we study a small part of this area by also be cast in geometric terms by regarding surveying the work that has been done on the record attributes as coordinates and the one particular multikey searching problem. k values for each record as representing a This problem is important in itself (having point in a k-dimensional coordinate space. applications in such areas as database sys- The file of records then becomes a point set in k-space. The intersection of the query ranges is a k-dimensional hyperrectangle in Thin research was supported m part by the Office of Naval Research under Contract N00014-76-C-0370 the space {that is, a "box"), and a range and m part by the Department of Energy query calls for finding all points lying inside Permission to copy without fee all or part of this materml is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notme and the title of the pubhcatlon and its date appear, and notme is gwen that copying is by permlsmon of the Association for Computing Machinery. To copy otherwme, or to repubhsh, reqmres a fee and/or specific permission © 1979 ACM 0010-4892/79/1200-0397 $00 75 Computing Surveys, Voi. 11, No. 4, December 1979 398 • J.L. Bentley and J. H. Friedman CONTENTS that gathers together and presents in a common terminology a number of results that have recently appeared on the problem of range searching. This problem is of par- INTRODUCTION ticular interest for two reasons: First, it is 1 THE DATA STRUCTURES 1 1 Sequentml Scan an important problem in many practical 1 2 Projection applications (and a difficult theoretical 1 3 Cells problem!); second, the methods that we 14 k-d Trees investigate are broadly applicable to many 1 5 Range Trees other multikey searching problems. The 1 6 k-ranges 1 7 Other Structures second type of reader for whom this paper 1 8 Comparison of Methods is intended is a computer scientist who is 2 ADDITIONAL WORK somewhat familiar with data structures for 3 CONCLUSIONS single-key searching, and who would like a REFERENCES tutorial on the problem of range searching. For this reader, the methods that we dis- T cuss are described on an intuitive level, and references are given to more precise de- scriptions elsewhere in the literature. In Section 1 of this paper we examine six this hyperrectangle. We will often cast data structures for the range searching range searching in this geometric frame- problem in some detail, and then briefly work as an aid to intuition. compare those structures at the end of the Range searching arises in many applica- section. Additional work (that both has tions. In a geographic database of U.S. been done and needs to be done) is de- cities one might seek a list of all those with scribed in Section 2, and conclusions are latitude between 37 ° and 41 ° and longitude then offered in Section 3. between 102 ° and 109 ° (defining the state of Colorado). To compile an honor list of older students, a university administrator 1. THE DATA STRUCTURES may wish to know those students whose In this section we investigate a number of age is between 21 and 24 years and whose search methods for range searching. Each grade point average is between 3.5 and 4.0. search method is specified by a data struc- In data analysis it is often useful to do ture for storing the data and algorithms for separate analyses on sets of data lying in building (which we call preprocessing) and different regions (hyperrectangles) of the searching the structure. We will analyze a observation space and then compare (or search structure (say A) by giving three contrast) the respective results. (At the cost functions of N (the number of points) Stanford Linear Accelerator Center, for ex- and k (the number of dimensions): ample, over 10 hours per week of IBM 370/ 168 time is devoted to this application.) In • PA(N, k), the cost of preprocessing N statistics, range searching can be employed points in k-space into a data structure; to determine the empirical probability con- • SA(N, k), the storage required by the tent of a hyperrectangle, to determine em- data structure; pirical cumulative distributions, and to per- • QA(N, k), the search time or query cost. form density estimation (see LOFT65). Lauther [LAuT78] describes how range These costs can be analyzed in terms of searching can be used to solve a design their average or their worst case; we usually automation problem in very large-scale in- speak of the worst-case cost, explicitly men- tegrated circuitry (VLSI). tioning the average whenever we employ it. This paper has been written with two In many applications one may desire var- distinct audiences in mind. For the expert ious utility operations on data structures, in searching {with background either in such as insertion and deletion. In this sec- database systems or theoretical computer tion we ignore this issue, considering only science), this paper is intended as a survey static (unchanging) files; we then return to Computing Surveys, Vol 11, No. 4, December 1979 Data Structures for Range Searching . 399 ! I i y I ! I I !I r I ! a • o i n t g I tI t I t ! e I I i. i I I I I I I I I I !/,d, ' I I I ! D ! I I I / I I I I I I I I I I I I :>" 2' ' I O I I #; /,',, ,,' f I I I o I I U • I O I I/r Q I I I I t I I I I ! I I I I I I l I j/q /~d" ' ! I l I I I I I I I a I I I ¢" I f ! FmuR~ 1. IUustration of projectmn the question of dynamic structures in Sec- 1.2 Projection tion 2. The projection technique involves keeping, 1.1 Sequential Scan for each attribute, a sequence of the records in the file sorted by that attribute. One can The simplest approach to range searching view this geometrically as a projection of is to store the N points in a sequential list. the points on each coordinate. The k lists As each query arrives, all elements of the representing the projections can be ob- list are scanned and every record that sat- tained by using a standard sorting algo- isfies the query is reported. If the queries rithm k times. After preprocessing, a range do not have to be handled immediately, query can be answered by the following then they can be "batched" so that many search procedure: Choose one of the attri- queries can be processed with one sequen- butes, say the ith. Look up the two positions tial pass through the file. Since all k keys of in the ith sequence (using a binary search) the N records must be stored and each of the extreme values defining the range on k-key record is examined as the structure is the ~th attribute of the query. All records built or searched, it is easy to see that the satisfying the query will be in the list be- sequential scan structure SS has the prop- tween these two positions just found. This erties (smaller} list is then searched by brute Pss(N, k) = O(Nk), force. The projection technique is referred Sss(N, k) = O(Nk), to as inverted lists by Knuth [KNUT73].
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages13 Page
-
File Size-