CS-1996-05 Range Searching1 Pankaj K. Agarwal2 Department Of
Total Page:16
File Type:pdf, Size:1020Kb
CS-1996-05 1 Range Searching 2 Panka j K. Agarwal Department of Computer Science DukeUniversity Durham, North Carolina 27708{0129 Septemb er 8, 1996 1 Work ion this pap er has b een supp orted by National Science Foundation Grant CCR-93{01259, byanArmyResearch Oce MURI grantDAAH04-96-1-0013, by a Sloan fellowship, byanNYI award, and by matching funds from Xerox Corp oration. 2 Department of Computer Science, Duke University, [email protected] Range Searching 1 RANGE SEARCHING Pankaj K. Agarwal INTRODUCTION Range searching is one of the central problems in computational geometry, b ecause it arises in many applications and a wide variety of geometric problems can b e formulated as a range-searching problem. Atypical range-searching problem has d the following form. Let S b e a set of n points in R , and let R b e a family of subsets; elements of R are called ranges . We wish to prepro cess S into a data structure so that for a query range R , the p oints in S \ R can b e rep orted or counted eciently. Typical examples of ranges include rectangles, halfspaces, simplices, and balls. If we are only interested in answering a single query, it can b e done in linear time, using linear space, by simply checking for each point p 2 S whether p lies in the query range. However, most of the applications call for querying the same point set S several times (or sometimes we also insert or delete a p oint p erio dically), in which case wewould liketoanswer a query faster by prepro cessing S into a data structure. Range counting and range reporting are just two instances of range-searching queries. Other examples include emptiness queries , where one wants to determine whether S \ R = ;, and extremal queries , where one wants to return a p oint with certain prop erty (e.g., returning a p oint with the largest x -co ordinate). In order to 1 encompass all di erenttyp es of range-searching queries, a general range-searching problem can be de ned as follows. Let (S; +) be a semigroup. For each point p 2 S , we assign a weight w (p) 2 S. For a query range R 2 R, we wish to P compute w (p). For example, range-counting queries can b e answered by p2S \R setting w (p) = 1 for every p 2 S and cho osing the semigroup to b e (Z; +), where + denotes the integer addition; range-emptiness queries by setting w (p)=1and cho osing the semigroup to be (f0; 1g; _); and range-rep orting queries by setting S w (p)=fpg and cho osing the semigroup to b e (2 ; [). Most of the range-searching data structures construct a family of `canonical' subsets of S , and for each canonical subset C , they store the weight w (A) = P w (p). For a query range r , the data structure searches for a small subfamily p2A S k of disjoint canonical subsets, A ;:::;A , so that A = r \ S , and then com- 1 k i i=1 P k putes w (A ). In order to exp edite the search, the structure also stores some i i=1 auxiliary information. Typically, the canonical subsets are organized in a tree-like data structure, each of whose no de v is asso ciated with a canonical subset A; v stores the weight w (A) and some auxiliary information. A query is answered by searching the tree in a top-down fashion, using the auxiliary information to guide the search. 2 Pankaj K. Agarwal MODEL OF COMPUTATION The p erformance of a data structure is measured by the time sp ent in answer- ing a query, called the query time and denoted by Q(n; d); by the size of the data structure, denoted by S (n; d); and by the time constructed in the data structure, called the preprocessing time and denoted by P (n; d). Since the data structure is constructed only once, its query time and size are more imp ortant than its prepro- cessing time. If a data structure supp orts insertion and deletion op erations, the update time is also relevant. We should remark that the query time of a range- rep orting query on any reasonable machine dep ends on the output size, so the query time for a range-rep orting query consists of two parts | search time ,which dep ends only on n and d,andreporting time, which dep ends on n; d, and the output size. Throughout this survey pap er we will use k to denote the output size. We assume that d is a small xed constant, and that the big-Oh notation hides constants dep ending on d. The dep endence on d of the p erformance of all the data structures mentioned here is exp onential, which makes them unsuitable for large values of d. We assume that each memory cell can store log n bits. The upp er b ounds will be given on pointer-machine or RAM mo dels, which are describ ed in [15, 107]. The main di erence between the two mo dels is that on the pointer machine a memory cell can b e accessed only through a series of p ointers while in the RAM mo del any memory cell can be accessed in constanttime. Most of the lower b ounds will b e given in the so-called semigroup model , whichwas originally intro duced byFredman [61] and which is muchweaker than the pointer machine or the RAM mo del. In the arithmetic mo del, a data structure is regarded as a set of precomputed sums in the underlying semigroup. The size of the data structure is the number of sums stored, and the query time is the number of semigroup op erations p erformed (on the precomputed sums) to answer a query; the query time ignores the cost of various auxiliary op erations, e.g., the cost of determining which of the precomputed sums should b e added to answer a query. Aweakness of the semigroup mo del is that it do es not allow subtractions even if the weights of p oints b elong to a group. Therefore, we will also consider the group model , in which b oth additions and subtractions are allowed. The size of any range-searching data structure is at least linear, for it has to store eachpoint(oritsweight) at least once, and the query time on any reasonable mo del of computation (e.g., pointer machine, RAM) is (log n) even for d = 1. Therefore, one would liketodevelop a linear-size data structure with logarithmic query time. Although near-linear-size data structures are known for orthogonal range searching in any xed dimension that can answer a query in p olylogarithmic time, no similar b ounds are known for range searching with more complex ranges (e.g., simplex, disks). In such cases, one seeks for a tradeo between the query time and the size of the data structure | how fast can a query b e answered using O (1) O (1) n log n space, howmuch space is required to answer a query in log n time, or what kind of tradeo b etween the size and the query time can b e achieved? The chapter is organized as follows. In Section 32.1 we review the orthogo- nal range-searching data structures, and in Section 32.2 we review simplex range- searching data structures. Section 32.3 surveys other variants and extensions of range searching. We study intersection-searching problems in Section 32.4, which Range Searching 3 can b e regarded as a generalization of range searching. Finally, Section 32.5 deals with various optimization queries. 1 ORTHOGONAL RANGE SEARCHING In the d-dimensional orthogonal range searching, the ranges are d-rectangles, Q d eachoftheform [a ;b ], where a ;b 2 R. This is an abstraction of the `multi- i i i i i=1 key' searching; see [21, 115]. For example, the points of S may corresp ond to employees of a company,each co ordinate corresp onding to a key such as age, salary, exp erience, etc. The queries of the form | rep ort all employees b etween the ages of 30 and 40 who earn more than $30; 000 and who haveworked for more than 5 years | can b e formulated as an orthogonal range-rep orting query. Because of its numerous applications, orthogonal range searching has b een studied extensively for the last 25 years. Asurvey of earlier results can b e found in the b o oks by Mehlhorn [88] and Preparata and Shamos [99]. In this section we review the more recentdata structures and the lower b ounds. GLOSSARY EPM Apointer machine with + op eration. APM A p ointer machine with basic arithmetic and shift op erations. Faithful semigroup A semigroup (S; +) is called faithful if for each n>0, for any T ;T f1; ::: ;ng so that T 6= T , and for every sequence of integers 1 2 1 2 ; > 0(i 2 T ;j 2 T ), there are s ;s ; ::: ;s 2 S suchthat i j 1 2 1 2 n X X s 6= s : i i j j i2T j 2T 1 2 Notice that (R ; +) is a faithful semigroup, but (f0; 1g; ) is not a faithful semi- group. UPPER BOUNDS Most of the recent orthogonal range-searching data structures are based on range trees , intro duced by Bentley [20].