Range Searching

Range Searching

Range Searching ² Data structure for a set of objects (points, rectangles, polygons) for efficient range queries. Y Q X ² Depends on type of objects and queries. Consider basic data structures with broad applicability. ² Time-Space tradeoff: the more we preprocess and store, the faster we can solve a query. ² Consider data structures with (nearly) linear space. Subhash Suri UC Santa Barbara Orthogonal Range Searching ² Fix a n-point set P . It has 2n subsets. How many are possible answers to geometric range queries? Y 5 Some impossible rectangular ranges 6 (1,2,3), (1,4), (2,5,6). 1 4 Range (1,5,6) is possible. 3 2 X ² Efficiency comes from the fact that only a small fraction of subsets can be formed. ² Orthogonal range searching deals with point sets and axis-aligned rectangle queries. ² These generalize 1-dimensional sorting and searching, and the data structures are based on compositions of 1-dim structures. Subhash Suri UC Santa Barbara 1-Dimensional Search ² Points in 1D P = fp1; p2; : : : ; png. ² Queries are intervals. 15 71 3 7 9 21 23 25 45 70 72 100 120 ² If the range contains k points, we want to solve the problem in O(log n + k) time. ² Does hashing work? Why not? ² A sorted array achieves this bound. But it doesn’t extend to higher dimensions. ² Instead, we use a balanced binary tree. Subhash Suri UC Santa Barbara Tree Search 15 7 24 3 12 20 27 1 4 9 14 17 22 25 29 1 3 4 7 9 12 14 15 17 20 22 24 25 27 29 31 u v xlo =2 x hi =23 ² Build a balanced binary tree on the sorted list of points (keys). ² Leaves correspond to points; internal nodes are branching nodes. ² Given an interval [xlo; xhi], search down the tree for xlo and xhi. ² All leaves between the two form the answer. ² Tree searches takes 2 log n, and reporting the points in the answer set takes O(k) time; assume leaves are linked together. Subhash Suri UC Santa Barbara Canonical Subsets ² S1;S2;:::;Sk are canonical subsets, Si ⊆ P , if the answer to any range query can be written as the disjoint union of some Si’s. ² The canonical subsets may overlap. ² Key is to determine correct Si’s, and given a query, efficiently determine the appropriate ones to use. ² In 1D, a canonical subset for each node of the tree: Sv is the set of points at the leaves of the subtree with root v. 15 24 7 {9,12,14,15} 3 12 20 27 {4,7} {17,20} 1 4 9 14 17 22 25 29 {3} {22} 1 3 4 7 9 12 14 15 17 20 22 24 25 27 29 31 u v xlo =2 x hi =23 Subhash Suri UC Santa Barbara 1D Range Query 15 24 7 {9,12,14,15} 3 12 20 27 {4,7} {17,20} 1 4 9 14 17 22 25 29 {3} {22} 1 3 4 7 9 12 14 15 17 20 22 24 25 27 29 31 u v xlo =2 x hi =23 ² Given query [xlo; xhi], search down the tree for leftmost leaf u ¸ xlo, and leftmost leaf v ¸ xhi. ² All leaves between u and v are in the range. ² If u = xlo or v = xhi, include that leaf’s canonical set (singleton) into the range. ² The remainder range determined by maximal subtree lying in the range [u; v). Subhash Suri UC Santa Barbara Query Processing ² Let z be the last node common to search paths from root to u; v. ² Follow the left path from z to u. When path goes left, add the canonical subset of right child. (Nodes 7, 3, 1 in Fig.) ² Follow the right path from z to v. When path goes right, add the canonical subset of left child. (Nodes 20, 22 in Fig.) 15 24 7 {9,12,14,15} 3 12 20 27 {4,7} {17,20} 1 4 9 14 17 22 25 29 {3} {22} 1 3 4 7 9 12 14 15 17 20 22 24 25 27 29 31 u v xlo =2 x hi =23 Subhash Suri UC Santa Barbara Analysis 15 24 7 {9,12,14,15} 3 12 20 27 {4,7} {17,20} 1 4 9 14 17 22 25 29 {3} {22} 1 3 4 7 9 12 14 15 17 20 22 24 25 27 29 31 u v xlo =2 x hi =23 ² Since search paths have O(log n) nodes, there are O(log n) canonical subsets, which are found in O(log n) time. ² To list the sets, traverse those subtrees in linear time, for additional O(k) time. ² If only count is needed, storing sizes of canonical sets at nodes suffices. ² Data structure uses O(n) space, and answers range queries in O(log n) time. Subhash Suri UC Santa Barbara Multi-Dimensional Data Y Q X ² Range searching in higher dimensions? ² kD-trees [Jon Bentley 1975]. Stands for k-dimensional trees. ² Simple, general, and arbitrary dimensional. Asymptotic search complexity not very good. ² Extends 1D tree, but alternates using x- y-coordinates to split. In k-dimensions, cycle through the dimensions. Subhash Suri UC Santa Barbara kD-Trees p p 4 5 p 10 p 9 p 2 p p 8 p p p p p p p 6 3 4 5 8 9 10 p 3 p 1 7 p p p p 1 2 6 7 Subdivision Tree structure ² A binary tree. Each node has two values: split dimension, and split value. ² If split along x, at coordinate s, then left child has points with x-coordinate · s; right child has remaining points. Same for y. ² When O(1) points remain, put them in a leaf node. ² Data points at leaves only; internal nodes for branching and splitting. Subhash Suri UC Santa Barbara Splitting p p 4 5 p 10 p 9 p 2 p p 8 p p p p p p p 6 3 4 5 8 9 10 p 3 p 1 7 p p p p 1 2 6 7 Subdivision Tree structure ² To get balanced trees, use the median coordinate for splitting—median itself can be put in either half. ² With median splitting, the height of the tree guaranteed to be O(log n). ² Either cycle through the splitting dimensions, or make data-dependent choices. E.g. select dimension with max spread. Subhash Suri UC Santa Barbara Space Partitioning View p p 4 5 p 10 p 9 p 2 p p 8 p p p p p p p 6 3 4 5 8 9 10 p 3 p 1 7 p p p p 1 2 6 7 Subdivision Tree structure ² kD-tree induces a space subdivision—each node introduces a x- or y-aligned cut. ² Points lying on two sides of the cut are passed to two children nodes. ² The subdivision consists of rectangular regions, called cells (possibly unbounded). ² Root corresponds to entire space; each child inherits one of the halfspaces, so on. ² Leaves correspond to the terminal cells. ² Special case of a general partition BSP. Subhash Suri UC Santa Barbara Construction p p 4 5 p 10 p 9 p 2 p p 8 p p p p p p p 6 3 4 5 8 9 10 p 3 p 1 7 p p p p 1 2 6 7 Subdivision Tree structure ² Can be built in O(n log n) time recursively. ² Presort points by x and y-coordinates, and cross-link these two sorted lists. ² Find the x-median, say, by scanning the x list. Split the list into two. Use the cross-links to split the y-list in O(n) time. ² Now two subproblems, each of size n=2, and with their own sorted lists. Recurse. ² Recurrence T (n) = 2T (n=2) + n, which solves to T (n) = O(n log n). Subhash Suri UC Santa Barbara Searching kD-Trees p p 4 5 p 10 p 9 p 2 p p 8 p p p p p p p 6 3 4 5 8 9 10 p 3 p 1 7 p p p p 1 2 6 7 The range Nodes visited in search ² Suppose query rectangle is R. Start at root node. ² Suppose current splitting line is vertical (analogous for horizontal). Let v; w be left and right children nodes. ² If v a leaf, report cell(v) \ R; if cell(v) ⊆ R, report all points of cell(v); if cell(v) \ R = ;, skip; otherwise, search subtree of v recursively. ² Do the same for w. ² Procedure obviously correct. What is the time complexity? Subhash Suri UC Santa Barbara Search Complexity p p 4 5 p 10 p 9 p 2 p p 8 p p p p p p p 6 3 4 5 8 9 10 p 3 p 1 7 p p p p 1 2 6 7 The range Nodes visited in search ² When cell(v) ⊆ R, complexity is linear in output size. ² It suffices to bound the number of nodes v visited for which the boundaries of cell(v) and R intersect. ² If cell(v) outside R, we don’t search it; if cell(v) inside R, we enumerate all points in region of v; a recursive call is made only if cell(v) partially overlaps R; the kD-tree height is O(log n). ² Let ` be the line defining one side of R. ² We prove a bound on the number of cells that intersect `; this is more than what is needed; multiply by 4 for total bound.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    33 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us