CS 240 Module 08: Data Structures for Multi-Dimensional Data

CS 240 Module 08: Data Structures for Multi-Dimensional Data

Multi-Dimensional Data Various applications Module 8: Data Structures for Multi-Dimensional Data I Attributes of a product (laptop: price, screen size, processor speed, RAM, hard drive, ) ··· CS 240 - Data Structures and Data Management I Attributes of an employee (name, age, salary, ) ··· Dictionary for multi-dimensional data A collection of d-dimensional items Sajed Haque Veronika Irvine Taylor Smith Each item has d aspects (coordinates): (x0, x1, , xd 1) ··· − Based on lecture notes by many previous cs240 instructors Operations: insert, delete, range-search query David R. Cheriton School of Computer Science, University of Waterloo (Orthogonal) Range-search query: specify a range (interval) for certain aspects, and find all the items whose aspects fall within given Spring 2017 ranges. Example: laptops with screen size between 11 and 13 inches, RAM between 8 and 16 GB, price between 1,500 and 2,000 CAD Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 1 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 2 / 22 Multi-Dimensional Data One-Dimensional Range Search First solution: ordered arrays Each item has d aspects (coordinates): (x0, x1, , xd 1) ··· − I Running time: Aspect values (x ) are numbers i I Problem: does not generalize to higher dimensions Each item corresponds to a point in d-dimensional space Second solution: balanced BST (e.g., AVL tree) We concentrate on d = 2, i.e., points in Euclidean plane BST-RangeSearch(T , k1, k2) price (CAD) T : A balanced search tree, k1, k2: search keys Report keys in T that are in range [k1, k2] 1200 1. if T = nil then return (1200,1000) 1000 2. if key(T ) < k1 then range-search query (1350 x 1550, 700 y 1100) 3. BST-RangeSearch(T .right, k , k ) ≤ ≤ ≤ ≤ 1 2 800 4. if key(T ) > k2 then 5. BST-RangeSearch(T .left, k1, k2) 600 6. if k1 key(T ) k2 then item: ordered pair (x, y) R R ≤ ≤ ∈ × 7. BST-RangeSearch(T .left, k1, k2) 8. report key(T ) processor speed (MHz) 1100 1300 1400 1500 1600 1700 1800 9. BST-RangeSearch(T .right, k1, k2) Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 3 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 4 / 22 Range Search example One-Dimensional Range Search BST-RangeSearch(T , 30, 65) P1: path from the root to a leaf that goes right if k < k1 and left Nodes either on boundary, inside, or outside. otherwise P2: path from the root to a leaf that goes left if k > k2 and right 52 otherwise Partition nodes of T into three groups: 1 boundary nodes: nodes in P1 or P2 2 inside nodes: non-boundary nodes that belong to either (a subtree 35 74 rooted at a right child of a node of P1) or (a subtree rooted at a left child of a node of P2) 3 outside nodes: non-boundary nodes that belong to either (a subtree rooted at a left child of a node of P1) or (a subtree rooted at a right 15 42 65 97 child of a node of P2) k: number of reported items Nodes visited during the search: I O(log n) boundary nodes 9 27 39 46 60 69 86 99 I O(k) inside nodes I No outside nodes Note: Not every boundary node is returned. Running time O(log n + k) Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 5 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 6 / 22 2-Dimensional Range Search Quadtrees Each item has 2 aspects (coordinates): (xi , yi ) We have n points P = (x0, y0), (x1, y1), , (xn 1, yn 1) in the { ··· − − } Each item corresponds to a point in Euclidean plane plane Options for implementing d-dimensional dictionaries: How to build a quadtree on P: I Reduce to one-dimensional dictionary: combine the d-dimensional key I Find a square R that contains all the points of P (We can compute into one key minimum and maximum x and y values among n points) Problem: Range search on one aspect is not straightforward I Root of the quadtree corresponds to R I Use several dictionaries: one for each dimension I Split: Partition R into four equal subsquares (quadrants), each Problem: inefficient, wastes space correspond to a child of R I Partition trees I Recursively repeat this process for any node that contains more than F A tree with n leaves, each leaf corresponds to an item one point F Each internal node corresponds to a region I Points on split lines belong to left/bottom side F quadtrees, kd-trees I Each leaf stores (at most) one point I multi-dimensional range trees I We can delete a leaf that does not contain any point Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 7 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 8 / 22 Quadtrees Quadtree Operations Example: We have 13 points P = (x , y ), (x , y ), , (x , y ) in { 0 0 1 1 ··· 12 12 } the plane Search: Analogous to binary search trees Insert: I Search for the point I Split the leaf if there are two points Delete: I Search for the point I Remove the point I If its parent has only one child left, delete that child and continue the process toward the root. Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 9 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 10 / 22 Quadtree: Range Search Quadtree Conclusion QTree-RangeSearch(T , R) T : A quadtree node, R: Query rectangle 1. if (T is a leaf) then 2. if (T .point R) then Very easy to compute and handle ∈ 3. report T .point No complicated arithmetic, only divisions by 2 (usually the boundary 4. for each child C of T do box is padded to get a power of two). 5. if C.region R = then ∩ 6 ∅ 6. QTree-RangeSearch(C, R) Space wasteful Major drawback: can have very large height for certain nonuniform distributions of points spread factor of points P: β(P) = dmax /dmin Easily generates to higher dimensions (octrees, etc. ). dmax (dmin): maximum (minimum) distance between two points in P dmax height of quadtree: h Θ(log2 ) ∈ dmin Complexity to build initial tree: Θ(nh) Complexity of range search: Θ(nh) even if the answer is ∅ Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 11 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 12 / 22 kd-trees kd-trees We have n points P = (x0, y0), (x1, y1), , (xn 1, yn 1) in the { ··· − − } plane Quadtrees split square into quadrants regardless of where points kd-tree idea: Split the points into two (roughly) equal subsets actually lie A balanced binary tree kd-tree idea: Split the points into two (roughly) equal subsets How to build a kd-tree on P: p4 I Split P into two equal subsets using a vertical line p3 p9 I Split each of the two subsets into two equal pieces using horizontal lines p8 I Continue splitting, alternating vertical and horizontal lines, until every point is in a separate region p1 More details: p I Initially, we sort the n points according to their x-coordinates. 0 I The root of the tree is the point with median x coordinate (index p p n/2 in the sorted list) p 6 5 b c 2 I All other points with x coordinate less than or equal to this go into the p7 left subtree; points with larger x-coordinate go in the right subtree. I At alternating levels, we sort and split according to y-coordinates instead. Complexity: Θ(n log n), height of the tree: Θ(log n) Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 13 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 14 / 22 kd-tree: Range Search kd-tree: Range Search Complexity kd-rangeSearch(T , R, split[ ‘x’]) T : A kd-tree node, R: Query← rectangle The complexity is O(k + U) where k is the number of keys reported 1. if T is empty then return and U is the number of regions we go to but unsuccessfully 2. if T .point R then ∈ U corresponds to the number of regions which intersect but are not 3. report T .point 4. for each child C of T do fully in R 5. if C.region R = then Those regions have to intersect one of the four sides of R ∩ 6 ∅ 6. kd-rangeSearch(C, R) Q(n): Maximum number of regions in a kd-tree with n points that 7. if split = ‘x’ then intersect a vertical (horizontal) line 8. if T .point.x R.leftSide then ≥ 9. kd-rangeSearch(T .left, R, ‘y’) Q(n) satisfies the following recurrence relation: 10. if T .point.x < R.rightSide then 11. kd-rangeSearch(T .right, R, ‘y’) Q(n) = 2Q(n/4) + O(1) 12. if split = ‘y’ then 13. if T .point.y R.bottomSide then It solves to Q(n) = O(√n) ≥ 14. kd-rangeSearch(T .left, R, ‘x’) Therefore, the complexity of range search in kd-trees is O(k + √n) 15. if T .point.y < R.topSide then 16. kd-rangeSearch(T .right, R, ‘x’) Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 15 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 16 / 22 kd-tree: Higher Dimensions Range Trees kd-trees for d-dimensional space We have n points P = (x0, y0), (x1, y1), , (xn 1, yn 1) in the I At the root the point set is partitioned based on the first coordinate { ··· − − } I At the children of the root the partition is based on the second plane coordinate A range tree is a tree of trees (a multi-level data structure) I At depth d 1 the partition is based on the last coordinate − How to build a range tree on P: I At depth d we start all over again, partitioning on first coordinate I Build a balanced binary search tree τ determined by the x-coordinates Storage: O(n) of the n points I For every node v τ, build a balanced binary search tree τ (v) Construction time: O(n log n) ∈ assoc 1 1/d (associated structure of τ) determined by the y-coordinates of the Range query time: O(n − + k) nodes in the subtree of τ with root node v (Note: d is considered to be a constant.) Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 17 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 18 / 22 Range Tree Structure Range Trees: Operations T Section 5.3 binary search tree on RANGE TREES x-coordinates Search: trivially as in a binary search tree Tassoc(ν) binary search tree Insert: insert point in τ by x-coordinate ν on y-coordinates From inserted leaf, walk back up to the root and insert the point in all associated trees τassoc (v) of nodes v on path to the root Delete: analogous to insertion Note: re-balancing is a problem! P(ν) Figure 5.6 P(ν) A 2-dimensional range tree returns the root of a 2-dimensional range tree T of P.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us