CS 240 Module 08: Data Structures for Multi-Dimensional Data

Multi-Dimensional Data Various applications Module 8: Data Structures for Multi-Dimensional Data I Attributes of a product (laptop: price, screen size, processor speed, RAM, hard drive, ) ··· CS 240 - Data Structures and Data Management I Attributes of an employee (name, age, salary, ) ··· Dictionary for multi-dimensional data A collection of d-dimensional items Sajed Haque Veronika Irvine Taylor Smith Each item has d aspects (coordinates): (x0, x1, , xd 1) ··· − Based on lecture notes by many previous cs240 instructors Operations: insert, delete, range-search query David R. Cheriton School of Computer Science, University of Waterloo (Orthogonal) Range-search query: specify a range (interval) for certain aspects, and find all the items whose aspects fall within given Spring 2017 ranges. Example: laptops with screen size between 11 and 13 inches, RAM between 8 and 16 GB, price between 1,500 and 2,000 CAD Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 1 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 2 / 22 Multi-Dimensional Data One-Dimensional Range Search First solution: ordered arrays Each item has d aspects (coordinates): (x0, x1, , xd 1) ··· − I Running time: Aspect values (x ) are numbers i I Problem: does not generalize to higher dimensions Each item corresponds to a point in d-dimensional space Second solution: balanced BST (e.g., AVL tree) We concentrate on d = 2, i.e., points in Euclidean plane BST-RangeSearch(T , k1, k2) price (CAD) T : A balanced search tree, k1, k2: search keys Report keys in T that are in range [k1, k2] 1200 1. if T = nil then return (1200,1000) 1000 2. if key(T ) < k1 then range-search query (1350 x 1550, 700 y 1100) 3. BST-RangeSearch(T .right, k , k ) ≤ ≤ ≤ ≤ 1 2 800 4. if key(T ) > k2 then 5. BST-RangeSearch(T .left, k1, k2) 600 6. if k1 key(T ) k2 then item: ordered pair (x, y) R R ≤ ≤ ∈ × 7. BST-RangeSearch(T .left, k1, k2) 8. report key(T ) processor speed (MHz) 1100 1300 1400 1500 1600 1700 1800 9. BST-RangeSearch(T .right, k1, k2) Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 3 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 4 / 22 Range Search example One-Dimensional Range Search BST-RangeSearch(T , 30, 65) P1: path from the root to a leaf that goes right if k < k1 and left Nodes either on boundary, inside, or outside. otherwise P2: path from the root to a leaf that goes left if k > k2 and right 52 otherwise Partition nodes of T into three groups: 1 boundary nodes: nodes in P1 or P2 2 inside nodes: non-boundary nodes that belong to either (a subtree 35 74 rooted at a right child of a node of P1) or (a subtree rooted at a left child of a node of P2) 3 outside nodes: non-boundary nodes that belong to either (a subtree rooted at a left child of a node of P1) or (a subtree rooted at a right 15 42 65 97 child of a node of P2) k: number of reported items Nodes visited during the search: I O(log n) boundary nodes 9 27 39 46 60 69 86 99 I O(k) inside nodes I No outside nodes Note: Not every boundary node is returned. Running time O(log n + k) Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 5 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 6 / 22 2-Dimensional Range Search Quadtrees Each item has 2 aspects (coordinates): (xi , yi ) We have n points P = (x0, y0), (x1, y1), , (xn 1, yn 1) in the { ··· − − } Each item corresponds to a point in Euclidean plane plane Options for implementing d-dimensional dictionaries: How to build a quadtree on P: I Reduce to one-dimensional dictionary: combine the d-dimensional key I Find a square R that contains all the points of P (We can compute into one key minimum and maximum x and y values among n points) Problem: Range search on one aspect is not straightforward I Root of the quadtree corresponds to R I Use several dictionaries: one for each dimension I Split: Partition R into four equal subsquares (quadrants), each Problem: inefficient, wastes space correspond to a child of R I Partition trees I Recursively repeat this process for any node that contains more than F A tree with n leaves, each leaf corresponds to an item one point F Each internal node corresponds to a region I Points on split lines belong to left/bottom side F quadtrees, kd-trees I Each leaf stores (at most) one point I multi-dimensional range trees I We can delete a leaf that does not contain any point Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 7 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 8 / 22 Quadtrees Quadtree Operations Example: We have 13 points P = (x , y ), (x , y ), , (x , y ) in { 0 0 1 1 ··· 12 12 } the plane Search: Analogous to binary search trees Insert: I Search for the point I Split the leaf if there are two points Delete: I Search for the point I Remove the point I If its parent has only one child left, delete that child and continue the process toward the root. Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 9 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 10 / 22 Quadtree: Range Search Quadtree Conclusion QTree-RangeSearch(T , R) T : A quadtree node, R: Query rectangle 1. if (T is a leaf) then 2. if (T .point R) then Very easy to compute and handle ∈ 3. report T .point No complicated arithmetic, only divisions by 2 (usually the boundary 4. for each child C of T do box is padded to get a power of two). 5. if C.region R = then ∩ 6 ∅ 6. QTree-RangeSearch(C, R) Space wasteful Major drawback: can have very large height for certain nonuniform distributions of points spread factor of points P: β(P) = dmax /dmin Easily generates to higher dimensions (octrees, etc. ). dmax (dmin): maximum (minimum) distance between two points in P dmax height of quadtree: h Θ(log2 ) ∈ dmin Complexity to build initial tree: Θ(nh) Complexity of range search: Θ(nh) even if the answer is ∅ Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 11 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 12 / 22 kd-trees kd-trees We have n points P = (x0, y0), (x1, y1), , (xn 1, yn 1) in the { ··· − − } plane Quadtrees split square into quadrants regardless of where points kd-tree idea: Split the points into two (roughly) equal subsets actually lie A balanced binary tree kd-tree idea: Split the points into two (roughly) equal subsets How to build a kd-tree on P: p4 I Split P into two equal subsets using a vertical line p3 p9 I Split each of the two subsets into two equal pieces using horizontal lines p8 I Continue splitting, alternating vertical and horizontal lines, until every point is in a separate region p1 More details: p I Initially, we sort the n points according to their x-coordinates. 0 I The root of the tree is the point with median x coordinate (index p p n/2 in the sorted list) p 6 5 b c 2 I All other points with x coordinate less than or equal to this go into the p7 left subtree; points with larger x-coordinate go in the right subtree. I At alternating levels, we sort and split according to y-coordinates instead. Complexity: Θ(n log n), height of the tree: Θ(log n) Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 13 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 14 / 22 kd-tree: Range Search kd-tree: Range Search Complexity kd-rangeSearch(T , R, split[ ‘x’]) T : A kd-tree node, R: Query← rectangle The complexity is O(k + U) where k is the number of keys reported 1. if T is empty then return and U is the number of regions we go to but unsuccessfully 2. if T .point R then ∈ U corresponds to the number of regions which intersect but are not 3. report T .point 4. for each child C of T do fully in R 5. if C.region R = then Those regions have to intersect one of the four sides of R ∩ 6 ∅ 6. kd-rangeSearch(C, R) Q(n): Maximum number of regions in a kd-tree with n points that 7. if split = ‘x’ then intersect a vertical (horizontal) line 8. if T .point.x R.leftSide then ≥ 9. kd-rangeSearch(T .left, R, ‘y’) Q(n) satisfies the following recurrence relation: 10. if T .point.x < R.rightSide then 11. kd-rangeSearch(T .right, R, ‘y’) Q(n) = 2Q(n/4) + O(1) 12. if split = ‘y’ then 13. if T .point.y R.bottomSide then It solves to Q(n) = O(√n) ≥ 14. kd-rangeSearch(T .left, R, ‘x’) Therefore, the complexity of range search in kd-trees is O(k + √n) 15. if T .point.y < R.topSide then 16. kd-rangeSearch(T .right, R, ‘x’) Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 15 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 16 / 22 kd-tree: Higher Dimensions Range Trees kd-trees for d-dimensional space We have n points P = (x0, y0), (x1, y1), , (xn 1, yn 1) in the I At the root the point set is partitioned based on the first coordinate { ··· − − } I At the children of the root the partition is based on the second plane coordinate A range tree is a tree of trees (a multi-level data structure) I At depth d 1 the partition is based on the last coordinate − How to build a range tree on P: I At depth d we start all over again, partitioning on first coordinate I Build a balanced binary search tree τ determined by the x-coordinates Storage: O(n) of the n points I For every node v τ, build a balanced binary search tree τ (v) Construction time: O(n log n) ∈ assoc 1 1/d (associated structure of τ) determined by the y-coordinates of the Range query time: O(n − + k) nodes in the subtree of τ with root node v (Note: d is considered to be a constant.) Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 17 / 22 Haque, Irvine, Smith (SCS, UW) CS240 - Module 8 Spring 2017 18 / 22 Range Tree Structure Range Trees: Operations T Section 5.3 binary search tree on RANGE TREES x-coordinates Search: trivially as in a binary search tree Tassoc(ν) binary search tree Insert: insert point in τ by x-coordinate ν on y-coordinates From inserted leaf, walk back up to the root and insert the point in all associated trees τassoc (v) of nodes v on path to the root Delete: analogous to insertion Note: re-balancing is a problem! P(ν) Figure 5.6 P(ν) A 2-dimensional range tree returns the root of a 2-dimensional range tree T of P.

Load more