Representation of multidimensional point data • How can we store multidimensional data, such as points in the plane, to also allow fast retrieval, updates and queries? • • k-d trees • Range trees and priority search trees Queries to point databases • Point query - determine if a specific point is in the database (rarely used) • Range query - identify the of data points whose coordinate values fall within a certain range - e.g., all points within distance k of a given point, all points within a rectangular region, etc. • Nearest neighbor query - find the nearest neighbor(s) to a given point independent of how far away they might be. • Find all records whose key values fall between two limiting values • Easy to do for one-dimensional data using binary search trees • Let L and U specify the lower and upper bounds for the search • Start at the root of the BST holding key K • 1) If L <= K <= U, include K in the answer set • 2) If L <= K, then search the left subtree • 3) If K <= U, search the right subtree Range searching procedure BSTRangeSearch (L, U, P, f Op); b {Perform Op on each node, n, in BST s pointed at by P if L <= key(n)<= U} a e n v If P = nil return;

K <-- Key(P); m q y If L <=K <= U then Op(P); if L <= K then BSTRange Search (L,U, r Left(P), Op); If K <= U then Range search from e to p BSTRangeSearch(L,U,Right(P), Op)

Sets of points

• Goal is to develop data structures for representing

sets of points in the plane. S = { (xi, yi) } • Simplest solution for bounded sets - an array of {0,1} with 1’s inserted at locations of points in S. • Need to specify two parameters to determine size of the array: • 1) Range of x and y coordinates. For example, we could have 0 <= x<=1 and 0 <= y <= 1. • 2) Precision of x and y coordinates. For example, x and y can be

represented as 6 bit numbers - .b1b2b3b4b5b6 • Combining the range and precision, we can determine the maximum number of points in any set chosen from that range and represented with that precision – size of the array. Sets of points

• So, for example, if 0 <= x <= 100 and 0 <= y <= 100 and the precision of a coordinate is 1.0, then we can have a maximum of 10,000 distinct points. • Insertion and deletion using this are trivial • But answering queries takes time proportional to the size of the array, rather than the number of points stored. • And typically points are “sparse” – think restaurants, banks, … Trees and (again)

• Suppose we want a data structure to store one dimensional points (locations on a line segment) for fast lookup. • Let the points be 6 bit numbers (think of dividing the line segment into 64 bins) • Consider the points 100100 (34), 110010 (50), 001001(9), 011100 (28) and insert them into a BST, in that order (clearly for the trie, the final trie is independent of the order in which the points are inserted). Insert 10010 (34), 110010 (50), 001001(9), 011100 (28) into BST

34

0 64 10010 (34), 110010 (50), 001001(9), 011100 (28)

34

50

0 64 10010 (34), 110010 (50), 001001(9), 011100 (28)

34

9 50

0 64 10010 (34), 110010 (50), 001001(9), 011100 (28) 34

9 50

28

0 64 Insert 10010 (34), 110010 (50), 001001(9), 011100 (28) into a trie

10010

0 64 Insert 100010 (34), 110010 (50), 001001(9), 011100 (28) into a trie

10010 110010

0 64 Insert 100010 (34), 110010 (50), 001001(9), 011100 (28) into a trie

001001

10010 110010

0 64 BST and Tries break up search key space in different ways. • The BST breaks up space based on the VALUES of the keys – so the root can break space into an arbitrary pair of intervals. • The Trie breaks up space by regular decomposition of the interval – splits sub intervals in half (0,1) whenever a new point is inserted. • We will use both of these methods for two dimensional points. Quadtrees

• Point – like a binary search – space will be decomposed at the values of the coordinates. • Each data point is represented as a node that contains seven fields • Four fields are pointers to the sons (NW,NE,SW,SE) • 2 fields for coordinates (x,y) • one field points to record containing information about the point (name) Example data for quad tree

• Name X Y • Chicago 35 40 • Mobile 50 10 • Toronto 60 75 • Buffalo 80 65 • Denver 5 45 • Omaha 25 35 • Atlanta 85 15 • Miami 90 5 Point quadtree

C Toronto

Buffalo D T O M

Denver

Chicago B A i Omaha

Atlanta Mobile

Miami Insertion into a point quadtree

• Insertion is similar to binary search trees • Compare the (x,y) coordinates of the new point to those of the quadtree nodes, branching until encountering a nil pointer procedure which_quadrant (newp, quadp); return (if x(newp) < x(quadp) then {West} if y(newp) < y(quadp) then SW else NW else {East} if y(newp) < y(quadp) then SE else NE) • This procedure is used to choose the appropriate son as we descend through the point quadtree Example

(35,40)

Chicago Example

Chicago (35,40)

Mobile (50,10) Example

Toronto (60,75)

Chicago (35,40)

Mobile (50,10) Example

Toronto (60,75)

Buffalo (80,65

Chicago (35,40)

Mobile (50,10) Proximity search

• It is all about pruning – ruling out entire subtrees that cannot possibly contain any points that fall in the query region • In the BST case, if we are visiting a node, N, holding key k, our search interval is [l,u] and k

1 Toronto2 (60,75)3 9 4 5 Buffalo (80,65 6 7 8 Chicago (35,40)

Mobile (50,10) Example

1. SE 2. SE,SW 3. SW 4. SE,NE 5. SW,NW Toronto (60,75) 6. NE

Buffalo (80,65

Chicago (35,40)

1 2 3 9 4 Mobile 5 (50,10)

6 7 8 Proximity search

• Find all points in a point quadtree within distance r of a given point, A. • Quadtree allows us to rule out large parts of the data structure from the search

1. SE 7. NE,NW 1 2 3 2. SE,SW 8. NW r 9 3. SW 9. All but NW/ 4 5 4. SE,NE ALL (BOX) 5. SW,NW 10. All but NE 6 7 8 6. NE 11. All but SW 12. All but SE 13. All Which quadrants to search based on location of root of quadtree 1 2 3 7. NE,NW 8. NW 9. All but NW 9 10 10. All but NE 11. All but SW 4 13 5 12. All but SE 13. All 12 11 6 7 8

1. SE 2. SE,SW 3. SW If quadtree node falls in region I relative 4. SE,NE to query point, then search the 5. SW,NW regions listed 6. NE Search algorithm • Let (u,v) be the coordinates of the point whose fixed radius neighbors we are searching for. • Algorithm will construct a list, V, of quadtree nodes to visit, and a list, N, of fixed radius neighbors. • Put R onto V, Initialize N to empty • If V is empty, halt; otherwise remove a node, T, from V. Let (x,y) be the coordinates of T • Compute the region, r, that (u,v) falls in relative to (x,y) using mask. • Case on r • r = 1: Add SE(T) to V • … • r = 13: Add T to N, add SE(T), SW(T), NE(T), NW(T) to V • repeat Example - Find all cities within 10 units of (83,10)

• Look only in SE quadrant of root

(Chicago) since Toronto! Chicago is in region 1 relative Buffalo! to search point

• Look at SE and Denver! NE quadrants of Mobile (SE son of Chicago! Chicago) since Omaha! 1! 2! 3! 9! 10! Mobile is in region Atlanta! r! 4! a! 5! 4 relative to Mobile! 13! search point 11! 12! 6Miami! ! 7! 8! Example

1. Chicago is in R4 Visit Toronto, Mobile Toronto! 2. Mobile is in R6

Buffalo! Visit Atlanta 3. Toronto is in R2 1 2 3 Visit Buffalo Denver! 9 Chicago4 ! 5

Omaha! 6 7 8

Atlanta! Mobile!

Miami! Deletion from point quadtrees

• Difficult to achieve efficiently • because deleting a node changes the partition lines for all of the descendants of that node • When deleting a node, remove the subtree rooted at the node and reinsert all of the nodes in the subtree • This is an expensive process, and often results in a deeper tree - yielding longer search times • Samet contains a better, but more complicated algorithm Point quadtree summary

• Lookup and insertion are straightforward extensions of search • just have to make a four way rather than a two way decision • Deletion is much more complicated and expensive • For higher dimensional data, storage requirements can be large

• (x1) - binary and each node has two pointers to children

• (x1, x2) - 2D point quadtree, and each node has 4 pointers

• (x1, x2, x3) - 3D point quadtree and each node has 8 pointers n • (x1, …xn) - nD point quadtree and each node has 2 pointers • Can overcome this problem by storing only non-null pointers at each node, but this makes the implementation more complex • can use a linked list of (pointer name, pointer value) of non-null pointers k-d trees - 2 pointers per node

• k- dimensionality of the data being represented • 2-d tree - representation for points in the plane • k-d tree is a binary tree 1) at each stage, a different coordinate of the key is used to determine the branching 2) first branch on the x coordinate of a key (ties break left) 3) then branch on the y coordinate (ties break down) 4) then alternate between the x and y coordinates 5) at a branch, the left son will have all keys smaller than the tested attribute, and the right son will have all keys greater than or equal to the test attribute

k-d tree example

Chicago (partition on x)

Chicago (35,40) k-d tree - example

Chicago(x)

Denver(y)

Denver (5,45) Chicago (35,40) k-d tree - example

Chicago(x)

Denver Denver(y) Mobile(y) (5,45) Chicago (35,40)

Mobile (50,10) k-d tree - example

Toronto Chicago(x) (60,75)

Denver(y) Mobile(y) Denver (5,45) Toronto(x) Chicago (35,40)

Mobile (50,10) k-d tree - example

Chicago(x)

Toronto (60,75) Denver(y) Mobile(y)

Miami(x) Toronto(x) Denver (5,45) Chicago (35,40)

Mobile Miami (50,10) (90,5) k-d tree - example

Buffalo Chicago(x) (80,65 Toronto (60,75) Denver(y) Mobile(y)

Miami(x) Toronto(x) Denver (5,45) Omaha(x) Chicago Buffalo(y) (35,40) Omaha (25,35)

Atlanta(x) Atlanta (85,15)

Mobile Miami (50,10) (90,5) Insertion into k-d tree k-d traverse (point, kdnode) begin return (if even_level (kdnode) then {test x} if x(point) < x(kdnode) then Left else Right else {test y} if y(point) < y(kdnode) then Left else Right Search in k-d trees • Goal: Find all nodes in a k-d tree whose distance from a point (a,b) is less than or equal to r

• k-d tree can be used to prune (a-r, b+r) many nodes from the search (a,b) 1) The minimum x coordinate of any node in the k-d tree within distance r r of (a,b) is a-r. (a+r, b-r) 2) The minimum y coordinate is b-r. 3) The maximum x and y coordinates are a+r and b+r, respectively. Search in k-d trees • If searches reaches a node with coordinates (e,f), and if (a-r, b-r) is in the "right" subtree of that node, then there is no need to search the left (a-r, b+r) subtree of that node • i.e., the leftmost point in the search region (a,b) is to the right of (e,f) or the bottom-most r point of the search region is to the top of (e,f) (a+r, b-r) • Similarly, if the point (a+r, b+r) ( the right-most point of the search region) is in the left subtree of (e,f), then its right subtree need not be searched. Searching in k-d trees

Chicago(x) Buffalo (80,65 Toronto Denver(y) Mobile(y) (60,75)

Miami(x) Toronto(x) Denver (5,45) Buffalo(y) Chicago (35,40) Omaha Atlanta(x) (25,35) Atlanta (85,15)

Mobile Miami (50,10) (90,5) Deletion from k-d trees

• General recursive procedure: • Want to delete node containing coordinates (a,b) from tree • If (a,b) is a leaf, replace it with the empty tree • Otherwise, have to find a “replacement” coordinate from one of the subtrees rooted at the descendants of (a,b) If (c,d) are the coordinates of the replacement point, we substitute (c,d) for (a,b) and then delete the node having coordinates (c,d) from the subtree. Deletion from k-d trees

• Finding a replacement • Suppose (a,b) is an x-discriminator • Find the node in the right subtree of (a,b) having minimal x- coordinate. It’s x coordinate will be less than or equal to the (a,b) x-coordinates of all other nodes in that subtree, but strictly greater than the x-coordinates of all points in the left subtree • cannot choose the node with maximal x coordinate in the left subtree because it might not be a unique x value, and definition (c,d) of k-d tree requires that all x coordinates in left subtree of x discriminator be less than x coordinate at root of subtree, so others would be lost to search (c,d) • Special case occurs when the right subtree is empty. In this case, we choose the node from the left subtree that has minimal x-coordinate as a replacement (c,d), switch the left subtree of (a,b) to the be the right subtree of (c,d) and delete the original copy of (c,d) from the subtree. • switching is required, again, because of asymmetry between left and right subtrees Deletion schema - non-empty right subtree

• Find k-d tree node containing key to be deleted (D) D • Find a replacement key in the right subtree • If D is an x (y) discriminator, this is a node in the right subtree with minimal x (y) coordinate (R) • this key has x(y) coordinate strictly greater than the x (y) coordinates of any key in the left subtree of D, so it can validly replace the key at D. • Replace D with R, and apply the deletion R algorithm to R • When we replace a key with another key from a leaf, the deletion is completed Deletion schema - empty right subtree

• Suppose the node, n1 containing D has an empty right subtree D n R n • Find the replacement node in the 1 1 left subtree • if D is an x (y) discriminator, this is the T T node having minimal x (y) coordinate. • this node may not be unique, because it R R is possible that several nodes in the left subtree have this minimum value as their x (y) coordinates • Replace D with R, and make T the

right subtree of n1 • Recursively, delete R from T Example - delete Chicago

Buffalo! Chicago(x) - 35 (80,85)! ! Toronto! (60,75)! !

Denver(y) - 45 Mobile(y) - 10 Denver! (5,45)! ! Chicago! (35,40)! ! Omaha! (25,35)! ! Omaha(x) Miami(x) - 90 Toronto(x) - 60 Atlanta! - 25 (85,15)! !

Mobile! Miami! (50,10)! ! (90,5)! ! Buffalo(y) - 85 • Replace Chicago with Mobile Atlanta(x) - 85 Example

Buffalo! Mobile(x) - 50 (80,85)! ! Toronto! (60,75)! !

Denver(y) - 45 Mobile(y) - 10 Denver! (5,45)! ! Chicago! (35,40)! ! Omaha! (25,35)! ! Omaha(x) Miami(x) - 90 Toronto(x) - 60 Atlanta! - 25 (85,15)! !

Mobile! Miami! (50,10)! ! (90,5)! ! Buffalo(y) - 85

Atlanta(x) - 85 • Replace Mobile with Atlanta Example

Mobile(x) - 50 Buffalo Toronto

Denver(y) - 45 Atlanta- 15

Denver

Omaha Atlanta

Omaha(x) Miami(x) - 90 Toronto(x) - 60 Mobile - 25

Miami Buffalo(y) - 85 Review

• Want to investigate data structures that can store multi- dimensional data • Simplest case is two dimensional data, where keys are of the form (x,y) • Call these data structures spatial data structures • In contrast to simple one dimensional keys where the basic query is “lookup,” for two dimensional data the basic queries to spatial data structures are • Range queries – find all points in the data structure that fall inside a given region. Examples include find all points inside the circle of radius r centered at some point, all points with x coordinates in [a,b] and y coordinates in [c,d] – a rectangle, … • Nearest neighbor queries – both single nearest neighbors and multiple nearest neighbors Review

• There are two types of hierarchical spatial data structures • Analogs to binary search trees, in which two dimensional space is recursively partitioned at the locations of the points in the spatial data structure. • Point quadtrees • k-d trees • Analogs to tries, in which there are fixed decompositions of space, and the ones needed to store a data set depend on the elements in the data set. • Will get to these later. Review

• For tree based spatial data structures • Insertion is just like insertion for BST’s – multiple branches for point quadtrees and alternating tests on x and y coordinates for k-d trees. • Insertion always occurs at a leaf • Deletion is also like BST’s for k-d Trees– lookup the item to be deleted, replace it by an appropriate key in the subtree rooted at its location and recursively delete the substitution key from the subtree rooted at that location. • Slight complication introduced by the fact that there is an asymmetry between the left and right subtrees – equality on the tested coordinate has to be broken to either left or right. Example of deletion

(20,20) -x (25,50)

(10,30) (25,50) (10,30) find min x in (25,50) right subtree (35,25) (35,25) (30,45) (55,40) (30,45) (55,40) (45,35) (30,35) (30,35) (45,35) (50,30) (50,30) Deletion

(35,25) (25,50) - y find min y in left subtree (35,25) (35,25) and switch (30,45) (55,40) (30,45) (55,40) (45,35) (30,35) (45,35) (30,35) (50,30) (50,30) Deletion

(35,25) - x find min x (45,35) in right sub (30,45) tree (55,40) (30,45) (55,40) (45,35) (45,35) (30,35) (30,35) (50,30) (50,30)

(55,40) switch x with its one (55,40) son (45,35) -x (50,30) (50,30) Deletion

(25,50)

(10,30) (35,25)

(45,35)

(30,45) (55,40)

(30,35) (50,30) k-d tree problems to think about

• A k-d tree can become very unbalanced, even if all we do is insert keys into the tree. If we knew the set of keys to be stored in a k-d tree, could you determine the order of insertion that would yield a “most balanced” tree. • The deletion algorithm involves finding, when an x-discriminator is deleted, the key in its right subtree having minimum x-coordinate to serve as the replacement key. Design an algorithm to efficiently find this replacement key. Range trees and priority search trees

• Data structures that allow fast rectangular range searching in 2-D and higher • Range trees • faster search than k-d trees and point quadtrees • higher storage requirements, since some points are duplicated in the tree • Binary tree of binary trees 1-d range trees

• Balanced BST where the data are stored in the leaf nodes and non leaf nodes contain midrange separators • Leaf nodes are then linked in sorted order in a doubly linked list • Range search [L:R] is conducted by searching for the smallest key >= L or the largest key <=L using the BST algorithm, and then following the linked list until encountering a key > R.

• For N points, process takes O(log2N + F) time, where F is the number of points found Example

16

9 28

6 12 22 36

4 8 10 14 20 25 32 40 Range search [11,27]

16

9 28

6 12 22 36

4 8 10 14 20 25 32 40

Initial search finds largest key < L Range trees (2D)

• Sort all points by their x-coordinates • Store these points in the leaf nodes of a balanced using a “midrange” value to separate keys in left and right subtrees

• also store at each nonleaf node, N, the pair (xmin, xmax ) - the minimum and maximum x-coordinate of points stored in the subtree rooted at N • Each nonleaf node contains a 1-D for all of the points in its subtree - source of duplication of points. This range tree is sorted on y. Range trees

Name X Y 55 Chicago 35 40 {all} (5,90) Mobile 50 10 Toronto 60 75 Buffalo 80 65 30 83 Denver 5 45 {O,D {T,B, (5,50) C,M} A,i} (60,90) Omaha 25 35 15 42 70 87 Atlanta 85 15 (5,25) (35,50) (60,80) (85,90) Miami 90 5 {D,O} {C,M} {T,B} {A,i}

D O Range tree

55 {all} (5,90)

30 83 {O,D {T,B, (60,90) (5,50) C,M} A,i}

15 42 70 87 {D,O} (5,25) {C,M} (35,50) {T,B} (60,80) {A,i} (85,90)

D O C M T B A i 5 25 35 50.. 60 80 85 90 Range search algorithm

• Goal is to find all points(x,y) with x in [a,b]

and y in [c,d] xT, x , x 2D-range-search (a,b,c,d,T) min max Find all keys satisfying x in [a,b] and y in [c,d] in range tree rooted at T case: 1.[ x min , x max ] ⊆ [ a , b :] return result of y-range query at T

2. xT ∈ [a,b]: Union(2D-range-search(a,b,c,d, T.left), 2D- range-search(a,b,c,d, T.right))

3. xT < a: 2D-range-search(a,b,c,d, T.right)

4: xT > b: 2D-range-search(a,b,c,d, T.left) Example -range search on x interval [35,62] y [40,86] 55 Case 2 {all} (5,90)

30 83 {O,D {T,B, Case 3 (60,90) Case 4 (5,50) C,M} A,i}

70 15 42 87 Case 4 {D,O} Case 1 (5,25) {C,M} (35,50) {T,B} (60,80) {A,i} (85,90)

D O C M T B A i 5 25 35 50 60 80 85 90 Range search algorithm:2

• Let α and β be the first and last leaf nodes whose x coordinates satisfy the x component of the query • Let Q be the lowest common ancestor of these two nodes - notice that the y range tree stored at Q contains all of the points we need to examine, but it also contains points that we may not have to examine.

• Let {Li} and {Ri} denote the sequence of nodes from Q to α and β, excluding Q. Example: x range search on [20,88]

55 {all} Q

30 83

L1 {O,D {T,B, C,M} R1 A,i} 87 15 42 70 R2 {D,O} L2 {C,M} {T,B} {A,i}

D O C M T B A i 5 25 35 50 60 80 85 90 α L3 β R3 Range search algorithm

• For every node P in {Li} whose left son is also in {Li} perform a one dimensional range search in the y-range tree associated with the right son of P.

• For every node P in {Ri} whose right son is also in {Ri} perform a one dimensional range search in the y-range tree associated with the left son of P. • Also have to check whether the keys in nodes α and β fit into the range. 2 • Complexity of search is O(log 2N + F) Example

55 {all} Q

30 83

L1 {O,D {T,B, C,M} R1 A,i} 87 15 42 70 R2 {D,O} L2 {C,M} {T,B} {A,i}

D O C M T B A i 5 25 35 50 60 80 85 90 α L3 R3 β Finding the lowest common ancestor

T <-- root of tree while not (Leaf(T)) do begin

if xu <= midrange (T) then T <-- Left(T)

else if midrange(T) <= xl then T <-- Right(T) else exit loop end Example - range search ([25:85];[8:16])

55 {all} Q = lowest Toronto common anc.

Buffalo 30 83 {O,D {T,B, Denver C,M} A,i}

Chicago 15 42 70 87 Omaha

Atlanta {D,O} {C,M} {T,B} {A,i} Mobile D A i Miami O

node α node β Priority search trees • Search trees for answering upper semi-infinite

queries ([xl: xu], [yl, ∞]) • Built as follows: • Build a range tree for the x-coordinates • Next, we will associate a point with each nonleaf node. At the root, choose the point having maximal y-coordinate from the entire set • Moving down the tree, choose the point in the subtree rooted at each node that has the maximal y coordinate, chosen only from those points not assigned previously. • If no point can be chosen, because all points in the subtree have been assigned to higher levels, then assign a null point. What goes at the root? Toronto, because it has the maximal y coordinate

75

5 25 35 50 60 80 85 95

5,45 25,35 35,40 50,10 60,75 80,65 85,15 90,5 Den Omaha Chic Mobi Tor Buf Atla Miami Left son of the root?

Denver, because it has the largest y Toronto coordinate of 75 points in its subtree 45

5 25 35 50 60 80 85 95

5,45 25,35 35,40 50,10 60,75 80,65 85,15 90,5 Den Omaha Chic Mobi Tor Buf Atla Miami Right son of the root?

Toronto Buffalo, because it has the largest y coordinate 75 of unused points in its Denver subtree 45 65

5 25 35 50 60 80 85 95

5,45 25,35 35,40 50,10 60,75 80,65 85,15 90,5 Den Omaha Chic Mobi Tor Buf Atla Miami Left son of Buffalo?

Null, because all of the points in its Toronto subtree have ben selected 75 Denver Buffalo 45 65

5 25 35 50 60 80 85 95

5,45 25,35 35,40 50,10 60,75 80,65 85,15 90,5 Den Omaha Chic Mobi Tor Buf Atla Miami Priority search trees

Toronto 75 Denver Buffalo 45 65

Omaha Chicago Atlanta 35 40 15

5 25 35 50 60 80 85 95

5,45 25,35 35,40 50,10 60,75 80,65 85,15 90,5 Den Omaha Chic Mobi Tor Buf Atla Miami Notice

• Not all points are chosen to be placed at internal nodes • They CANNOT – there is always a spare key at the bottom anyway. • For you to think about – what is the maximal number of internal nodes that can be assigned null during the construction of a priority search tree? Answering semi infinite range queries

• Start at the root, T, of the tree and apply this recursive algorithm • If T is null, then return • If the y-coordinate associated with T is null, then return, since all of the points in the subtree rooted at T have been previously examined

• If the y-coordinate associated with T < yl, then return because we are guaranteed that all points in the subtree rooted at T will also have a y coordinate too small for the range. • If all of these tests fail, then the y-coordinate of T is in the semi- infinite range, and we compare the x-coordinate of the point associated with T to the finite x range. If it falls in this range, we report this point as satisfying the query The recursive step

• Three cases to consider:

• midrange(T) falls in the interval [xl: xu]. In this case, we apply the algorithm recursively to both the left and right subtrees of T.

• midrange(T) is greater than xu.In this case, we can ignore the right subtree of T but must apply algorithm recursively to T’s left subtree.

• midrange(T) < xl. In this case we can ignore the left subtree of T, but apply the algorithm recursively to T’s right subtree. Example - ([35:80],[50:∞])

Toronto 75 Denver Buffalo 45 65

Omaha Chicago Atlanta 35 40 15

5 25 35 50 60 80 85 95

5,45 25,35 35,40 50,10 60,75 80,65 85,15 90,5 Den Omaha Chic Mobi Tor Buf Atla Miami • Output Toronto and descend both left and right Example -- ([35:80],[50:∞])

Toronto

75 Denver Buffalo

45 65 mid=83 Atlanta Omaha Chicago

35 40 15

5 25 35 50 60 80 85 95

5,45 25,35 35,40 50,10 60,75 80,65 85,15 90,5 • Left subtree fails, because 45 < 50 and algorithm terminates on path • Buffalo is selected, and its left subtree must be examined; right can be ignored since midrange will be > 80 Example

Toronto 75 Denver Buffalo 45 65

Omaha Chicago Atlanta 35 40 15

5 25 35 50 60 80 85 95

5,45 25,35 35,40 50,10 60,75 80,65 85,15 90,5

• This node has a null y coordinate, so the algorithm terminates on this path. Range priority trees

• An inverse priority search tree is a variant of a priority search tree in which we associate the minimum y-value of as-yet unselected coordinates with nodes in the tree. • A range priority tree is a BST on the y-coordinates of the point set in which we associate priority search trees with some nonleaf nodes, and inverse priority search trees with others. • Each nonleaf node which is a left son of its father is associated with a priority search tree of the points in its subtree • Each nonleaf node which is a right son of its father is associated with an inverse priority search tree of the points in its subtree. Range priority tree - example

Name X Y Chicago 35 40 Mobile 50 10 Toronto 60 75 Buffalo 80 65 Denver 5 45 midrange =38 P P-1 Omaha 25 35 Mi,M,A,O C,D,B,T Atlanta 85 15 Miami 90 5

P P-1 P P-1 Mi,M A,O C,D B,T

5 10 15 35 40 45 65 75

Mi M A O C D B T Range priority trees

• At each node of a range priority tree, the points in the leafs of the subtree rooted at that node are partitioned into two sets • the points in the left subtree, which are all < the midrange value, are stored in a priority search tree at the left son • the points in the right subtree, all larger than the midrange, are stored in an inverse priority search tree at the right son. Range priority trees

• To solve a range query in a range priority tree we start from the root

and use the midrange values, ym, to find the first node satisfying: ym • y1 < ym < yu yu • Its left and right subtrees partition all points having acceptable y ym ym coordinates into two sets. • in the left subtree are all the points of interest whose y coordinates are strictly yl

less than ym • and the right subtree has all of the points of interest whose y coordinates

are strictly greater than ym. Range priority trees

• Using the priority search tree at the left son, we can

compute the query ([xl: xu], [yl, ∞]) in O(logN + F) which is equivalent to the query ([xl: xu], [yl, ym]) since all points in the left subtree have y coordinates < ym. • Using the inverse priority search tree at the right son, we

can compute the query ([xl: xu], [-∞, yu]) O(logN + F) time. This is equivalent to the query ([xl: xu], [ym, yu]) since all of the points in the right subtree have y coordinates > ym. Searching range priority trees

• Using the priority search tree at the left son, we can

compute the query ([xl: xu], [yl, ∞]) in O(logN + F) - but how does this help us? • Notice that all of the points in the left subtree have y

coordinate < yu. So, the points that satisfy the semi- infinite range query must also satisfy the range query. • Similarly, all of the points in the right subtree have y

coordinates > yl. So, the points that satisfy the semi- infinite range query on the inverse tree must also satisfy the original range query. Example: ([25:60], [15:45])

Q L 1 R1 ([25:60],[15:∞]) ([25:60],[-∞: 45]) midrange =38 12 P P-1 Mi,M,A,O C,D,B,T 50 R2 L2 P P-1 P P-1 8 Mi,M A,O C,D B,T

5 10 15 35 40 45 65 75

Mi M A O C D B T y l yu Range priority tree – outer tree

Any upper semi-infinite range query run at the left son of the root is effectively a regular range query capped above by Range priority tree – outer tree

Any lower semi-infinite range query run at the right son of the root is effectively a regular range query capped below by Region-based quadtrees

• For point quadtrees and k-d trees, the data determines how the plane is subdivided. • For a region quadtree, the decomposition lines are fixed in advance. There are two types of region quadtrees for points: • MX quadtrees, in which all point data is found in the leaf nodes of the tree and all leaf nodes are at the same level • PR quadtrees, in which a quadrant that contains more than one point is split, recursively, until each quadrant contains one point. Data points can occur at many levels of the tree Region based quadtrees

• Quadtrees are really more like tries than trees • Advantages: • simpler algorithms for insertions, deletion and set operations • Disadvantages • might need a large data structure to store only a small number of points Quadtrees and tries

• Suppose data to be stored is of the form where 0 <= x < 1 and 0 <= y < 1. • Number the four children of a node 00, 01, 10, 11

• Let x = x0 x1 x2 x3 .... xp

• Let y = y0 y1 y2 y3 ... yp

• Then the node reached by the path x0y0, x1y1, ..., xpyp represent the set of all keys whose x- coordinates begins with x0x1...xp and whose y- coordinates begin with y0y1...yp. Quadtrees and tries

• Example - root represents all points; its 10 child represents all points whose x coordinate begins with 1 and y coordinate begins with 0. • These are points that look like < .1...., .0....> 1) 1/2 <= x < 1 2) 0 <= y < 1/2 • These points fall in one quadrant of the unit square • For PR quadtrees, if there is only one point in the data set in this quadrant, then the quadrant is a leaf, otherwise it has four children each representing a square of side length 1/4 Quadtrees and tries

0,0 1,1 1,0 .0..., .1... .1..., .1...

1,1 (.5, .5) 0,0 0,1 1,0 .10..., .01 .11..., .01 a b c d .0..., .0... b d 10,00 11,00 11,00 11,01 .10..., .00 .11..., .00... a c Quadtrees

• We will assume that: • all data points are integer coordinates • the range of coordinate values is between 0 and 2n. • the width of the resulting quadtree is W = 2 n-1. • the center of the quadtree is then at coordinate (W,W)

2n (W,W) Quadrant algorithm

quadrant procedure (x,y,W) begin 2n (W,W) value integer x,y,W; if x < W then if y < W then SW else NW else if y < W then SE else NE MX quadtrees

• All leaf nodes are found at the same depth • Dimensions of the space from which points are drawn must be known in advance. Required precision then determines maximum depth of the MX quadtree • Quadtree contains two types of nodes: • white nodes, which correspond to empty space • black nodes, which correspond to data points • When points are inserted, white nodes may be split many times. • When points are deleted, white nodes may be merged many times. MX quadtree example

Toronto (4, 6)

Buffalo (6, 5)

Denver Chicago (0, 3) (2, 3)

Atlanta T C O (6, 1)

Om aha (2, 2)

Mobile (4, 0)

Miami (7,0) Building the MX QT - insertion

Chicago Chicag o (2,3) (2,3)

Mobile (4,0) Building the MX quadtree

Tor onto Tor onto (4,6) (4,6)

Buffalo (6,5)

Chicag o Chicag o (2,3) (2,3)

Mobile Mobile (4,0) (4,0) Insertion into MX quadtree Procedure MX Insert (P, X, Y, R, W) { R is root and W is width = 2n; X,Y integers in range [0, 2n+1]} if NULL(R) then R <-- create_node T <-- R Q <-- Quadrant(X,Y,W) while W> 1 do begin if NULL (SON(T,Q)) then SON(T,Q) <-- create_node T <-- SON(T,Q) X <-- X mod W Y <-- Y mod W W <-- W/2 Q <-- Quadrant (X,Y,W) end SON(T,Q) <-- P

Insertion algorithm

• When inserting Buffalo: • W = 4 and Buffalo is in Tor onto (4,6) 2,2 NE quadrant, (X,Y) = (6,5) • (X,Y) becomes (2,1), W Buffalo 1,1 (6,5) becomes 2 and Buffalo is

Chicag o in the SE quadrant (2,3) 4,4 • (X,Y) becomes (0,1), W becomes 1 and P is inserted in the NW Mobile (4,0) quadrant. Deletion from MX quadtrees

• If deleting a node causes all of its father’s sons to be NIL, then we recursively delete the father also.

Toronto (4, 6)

Buffalo (6, 5)

Denver Chicago (0, 3) (2, 3)

Atlanta (6, 1) T C O Om aha (2, 2) Consider deleting Toronto Mobile (4, 0)

Miami (7,0) PR Quadtree

• MX quadtree requires that precision and range of data must be known in advance. • A PR quadtree is a regular, recursive decomposition of the plane that guarantees that each node contains at most one data point • so, nodes can be split arbitrarily many times to separate data points • data points can occur at many different levels of the PR quadtree Example

Chicago

• Quadtree contains only a single black node

• That node contains Chicago Chicago Example

Chic Mobile

Chicago

Mobile Example

Tor Chic Mobile Toronto

Chicago

Mobile Example

C M Toronto

Buffalo

T B Chicago

Mobile Example

*

Toronto

Buffalo T B D *

Chicago C O Denver Omaha M Atlanta

Mobile Miami A Mi Insertion and deletion

• Insertion 1) descend from root, comparing coordinates until encountering node in which point belongs 2) If node is white, insert point and change to black 3) If node is black, subdivide until the two points do not occupy the same quadrant • Deletion 1) Remove point from node and change node from black to white 2) If one brother is black and other two are white, shrink one level and recurse Deletion

*

Toronto

T B D Buffalo *

Chicago C O Denver Omaha Delete Mobile

M Atlanta

A Mi Miami Deletion

* B

Buffalo D * Chicago Denver C O Omaha

Atlanta

Miami A Mi Deletion

• When does deletion lead to collapsing? • Deleted node has exactly one brother that is a black leaf node and two that are white nodes • The black brother is moved up one level, and the condition is checked again

Buffalo Delete Buffalo Atlanta

Chicago Chicago Denver Denver Omaha Omaha

Atlanta

Miami Miami Storage efficiency

• Number of nodes in the quadtree can be large even when only a small number of data points are represented. • This is because the data points might be close to one another and require many levels of decomposition to be separated into distinct nodes

Quadtree contains only two data points but many nodes 1 2 3 9 10 r 4 a 5 13 11 12 6 7 8