Node
Total Page:16
File Type:pdf, Size:1020Kb
Representation of multidimensional point data • How can we store multidimensional data, such as points in the plane, to also allow fast retrieval, updates and queries? • Quadtrees • k-d trees • Range trees and priority search trees Queries to point databases • Point query - determine if a specific point is in the database (rarely used) • Range query - identify the set of data points whose coordinate values fall within a certain range - e.g., all points within distance k of a given point, all points within a rectangular region, etc. • Nearest neighbor query - find the nearest neighbor(s) to a given point independent of how far away they might be. Range searching • Find all records whose key values fall between two limiting values • Easy to do for one-dimensional data using binary search trees • Let L and U specify the lower and upper bounds for the search • Start at the root of the BST holding key K • 1) If L <= K <= U, include K in the answer set • 2) If L <= K, then search the left subtree • 3) If K <= U, search the right subtree Range searching procedure BSTRangeSearch (L, U, P, f Op); b {Perform Op on each node, n, in BST s pointed at by P if L <= key(n)<= U} a e n v If P = nil return; K <-- Key(P); m q y If L <=K <= U then Op(P); if L <= K then BSTRange Search (L,U, r Left(P), Op); If K <= U then Range search from e to p BSTRangeSearch(L,U,Right(P), Op) Sets of points • Goal is to develop data structures for representing sets of points in the plane. S = { (xi, yi) } • Simplest solution for bounded sets - an array of {0,1} with 1’s inserted at locations of points in S. • Need to specify two parameters to determine size of the array: • 1) Range of x and y coordinates. For example, we could have 0 <= x<=1 and 0 <= y <= 1. • 2) Precision of x and y coordinates. For example, x and y can be represented as 6 bit numbers - .b1b2b3b4b5b6 • Combining the range and precision, we can determine the maximum number of points in any set chosen from that range and represented with that precision – size of the array. Sets of points • So, for example, if 0 <= x <= 100 and 0 <= y <= 100 and the precision of a coordinate is 1.0, then we can have a maximum of 10,000 distinct points. • Insertion and deletion using this data structure are trivial • But answering queries takes time proportional to the size of the array, rather than the number of points stored. • And typically points are “sparse” – think restaurants, banks, … Trees and tries (again) • Suppose we want a data structure to store one dimensional points (locations on a line segment) for fast lookup. • Let the points be 6 bit numbers (think of dividing the line segment into 64 bins) • Consider the points 100100 (34), 110010 (50), 001001(9), 011100 (28) and insert them into a BST, Trie in that order (clearly for the trie, the final trie is independent of the order in which the points are inserted). Insert 10010 (34), 110010 (50), 001001(9), 011100 (28) into BST 34 0 64 10010 (34), 110010 (50), 001001(9), 011100 (28) 34 50 0 64 10010 (34), 110010 (50), 001001(9), 011100 (28) 34 9 50 0 64 10010 (34), 110010 (50), 001001(9), 011100 (28) 34 9 50 28 0 64 Insert 10010 (34), 110010 (50), 001001(9), 011100 (28) into a trie 10010 0 64 Insert 100010 (34), 110010 (50), 001001(9), 011100 (28) into a trie 10010 110010 0 64 Insert 100010 (34), 110010 (50), 001001(9), 011100 (28) into a trie 001001 10010 110010 0 64 BST and Tries break up search key space in different ways. • The BST breaks up space based on the VALUES of the keys – so the root can break space into an arbitrary pair of intervals. • The Trie breaks up space by regular decomposition of the interval – splits sub intervals in half (0,1) whenever a new point is inserted. • We will use both of these methods for two dimensional points. Quadtrees • Point quadtree – like a binary search tree – space will be decomposed at the values of the coordinates. • Each data point is represented as a node that contains seven fields • Four fields are pointers to the sons (NW,NE,SW,SE) • 2 fields for coordinates (x,y) • one field points to record containing information about the point (name) Example data for quad tree • Name X Y • Chicago 35 40 • Mobile 50 10 • Toronto 60 75 • Buffalo 80 65 • Denver 5 45 • Omaha 25 35 • Atlanta 85 15 • Miami 90 5 Point quadtree C Toronto Buffalo D T O M Denver Chicago B A i Omaha Atlanta Mobile Miami Insertion into a point quadtree • Insertion is similar to binary search trees • Compare the (x,y) coordinates of the new point to those of the quadtree nodes, branching until encountering a nil pointer procedure which_quadrant (newp, quadp); return (if x(newp) < x(quadp) then {West} if y(newp) < y(quadp) then SW else NW else {East} if y(newp) < y(quadp) then SE else NE) • This procedure is used to choose the appropriate son as we descend through the point quadtree Example (35,40) Chicago Example Chicago (35,40) Mobile (50,10) Example Toronto (60,75) Chicago (35,40) Mobile (50,10) Example Toronto (60,75) Buffalo (80,65 Chicago (35,40) Mobile (50,10) Proximity search • It is all about pruning – ruling out entire subtrees that cannot possibly contain any points that fall in the query region • In the BST case, if we are visiting a node, N, holding key k, our search interval is [l,u] and k<l we can prune the left subtree necause all keys, k’, in the left subtree of N are by construction <k<l, so cannot lie in the interval. • In a point quadtree the logic of pruning is slightly more complicated. But it depends on which quadrants defined by the point at a node intersect the search region. • So, if the search region is contained completely within the NE quadrant, we only need to search the NE quadrant. 1. SE Example 2. SE,SW 3. SW 4. SE,NE 5. SW,NW 6. NE 1 Toronto2 (60,75)3 9 4 5 Buffalo (80,65 6 7 8 Chicago (35,40) Mobile (50,10) Example 1. SE 2. SE,SW 3. SW 4. SE,NE 5. SW,NW Toronto (60,75) 6. NE Buffalo (80,65 Chicago (35,40) 1 2 3 9 4 Mobile 5 (50,10) 6 7 8 Proximity search • Find all points in a point quadtree within distance r of a given point, A. • Quadtree allows us to rule out large parts of the data structure from the search 1. SE 7. NE,NW 1 2 3 2. SE,SW 8. NW r 9 3. SW 9. All but NW/ 4 5 4. SE,NE ALL (BOX) 5. SW,NW 10. All but NE 6 7 8 6. NE 11. All but SW 12. All but SE 13. All Which quadrants to search based on location of root of quadtree 1 2 3 7. NE,NW 8. NW 9. All but NW 9 10 10. All but NE 11. All but SW 4 13 5 12. All but SE 13. All 12 11 6 7 8 1. SE 2. SE,SW 3. SW If quadtree node falls in region I relative 4. SE,NE to query point, then search the 5. SW,NW regions listed 6. NE Search algorithm • Let (u,v) be the coordinates of the point whose fixed radius neighbors we are searching for. • Algorithm will construct a list, V, of quadtree nodes to visit, and a list, N, of fixed radius neighbors. • Put R onto V, Initialize N to empty • If V is empty, halt; otherwise remove a node, T, from V. Let (x,y) be the coordinates of T • Compute the region, r, that (u,v) falls in relative to (x,y) using mask. • Case on r • r = 1: Add SE(T) to V • … • r = 13: Add T to N, add SE(T), SW(T), NE(T), NW(T) to V • repeat Example - Find all cities within 10 units of (83,10) • Look only in SE quadrant of root (Chicago) since Toronto! Chicago is in region 1 relative Buffalo! to search point • Look at SE and Denver! NE quadrants of Mobile (SE son of Chicago! Chicago) since Omaha! 1! 2! 3! 9! 10! Mobile is in region Atlanta! r! 4! a! 5! 4 relative to Mobile! 13! search point 11! 12! 6Miami! ! 7! 8! Example 1. Chicago is in R4 Visit Toronto, Mobile Toronto! 2. Mobile is in R6 Buffalo! Visit Atlanta 3. Toronto is in R2 1 2 3 Visit Buffalo Denver! 9 Chicago4 ! 5 Omaha! 6 7 8 Atlanta! Mobile! Miami! Deletion from point quadtrees • Difficult to achieve efficiently • because deleting a node changes the partition lines for all of the descendants of that node • When deleting a node, remove the subtree rooted at the node and reinsert all of the nodes in the subtree • This is an expensive process, and often results in a deeper tree - yielding longer search times • Samet contains a better, but more complicated algorithm Point quadtree summary • Lookup and insertion are straightforward extensions of binary tree search • just have to make a four way rather than a two way decision • Deletion is much more complicated and expensive • For higher dimensional data, storage requirements can be large • (x1) - binary search tree and each node has two pointers to children • (x1, x2) - 2D point quadtree, and each node has 4 pointers • (x1, x2, x3) - 3D point quadtree and each node has 8 pointers n • (x1, …xn) - nD point quadtree and each node has 2 pointers • Can overcome this problem by storing only non-null pointers at each node, but this makes the implementation more complex • can use a linked list of (pointer name, pointer value) of non-null pointers k-d trees - 2 pointers per node • k- dimensionality of the data being represented • 2-d tree - representation for points in the plane • k-d tree is a binary tree 1) at each stage, a different coordinate of the key is used to determine the branching 2) first branch on the x coordinate of a key (ties break left) 3) then branch on the y coordinate (ties break down) 4) then alternate between the x and y coordinates 5) at a branch, the left son will have all keys smaller than the tested attribute, and the right son will have all keys greater than or equal to the test attribute k-d tree example Chicago (partition on x) Chicago (35,40) k-d tree - example Chicago(x) Denver(y) Denver (5,45) Chicago (35,40) k-d tree - example