The Kd-Tree the Kd-Tree

GIS-E1070 Theories and Techniques in GIS L Lecture 8: Vector Data Indexing Jussi Nikander The contents of this lecture • The different problems in vector data indexing • Point location • Range search • Window search • The data structures for indexing the data • Trapezodial map • Kd-tree • Segment tree • Partition tree Learning goals of this lecture Understand the different point location and windowing problems Know how they are different from each other and how these differences affect the data structures required for efficient problem solving Recall the different data structures and their most important details Literature for this lecture • De Berg et al.: Computational geometry algorithms and applications • Chapters 5-6, 10, 16 The vector data indexing problems Different kinds of vector data • Recall that vector data consists of points, polylines and polygons • A point is a 0-dimensional coordinate pair • A polyline consists of two or more points • A polygon is bounded by polyline rings (first and last point are identical) Different kinds of vector data searches • In a search problem, the goal • For a set of points, find the points that are inside a given is to find data elements that rectangular or arbitrary search fulfill a given search criterion window • Which elements need to be shown on- • Different search criteria screen? • Which elements are inside a given include polygon? • In a polygon network, find the • For a set of polylines or polygon that contains a given polygons, find the elements that overlap a given search point window • E.g. inside which map element the • Again, rectangular and arbitrary mouse cursor is? windows are separate cases Different kinds of vector data searches Search windows for a set of points • In breakout rooms, consider the following questions • Why computer need different approaches to all these problems? • Try to put into words what the fundamental differences are • How could computers attempt to solve these problems? A given point in polygon network Search window for polylines and polygons The point location problem • Consider a polygon network • Naive solution is to make a depicting municipalities point-in-polygon calculation • Consider a situation, where the for each polygon municipalities are visualized on a • There are 푛 polygons and it computer takes 푂(푙표푔푛) time to do test • How can we efficiently find over => 푂(푛푙표푔푛) which municipality the cursor is? • Thus, an efficient data • Or, more generally, given a polygon structure for storing the network, in which polygon the point 푝 relevant information is is? required The range search problem • Problem: given a set of points in 2D, we want to do queries on points that are inside a given window [풙ퟏ, 풚ퟏ, 풙ퟐ, 풚ퟐ] • For point sets, this is basically a range query • Report points that are in range 푥1,푥2 , 푦1, 푦2 • All points are either completely inside or outside the range Simple solution to range searching • Again, we could check all • In one-dimensional case, we can elements to see whether they are just use a regular search tree inside the window • A balanced BST or its variants, for example • This would take 푂(푛) time • Checking whether a point is inside • When looking through the nodes, a rectangle can be done in 푂(1) make recursive calls to all • However, good search structures subtrees that might contain nodes typically have search times belonging to the area comparable to 푂(푙표푔푛) • Each recursive call divides the area into two halves • Therefore, again, we can do better The window search problem • Problem: Select all map elements that intersect a given window [풙ퟏ, 풚ퟏ, 풙ퟐ, 풚ퟐ] • Now, map elements may contain points, lines and polygons • Therefore, a more sophisticated approach is required than with point range queries • Even more complicated approaches are required if the • Consider in what functionally windows are not axis-parallel or rectangular different ways lines or polygons can overlap a rectangular search area Segment endpoint: inside and outside cases • From algorithmic point of view, there • For segments that have an endpoint inside does not need to be a distinction between the window, we can use a range query polylines and polygons • Kd-trees or range trees • polygon can be considered a set of • For segments that have endpoints outside polylines • Polygons that contain the whole search the window, we need a different structure window require some extra work • Capable of answering interval queries • Which line segments intersect the line segment [푥, 푥], [푦 , 푦 ] or [푥 , 푥 ], [푦, 푦] • There are two different types of segments 1 2 1 2 that intersect the window • We assume that the line segments do not • Those that have at least one endpoint intersect each other inside the window • Data stored in, for example, DCEL • Those that have both endpoints outside Windowing with polygon windows • We have now covered • Windowing using rectangular windows • Both for point data (kd-trees and range trees), and line data (segment trees) • Now, we generalize the situation to using polygonal windows with point data • Line data can be stored using a modification of this method The point location problem The point location problem • Consider a polygon network depicting municipalities • Consider a situation, where the municipalities are visualized on a computer • How can we efficiently find over which municipality the cursor is? • Or, more generally, given a polygon network, in which polygon the point 푝 is? Searchable subdivisions • A polygon network is a subdivision of an area • Unfortunately, since polygons are arbitrary, it is not a very good subdivision for searching • For searching, dividing the polygons into trapezoids works better • Now it is possible to do comparisons such as ”is the point to the left/right, or above/below certain line” Image source: de Berg et al: Computational geometry Search structure using trapezoidal map • A trapezodial map consists of vertical • Each inner node represents a lines (created for the map) and non- vertical line or non-vertical line vertical line segments (part of the segment original polygon network) • Each node has two child nodes • Vertical lines divide the area to parts on • For searching, these edges are left and right (x-nodes) • Non-vertical line segments divide the arranged into a search structure area to parts above and below (y-nodes) • Whenever an y-node is encountered, the • Structure is a ”tree-like” structure corresponding line segment spans the x- values possible in that subset of the that consists of of inner nodes and search space leaf nodes • Leaf nodes represent trapezoids Search structure using trapezoidal map • There is a root node from which all searches start • There may be several paths from the root node to a leaf node • Thus, the resulting structure is a rooted, directed, acyclic graph • In x-nodes comparison is left/right (white) • In y-nodes comparison is above/below (gray) Image source: de Berg et al: Computational geometry Building a trapezoidal map • A trapezodial map works by limiting • Creating a structure that practically the search area using x- and y-axes always exhibits such behavior can • Each step should divide the be done through randomization remaining search area approximately • Lines are added to the trapezodial to half map in random order • After each line segment insertion the • Thus, the map must be structured so structure is updated to be a valid that each node divides the remaining trapezodial map area to approximately half • Resulting structure has 푂(푛) space • Creating a structure that guarantees and 푂(푙표푔푛) search time efficiency this is hard Line segment addition examples Image source: de Berg et al: Computational geometry Trapezoidal map search example Trapezoidal map search example Trapezoidal map search example Trapezoidal map search example Trapezoidal map search example Trapezoidal map search example 1. 5. 4. 6. 3. 2. Trapezoidal map search example • Actual search path depends on • The structure of the trapezoidal map • The point location • Thus for all the possible trapezoidal map implementations for any polygon network, the number of possible search paths is very large • The paths in any specific implementation are well-defined Range searching The range search problem • Problem: given a set of points in 2D, we want to do queries on points that are inside a given window [풙ퟏ, 풚ퟏ, 풙ퟐ, 풚ퟐ] • For point sets, this is basically a range query • Report points that are in range 푥1,푥2 , 푦1, 푦2 • All points are either completely inside or outside the range Recursive splitting of search area • The 1-dimensional example shows how the search area is continually split into smaller pieces • A similar approach can be applied in higher dimensions: at each node, the search area is split into two halves on one axis • Range search for range [18, 77] • Different axes can used on • Light grey nodes are in the search path, dark grey nodes are known to be inside the range without different levels in the tree having to check each element separately Image source: de Berg et al: Computational geometry Recursive splitting example Recursive splitting example Recursive splitting example The kd-tree The kd-tree • Recursive splitting can be • kd-Tree is a multidimensional implemented using a generalization of a binary search structure called kd-tree tree • At each level of the tree, • Technically 2d-tree, since the elements are split into two groups k stands for the number of according to one of the axes dimensions • All data elements are in leaf nodes • Internal nodes contain

Load more