Range & windowing queries 1/180
Geometric Algorithms
Range & windowing queries (2 lectures) Database queries 2/180
G. Ometer born: Aug 16, 1954 salary salary: $3,500
A database query may ask for all employees with age between a1 and a2, and salary between s1 and s2
19,500,000 19,559,999 date of birth Data structures 3/180
Idea of data structures
I Representation of structure, for convenience (like DCEL)
I Preprocessing of data, to be able to solve future questions really fast (sub-linear time) A (search) data structure has a storage requirement, a query time, and a construction time (and an update time) 1D range query problem 4/180
1D range query problem: Preprocess a set of n points on the real line such that the ones inside a 1D query range (interval) can be answered fast
The points p1,..., pn are known beforehand, the query [x,x0] only later
A solution to a query problem is a data structure, a query algorithm, and a construction algorithm
Question: What are the most important factors for the efficiency of a solution? Balanced binary search trees 5/180
A balanced binary search tree with the points in the leaves
49
23 80 10 37 62 89
3 19 30 49 59 70 89 93
3 10 19 23 30 37 59 62 70 80 93 97 Balanced binary search trees 6/180
The search path for 25
49
23 80 10 37 62 89
3 19 30 49 59 70 89 93
3 10 19 23 30 37 59 62 70 80 93 97 Balanced binary search trees 7/180
The search paths for 25 and for 90
49
23 80 10 37 62 89
3 19 30 49 59 70 89 93
3 10 19 23 30 37 59 62 70 80 93 97 Example 1D range query 8/180
A 1-dimensional range query with [25, 90]
49
23 80 10 37 62 89
3 19 30 49 59 70 89 93
3 10 19 23 30 37 59 62 70 80 93 97 Node types for a query 9/180
Three types of nodes for a given query:
I White nodes: never visited by the query
I Grey nodes: visited by the query, unclear if they lead to output
I Black nodes: visited by the query, whole subtree is output
Question: What query time do we hope for? Node types for a query 10/180
The query algorithm comes down to what we do at each type of node
Grey nodes: use query range to decide how to proceed: to not visit a subtree (pruning), to report a complete subtree, or just continue
Black nodes: traverse and enumerate all points in the leaves Example 1D range query 11/180
A 1-dimensional range query with [61, 90]
49
23 80 10 37 62 89
3 19 30 49 59 70 89 93
3 10 19 23 30 37 59 62 70 80 93 97 Example 1D range query 12/180
A 1-dimensional range query with [61, 90]
49 split node 23 80 10 37 62 89
3 19 30 49 59 70 89 93
3 10 19 23 30 37 59 62 70 80 93 97 1D range query algorithm 13/180
Algorithm 1DRANGEQUERY(T,[x : x0]) 1. νsplit FINDSPLITNODE(T,x,x ) ← 0 2. if νsplit is a leaf 3. then Check if the point in νsplit must be reported. 4. else ν lc(νsplit) ← 5. while ν is not a leaf 6. do if x xν ≤ 7. then REPORTSUBTREE(rc(ν)) 8. ν lc(ν) ← 9. else ν rc(ν) ← 10. Check if the point stored in ν must be reported. 11. Similarly, follow the path to x0, and ... Query time analysis 14/180
The efficiency analysis is based on counting the numbers of nodes visited for each type
I White nodes: never visited by the query; no time spent
I Grey nodes: visited by the query, unclear if they lead to output; time determines dependency on n
I Black nodes: visited by the query, whole subtree is output; time determines dependency on k, the output size Query time analysis 15/180
Grey nodes: they occur on only two paths in the tree, and since the tree is balanced, its depth is O(logn)
Black nodes: a (sub)tree with m leaves has m 1 internal − nodes; traversal visits O(m) nodes and finds m points for the output
The time spent at each node is O(1) O(logn + k) query time ⇒ Storage requirement and preprocessing 16/180
A (balanced) binary search tree storing n points uses O(n) storage
A balanced binary search tree storing n points can be built in O(n) time after sorting Result 17/180
Theorem: A set of n points on the real line can be preprocessed in O(nlogn) time into a data structure of O(n) size so that any 1D range query can be answered in O(logn + k) time, where k is the number of answers reported quadtrees: good in practice, but not so good worst-case query time
Kd-trees: queries take O(√n + k) (Chapter 5.2) range trees: today
Range queries in 2D 18/180 Range queries in 2D 18/180
quadtrees: good in practice, but not so good worst-case query time
Kd-trees: queries take O(√n + k) (Chapter 5.2) range trees: today Back to 1D range queries 19/180
A 1-dimensional range query with [61, 90]
49 split node 23 80 10 37 62 89
3 19 30 49 59 70 89 93
3 10 19 23 30 37 59 62 70 80 93 97 Examining 1D range queries 20/180
Observation: Ignoring the search path leaves, all answers are jointly represented by the highest nodes strictly between the two search paths
Question: How many highest nodes between the search paths can there be? Examining 1D range queries 21/180
For any 1D range query, we can identify O(logn) nodes that together represent all answers to a 1D range query Toward 2D range queries 22/180
For any 2d range query, we can identify O(logn) nodes that together represent all points that have a correct first coordinate Toward 2D range queries 23/180
(1, 5) (3, 8) (4, 2) (5, 9) (6, 7) (7, 3) (8, 1) (9, 4) Toward 2D range queries 24/180
(1, 5) (3, 8) (4, 2) (5, 9) (6, 7) (7, 3) (8, 1) (9, 4) Toward 2D range queries 25/180
data structure for searching on y-coordinate
(1, 5) (3, 8) (4, 2) (5, 9) (6, 7) (7, 3) (8, 1) (9, 4) Toward 2D range queries 26/180
(5, 9) (3, 8) (6, 7) (1, 5) (9, 4) (7, 3) (4, 2) (8, 1)
(1, 5) (3, 8) (4, 2) (5, 9) (6, 7) (7, 3) (8, 1) (9, 4) 2D range trees 27/180
Every internal node stores a whole tree in an associated structure, on y-coordinate
Question: How much storage does this take? Storage of 2D range trees 28/180
To analyze storage, two arguments can be used:
I By level: On each level, any point is stored exactly once. So all associated trees on one level together have O(n) size
I By point: For any point, it is stored in the associated structures of its search path. So it is stored in O(logn) of them Construction algorithm 29/180
Algorithm BUILD2DRANGETREE(P) 1. Construct the associated structure: Build a binary search tree Tassoc on the set Py of y-coordinates in P 2. if P contains only one point 3. then Create a leaf ν storing this point, and make Tassoc the associated structure of ν. 4. else Split P into Pleft and Pright, the subsets and > ≤ the median x-coordinate xmid 5. νleft BUILD2DRANGETREE(Pleft) ← 6. νright BUILD2DRANGETREE(Pright) ← 7. Create a node ν storing xmid, make νleft the left child of ν, make νright the right child of ν, and make Tassoc the associated structure of ν 8. return ν Efficiency of construction 30/180
The construction algorithm takes O(nlog2 n) time
T(1) = O(1)
T(n) = 2 T(n/2) + O(nlogn) ·
which solves to O(nlog2 n) time Efficiency of construction 31/180
Suppose we pre-sort P on y-coordinate, and whenever we split P into Pleft and Pright, we keep the y-order in both subsets
For a sorted set, the associated structure can be built in linear time Efficiency of construction 32/180
The adapted construction algorithm takes O(nlogn) time
T(1) = O(1)
T(n) = 2 T(n/2) + O(n) ·
which solves to O(nlogn) time 2D range queries 33/180
How are queries performed and why are they correct?
I Are we sure that each answer is found?
I Are we sure that the same point is found only once? 2D range queries 34/180
ν p
p
µ µ0 p p Query algorithm 35/180
Algorithm 2DRANGEQUERY(T,[x : x ] [y : y ]) 0 × 0 1. νsplit FINDSPLITNODE(T,x,x ) ← 0 2. if νsplit is a leaf 3. then report the point stored at νsplit, if an answer 4. else ν lc(νsplit) ← 5. while ν is not a leaf 6. do if x xν ≤ 7. then 1DRANGEQ(Tassoc(rc(ν)),[y : y0]) 8. ν lc(ν) ← 9. else ν rc(ν) ← 10. Check if the point stored at ν must be reported. 11. Similarly, follow the path from rc(νsplit) to x0 ... 2D range query time 36/180
Question: How much time does a 2D range query take?
Subquestions: In how many associated structures do we search? How much time does each such search take? 2D range queries 37/180
ν
µ µ0 2D range query efficiency 38/180
We search in O(logn) associated structures to perform a 1D range query; at most two per level of the main tree
The query time is O(logn) O(logm + k ), or × 0
∑O(lognν + kν ) ν
where ∑kν = k the number of points reported 2D range query efficiency 39/180
Use the concept of grey and black nodes again: 2D range query efficiency 40/180
The number of grey nodes is O(log2 n)
The number of black nodes is O(k) if k points are reported
The query time is O(log2 n + k), where k is the size of the output Result 41/180
Theorem: A set of n points in the plane can be preprocessed in O(nlogn) time into a data structure of O(nlogn) size so that any 2D range query can be answered in O(log2 n + k) time, where k is the number of answers reported
In contrast, a kd-tree has O(n) size and answers queries in O(√n + k) time (Chapter 5.2). Higher dimensional range trees 42/180
A d-dimensional range tree has a main tree which is a one-dimensional balanced binary search tree on the first coordinate, where every node has a pointer to an associated structure that is a (d 1)-dimensional range tree on− the other coordinates Storage 43/180
The size Sd(n) of a d-dimensional range tree satisfies:
S1(n) = O(n) for all n
Sd(1) = O(1) for all d
Sd(n) 2 Sd(n/2) + Sd 1(n) for d 2 ≤ · − ≥
d This solves to Sd(n) = O(nlog n) Query time 44/180
The number of grey nodes Gd(n) satisfies:
G1(n) = O(logn) for all n
Gd(1) = O(1) for all d
Gd(n) 2 logn + 2 logn Gd 1(n) for d 2 ≤ · · · − ≥
d This solves to Gd(n) = O(log n) Result 45/180
Theorem: A set of n points in d-dimensional space can be d 1 preprocessed in O(nlog − n) time into a data structure of d 1 O(nlog − n) size so that any d-dimensional range query can be answered in O(logd n + k) time, where k is the number of answers reported Improving the query time 46/180
We can improve the query time of a 2D range tree from O(log2 n) to O(logn) by a technique called fractional cascading
This automatically lowers the query time in d dimensions to d 1 O(log − n) time Improving the query time 47/180
The idea illustrated best by a different query problem:
Suppose that we have a collection of sets S1,...,Sm, where S1 = n and where Si 1 Si | | + ⊆ We want a data structure that can report for a query number x, the smallest value x in all sets S1,...,Sm ≥ Improving the query time 48/180
1 2 3 5 8 13 21 34 55 S1
1 3 5 8 13 21 34 55 S2
1 3 13 21 34 55 S3
3 34 55 S4 Improving the query time 49/180
1 2 3 5 8 13 21 34 55 S1
1 3 5 8 13 21 34 55 S2
1 3 13 21 34 55 S3
3 34 55 S4 Improving the query time 50/180
1 2 3 5 8 13 21 34 55 S1
1 3 5 8 13 21 34 55 S2
1 3 13 21 34 55 S3
3 34 55 S4 Improving the query time 51/180
Suppose that we have a collection of sets S1,...,Sm, where S1 = n and where Si 1 Si | | + ⊆ We want a data structure that can report for a query number x, the smallest value x in all sets S1,...,Sm ≥ This query problem can be solved in O(logn +m) time instead of O(m logn) time · Improving the query time 52/180
Can we do something similar for m 1-dimensional range queries on m sets S1,...,Sm?
We hope to get a query time of O(logn + m + k) with k the total number of points reported Improving the query time 53/180
1 2 3 5 8 13 21 34 55 S1
1 3 5 8 13 21 34 55 S2
1 3 13 21 34 55 S3
3 34 55 S4 Improving the query time 54/180
1 2 3 5 8 13 21 34 55 S1
1 3 5 8 13 21 34 55 S2
1 3 13 21 34 55 S3
3 34 55 S4 Improving the query time 55/180
[6,35]
1 2 3 5 8 13 21 34 55 S1
1 3 5 8 13 21 34 55 S2
1 3 13 21 34 55 S3
3 34 55 S4 Fractional cascading 56/180
Now we do “the same” on the associated structures of a 2-dimensional range tree
Note that in every associated structure, we search with the same values y and y0
I Replace all associated structure except for the root by a linked list
I For every list element (and leaf of the associated structure of the root), store two pointers to the appropriate list elements in the lists of the left child and of the right child Fractional cascading 57/180 Fractional cascading 58/180 Fractional cascading 59/180
17
8 52 5 15 33 58
2 7 12 17 21 41 58 67
2 5 7 8 12 15 21 33 41 52 67 93 (2, 19) (7, 10) (12, 3) (17, 62) (21, 49) (41, 95) (58, 59) (93, 70) (5, 80) (8, 37) (15, 99) (33, 30) (52, 23) (67, 89) Fractional cascading 60/180
3 10 19 23 30 37 49 59 62 70 80 89 95 99
3 10 19 37 62 80 99 23 30 49 59 70 89 95
10 19 37 80 3 62 99 23 30 49 95 59 70 89
19 80 10 37 3 99 62 30 49 23 95 59 70 89
19 80 10 37 3 99 49 30 95 23 89 70 Fractional cascading 61/180
[4, 58] [19, 65] 17 ×
8 52 5 15 33 58
2 7 12 17 21 41 58 67
2 5 7 8 12 15 21 33 41 52 67 93 (2, 19) (7, 10) (12, 3) (17, 62) (21, 49) (41, 95) (58, 59) (93, 70) (5, 80) (8, 37) (15, 99) (33, 30) (52, 23) (67, 89) Fractional cascading 62/180
3 10 19 23 30 37 49 59 62 70 80 89 95 99
3 10 19 37 62 80 99 23 30 49 59 70 89 95
10 19 37 80 3 62 99 23 30 49 95 59 70 89
19 80 10 37 3 99 62 30 49 23 95 59 70 89
19 80 10 37 3 99 49 30 95 23 89 70 Fractional cascading 63/180
3 10 19 23 30 37 49 59 62 70 80 89 95 99
3 10 19 37 62 80 99 23 30 49 59 70 89 95
10 19 37 80 3 62 99 23 30 49 95 59 70 89
19 80 10 37 3 99 62 30 49 23 95 59 70 89
19 80 10 37 3 99 49 30 95 23 89 70 Fractional cascading 64/180
Instead of doing a 1D range query on the associated structure of some node ν, we find the leaf where the search to y would end in O(1) time via the direct pointer in the associated structure in the parent of ν
The number of grey nodes reduces to O(logn) Result 65/180
Theorem: A set of n points in d-dimensional space with d 2 d 1 ≥ can be preprocessed in O(nlog − n) time into a data d 1 structure of O(nlog − n) size so that any d-dimensional d 1 range query can be answered in O(log − n + k) time, where k is the number of answers reported.
Multiple points with the same x- or y-coordinate need to be handled with care. Windowing 66/180
Zoom in; re-center and zoom in; select by outlining Windowing 67/180 Windowing 68/180
Given a set of n axis-parallel line segments, preprocess them into a data structure so that the ones that intersect a query rectangle can be reported efficiently Windowing 69/180
How can a rectangle and an axis-parallel line segment intersect? Windowing 70/180
Essentially two types:
I Segments whose endpoint lies in the rectangle (or both endpoints)
I Segments with both endpoints outside the rectangle
Segments of the latter type always intersect the boundary of the rectangle (even the left and/or bottom side) Windowing 71/180
Instead of storing axis-parallel segments and searching with a rectangle, we will:
I store the segment endpoints and query with the rectangle
I store the segments and query with the left side and the bottom side of the rectangle
Note that the query problem is at least as hard as rectangular range searching in point sets Windowing 72/180
Instead of storing axis-parallel segments and searching with a rectangle, we will:
I store the segment endpoints and query with the rectangle
I store the segments and query with the left side and the bottom side of the rectangle
Question: How often might we report the same segment? Windowing 73/180
Instead of storing axis-parallel segments and searching with a rectangle, we will:
I store the segment endpoints and query with the rectangle use range tree
I store the segments and query with the left side and the bottom side of the rectangle need to develop data structure Windowing 74/180
Current problem of our interest: Given a set of horizontal (vertical) line segments, preprocess them into a data structure so that the ones intersecting a vertical (horizontal) query segment can be reported efficiently Windowing 75/180
Simpler query problem: What if the vertical query segment is a full line?
Then the problem is essentially 1-dimensional Interval querying 76/180
Given a set I of n intervals on the real line, preprocess them into a data structure so that the ones containing a query point (value) can be reported efficiently Splitting a set of intervals 77/180
The median x of the 2n endpoints partitions the intervals into three subsets:
I Intervals Ileft fully left of x
I Intervals Imid that contain (intersect) x
I Intervals Iright fully right of x
x Interval tree: recursive definition 78/180
The interval tree for I has a root node ν that contains x and
I the intervals Ileft are stored in the left subtree of ν
I the intervals Imid are stored with ν
I the intervals Iright are stored in the right subtree of ν
The left and right subtrees are proper interval trees for Ileft and Iright
How many intervals can be in Imid? How should we store Imid? Interval tree: left and right lists 79/180
How is Imid stored? x
Observe: If the query point is left of x, then only the left endpoint determines if an interval is an answer
Symmetrically: If the query point is right of x, then only the right endpoint determines if an interval is an answer Interval tree: left and right lists 80/180
x
Make a list Lleft using the left-to-right order of the left endpoints of Imid
Make a list Lright using the right-to-left order of the right endpoints of Imid
Store both lists as associated structures with ν Interval tree: example 81/180
s5, s6, s7 Lleft s7, s5, s6 Lright
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s12, s11 s1 s8 s11, s12
s2 s6 s3 s7 s10 s12 s5
s1 s4 s8 s9 s11 Interval tree: storage 82/180
The main tree has O(n) nodes
The total length of all lists is 2n because each interval is stored exactly twice: in Lleft and Lright and only at one node
Consequently, the interval tree uses O(n) storage Interval querying 83/180
Algorithm QUERYINTERVALTREE(ν,qx) 1. if ν is not a leaf 2. then if qx < xmid(ν) 3. then Traverse list Lleft(ν), starting at the interval with the leftmost endpoint, reporting all the intervals that contain qx. Stop as soon as an interval does not contain qx. 4.Q UERYINTERVALTREE(lc(ν),qx) 5. else Traverse list Lright(ν), starting at the interval with the rightmost endpoint, reporting all the intervals that contain qx. Stop as soon as an interval does not contain qx. 6.Q UERYINTERVALTREE(rc(ν),qx) Interval tree: query example 84/180
s5, s6, s7 Lleft s7, s5, s6 Lright
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s11, s12 s1 s8 s12, s11
s2 s6 s3 s7 s10 s12 s5
s1 s4 s8 s9 s11 Interval tree: query example 85/180
s5, s6, s7 Lleft s7, s5, s6 Lright
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s11, s12 s1 s8 s12, s11
s2 s6 s3 s7 s10 s12 s5
s1 s4 s8 s9 s11 Interval tree: query example 86/180
s5, s6, s7 Lleft s7, s5, s6 Lright
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s11, s12 s1 s8 s12, s11
s2 s6 s3 s7 s10 s12 s5
s1 s4 s8 s9 s11 Interval tree: query example 87/180
s5, s6, s7 Lleft s7, s5, s6 Lright
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s11, s12 s1 s8 s12, s11
s2 s6 s3 s7 s10 s12 s5
s1 s4 s8 s9 s11 Interval tree: query example 88/180
s5, s6, s7 Lleft s7, s5, s6 Lright
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s11, s12 s1 s8 s12, s11
s2 s6 s3 s7 s10 s12 s5
s1 s4 s8 s9 s11 Interval tree: query example 89/180
s5, s6, s7 Lleft s7, s5, s6 Lright
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s11, s12 s1 s8 s12, s11
s2 s6 s3 s7 s10 s12 s5
s1 s4 s8 s9 s11 Interval tree: query example 90/180
s5, s6, s7 Lleft s7, s5, s6 Lright
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s11, s12 s1 s8 s12, s11
s2 s6 s3 s7 s10 s12 s5
s1 s4 s8 s9 s11 Interval tree: query example 91/180
s5, s6, s7 Lleft s7, s5, s6 Lright
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s11, s12 s1 s8 s12, s11
s2 s6 s3 s7 s10 s12 s5
s1 s4 s8 s9 s11 Interval tree: query example 92/180
s5, s6, s7 Lleft s7, s5, s6 Lright
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s11, s12 s1 s8 s12, s11
s2 s6 s3 s7 s10 s12 s5
s1 s4 s8 s9 s11 Interval tree: query example 93/180
s5, s6, s7 Lleft s7, s5, s6 Lright
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s11, s12 s1 s8 s12, s11
s2 s6 s3 s7 s10 s12 s5
s1 s4 s8 s9 s11 Interval tree: query example 94/180
s5, s6, s7 Lleft s7, s5, s6 Lright
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s11, s12 s1 s8 s12, s11
s2 s6 s3 s7 s10 s12 s5
s1 s4 s8 s9 s11 Interval tree: query time 95/180
The query follows only one path in the tree, and that path has length O(logn)
The query traverses O(logn) lists. Traversing a list with k0 answers takes O(1 + k0) time
The total time for list traversal is therefore O(log+k), with the total number of answers reported (no answer is found more than once)
The query time is O(logn) + O(logn + k) = O(logn + k) Interval tree: query example 96/180
Algorithm CONSTRUCTINTERVALTREE(I) Input. A set I of intervals on the real line Output. The root of an interval tree for I 1. if I = /0 2. then return an empty leaf 3. else Create a node ν. Compute xmid, the median of the set of interval endpoints, and store xmid with ν 4. Compute Imid and construct two sorted lists for Imid: a list Lleft(ν) sorted on left endpoint and a list Lright(ν) sorted on right endpoint. Store these two lists at ν 5. lc(ν) CONSTRUCTINTERVALTREE(Ileft) ← 6. rc(ν) CONSTRUCTINTERVALTREE(Iright) ← 7. return ν Interval tree: result 97/180
Theorem: An interval tree for a set I of n intervals uses O(n) storage and can be built in O(nlogn) time. All intervals that contain a query point can be reported in O(logn + k) time, where k is the number of reported intervals. Back to the plane 98/180 Back to the plane 99/180
Suppose we use an interval tree on the x-intervals of the horizontal line segments?
Then the lists Lleft and Lright are not suitable anymore to solve the query problem for the segments corresponding to Imid Back to the plane 100/180
s5, s6, s7 s7, s5, s6
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s11, s12 s1 s8 s12, s11
s2 s s6 8 s4 s1 s12 s s3 7 s9 s11 s5 s10 Back to the plane 101/180
s5, s6, s7 s7, s5, s6
s4, s3, s2 s4, s3, s2 s9, s10 s9, s10
s1 s8 s11, s12 s1 s8 s12, s11
s2 s s6 8 s4 s1 s12 q s s3 7 s9 s11 s5 s10 Back to the plane 102/180
s5, s6, s7 s7, s5, s6
s6 q s7
s5 Back to the plane 103/180
s5, s6, s7 s7, s5, s6
s6 q s7
s5 Back to the plane 104/180
s , s , s , s , s , s s , s , s , s , s , s { 2 5 6 7 9 22 } { 2 5 6 7 9 22 }
s2
s9 q s22 s6
s7
s5 Back to the plane 105/180
s , s , s , s , s , s s , s , s , s , s , s { 2 5 6 7 9 22 } { 2 5 6 7 9 22 }
s2
s9 q s22 s6
s7
s5 Back to the plane 106/180
s , s , s , s , s , s s , s , s , s , s , s { 2 5 6 7 9 22 } { 2 5 6 7 9 22 }
s2
s9 q s22 s 6 q
s7
s5 Segment intersection queries 107/180
We can use a range tree as the associated structure; we only need one that stores all of the endpoints, to replace Lleft and Lright
Instead of traversing Lleft or Lright, we perform a query with the region left or right, respectively, of q Segment intersection queries 108/180
all endpoints of s2 s , s , s , s , s , s { 2 5 6 7 9 22 } s9 q s22 s 6 q
s7
s5 Segment intersection queries 109/180
In total, there are O(n) range trees that together store 2n points, so the total storage needed by all associated structures is O(nlogn)
A query with a vertical segment leads to O(logn) range queries If fractional cascading is used in the associated structures, the overall query time is O(log2 n + k)
Question: How about the construction time? Theorem: A set of n axis-parallel line segments can be stored in a data structure of size O(nlogn) such that windowing queries can be performed in O(log2 n + k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time.
Result 110/180
Theorem: A set of n horizontal line segments can be stored in a data structure of size O(nlogn) such that intersection queries with a vertical line segment can be performed in O(log2 n + k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time. Result 110/180
Theorem: A set of n horizontal line segments can be stored in a data structure of size O(nlogn) such that intersection queries with a vertical line segment can be performed in O(log2 n + k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time.
Theorem: A set of n axis-parallel line segments can be stored in a data structure of size O(nlogn) such that windowing queries can be performed in O(log2 n + k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time. Result 110/180
Theorem: A set of n horizontal line segments can be stored in a data structure of size O(nlogn) such that intersection queries with a vertical line segment can be performed in O(log2 n + k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time.
Theorem: A set of n axis-parallel line segments can be stored in a data structure of size O(nlogn) such that windowing queries can be performed in O(log2 n + k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time. Outlook: 3- and 4-sided ranges 111/180
Considering the associated structure, we only need 3-sided range queries, whereas the range tree provides 4-sided range queries
Can the 3-sided range query problem be solved more efficiently than the 4-sided (rectangular) range query problem? Outlook: arbitrary line segments 112/180
Given a set of n arbitrary, non-crossing line segments, can we preprocess them into a data structure so that the ones that intersect a query rectangle can be reported efficiently? Scheme of structure 113/180
all left endpoints of all right endpoints of s , s , s , s , s , s s2 s , s , s , s , s , s { 2 5 6 7 9 22 } { 2 5 6 7 9 22 } s9 q s22 s 6 q
s7
s5 Heap and search tree 114/180
A priority search tree is like a heap on x-coordinate and binary search tree on y-coordinate at the same time
Recall the heap:
1
2 4
3 7 5 6
9 12 14 10 13 8 11 Heap and search tree 115/180
A priority search tree is like a heap on x-coordinate and binary search tree on y-coordinate at the same time
Recall the heap:
1
2 4
3 7 5 6
9 12 14 10 13 8 11
Report all values 4 ≤ Priority search tree 116/180
If P = /0, then a priority search tree is an empty leaf
Otherwise, let pmin be the leftmost point in P, and let ymid be the median y-coordinate of P pmin \{ } The priority search tree has a node ν that stores pmin and ymid, and a left subtree and right subtree for the points in P pmin with y-coordinate ymid and > ymid \{ } ≤
pmin ymid
ymid pmin Priority search tree 117/180
p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 118/180
p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 119/180
p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 120/180
p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 121/180
p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 122/180
p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 123/180
p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 124/180
p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 125/180
p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 126/180
p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 127/180
p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Query algorithm 128/180
Algorithm QUERYPRIOSEARCHTREE(T,( ∞ : qx] [qy : q ]) − × y0 1. Search with qy and qy0 in T 2. Let νsplit be the node where the two search paths split 3. for each node ν on the search path of qy or qy0 4. do if p(ν) ( ∞ : qx] [qy : q ] then report p(ν) ∈ − × y0 5. for each node ν on the path of qy in the left subtree of νsplit 6. do if the search path goes left at ν 7. then REPORTINSUBTREE(rc(ν),qx) 8. for each node ν on the path of qy0 in the right subtree of νsplit 9. do if the search path goes right at ν 10. then REPORTINSUBTREE(lc(ν),qx) Structure of the query 129/180 Structure of the query 130/180 Query algorithm 131/180
REPORTINSUBTREE(ν,qx) Input. The root ν of a subtree of a priority search tree and a value qx Output. All points in the subtree with x-coordinate at most qx 1. if ν is not a leaf and (p(ν))x qx ≤ 2. then Report p(ν) 3.R EPORTINSUBTREE(lc(ν),qx) 4.R EPORTINSUBTREE(rc(ν),qx)
This subroutine takes O(1 + k) time, for k reported answers Query algorithm 132/180
The search paths to y and y0 have O(logn) nodes. At each node O(1) time is spent
No nodes outside the search paths are ever visited
Subtrees of nodes between the search paths are queried like a heap, and we spend O(1 + k0) time on each one
The total query time is O(logn + k), if k points are reported Priority search tree: result 133/180
Theorem: A priority search tree for a set P of n points uses O(n) storage and can be built in O(nlogn) time. All points that lie in a 3-sided query range can be reported in O(logn + k) time, where k is the number of reported points Scheme of structure 134/180
all left endpoints of all right endpoints of s , s , s , s , s , s s2 s , s , s , s , s , s { 2 5 6 7 9 22 } { 2 5 6 7 9 22 } s9 q s22 s 6 q
s7
s5 Storage of the structure 135/180
Question: What are the storage requirements of the structure for querying with a vertical segment in a set of horizontal segments? Query time of the structure 136/180
Question: What is the query time of the structure for querying with a vertical segment in a set of horizontal segments? Result 137/180
Theorem: A set of n horizontal line segments can be stored in a data structure with size O(n) such that intersection queries with a vertical line segment can be performed in O(log2 n + k) time, where k is the number of segments reported Result 138/180
Recall that the windowing problem is solved with a combination of a range tree and the structure just described
Theorem: A set of n axis-parallel line segments can be stored in a data structure with size O(nlogn) such that windowing queries can be performed in O(log2 n + k) time, where k is the number of segments reported Windowing 139/180
Given a set of n arbitrary, non-crossing line segments, preprocess them into a data structure so that the ones that intersect a query rectangle can be reported efficiently Windowing 140/180
Two cases of intersection:
I An endpoint lies inside the query window; solve with range trees
I The segment intersects a side of the query window; solve how? Using a bounding box? 141/180
If the query window intersects the line segment, then it also intersects the bounding box of the line segment (whose sides are axis-parallel segments)
So we could search in the 4n bounding box sides Using a bounding box? 142/180
But: if the query window intersects bounding box sides does not imply that it intersects the corresponding segments Windowing 143/180
Current problem of our interest: Given a set of arbitrarily oriented, non-crossing line segments, preprocess them into a data structure so that the ones intersecting a vertical (horizontal) query segment can be reported efficiently Using an interval tree? 144/180
q q Interval querying 145/180
Given a set I of n intervals on the real line, preprocess them into a data structure so that the ones containing a query point (value) can be reported efficiently
We have the interval tree, but we will develop an alternative solution Interval querying 146/180
Given a set S = s1,s2,...,sn of n segments on the real line, preprocess them{ into a data structure} so that the ones containing a query point (value) can be reported efficiently
s3 s6 s2 s7 s1 s8 s4 s5
The new structure is called the segment tree Locus approach 147/180
The locus approach is the idea to partition the solution space into parts with equal answer sets
s3 s6 s2 s7 s1 s8 s4 s5
For the set S of segments, we get different answer sets before and after every endpoint Locus approach 148/180
Let p1, p2,..., pm be the sorted set of unique endpoints of the intervals; m 2n ≤ s3 s6 s2 s7 s1 s8 s4 s5 p1 p2 p3 p4 p5 p6 p7 p8
The real line is partitioned into
( ∞, p1), [p1, p1],(p1, p2), [p2, p2], (p2, p3),..., (pm,+∞), − these are called the elementary intervals Locus approach 149/180
We could make a binary search tree that has a leaf for every elementary interval ( ∞, p1), [p1, p1],(p1, p2), [p2, p2], (p2, p3),..., (pm,+∞) − Each segment from the set S can be stored with all leaves whose elementary interval it contains: [pi, p j] is stored with
[pi, pi],(pi, pi+1),..., [p j, p j]
A stabbing query with point q is then solved by finding the unique leaf that contains q, and reporting all segments that it stores Locus approach 150/180
s3 s6 s2 s7 s1 s8 s4 s5 ( , p )(p , p )(p , p )(p , p )(p , p )(p , p )(p , p )(p , p )(p , + ) −∞ 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 ∞ [p1, p1] [p2, p2] [p3, p3] [p4, p4] [p5, p5] [p6, p6] [p7, p7] [p8, p8] Locus approach 151/180
p1 p2 p3 p4 p5 p6 p7 p8 s3 s6 s2 s7 s1 s8 s4 s5 Locus approach 152/180
Question: What are the storage requirements and what is the query time of this solution? Towards segment trees 153/180
(pi, pi+2] In the tree, the leaves store elementary intervals
But each internal node corresponds to an interval too: (pi, pi+1) (pi+1, pi+2) the interval that is the union of [pi+1, pi+1] [pi+2, pi+2] the elementary intervals of all leaves below it
pi pi+1 pi+2 Towards segment trees 154/180
(p , p ] (p , + ) 2 4 6 ∞
(p1, p2]
p1 p2 p3 p4 p5 p6 p7 p8 s3 s6 s2 s7 s1 s8 s4 s5 Towards segment trees 155/180
Let Int(ν) denote the interval of node ν
To avoid quadratic storage, we store any segment s j as high as possible in the tree whose leaves correspond to elementary intervals
More precisely: s j is stored with ν if and only if
Int(ν) s j but Int(parent(ν)) s j ⊆ 6⊆ Towards segment trees 156/180
(pi 2, pi+2] −
(pi, pi+2] ν
(pi, pi+1) (pi+1, pi+2) [pi+1, pi+1] [pi+2, pi+2] sj
(pi 2, pi+2] − Int(parent(ν)) (pi, pi+2] Int(ν)
pi 2 pi 1 pi pi+1 pi+2 − − Segment trees 157/180
A segment tree on a set S of segments is a balanced binary search tree on the elementary intervals defined by S, and each node stores its interval, and its canonical subset of S in a list (unsorted)
The canonical subset (of S) of a node ν is the subset of segments s j for which
Int(ν) s j but Int(parent(ν)) s j ⊆ 6⊆ Segment trees 158/180
s1, s3, s4 s1
s3, s5 s6, s7 s1, s2 s5 s6
p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s3, s4 s5 s6 s7 s8 s7, s8 s7, s8 s3 s6 s2 s7 s1 s8 s4 s5 Segment trees 159/180
s1, s3, s4 s1
s3, s5 s6, s7 s1, s2 s5 s6
p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s3, s4 s5 s6 s7 s8 s7, s8 s7, s8 s3 s6 s2 s7 s1 s8 s4 s5 Segment trees 160/180
Question: Why are no segments stored with nodes on the leftmost and rightmost paths of the segment tree? Query algorithm 161/180
The query algorithm is trivial:
For a query point q, follow the path down the tree to the elementary interval that contains q, and report all segments stored in the lists with the nodes on that path Example query 162/180
s1, s3, s4 s1
s , s s6, s7 s1, s2 s5 3 5 s6
p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s3, s4 s5 s6 s7 s8 s7, s8 s7, s8 s3 s6 s2 s7 s1 s8 s4 s5 Example query 163/180
s1, s3, s4 s1
s3, s5 s6, s7 s1, s2 s5 s6
p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s3, s4 s5 s6 s7 s8 s7, s8 s7, s8 s3 s6 s2 s7 s1 s8 s4 s5 Query time 164/180
The query time is O(logn + k), where k is the number of segments reported Segments stored at many nodes 165/180
A segment can be stored in several lists of nodes. How bad can the storage requirements get? Segments stored at many nodes 166/180
Lemma: Any segment can be stored at up to two nodes of the same depth
Proof: Suppose a segment si is stored at three nodes ν1, ν2, and ν3 at the same depth from the root
parent(ν2) si si si ν1 ν2 ν3
si Segments stored at many nodes 167/180
If a segment tree has depth O(logn), then any segment is stored in at most O(logn) lists the total size of all lists is ⇒ O(nlogn)
The main tree uses O(n) storage
The storage requirements of a segment tree on n segments is O(nlogn) Result 168/180
Theorem: A segment tree storing n segments (=intervals) on the real line uses O(nlogn) storage, can be built in O(nlogn) time, and stabbing queries can be answered in O(logn + k) time, where k is the number of segments reported
Property: For any query, all segments containing the query point are stored in the lists of O(logn) nodes Back to windowing 169/180
Problem arising from windowing: Given a set of arbitrarily oriented, non-crossing line segments, preprocess them into a data structure so that the ones intersecting a vertical (horizontal) query segment can be reported efficiently Idea for solution 170/180
The main idea is to build a segment tree on the x-projections of the 2D segments, and replace the associated lists with a more suitable data structure 171/180
s1, s3, s4 s1
s3, s5 s6, s7 s1, s2 s5 s6
p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s3, s4 s5 s6 s7 s8 s7, s8 s7, s8 s3 s6 s2 s4 s7 s8 s1 s5 172/180
s1, s3, s4 s1
s3, s5 s6, s7 s1, s2 s5 s6
p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s3, s4 s5 s6 s7 s8 s7, s8 s7, s8 s3 s6 s2 s4 s7 s8 s1 s5 173/180
Observe that nodes now correspond to vertical slabs of the plane (with or without left and right bounding lines), and:
I if a segment si is stored with a node ν, then it crosses the slab of ν completely, but not the slab of the parent of ν
I the segments crossing a slab have a well-defined top-to-bottom order
s j sj is stored at one or more nodes below ν Int(ν) 174/180
s1, s3, s4
s5
p3 p4 s5
s3
s4 s1 s5 175/180
s1, s3, s4
s5
p3 p4 s5
s3
s4 s1 s5 Querying 176/180
Recall that a query is done with a vertical line segment q
Only segments of S stored with nodes on the path down the tree using the q x-coordinate of q can be answers
At any such node, the query problem is: which of the segments (that cross the slab completely) intersects the vertical query segment q? Querying 177/180
s7 s7
s6 s6 s6 s5 We store the canonical s5 s subset of a node ν in a 5 q s4 balanced binary search tree s4 s4 that follows the s3 bottom-to-top order in its s3 s3 leaves s2 s2 s1 s2 s1 s1 Data structure 178/180
A query with q follows one path down the main tree, using the x-coordinate of q
At each node, the associated tree is queried using the endpoints of q, as if it is a 1-dimensional range query
The query time is O(log2 n + k) Data structure 179/180
The data structure for intersection queries with a vertical query segment in a set of non-crossing line segments is a segment tree where the associated structures are binary search trees on the bottom-to-top order of the segments in the corresponding slab
Since it is a segment tree with lists replaced by trees, the storage remains O(nlogn) Result 180/180
Theorem: A set of n non-crossing line segments can be stored in a data structure of size O(nlogn) so that intersection queries with a vertical query segment can be answered in O(log2 n + k) time, where k is the number of answers reported
Theorem: A set of n non-crossing line segments can be stored in a data structure of size O(nlogn) so that windowing queries can be answered in O(log2 n + k) time, where k is the number of answers reported