Range & windowing queries 1/180

Geometric Algorithms

Range & windowing queries (2 lectures) Database queries 2/180

G. Ometer born: Aug 16, 1954 salary salary: $3,500

A database query may ask for all employees with age between a1 and a2, and salary between s1 and s2

19,500,000 19,559,999 date of birth Data structures 3/180

Idea of data structures

I Representation of structure, for convenience (like DCEL)

I Preprocessing of data, to be able to solve future questions really fast (sub-linear time) A (search) has a storage requirement, a query time, and a construction time (and an update time) 1D range query problem 4/180

1D range query problem: Preprocess a of n points on the real line such that the ones inside a 1D query range (interval) can be answered fast

The points p1,..., pn are known beforehand, the query [x,x0] only later

A solution to a query problem is a data structure, a query algorithm, and a construction algorithm

Question: What are the most important factors for the efficiency of a solution? Balanced binary search trees 5/180

A balanced binary search with the points in the leaves

49

23 80 10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97 Balanced binary search trees 6/180

The search path for 25

49

23 80 10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97 Balanced binary search trees 7/180

The search paths for 25 and for 90

49

23 80 10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97 Example 1D range query 8/180

A 1-dimensional range query with [25, 90]

49

23 80 10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97 Node types for a query 9/180

Three types of nodes for a given query:

I White nodes: never visited by the query

I Grey nodes: visited by the query, unclear if they lead to output

I Black nodes: visited by the query, whole subtree is output

Question: What query time do we hope for? Node types for a query 10/180

The query algorithm comes down to what we do at each type of node

Grey nodes: use query range to decide how to proceed: to not visit a subtree (pruning), to report a complete subtree, or just continue

Black nodes: traverse and enumerate all points in the leaves Example 1D range query 11/180

A 1-dimensional range query with [61, 90]

49

23 80 10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97 Example 1D range query 12/180

A 1-dimensional range query with [61, 90]

49 split node 23 80 10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97 1D range query algorithm 13/180

Algorithm 1DRANGEQUERY(T,[x : x0]) 1. νsplit FINDSPLITNODE(T,x,x ) ← 0 2. if νsplit is a leaf 3. then Check if the point in νsplit must be reported. 4. else ν lc(νsplit) ← 5. while ν is not a leaf 6. do if x xν ≤ 7. then REPORTSUBTREE(rc(ν)) 8. ν lc(ν) ← 9. else ν rc(ν) ← 10. Check if the point stored in ν must be reported. 11. Similarly, follow the path to x0, and ... Query time analysis 14/180

The efficiency analysis is based on counting the numbers of nodes visited for each type

I White nodes: never visited by the query; no time spent

I Grey nodes: visited by the query, unclear if they lead to output; time determines dependency on n

I Black nodes: visited by the query, whole subtree is output; time determines dependency on k, the output size Query time analysis 15/180

Grey nodes: they occur on only two paths in the tree, and since the tree is balanced, its depth is O(logn)

Black nodes: a (sub)tree with m leaves has m 1 internal − nodes; traversal visits O(m) nodes and finds m points for the output

The time spent at each node is O(1) O(logn + k) query time ⇒ Storage requirement and preprocessing 16/180

A (balanced) binary storing n points uses O(n) storage

A balanced storing n points can be built in O(n) time after sorting Result 17/180

Theorem: A set of n points on the real line can be preprocessed in O(nlogn) time into a data structure of O(n) size so that any 1D range query can be answered in O(logn + k) time, where k is the number of answers reported : good in practice, but not so good worst-case query time

Kd-trees: queries take O(√n + k) (Chapter 5.2) range trees: today

Range queries in 2D 18/180 Range queries in 2D 18/180

quadtrees: good in practice, but not so good worst-case query time

Kd-trees: queries take O(√n + k) (Chapter 5.2) range trees: today Back to 1D range queries 19/180

A 1-dimensional range query with [61, 90]

49 split node 23 80 10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97 Examining 1D range queries 20/180

Observation: Ignoring the search path leaves, all answers are jointly represented by the highest nodes strictly between the two search paths

Question: How many highest nodes between the search paths can there be? Examining 1D range queries 21/180

For any 1D range query, we can identify O(logn) nodes that together represent all answers to a 1D range query Toward 2D range queries 22/180

For any 2d range query, we can identify O(logn) nodes that together represent all points that have a correct first coordinate Toward 2D range queries 23/180

(1, 5) (3, 8) (4, 2) (5, 9) (6, 7) (7, 3) (8, 1) (9, 4) Toward 2D range queries 24/180

(1, 5) (3, 8) (4, 2) (5, 9) (6, 7) (7, 3) (8, 1) (9, 4) Toward 2D range queries 25/180

data structure for searching on y-coordinate

(1, 5) (3, 8) (4, 2) (5, 9) (6, 7) (7, 3) (8, 1) (9, 4) Toward 2D range queries 26/180

(5, 9) (3, 8) (6, 7) (1, 5) (9, 4) (7, 3) (4, 2) (8, 1)

(1, 5) (3, 8) (4, 2) (5, 9) (6, 7) (7, 3) (8, 1) (9, 4) 2D range trees 27/180

Every internal node stores a whole tree in an associated structure, on y-coordinate

Question: How much storage does this take? Storage of 2D range trees 28/180

To analyze storage, two arguments can be used:

I By level: On each level, any point is stored exactly once. So all associated trees on one level together have O(n) size

I By point: For any point, it is stored in the associated structures of its search path. So it is stored in O(logn) of them Construction algorithm 29/180

Algorithm BUILD2DRANGETREE(P) 1. Construct the associated structure: Build a binary search tree Tassoc on the set Py of y-coordinates in P 2. if P contains only one point 3. then Create a leaf ν storing this point, and make Tassoc the associated structure of ν. 4. else Split P into Pleft and Pright, the subsets and > ≤ the median x-coordinate xmid 5. νleft BUILD2DRANGETREE(Pleft) ← 6. νright BUILD2DRANGETREE(Pright) ← 7. Create a node ν storing xmid, make νleft the left child of ν, make νright the right child of ν, and make Tassoc the associated structure of ν 8. return ν Efficiency of construction 30/180

The construction algorithm takes O(nlog2 n) time

T(1) = O(1)

T(n) = 2 T(n/2) + O(nlogn) ·

which solves to O(nlog2 n) time Efficiency of construction 31/180

Suppose we pre-sort P on y-coordinate, and whenever we split P into Pleft and Pright, we keep the y-order in both subsets

For a sorted set, the associated structure can be built in linear time Efficiency of construction 32/180

The adapted construction algorithm takes O(nlogn) time

T(1) = O(1)

T(n) = 2 T(n/2) + O(n) ·

which solves to O(nlogn) time 2D range queries 33/180

How are queries performed and why are they correct?

I Are we sure that each answer is found?

I Are we sure that the same point is found only once? 2D range queries 34/180

ν p

p

µ µ0 p p Query algorithm 35/180

Algorithm 2DRANGEQUERY(T,[x : x ] [y : y ]) 0 × 0 1. νsplit FINDSPLITNODE(T,x,x ) ← 0 2. if νsplit is a leaf 3. then report the point stored at νsplit, if an answer 4. else ν lc(νsplit) ← 5. while ν is not a leaf 6. do if x xν ≤ 7. then 1DRANGEQ(Tassoc(rc(ν)),[y : y0]) 8. ν lc(ν) ← 9. else ν rc(ν) ← 10. Check if the point stored at ν must be reported. 11. Similarly, follow the path from rc(νsplit) to x0 ... 2D range query time 36/180

Question: How much time does a 2D range query take?

Subquestions: In how many associated structures do we search? How much time does each such search take? 2D range queries 37/180

ν

µ µ0 2D range query efficiency 38/180

We search in O(logn) associated structures to perform a 1D range query; at most two per level of the main tree

The query time is O(logn) O(logm + k ), or × 0

∑O(lognν + kν ) ν

where ∑kν = k the number of points reported 2D range query efficiency 39/180

Use the concept of grey and black nodes again: 2D range query efficiency 40/180

The number of grey nodes is O(log2 n)

The number of black nodes is O(k) if k points are reported

The query time is O(log2 n + k), where k is the size of the output Result 41/180

Theorem: A set of n points in the plane can be preprocessed in O(nlogn) time into a data structure of O(nlogn) size so that any 2D range query can be answered in O(log2 n + k) time, where k is the number of answers reported

In contrast, a kd-tree has O(n) size and answers queries in O(√n + k) time (Chapter 5.2). Higher dimensional range trees 42/180

A d-dimensional has a main tree which is a one-dimensional balanced binary search tree on the first coordinate, where every node has a pointer to an associated structure that is a (d 1)-dimensional range tree on− the other coordinates Storage 43/180

The size Sd(n) of a d-dimensional range tree satisfies:

S1(n) = O(n) for all n

Sd(1) = O(1) for all d

Sd(n) 2 Sd(n/2) + Sd 1(n) for d 2 ≤ · − ≥

d This solves to Sd(n) = O(nlog n) Query time 44/180

The number of grey nodes Gd(n) satisfies:

G1(n) = O(logn) for all n

Gd(1) = O(1) for all d

Gd(n) 2 logn + 2 logn Gd 1(n) for d 2 ≤ · · · − ≥

d This solves to Gd(n) = O(log n) Result 45/180

Theorem: A set of n points in d-dimensional space can be d 1 preprocessed in O(nlog − n) time into a data structure of d 1 O(nlog − n) size so that any d-dimensional range query can be answered in O(logd n + k) time, where k is the number of answers reported Improving the query time 46/180

We can improve the query time of a 2D range tree from O(log2 n) to O(logn) by a technique called fractional cascading

This automatically lowers the query time in d dimensions to d 1 O(log − n) time Improving the query time 47/180

The idea illustrated best by a different query problem:

Suppose that we have a collection of sets S1,...,Sm, where S1 = n and where Si 1 Si | | + ⊆ We want a data structure that can report for a query number x, the smallest value x in all sets S1,...,Sm ≥ Improving the query time 48/180

1 2 3 5 8 13 21 34 55 S1

1 3 5 8 13 21 34 55 S2

1 3 13 21 34 55 S3

3 34 55 S4 Improving the query time 49/180

1 2 3 5 8 13 21 34 55 S1

1 3 5 8 13 21 34 55 S2

1 3 13 21 34 55 S3

3 34 55 S4 Improving the query time 50/180

1 2 3 5 8 13 21 34 55 S1

1 3 5 8 13 21 34 55 S2

1 3 13 21 34 55 S3

3 34 55 S4 Improving the query time 51/180

Suppose that we have a collection of sets S1,...,Sm, where S1 = n and where Si 1 Si | | + ⊆ We want a data structure that can report for a query number x, the smallest value x in all sets S1,...,Sm ≥ This query problem can be solved in O(logn +m) time instead of O(m logn) time · Improving the query time 52/180

Can we do something similar for m 1-dimensional range queries on m sets S1,...,Sm?

We hope to get a query time of O(logn + m + k) with k the total number of points reported Improving the query time 53/180

1 2 3 5 8 13 21 34 55 S1

1 3 5 8 13 21 34 55 S2

1 3 13 21 34 55 S3

3 34 55 S4 Improving the query time 54/180

1 2 3 5 8 13 21 34 55 S1

1 3 5 8 13 21 34 55 S2

1 3 13 21 34 55 S3

3 34 55 S4 Improving the query time 55/180

[6,35]

1 2 3 5 8 13 21 34 55 S1

1 3 5 8 13 21 34 55 S2

1 3 13 21 34 55 S3

3 34 55 S4 Fractional cascading 56/180

Now we do “the same” on the associated structures of a 2-dimensional range tree

Note that in every associated structure, we search with the same values y and y0

I Replace all associated structure except for the root by a linked list

I For every list element (and leaf of the associated structure of the root), store two pointers to the appropriate list elements in the lists of the left child and of the right child Fractional cascading 57/180 Fractional cascading 58/180 Fractional cascading 59/180

17

8 52 5 15 33 58

2 7 12 17 21 41 58 67

2 5 7 8 12 15 21 33 41 52 67 93 (2, 19) (7, 10) (12, 3) (17, 62) (21, 49) (41, 95) (58, 59) (93, 70) (5, 80) (8, 37) (15, 99) (33, 30) (52, 23) (67, 89) Fractional cascading 60/180

3 10 19 23 30 37 49 59 62 70 80 89 95 99

3 10 19 37 62 80 99 23 30 49 59 70 89 95

10 19 37 80 3 62 99 23 30 49 95 59 70 89

19 80 10 37 3 99 62 30 49 23 95 59 70 89

19 80 10 37 3 99 49 30 95 23 89 70 Fractional cascading 61/180

[4, 58] [19, 65] 17 ×

8 52 5 15 33 58

2 7 12 17 21 41 58 67

2 5 7 8 12 15 21 33 41 52 67 93 (2, 19) (7, 10) (12, 3) (17, 62) (21, 49) (41, 95) (58, 59) (93, 70) (5, 80) (8, 37) (15, 99) (33, 30) (52, 23) (67, 89) Fractional cascading 62/180

3 10 19 23 30 37 49 59 62 70 80 89 95 99

3 10 19 37 62 80 99 23 30 49 59 70 89 95

10 19 37 80 3 62 99 23 30 49 95 59 70 89

19 80 10 37 3 99 62 30 49 23 95 59 70 89

19 80 10 37 3 99 49 30 95 23 89 70 Fractional cascading 63/180

3 10 19 23 30 37 49 59 62 70 80 89 95 99

3 10 19 37 62 80 99 23 30 49 59 70 89 95

10 19 37 80 3 62 99 23 30 49 95 59 70 89

19 80 10 37 3 99 62 30 49 23 95 59 70 89

19 80 10 37 3 99 49 30 95 23 89 70 Fractional cascading 64/180

Instead of doing a 1D range query on the associated structure of some node ν, we find the leaf where the search to y would end in O(1) time via the direct pointer in the associated structure in the parent of ν

The number of grey nodes reduces to O(logn) Result 65/180

Theorem: A set of n points in d-dimensional space with d 2 d 1 ≥ can be preprocessed in O(nlog − n) time into a data d 1 structure of O(nlog − n) size so that any d-dimensional d 1 range query can be answered in O(log − n + k) time, where k is the number of answers reported.

Multiple points with the same x- or y-coordinate need to be handled with care. Windowing 66/180

Zoom in; re-center and zoom in; select by outlining Windowing 67/180 Windowing 68/180

Given a set of n axis-parallel line segments, preprocess them into a data structure so that the ones that intersect a query rectangle can be reported efficiently Windowing 69/180

How can a rectangle and an axis-parallel line segment intersect? Windowing 70/180

Essentially two types:

I Segments whose endpoint lies in the rectangle (or both endpoints)

I Segments with both endpoints outside the rectangle

Segments of the latter type always intersect the boundary of the rectangle (even the left and/or bottom side) Windowing 71/180

Instead of storing axis-parallel segments and searching with a rectangle, we will:

I store the segment endpoints and query with the rectangle

I store the segments and query with the left side and the bottom side of the rectangle

Note that the query problem is at least as hard as rectangular in point sets Windowing 72/180

Instead of storing axis-parallel segments and searching with a rectangle, we will:

I store the segment endpoints and query with the rectangle

I store the segments and query with the left side and the bottom side of the rectangle

Question: How often might we report the same segment? Windowing 73/180

Instead of storing axis-parallel segments and searching with a rectangle, we will:

I store the segment endpoints and query with the rectangle use range tree

I store the segments and query with the left side and the bottom side of the rectangle need to develop data structure Windowing 74/180

Current problem of our interest: Given a set of horizontal (vertical) line segments, preprocess them into a data structure so that the ones intersecting a vertical (horizontal) query segment can be reported efficiently Windowing 75/180

Simpler query problem: What if the vertical query segment is a full line?

Then the problem is essentially 1-dimensional Interval querying 76/180

Given a set I of n intervals on the real line, preprocess them into a data structure so that the ones containing a query point (value) can be reported efficiently Splitting a set of intervals 77/180

The median x of the 2n endpoints partitions the intervals into three subsets:

I Intervals Ileft fully left of x

I Intervals Imid that contain (intersect) x

I Intervals Iright fully right of x

x : recursive definition 78/180

The interval tree for I has a root node ν that contains x and

I the intervals Ileft are stored in the left subtree of ν

I the intervals Imid are stored with ν

I the intervals Iright are stored in the right subtree of ν

The left and right subtrees are proper interval trees for Ileft and Iright

How many intervals can be in Imid? How should we store Imid? Interval tree: left and right lists 79/180

How is Imid stored? x

Observe: If the query point is left of x, then only the left endpoint determines if an interval is an answer

Symmetrically: If the query point is right of x, then only the right endpoint determines if an interval is an answer Interval tree: left and right lists 80/180

x

Make a list Lleft using the left-to-right order of the left endpoints of Imid

Make a list Lright using the right-to-left order of the right endpoints of Imid

Store both lists as associated structures with ν Interval tree: example 81/180

s5, s6, s7 Lleft s7, s5, s6 Lright

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s12, s11 s1 s8 s11, s12

s2 s6 s3 s7 s10 s12 s5

s1 s4 s8 s9 s11 Interval tree: storage 82/180

The main tree has O(n) nodes

The total length of all lists is 2n because each interval is stored exactly twice: in Lleft and Lright and only at one node

Consequently, the interval tree uses O(n) storage Interval querying 83/180

Algorithm QUERYINTERVALTREE(ν,qx) 1. if ν is not a leaf 2. then if qx < xmid(ν) 3. then Traverse list Lleft(ν), starting at the interval with the leftmost endpoint, reporting all the intervals that contain qx. Stop as soon as an interval does not contain qx. 4.Q UERYINTERVALTREE(lc(ν),qx) 5. else Traverse list Lright(ν), starting at the interval with the rightmost endpoint, reporting all the intervals that contain qx. Stop as soon as an interval does not contain qx. 6.Q UERYINTERVALTREE(rc(ν),qx) Interval tree: query example 84/180

s5, s6, s7 Lleft s7, s5, s6 Lright

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s11, s12 s1 s8 s12, s11

s2 s6 s3 s7 s10 s12 s5

s1 s4 s8 s9 s11 Interval tree: query example 85/180

s5, s6, s7 Lleft s7, s5, s6 Lright

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s11, s12 s1 s8 s12, s11

s2 s6 s3 s7 s10 s12 s5

s1 s4 s8 s9 s11 Interval tree: query example 86/180

s5, s6, s7 Lleft s7, s5, s6 Lright

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s11, s12 s1 s8 s12, s11

s2 s6 s3 s7 s10 s12 s5

s1 s4 s8 s9 s11 Interval tree: query example 87/180

s5, s6, s7 Lleft s7, s5, s6 Lright

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s11, s12 s1 s8 s12, s11

s2 s6 s3 s7 s10 s12 s5

s1 s4 s8 s9 s11 Interval tree: query example 88/180

s5, s6, s7 Lleft s7, s5, s6 Lright

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s11, s12 s1 s8 s12, s11

s2 s6 s3 s7 s10 s12 s5

s1 s4 s8 s9 s11 Interval tree: query example 89/180

s5, s6, s7 Lleft s7, s5, s6 Lright

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s11, s12 s1 s8 s12, s11

s2 s6 s3 s7 s10 s12 s5

s1 s4 s8 s9 s11 Interval tree: query example 90/180

s5, s6, s7 Lleft s7, s5, s6 Lright

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s11, s12 s1 s8 s12, s11

s2 s6 s3 s7 s10 s12 s5

s1 s4 s8 s9 s11 Interval tree: query example 91/180

s5, s6, s7 Lleft s7, s5, s6 Lright

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s11, s12 s1 s8 s12, s11

s2 s6 s3 s7 s10 s12 s5

s1 s4 s8 s9 s11 Interval tree: query example 92/180

s5, s6, s7 Lleft s7, s5, s6 Lright

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s11, s12 s1 s8 s12, s11

s2 s6 s3 s7 s10 s12 s5

s1 s4 s8 s9 s11 Interval tree: query example 93/180

s5, s6, s7 Lleft s7, s5, s6 Lright

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s11, s12 s1 s8 s12, s11

s2 s6 s3 s7 s10 s12 s5

s1 s4 s8 s9 s11 Interval tree: query example 94/180

s5, s6, s7 Lleft s7, s5, s6 Lright

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s11, s12 s1 s8 s12, s11

s2 s6 s3 s7 s10 s12 s5

s1 s4 s8 s9 s11 Interval tree: query time 95/180

The query follows only one path in the tree, and that path has length O(logn)

The query traverses O(logn) lists. Traversing a list with k0 answers takes O(1 + k0) time

The total time for list traversal is therefore O(log+k), with the total number of answers reported (no answer is found more than once)

The query time is O(logn) + O(logn + k) = O(logn + k) Interval tree: query example 96/180

Algorithm CONSTRUCTINTERVALTREE(I) Input. A set I of intervals on the real line Output. The root of an interval tree for I 1. if I = /0 2. then return an empty leaf 3. else Create a node ν. Compute xmid, the median of the set of interval endpoints, and store xmid with ν 4. Compute Imid and construct two sorted lists for Imid: a list Lleft(ν) sorted on left endpoint and a list Lright(ν) sorted on right endpoint. Store these two lists at ν 5. lc(ν) CONSTRUCTINTERVALTREE(Ileft) ← 6. rc(ν) CONSTRUCTINTERVALTREE(Iright) ← 7. return ν Interval tree: result 97/180

Theorem: An interval tree for a set I of n intervals uses O(n) storage and can be built in O(nlogn) time. All intervals that contain a query point can be reported in O(logn + k) time, where k is the number of reported intervals. Back to the plane 98/180 Back to the plane 99/180

Suppose we use an interval tree on the x-intervals of the horizontal line segments?

Then the lists Lleft and Lright are not suitable anymore to solve the query problem for the segments corresponding to Imid Back to the plane 100/180

s5, s6, s7 s7, s5, s6

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s11, s12 s1 s8 s12, s11

s2 s s6 8 s4 s1 s12 s s3 7 s9 s11 s5 s10 Back to the plane 101/180

s5, s6, s7 s7, s5, s6

s4, s3, s2 s4, s3, s2 s9, s10 s9, s10

s1 s8 s11, s12 s1 s8 s12, s11

s2 s s6 8 s4 s1 s12 q s s3 7 s9 s11 s5 s10 Back to the plane 102/180

s5, s6, s7 s7, s5, s6

s6 q s7

s5 Back to the plane 103/180

s5, s6, s7 s7, s5, s6

s6 q s7

s5 Back to the plane 104/180

s , s , s , s , s , s s , s , s , s , s , s { 2 5 6 7 9 22 } { 2 5 6 7 9 22 }

s2

s9 q s22 s6

s7

s5 Back to the plane 105/180

s , s , s , s , s , s s , s , s , s , s , s { 2 5 6 7 9 22 } { 2 5 6 7 9 22 }

s2

s9 q s22 s6

s7

s5 Back to the plane 106/180

s , s , s , s , s , s s , s , s , s , s , s { 2 5 6 7 9 22 } { 2 5 6 7 9 22 }

s2

s9 q s22 s 6 q

s7

s5 Segment intersection queries 107/180

We can use a range tree as the associated structure; we only need one that stores all of the endpoints, to replace Lleft and Lright

Instead of traversing Lleft or Lright, we perform a query with the region left or right, respectively, of q Segment intersection queries 108/180

all endpoints of s2 s , s , s , s , s , s { 2 5 6 7 9 22 } s9 q s22 s 6 q

s7

s5 Segment intersection queries 109/180

In total, there are O(n) range trees that together store 2n points, so the total storage needed by all associated structures is O(nlogn)

A query with a vertical segment leads to O(logn) range queries If fractional cascading is used in the associated structures, the overall query time is O(log2 n + k)

Question: How about the construction time? Theorem: A set of n axis-parallel line segments can be stored in a data structure of size O(nlogn) such that windowing queries can be performed in O(log2 n + k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time.

Result 110/180

Theorem: A set of n horizontal line segments can be stored in a data structure of size O(nlogn) such that intersection queries with a vertical line segment can be performed in O(log2 n + k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time. Result 110/180

Theorem: A set of n horizontal line segments can be stored in a data structure of size O(nlogn) such that intersection queries with a vertical line segment can be performed in O(log2 n + k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time.

Theorem: A set of n axis-parallel line segments can be stored in a data structure of size O(nlogn) such that windowing queries can be performed in O(log2 n + k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time. Result 110/180

Theorem: A set of n horizontal line segments can be stored in a data structure of size O(nlogn) such that intersection queries with a vertical line segment can be performed in O(log2 n + k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time.

Theorem: A set of n axis-parallel line segments can be stored in a data structure of size O(nlogn) such that windowing queries can be performed in O(log2 n + k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time. Outlook: 3- and 4-sided ranges 111/180

Considering the associated structure, we only need 3-sided range queries, whereas the range tree provides 4-sided range queries

Can the 3-sided range query problem be solved more efficiently than the 4-sided (rectangular) range query problem? Outlook: arbitrary line segments 112/180

Given a set of n arbitrary, non-crossing line segments, can we preprocess them into a data structure so that the ones that intersect a query rectangle can be reported efficiently? Scheme of structure 113/180

all left endpoints of all right endpoints of s , s , s , s , s , s s2 s , s , s , s , s , s { 2 5 6 7 9 22 } { 2 5 6 7 9 22 } s9 q s22 s 6 q

s7

s5 and search tree 114/180

A priority search tree is like a heap on x-coordinate and binary search tree on y-coordinate at the same time

Recall the heap:

1

2 4

3 7 5 6

9 12 14 10 13 8 11 Heap and search tree 115/180

A priority search tree is like a heap on x-coordinate and binary search tree on y-coordinate at the same time

Recall the heap:

1

2 4

3 7 5 6

9 12 14 10 13 8 11

Report all values 4 ≤ Priority search tree 116/180

If P = /0, then a priority search tree is an empty leaf

Otherwise, let pmin be the leftmost point in P, and let ymid be the median y-coordinate of P pmin \{ } The priority search tree has a node ν that stores pmin and ymid, and a left subtree and right subtree for the points in P pmin with y-coordinate ymid and > ymid \{ } ≤

pmin ymid

ymid pmin Priority search tree 117/180

p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 118/180

p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 119/180

p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 120/180

p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 121/180

p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 122/180

p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 123/180

p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 124/180

p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 125/180

p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 126/180

p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Priority search tree 127/180

p6 p11 p4 p1 p p6 8 p11 p 5 p8 p4 p13 p13 p5 p1 p10 p10 p2 p14 p7 p7 p14 p12 p3 p9 p2 p12 p3 p9 Query algorithm 128/180

Algorithm QUERYPRIOSEARCHTREE(T,( ∞ : qx] [qy : q ]) − × y0 1. Search with qy and qy0 in T 2. Let νsplit be the node where the two search paths split 3. for each node ν on the search path of qy or qy0 4. do if p(ν) ( ∞ : qx] [qy : q ] then report p(ν) ∈ − × y0 5. for each node ν on the path of qy in the left subtree of νsplit 6. do if the search path goes left at ν 7. then REPORTINSUBTREE(rc(ν),qx) 8. for each node ν on the path of qy0 in the right subtree of νsplit 9. do if the search path goes right at ν 10. then REPORTINSUBTREE(lc(ν),qx) Structure of the query 129/180 Structure of the query 130/180 Query algorithm 131/180

REPORTINSUBTREE(ν,qx) Input. The root ν of a subtree of a priority search tree and a value qx Output. All points in the subtree with x-coordinate at most qx 1. if ν is not a leaf and (p(ν))x qx ≤ 2. then Report p(ν) 3.R EPORTINSUBTREE(lc(ν),qx) 4.R EPORTINSUBTREE(rc(ν),qx)

This subroutine takes O(1 + k) time, for k reported answers Query algorithm 132/180

The search paths to y and y0 have O(logn) nodes. At each node O(1) time is spent

No nodes outside the search paths are ever visited

Subtrees of nodes between the search paths are queried like a heap, and we spend O(1 + k0) time on each one

The total query time is O(logn + k), if k points are reported Priority search tree: result 133/180

Theorem: A priority search tree for a set P of n points uses O(n) storage and can be built in O(nlogn) time. All points that lie in a 3-sided query range can be reported in O(logn + k) time, where k is the number of reported points Scheme of structure 134/180

all left endpoints of all right endpoints of s , s , s , s , s , s s2 s , s , s , s , s , s { 2 5 6 7 9 22 } { 2 5 6 7 9 22 } s9 q s22 s 6 q

s7

s5 Storage of the structure 135/180

Question: What are the storage requirements of the structure for querying with a vertical segment in a set of horizontal segments? Query time of the structure 136/180

Question: What is the query time of the structure for querying with a vertical segment in a set of horizontal segments? Result 137/180

Theorem: A set of n horizontal line segments can be stored in a data structure with size O(n) such that intersection queries with a vertical line segment can be performed in O(log2 n + k) time, where k is the number of segments reported Result 138/180

Recall that the windowing problem is solved with a combination of a range tree and the structure just described

Theorem: A set of n axis-parallel line segments can be stored in a data structure with size O(nlogn) such that windowing queries can be performed in O(log2 n + k) time, where k is the number of segments reported Windowing 139/180

Given a set of n arbitrary, non-crossing line segments, preprocess them into a data structure so that the ones that intersect a query rectangle can be reported efficiently Windowing 140/180

Two cases of intersection:

I An endpoint lies inside the query window; solve with range trees

I The segment intersects a side of the query window; solve how? Using a bounding box? 141/180

If the query window intersects the line segment, then it also intersects the bounding box of the line segment (whose sides are axis-parallel segments)

So we could search in the 4n bounding box sides Using a bounding box? 142/180

But: if the query window intersects bounding box sides does not imply that it intersects the corresponding segments Windowing 143/180

Current problem of our interest: Given a set of arbitrarily oriented, non-crossing line segments, preprocess them into a data structure so that the ones intersecting a vertical (horizontal) query segment can be reported efficiently Using an interval tree? 144/180

q q Interval querying 145/180

Given a set I of n intervals on the real line, preprocess them into a data structure so that the ones containing a query point (value) can be reported efficiently

We have the interval tree, but we will develop an alternative solution Interval querying 146/180

Given a set S = s1,s2,...,sn of n segments on the real line, preprocess them{ into a data structure} so that the ones containing a query point (value) can be reported efficiently

s3 s6 s2 s7 s1 s8 s4 s5

The new structure is called the Locus approach 147/180

The locus approach is the idea to partition the solution space into parts with equal answer sets

s3 s6 s2 s7 s1 s8 s4 s5

For the set S of segments, we get different answer sets before and after every endpoint Locus approach 148/180

Let p1, p2,..., pm be the sorted set of unique endpoints of the intervals; m 2n ≤ s3 s6 s2 s7 s1 s8 s4 s5 p1 p2 p3 p4 p5 p6 p7 p8

The real line is partitioned into

( ∞, p1), [p1, p1],(p1, p2), [p2, p2], (p2, p3),..., (pm,+∞), − these are called the elementary intervals Locus approach 149/180

We could make a binary search tree that has a leaf for every elementary interval ( ∞, p1), [p1, p1],(p1, p2), [p2, p2], (p2, p3),..., (pm,+∞) − Each segment from the set S can be stored with all leaves whose elementary interval it contains: [pi, p j] is stored with

[pi, pi],(pi, pi+1),..., [p j, p j]

A stabbing query with point q is then solved by finding the unique leaf that contains q, and reporting all segments that it stores Locus approach 150/180

s3 s6 s2 s7 s1 s8 s4 s5 ( , p )(p , p )(p , p )(p , p )(p , p )(p , p )(p , p )(p , p )(p , + ) −∞ 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 ∞ [p1, p1] [p2, p2] [p3, p3] [p4, p4] [p5, p5] [p6, p6] [p7, p7] [p8, p8] Locus approach 151/180

p1 p2 p3 p4 p5 p6 p7 p8 s3 s6 s2 s7 s1 s8 s4 s5 Locus approach 152/180

Question: What are the storage requirements and what is the query time of this solution? Towards segment trees 153/180

(pi, pi+2] In the tree, the leaves store elementary intervals

But each internal node corresponds to an interval too: (pi, pi+1) (pi+1, pi+2) the interval that is the union of [pi+1, pi+1] [pi+2, pi+2] the elementary intervals of all leaves below it

pi pi+1 pi+2 Towards segment trees 154/180

(p , p ] (p , + ) 2 4 6 ∞

(p1, p2]

p1 p2 p3 p4 p5 p6 p7 p8 s3 s6 s2 s7 s1 s8 s4 s5 Towards segment trees 155/180

Let Int(ν) denote the interval of node ν

To avoid quadratic storage, we store any segment s j as high as possible in the tree whose leaves correspond to elementary intervals

More precisely: s j is stored with ν if and only if

Int(ν) s j but Int(parent(ν)) s j ⊆ 6⊆ Towards segment trees 156/180

(pi 2, pi+2] −

(pi, pi+2] ν

(pi, pi+1) (pi+1, pi+2) [pi+1, pi+1] [pi+2, pi+2] sj

(pi 2, pi+2] − Int(parent(ν)) (pi, pi+2] Int(ν)

pi 2 pi 1 pi pi+1 pi+2 − − Segment trees 157/180

A segment tree on a set S of segments is a balanced binary search tree on the elementary intervals defined by S, and each node stores its interval, and its canonical subset of S in a list (unsorted)

The canonical subset (of S) of a node ν is the subset of segments s j for which

Int(ν) s j but Int(parent(ν)) s j ⊆ 6⊆ Segment trees 158/180

s1, s3, s4 s1

s3, s5 s6, s7 s1, s2 s5 s6

p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s3, s4 s5 s6 s7 s8 s7, s8 s7, s8 s3 s6 s2 s7 s1 s8 s4 s5 Segment trees 159/180

s1, s3, s4 s1

s3, s5 s6, s7 s1, s2 s5 s6

p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s3, s4 s5 s6 s7 s8 s7, s8 s7, s8 s3 s6 s2 s7 s1 s8 s4 s5 Segment trees 160/180

Question: Why are no segments stored with nodes on the leftmost and rightmost paths of the segment tree? Query algorithm 161/180

The query algorithm is trivial:

For a query point q, follow the path down the tree to the elementary interval that contains q, and report all segments stored in the lists with the nodes on that path Example query 162/180

s1, s3, s4 s1

s , s s6, s7 s1, s2 s5 3 5 s6

p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s3, s4 s5 s6 s7 s8 s7, s8 s7, s8 s3 s6 s2 s7 s1 s8 s4 s5 Example query 163/180

s1, s3, s4 s1

s3, s5 s6, s7 s1, s2 s5 s6

p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s3, s4 s5 s6 s7 s8 s7, s8 s7, s8 s3 s6 s2 s7 s1 s8 s4 s5 Query time 164/180

The query time is O(logn + k), where k is the number of segments reported Segments stored at many nodes 165/180

A segment can be stored in several lists of nodes. How bad can the storage requirements get? Segments stored at many nodes 166/180

Lemma: Any segment can be stored at up to two nodes of the same depth

Proof: Suppose a segment si is stored at three nodes ν1, ν2, and ν3 at the same depth from the root

parent(ν2) si si si ν1 ν2 ν3

si Segments stored at many nodes 167/180

If a segment tree has depth O(logn), then any segment is stored in at most O(logn) lists the total size of all lists is ⇒ O(nlogn)

The main tree uses O(n) storage

The storage requirements of a segment tree on n segments is O(nlogn) Result 168/180

Theorem: A segment tree storing n segments (=intervals) on the real line uses O(nlogn) storage, can be built in O(nlogn) time, and stabbing queries can be answered in O(logn + k) time, where k is the number of segments reported

Property: For any query, all segments containing the query point are stored in the lists of O(logn) nodes Back to windowing 169/180

Problem arising from windowing: Given a set of arbitrarily oriented, non-crossing line segments, preprocess them into a data structure so that the ones intersecting a vertical (horizontal) query segment can be reported efficiently Idea for solution 170/180

The main idea is to build a segment tree on the x-projections of the 2D segments, and replace the associated lists with a more suitable data structure 171/180

s1, s3, s4 s1

s3, s5 s6, s7 s1, s2 s5 s6

p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s3, s4 s5 s6 s7 s8 s7, s8 s7, s8 s3 s6 s2 s4 s7 s8 s1 s5 172/180

s1, s3, s4 s1

s3, s5 s6, s7 s1, s2 s5 s6

p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s3, s4 s5 s6 s7 s8 s7, s8 s7, s8 s3 s6 s2 s4 s7 s8 s1 s5 173/180

Observe that nodes now correspond to vertical slabs of the plane (with or without left and right bounding lines), and:

I if a segment si is stored with a node ν, then it crosses the slab of ν completely, but not the slab of the parent of ν

I the segments crossing a slab have a well-defined top-to-bottom order

s j sj is stored at one or more nodes below ν Int(ν) 174/180

s1, s3, s4

s5

p3 p4 s5

s3

s4 s1 s5 175/180

s1, s3, s4

s5

p3 p4 s5

s3

s4 s1 s5 Querying 176/180

Recall that a query is done with a vertical line segment q

Only segments of S stored with nodes on the path down the tree using the q x-coordinate of q can be answers

At any such node, the query problem is: which of the segments (that cross the slab completely) intersects the vertical query segment q? Querying 177/180

s7 s7

s6 s6 s6 s5 We store the canonical s5 s subset of a node ν in a 5 q s4 balanced binary search tree s4 s4 that follows the s3 bottom-to-top order in its s3 s3 leaves s2 s2 s1 s2 s1 s1 Data structure 178/180

A query with q follows one path down the main tree, using the x-coordinate of q

At each node, the associated tree is queried using the endpoints of q, as if it is a 1-dimensional range query

The query time is O(log2 n + k) Data structure 179/180

The data structure for intersection queries with a vertical query segment in a set of non-crossing line segments is a segment tree where the associated structures are binary search trees on the bottom-to-top order of the segments in the corresponding slab

Since it is a segment tree with lists replaced by trees, the storage remains O(nlogn) Result 180/180

Theorem: A set of n non-crossing line segments can be stored in a data structure of size O(nlogn) so that intersection queries with a vertical query segment can be answered in O(log2 n + k) time, where k is the number of answers reported

Theorem: A set of n non-crossing line segments can be stored in a data structure of size O(nlogn) so that windowing queries can be answered in O(log2 n + k) time, where k is the number of answers reported