Uncertainty Models in

A thesis submitted to fulfil requirements for the degree of Doctor of Philosophy

Patrick Eades

Faculty of Engineering The University of Sydney

June 30, 2020 2 Abstract

In recent years easily and cheaply available internet-connected devices have enabled the collection of vast amounts of data, which has driven a continued interest in efficient, elegant combinatorial algorithms with mathematical guaran- tees. Much of this data contains an inherent element of uncertainty; whether because of imperfect measurements, because the data contains predictions about the future, or because the data is derived from machine learning algorithms which are inherently probabilistic. There is therefore a need for algorithms which include uncertainty in their definition and give answers in terms of that uncertainty. Questions about the most likely solution, the solution with lowest expected cost or a solution which is correct with high probability are natural here. Computational geometry is the sub-field of theoretical computer science concerned with developing algorithms and data structures for geometric problems, that is problems involving points, distances, angles and shapes. In computational geometry uncertainty is included in the location of the input points, or in which potential points are included in the input. The study of uncertainty in computational geometry is relatively recent. Earlier research concerned imprecise points, which are known to appear somewhere in a geometric region. More recently the focus has been on points whose location, or presence, is given by a probability distribution. In this thesis we describe the most commonly used uncertainty models which are the subject of ongoing research in computational geometry. We present specific problems in those models and present new results, both positive and negative. In Chapter 3 we consider universal solutions, and show a new lower bound on the competitive ratio of the Universal Traveling Salesman Problem. In Chapter 4 we describe how to determine if two moving entities are ever mutually visible, and how data structures can be repeatedly queried to simulate uncertainty. In Chapter 5 we describe how to construct a graph on uncertain points with high probability of being a geometric spanner, an example of redundancy protecting against uncertainty. In Chapter 6 we introduce the online ply maintenance problem, an online problem where uncertainty can be reduced at a cost, and give an optimal algorithm.

3 4 Acknowledgements

Acknowledgement and my gratitude are due first to my supervisor Juli´anMestre whose influence both as a mentor and as a coauthor cannot be overstated. This thesis would not have been possible without Juli´an,and his influence is apparent on every page of it. Likewise to my secondary supervisor Joachim Gudmundsson, who also served as my primary supervisor during Juli´an’s sabbatical. Joachim’s support has underpinned my entire time as a PhD student. Both personally and academically he has been a irreplaceable mentor. To my colleagues at the University of Sydney, especially to the Algorithms group, which has grown into the most enjoyable and engaging place one could imagine doing a PhD. Special thanks are due to Vikrant Ashvinkumar, Mica Brankovic, Ralph Holz, John Pfeifer, Andr´evan Renssen, Martin Seybold, William Umboh and Sampson Wong. To the members of the Geometric Computation Group of Utrecht University, who I was privileged to visit, and who strongly influenced me. Especially to my collaborators Ivor van der Hoog, Maarten L¨offlerand Frank Staals. To the anonymous examiners of this thesis for their time, their kind words and their helpful suggestions. To my parents for twenty eight years of unwavering love and support, to my friends for their love, and to Daniel, always. Patrick Sydney, June 30, 2020

5 6 Statement of Originality

This is to certify that to the best of my knowledge, the content of this thesis is my own work. This thesis has not been submitted for any degree or other purposes. I certify that the intellectual content of this thesis is the product of my own work and that all the assistance received in preparing this thesis and sources have been acknowledged. Patrick Eades June 30, 2020

7 8 Authorship Attribution

Chapters 3-6 of this thesis contain work developed and written in collaboration with my coauthors. In each case I was one of the main contributors of the paper. Authors are listed alphabetically, as is conventional in theoretical computer science. Chapter 3 will be published as: Patrick Eades and Juli´anMestre. An Optimal Lower Bound for Hierarchical Universal Solutions for TSP on the Plane. In The 26th International Computing and Combinatorics Conference (COCOON), Atlanta, USA, 2020. I was corresponding author and will present the work at the conference. Chapter 4 is published as: Patrick Eades, Ivor van der Hoog, Maarten L¨offler and Frank Staals. Trajectory Visibility. In The 17th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT), T´orshavn,Faroe Islands, 2020. I presented the work at the conference. Chapter 5 is being prepared for publication. It is joint work with Juli´anMestre. Chapter 6 is being prepared for publication. It is joint work with Vikrant Ashvinkumar, Maarten L¨offlerand Seeun William Umboh. In addition to the statements above, in cases where I am not the corresponding author of a published item, permission to include the published material has been granted by the corresponding author. Patrick Eades, June 30, 2020 As supervisor for the candidature upon which this thesis is based, I can confirm that the authorship attribution statements above are correct. Juli´anMestre, June 30, 2020

9 10 Contents

1 Introduction 15 1.1 Computational Geometry ...... 15 1.2 Uncertainty ...... 18 1.2.1 The Minimum Expected Weight Spanning Tree ...... 18 1.2.2 Uncertainty Models ...... 20 1.3 Summary of Contributions ...... 24

2 Related Work 29 2.1 The Traveling Salesman Problem ...... 30 2.2 Range Searching ...... 33 2.2.1 Orthogonal Range Searching ...... 34 2.2.2 Simplex Range Searching ...... 35 2.2.3 Semi-algebraic Range Searching ...... 36 2.2.4 Multi-level Data Structures ...... 36 2.3 Coresets and -nets ...... 37 2.4 Visibility ...... 38 2.5 Trajectories ...... 40 2.6 Trajectories and Visibility ...... 41 2.7 Spanners ...... 42 2.8 Uncertainty ...... 44 2.8.1 Clustering ...... 44 2.8.2 Minimum Spanning Tree ...... 45 2.8.3 Nearest Neighbours and Voronoi Diagrams ...... 45 2.8.4 Closest Pair ...... 46 2.8.5 Convex Hull ...... 46 2.8.6 Distance and Shape ...... 47 2.8.7 Fr´echet Distance ...... 47 2.8.8 Hyperplane Separability ...... 48 2.8.9 Range Queries ...... 48 2.8.10 -kernels ...... 48 2.8.11 Skylines ...... 48 2.8.12 Spanners ...... 49 2.8.13 Visibility ...... 49

11 12 CONTENTS

2.8.14 Dealing with Uncertainty ...... 50

3 The Universal Traveling Salesman 51 3.1 Background ...... 54 3.1.1 Previous Lower Bounds ...... 55 3.2 Preliminaries and Notation ...... 57 3.3 Logarithmic Lower Bound ...... 61 3.4 Conclusion ...... 68 3.4.1 Equal Measure ...... 68 3.4.2 Convex ...... 68 3.4.3 α-fatness ...... 68

4 Trajectory Visibility 69 4.1 Background ...... 70 4.2 Introduction ...... 73 4.2.1 Results ...... 74 4.3 Algorithms for testing visibility ...... 76 4.3.1 An O(n log n) time algorithm ...... 76 4.3.2 An Ω(n log n) lower bound...... 78 4.3.3 A linear-time algorithm ...... 79 4.4 Semi-algebraic range searching ...... 82 4.4.1 Intersecting line segments with a quadratic curve segment 84 4.5 Intersecting a convex with an algebraic curve ...... 91 4.6 A data structure for two entities moving inside a 94 4.6.1 Querying the data structure ...... 95 4.7 Two moving entities crossing edges in a polygonal domain . . . . 99 4.8 A data structure for queries with one moving entity ...... 102 4.8.1 Entity r is contained in a simple polygon ...... 102 4.8.2 Entity r can cross a simple polygon ...... 104 4.8.3 Polygonal domains ...... 108 4.9 Conclusions ...... 109

5 Uncertain Spanners 111 5.1 Background ...... 112 5.2 Problem Definition ...... 114 5.3 Algorithm ...... 115 5.3.1 Uncertain Spanners in Higher Dimensions ...... 117 5.3.2 Other Cone-Based Spanners ...... 118 5.4 A Lower Bound ...... 119 5.4.1 Order Statistics and Spacings ...... 120 5.4.2 Fractional Coupon Collecting ...... 121 5.4.3 Simulating the Discrete Problem with the Continuous . . 122 5.4.4 Extensions ...... 123 5.5 Conclusion ...... 125

6 Maintaining the Ply of Unknown Trajectories 127 CONTENTS 13

6.1 Introduction ...... 127 6.1.1 Minimizing Ply ...... 127 6.1.2 Problem Statement ...... 128 6.1.3 Prior Work ...... 129 6.2 Optimal Play is NP-hard ...... 131 6.3 An Optimal Algorithm ...... 133 6.4 Lower Bound ...... 135 6.4.1 Lower Bound for ∆ = 1 ...... 135 6.4.2 General Lower Bound ...... 136 6.5 Conclusions ...... 139

7 Conclusion 141 14 CONTENTS Chapter 1

Introduction

1.1 Computational Geometry

A defining trend of the last half-century has been incredible growth in the quantity of data collected, stored and analysed. This in turn has driven continued interest in efficient, elegant combinatorial algorithms with good asymptotic behavior and worst-case guarantees. Computational geometry is the field of computer science concerned with designing and evaluating algorithms for geometric problems. The geometry of a problem may come from the input objects (e.g. points, lines, regions) or from relationships between them (e.g. distance, angle, intersection.) In the same way that a graph is the fundamental object of combinatorics, a set of points embedded in Euclidean space is the fundamental object of computational geometry. Computational geometry is a relatively recent field within computer science. Some of the earliest progress in the field was motivated by subproblems arising from computer graphics. For example, to render the view of a camera in 3D space one must compute which objects are visible to the camera. However, computational geometry today finds applications in fields as diverse as social choice theory [110], computational ecology [52], wireless network design [107] and geographic information systems [109]. A common problem in computational geometry is to minimize a distance, subject to some constraints. A canonical problem of this type is the Euclidean traveling salesman problem. The Euclidean TSP takes as input a set of points in Euclidean space Rd, and must return a tour which visits each point exactly once before returning to where it started, and whose total length is as small as possible. This problem is known to be NP-hard, but it admits an approximation algorithm. The study of the Euclidean TSP has been a rich source of new theories and methods in computational geometry over the last century. The difference between the

15 16 CHAPTER 1. INTRODUCTION

Convex Hull Euclidean TSP Voronoi Diagram

Art Gallery Problem Range Searching Delaunay Triangulation

Figure 1.1: Some examples of classical computational geometry problems.

Euclidean TSP (which can be approximated within  [26, 149]) and the general TSP, where the distance between two points can be any arbitrary value (which cannot [129]), highlights the difference between computational geometry and theoretical computer science in general. Another common type of problem is to compute some geometric property of a set of points, for example the convex hull problem. The convex hull of a set of points is the smallest convex set containing the points, or equivalently the intersection of all convex sets containing the points. There exist many algorithms to compute convex hulls in different circumstances, and convex hulls and their related algorithms form core building-blocks of many problems and methods in computational geometry. Many computational geometry problems can be stated in a direct algorithmic version, where some input is given and an output must be computed, or in a data structure version where the input may be preprocessed and stored in memory and a sequence of queries must be answered using the precomputed data structure. For example, given a set of points one could directly compute the pair of points with the smallest distance between them, or one could preprocess the points into a Voronoi diagram [170] such that, given a query point, the closest point to the query point could be quickly computed. There is vast scholarship on both types of problems, and both types will be discussed in this thesis. There is considerable research on applied computational geometry. This field 1.1. COMPUTATIONAL GEOMETRY 17 has close links to engineering and industrial design. However, while the algo- rithms presented in this thesis have broad application as subroutines, applied computational geometry will not be directly discussed. 18 CHAPTER 1. INTRODUCTION

1.2 Uncertainty

Alongside the great increase in data collection has occurred a corresponding decrease in the quality of the data collected. The Internet of Things (IoT) and readily available and affordable sensors have enabled large scale measurement data collection. Most of the data thus collected has an element of noise, which often can be understood as randomly sampled from some probability distribution. LIDAR and GPS are examples of commonly used measurements with well-understood uncertainty models. Another source of uncertainty derives from the increasing prevalence of data which is the output of a machine learning model, such as classification or regression. Since machine learning is probabilistic at its heart, the outcomes are often given as a probability distribution rather than a definite value. A third major source of uncertainty is more traditional - the uncertainty of future data. When designing, for example, a wireless network, the demand may be estimated by statistical, or machine learning methods, or by expert analysis of contributing factors. In each case there is a degree of uncertainty involved, which can be modeled in various ways. A natural, and traditional, solution to uncertainty in the input data to an algorithm is to replace each uncertain value with a point estimate; such as the expected value or mode of the value’s probability distribution. Afterwards a conventional algorithm can be run on the now certain data. While this solution is simple, sensible and can be justified on some kinds of input and some kinds of uncertainty it can also lead to solutions which perform badly on every realization of the uncertainty, or are very sensitive to small perturbations that should be expected from the uncertainty.

1.2.1 The Minimum Expected Weight Spanning Tree Let us divert here with a concrete example to illustrate this point. Consider the minimum expected weight spanning tree problem. In this problem a set of n probability distributions P are given over Rd, such that a set of n points P ⊂ Rd may be sampled from P. The objective is to return a set E ⊂ P × P such that hP i the graph G(P,E) is connected and the expected weight of E, E (u,v)∈E |uv| , is minimized. That is, to return the spanning tree over the uncertain points which has the lowest expected weight. hP i P Notice that by linearity of expectation E (u,v)∈E |uv| = (u,v)∈E E[|uv|] so this problem can be easily solved by computing E[|uv|] for each pair of points, then finding the MST of the complete graph on P with edge weights E[|uv|]. So long as the expectation of |uv| can be computed in constant time this problem is not meaningfully more difficult than the regular MST. However, replacing each point with a point estimate will not solve even this 1.2. UNCERTAINTY 19

p1,1

p2,1 q p2,2

Figure 1.2: The expected distance between p1 and p2 is always less than between p1 and q. simple problem. Consider an input with n + 1 ≥ 7 points in R2, where a special point q always appears at the origin, and the remaining points {p1, . . . , pn} always appear on the unit circle. Point pi has a 0.5 chance of appearing at an π π angle of i ∗ n and a 0.5 chance of appearing opposite, at an angle of π + i ∗ n . Hence the expected position of each pi is the origin, and the set of point estimates contains n + 1 copies of the origin and no helpful information - any tree over the points is a valid MST of the point estimates. However, it does not follow that any tree on the points will be the MST of any particular realization of the points, in fact this is far from true. If the points are considered including their distributions, then it is becomes clear that only the star graph rooted at q is an expected minimum spanning tree. Consider the expected distance between two points p1 and p2, neither of which is the centre point q. Suppose without loss of generality that p1 realizes at its first position p1,1. Then with a 0.5 chance the distance |p1p2| = |p1,1p2,1| and with with a 0.5 chance it is equal to |p1,1p2,2|. Hence the expected value E[|p1p2] = 0.5∗(|p1,1p2,1|+|p1,1p2,2|) which, by the triangle inequality, is greater than 0.5 ∗ |p2,1p2,2| which is equal to 1, since p2,1 and p2,2 are on opposite sides of the unit circle. On the other hand the expected distance |p1q| will always be exactly 1. Hence under expectation it is always better to connect p1 to q than to any other point. This argument can be repeated to show that only the star graph rooted at q is an expected minimum weight spanning tree. See Figure 1.2 for a graphical illustration of this example. Since in many situations point estimates provide unsatisfactory results, there has been considerable interest in developing algorithms that take distributions 20 CHAPTER 1. INTRODUCTION

0.05 0.1 0.8 0.2 0.6

0.9 0.7 0.01

0.8 0.95 0.05

Figure 1.3: The most-likely convex hull of a set of existentially uncertain points with their probabilities. over possible locations as input, and output probabilistic best solutions.

1.2.2 Uncertainty Models There is increasing interest in the computational geometry community in com- bining uncertainty models and geometric problems to develop algorithms which are aware of the uncertainty in the input data. There are several ways of doing this, which have different advantages and disadvantages. Since including uncer- tainty in geometry is relatively new, there are a variety of names and definitions, often used differently by different authors. In this thesis we endeavour to use a consistent set of definitions that agree with the majority of contemporary authors. A very straightforward uncertainty model is the existential model. In this model d the input is given as a set of points {p1, . . . , pn} ⊂ R and a set of n probabilities {ρ1, . . . , ρn} where pi is considered “active” or “present” with probability ρi. The uncertain problem is to solve some deterministic problem over the set of active points. For example, to find cluster centres that minimize the expected sum of distances between active points and their nearest cluster center, or to find the set of points most likely to form the convex hull. A drawback of this model is the number of active points is typically less than n, so the deterministic subproblem is not necessarily of any particular size. This often requires handling edge cases rather arbitrarily. An alternative model is the locational model. This model guarantees realizations of size n, and satisfactorily captures many uncertainty types, especially those 1.2. UNCERTAINTY 21 from noisy measurements. In this model the input consists of a set of n probability distributions over Rd.A realization is defined as a set of n points, one sampled from each distribution. Because each point is guaranteed to realize somewhere, concepts like the distance or angle between two points are always well-defined, and are in fact random variables themselves. It should, however, be noted that they are not statistically independent random variables.

Since allowing arbitrary probability distributions can make computation infeasi- ble, a very common simplification of the locational model is the discrete locational model, where the location of each point is drawn from a discrete distribution of possible locations associated with that point. This allows computations like expectation to be performed in time proportional to the number of possible locations for any point, which is typically upper bounded by some constant. There is a close connection between the discrete locational model and the existential model, since they both result in a set of possible locations, each of which will contain a point with some fixed probability. The key difference is the realization probabilities are not independent in the locational model - each point must realize in exactly one location. Sometimes an algorithm designed for one model can be easily adapted to the other, but this is not always true. Often special algorithms are required for each model. This statistical dependence is also the distinction between the locational model and algorithms for random graphs. While a set of locationally uncertain points and their pairwise distances do indeed form a complete graph with computable, random edge weights, those edge weights are not independent. To properly understand how the edge weights interact an understanding of the geometry is required. Treating the input as a random weighted graph is typically not viable for anything but the simplest problems. Because of its popularity we will refer to the discrete locational model as the locational model, and specify the continuous location model where appropriate. An alternative model is to disregard probability entirely, and consider a worst-case or adversarial model. This is the most traditional and well studied uncertainty model in computational geometry, often called the imprecise points model. In this model a set of points {p1, . . . , pn} are given as a set of regions {R1,...,Rn} where d each Ri ⊂ R . Once the algorithm has computed a solution, an adversary is allowed to place each point pi anywhere inside Ri and the quality of the resulting solution is evaluated. Typically the Ri are restricted to sensible geometric objects such as balls, axis-aligned boxes or line segments. A final model is the black box model, common in many fields that involve random variables. In this model the distribution over possible inputs is completely hidden from the algorithm and can only be accessed through a sampling oracle. In this model the algorithm must analyze as many sampled inputs from the oracle as it requires, then output a solution. The quality of the solution is evaluated by how well it performs on future sampled inputs. 22 CHAPTER 1. INTRODUCTION

Figure 1.4: The expected minimum weight spanning tree for a set of locationally uncertain points, drawn for a specific realization. Notice that it is not a minimum weight spanning tree of the realized points.

p2

p1

p3

p4 p5

Figure 1.5: The shortest pair between a set of uncertainty regions. Even though the uncertainty regions for p2 and p3 are the closest, adversarially the closest pair is between p3 and p5 as shown. 1.2. UNCERTAINTY 23

Figure 1.6: A 3-center clustering computed by sampling a set of uncertain points from a black box many times, then computing a clustering on their union.

A natural question here is how many samples are sufficient to be confident the computed solution will work well for future inputs? This question is firmly within the realm of machine learning, and won’t be discussed further in this thesis. However there are many other interesting geometric questions. How solutions computed on individual sampled solutions can be combined into a solution for future samples, for example. Getting reliable results normally requires computing solutions on huge numbers of samples. Often these samples are very similar to each other so very similar computations are performed many times. To improve efficiency and allow more sampling (and thus better results) there is a need for computational geometry data structures which allow computation to be reused and each sample to be processed as quickly as possible. 24 CHAPTER 1. INTRODUCTION

1.3 Summary of Contributions

In this thesis we discuss a collection of results in each of the uncertainty models discussed above. We introduce several new problems and provide new algorithms, new data structures and new lower bounds. We also discuss many possible avenues for future research in this area. In Chapter 2 we will review the existing relevant literature. We include discussion both of uncertainty in computational geometry, and a background of methods and techniques used in this thesis.

Chapter 3 In Chapter 3 we will discuss a particular type of adversarial model known as universal solutions, and describe the Universal Traveling Salesman problem (UTSP). The UTSP was formulated in 1989 by Platzman and Bartholdi who were motivated by a Meals-on-Wheels driver who must deliver food daily to a fixed list of recipients; however, each day a subset of the clients do not require meal delivery and may be skipped. Platzman and Bartholdi wondered whether it was possible to order the complete list of clients in such a way that on any day those clients who did need service could be visited according to their order on the list, and regardless of which clients needed service the delivery driver’s path was never too far from optimal. Specifically they asked if given n points P on the unit square [0, 1]2 there existed a total ordering of P , , such that for each subset S ⊂ P the cost of visiting each point of S in the order given by  is within a factor of the shortest tour visiting each point of S in any order. Platzman and Bartholdi answered in the positive, giving an ordering induced by the Sierpinski curve which they proved to always be within O(log n) of optimal. They further conjectured the bound could be improved by tighter analysis or an alternative ordering. This type of problem is now known as a universal solution; the name derives from one single solution (the total ordering) approximately solving an entire class of problems (the TSP on any subset S). Universal solutions and the universal TSP in particular have attracted considerable research, which will be discussed later. q  6 log n In 2006 Hajiaghayi et al. showed a general lower bound of Ω log log n for the UTSP, regardless of which ordering was used. In this Chapter we define a new family of total orders on the plane, which we call hierarchical total orders. Intuitively, a hierarchical order is one that can be derived by partitioning the unit square into fat, convex regions and ordering the regions, then recursing to order the points inside each region. Because of the good locality of hierarchical total orders, and because of the connection between total orders of the unit square and recursively defined space-filling curves, hierarchical total orders are particularly well suited for use in the UTSP. 1.3. SUMMARY OF CONTRIBUTIONS 25

In fact, hierarchical total orders capture all the total orders commonly used for the UTSP. We prove a Ω(log n) lower bound for any UTSP solution using a hierarchical total order, confirming the long-held conjecture that the Sierpinski and Hilbert total orders are optimal in this category. There remains a gap between the fully general Hajiaghayi et al. lower bound, and the O(log n) performance achieved by the best hierarchical orders. We are not familiar with any non-hierarchical ordering that achieves O(log n) performance. To this end we develop several new techniques and definitions relating to fat, convex partitions and random lines through the unit square.

Chapter 4 In Chapter 4 we will consider the trajectory visibility problem, that is determining visibility between two entities, moving along piecewise linear trajectories through an environment containing visibility-obstructing walls. This problem is motivated by an application in computational ecology. Re- searchers tracking the movements of certain species of monkeys via special GPS-enabled collars periodically observe discrete changes in the monkeys’ behav- ior; the researchers believe this may have been caused by a monkey coming in visual range of a monkey from another group. The main obstruction to visibility is the jungle environment, which can reasonably be modelled as a collection of obstacles which block sight-lines, but can be freely moved through by the monkeys. Because the collars worn by the monkeys have limited battery life, GPS readings can only be taken at fairly infrequent intervals; hence the exact position of the monkey between two GPS readings is uncertain. Various statistical models can be used to describe uncertain interpolation between two fixed points; a Brownian bridge is commonly used in ecology, because it has been extensively studied in many contexts and has several convenient properties. We treat the movement of each monkey as a black box, from which a piecewise linear, time-stamped trajectory can be sampled. This sampling can be the result of linearly interpolating between each GPS coordinate, simulating a Brownian motion, taking random walks on a grid graph, or any other method. Our problem then reduces to computing visibility between a pair of trajectories. Surprisingly, given the importance and popularity of both trajectory data and visibility queries, this is the first work to our knowledge addressing this problem. We provide simple algorithms to compute visibility when the trajectories are inside a simple polygon, where the obstacle is a simple polygon but the trajectories are allowed to intersect the polygon’s boundary, and in full generality in a polygonal domain. We also include an argument that the running time of these algorithms is optimal. 26 CHAPTER 1. INTRODUCTION

While the monkeys may move in many different ways, the terrain through which they move is essentially fixed. Black box sampling two trajectories many times to estimate the probability two monkeys saw each other involves many visibility problems calculated in the same obstacle domain. It would be convenient then to preprocess the obstacles into a data structure such that querying visibility between two trajectories is sublinear in the complexity of the obstacle domain. We achieve this goal using a collection of fundamental geometric data structures, including range searching, shortest path and point location. We provide sublinear algorithms for all variants of the trajectory visibility problem and in doing so develop some new techniques of independent interest, involving intersections between convex and second degree polynomial curves.

Chapter 5 In Chapter 5 we will introduce the uncertain geometric spanner problem and present an algorithm and an argument our algorithm is optimal for any cone- based spanner. Given a set of points P in Euclidean space, a geometric spanner is a graph dG(u,v) G(P,E) such that for each pair of points u and v the dilation |uv| , that is the ratio of the graph distance and the Euclidean distance between the points, is at most some fixed τ. Since E = P ×P would trivially satisfy this requirement, it is furthermore required that the number of edges |E| is small. Geometric spanners with only |E| = O(|P |) edges can be constructed in several ways, including the greedy spanner, the well-separated pair decomposition spanner, the Yao graph and the θ-graph. The uncertain geometric spanner problem takes as input a set of locationally uncertain points P, a target spanning ratio τ and an acceptable failure probability ρ. The problem is to output a graph G(P,E), such that |E| is not too large and when some P is sampled from P, the probability that the induced graph G(P,E) is a τ-spanner is at least 1 − ρ. This problem has some similarity to universal solutions - we are seeking a solution which induces a whole class of good solutions. However in this case the input is randomly chosen from a distribution, rather than adversarially, and we are allowed to fail on some possible realizations, so long as this happens with low probability. We provide an algorithm using Yao graphs to build such a spanner using only O(n log n) edges, which is to our knowledge the first algorithm to solve this problem. We give an argument that any cone-based spanner cannot solve this problem using only o(n log n) edges.

Chapter 6 In Chapter 6 we introduce the online ply maintenance problem, which models uncertainty adversarially as an online problem. 1.3. SUMMARY OF CONTRIBUTIONS 27

Given a set of geometric regions, the local ply at a given point is the number of regions intersecting that point, and the ply is the maximum local ply at any point. Ply is of interest in computational geometry because it is a rough measure of how intersecting a set of regions are. The ply of a set of imprecise points’ uncertainty regions represents a rough measure of the entropy of the uncertainty. Areas with high ply may have points appear in any configuration, whereas sets of regions with low ply are restricted in how they can be arranged. Many algorithms which process uncertainty regions have performance guarantees that depend on the ply, so it is preferable to have as low ply as possible. In the online ply maintenance problem, the input consists of a set of entities, moving at a maximum speed of 1, and a maximum acceptable ply ∆. However, the exact trajectories of the points are not available to the algorithm; instead any point can be queried at any time to discover its exact location at that time. Otherwise all that is known is the point is within a growing uncertainty region around its last known location. Making a single query has unit cost, and any amount of queries can be made, instantaneously, at any time. The algorithm must decide which entities to query, at which times, to ensure the ply of the uncertainty regions is never more than ∆ for longer than an instant, while minimising the cost (that is, the total number of queries.) To fairly evaluate the algorithm’s performance the cost is compared to the cost of an adversary playing the same game who has full knowledge of the points’ trajectories and unlimited, instantaneous computation at any time. We provide a simple algorithm that never makes more than ∆ + 1 times the number of queries the adversary has made, at any point in time. We prove 1 this is optimal for deterministic algorithms, and within a 2 factor of optimal for randomized algorithms. We also prove the adversary’s problem of deciding which points to query is NP-complete. 28 CHAPTER 1. INTRODUCTION Chapter 2

Related Work

We will now take some time to survey the existing literature on uncertainty in computational geometry, and introduce the methods and theorems which will be required for the chapters that follow.

29 30 CHAPTER 2. RELATED WORK

Figure 2.1: A traveling salesman tour of some locations in the plane

2.1 The Traveling Salesman Problem

Given a set of locations and their pairwise distances, the traveling salesman problem is to find the shortest tour that visits each location exactly once and returns to the starting location. The TSP is a fundamental problem in computer science that has been extensively studied. The exact origins of the traveling salesman problem are unknown, but the problem dates back to at least 1832, and there is evidence it was discussed amongst mathematicians even earlier. The first comprehensive treatment of the problem was developed by Karl Menger in the 1930s [169]. The TSP is known to be NP-complete [128] on general graphs. It remains NP-complete even when restricted to graphs embedded in the Euclidean plane with edge weights equal to their length [156]. The Eulerian tour on the Minimum Spanning tree of a graph is known to be a 2-approximation of the optimal TSP tour on that graph. A celebrated algorithm of Christofides [74] builds on this idea and improves the approximation ratio to 1.5 for general metric spaces, which remains the best upper bound in this setting. When restricted to a Euclidean metric Arora [26], and independently Mitchell [149], give a polynomial time approximation scheme (PTAS). Interestingly, the metric 123 TSP cannot be approximated within a factor of 122 unless P = NP [129]; in contrast to the PTAS available for the Euclidean TSP this suggests the metric TSP is a fundamentally more difficult problem. The Universal Traveling Salesman Problem (UTSP) was introduced in 1982 by Bartholdi and Platzman [30] and later formalized by the same authors [160] in 1989, in response to a need for faster performing TSP algorithms. The motivating application involved a food delivery service whose subset of clients requiring delivery changed every day, requiring the delivery tour to be efficiently re- calculated. Platzman and Bartholdi used the ordering induced by the Sierpinski space-filling curve, and proved the competitive ratio for this ordering was O(log n). 2.1. THE TRAVELING SALESMAN PROBLEM 31

They further conjectured that the bound could be tightened to O(1) by improving the analysis. This was refuted later the same year in a note by Bertsimas and Grigni [37], who provided Ω(log n) examples for the orderings induced by the Sierpinksi, Hilbert and zig-zag curves. In their lower bound examples Bertsimas and Grigni carefully place a line across the unit square, and points evenly along the line, such that the UTSP must backtrack repeatedly over itself. They demonstrate this pattern can be found recursively on any section of the line so long as there are enough points on that section to form a backtrack, which leads to a Ω(log n) sum. This approach has inspired Hajiaghayi et al. [116] and our own lower bound. The first general lower bound for all total orderings of the plane was provided by Hajiaghayi et al. [116] who considered points on a grid-graph to show a lower q  6 log n bound of Ω log log n . Hajiaghayi et al. adapt the method of Bertsimas and Grigni and place points along a line. To overcome not knowing the ordering in advance they employ the probabilistic method and show the expected backtracks along a random line are logarithmic. However the increased complexity of bounding the expectation in this setting results in losing some tightness in the lower bound. Hajiaghayi et al. further conjecture their method could be improved to provide a tight lower bound of Ω(log n). The UTSP has also been defined on a finite metric space with m elements, instead of Euclidean space. In this setting the UTSP problem is to find a master tour of the m elements, which induces a total order. Competitive ratios in this model are generally described in terms of m, the size of the metric space, rather than n, the number of points in the tour, which can be significant if n  m, as in the case of the Euclidean plane. The first upper bound in the finite metric space setting is due to Jia et al. [122],  log4 m  who showed a competitive ratio of O log log m . An elegant construction of Schalekamp and Schmoys [168] shows the UTSP has O(1) competitive ratio on tree metrics and therefore, using Bartal’s tree embedding [29, 95], an expected O(log m) performance on finite metric spaces. The current best known deterministic upper bound in this setting is O(log2 m), which follows from the well-padded tree cover of Gupta et al. [114]. For lower bounds in finite metric spaces Gorodezky et al. [106] construct a set with competitive ratio Ω(log m) by taking points from a random walk on a Ramanujan graph. Bhalgat et al. [38] show the same bound with a simpler construction using two random walks; their construction has some nice additional properties. A recent result of Christodoulou and Sgouritsa [73] provides an ordering of the  log m  m × m grid-graph with competitive ratio O log log m , disproving a conjecture of Bertsimas and Grigni [37]. Their result is interesting in the context of the 32 CHAPTER 2. RELATED WORK

UTSP on finite metric spaces, but as their bound depends on m it cannot be extended to the Euclidean plane. The generalized Lebesgue orderings used by Christodoulou and Sgouritsa are a grid-specific version of a hierarchical ordering, as discussed in Chapter 3. 2.2. RANGE SEARCHING 33

Figure 2.2: Three examples or range searching problems: (top left) orthogonal, (top right) simplex (bottom left) balls and (bottom right) semi-algebraic. In each case the orange query selects the same set of points.

2.2 Range Searching

In Chapter 4 we heavily employ the technique of range searching on semi-algebraic sets, introduced by Agarwal and Matouˇsekin 1994 [14]. Semi-algebraic range searching will be described in more complete technical detail in that chapter. But let us here introduce the history and background of range searching in general, and semi-algebraic range searching in particular.

The range searching problem is to preprocess a set of points, such that given a geometry query set known as a range, the points intersecting the query may be efficiently counted or reported. Range searching data structures are a core building block in computational geometry with broad applications, and can be seen as a geometric generalization of binary search trees.

Different types of ranges require different data structures; possible ranges include axis-aligned boxes, balls, half-spaces or more complicated algebraic sets. Data 34 CHAPTER 2. RELATED WORK structures exist for the problem of reporting the points intersecting a range, reporting the number of points intersecting a range, reporting the total weight of points intersecting a range, or checking if the range intersects any points (range emptiness). We will include results of all varieties in this section. It should be noted that any data structure which reports the points intersecting a range cannot do so faster than the number of points it is reporting. Hence it is common to see running times of the type O(log n + k), where k is the number of points reported. In discussing the results in this section we omit the k and consider only the time to run the query. For a more comprehensive treatment of this vast topic the reader is directed to surveys; by Matouˇsek[148], by Agarwal and Erickson [9] and by Agarwal [5], which contain most of the important results in this area.

2.2.1 Orthogonal Range Searching A very natural variant of range searching is orthogonal range searching where the query ranges are axis aligned boxes. Questions of this kind arise very easily in practice, especially in databases. For example a bank may be interested in accounts containing between $10,000 and $50,000, opened between 01/01/2020 and 01/07/2020, where the account holder is between 30 and 50 years old. Each of these constraints asks if a value is within an interval and so together form a three dimensional box in the value-date-age parameter space. Notice there is nothing especially geometric about this question. While bank accounts can certainly be modelled as points in R3 and the query ranges are geometric objects, there is no interaction between the different dimensions, and no particular notion of distance, angle or shape. This problem has more in common with binary searching on a sorted list than it has with geometric problems. Nonetheless because of its connection to more geometric kinds of range searching, and its general usefulness, it still deserves a mention here. The most popular technique for orthogonal range searching is the range tree, published by Bentley in 1980 [34]. A d-dimensional range tree stores a set of points as the leaves of a balanced binary search tree sorted by the first coordinate, such that each non-leaf vertices is associated with the set of points corresponding to the leaves below that vertex in the tree. At each vertex a d − 1-dimensional range tree is built recursively on the points associated with that vertex, now sorted by the second coordinate. Range trees require O(n logd−1 n) space and construction time, and answer queries in in O(logd n) time. Using Chazelle and Guibas’ fractional cascading [67, 68] to reuse some repeated computation the query time may be reduced to O(logd−1 n)[82, 188]. Range trees are a simple instance of a multi-level data structure, a topic we will also survey in this chapter because of its importance in Chapter 4. Another option for orthogonal range searching are k-d trees, also proposed by Bentley [33]. In k-d trees, a median point in the first coordinate is chosen as the 2.2. RANGE SEARCHING 35 root of the tree, the points to on each side of the median from the left and right children of the root, with a median now chosen in the second coordinate. The points are recursively split according to a median point, cycling through which dimension to split on, until each point has been chosen as a median. A k-d tree can be constructed in O(n log n) time and uses only O(n) space. The reduced 1− 1 space comes at a cost however, and query times are O(n d ) After the initial surge of interest following Bentley’s papers in the formative years of computational geometry, research on orthogonal range searching has from a theoretical perspective largely slowed down. There is still some active research, largely focused on improving logarithmic factors, alternative memory models and extensions to the problem [62, 63].

2.2.2 Simplex Range Searching

In computational geometry a simplex in Rd is typically defined as the intersection of d+1 half-spaces. Unlike the definition common in other fields (e.g. in algebraic topology) this allows simplicies which are unbounded, and most notably includes half-spaces in the definition. Simplex range searching is the problem of processing and storing a set of points such that, given a query simplex, the points intersecting the query can be efficiently counted or reported. Unlike orthogonal range searching, which is primary concerned with order, simplex range searching is a genuinely geometric problem, and has been the subject of extensive research in computational geometry. The standard approaches involve building a tree by recursively dividing the space into regions, such that the boundary of a query does not intersect too many regions. There is a natural trade-off between low space or low query time data structures, and a lower bound showing this trade-off cannot be avoided. The best low space solution to simplex range searching uses the partition trees of Willard [187]. Matouˇsekuses these trees [145], later improved by the same author [147], to build a simplex range searching data structure with O(n) storage 1+ 1− 1 and O(n ) preprocessing time, which answers queries in O(n d ). A more recent paper by Chan [61] simplifies this result and reduces the preprocessing time to O(n log n). Chan’s method is randomized, so there are situations in which the earlier result may be preferable. If O(nd) space is allowed then Matouˇsek[147] achieves O(logd+1 n) query time with O(nd log n) preprocessing. Chazelle et al. [70] use O(nd+) space but achieve O(log n) query time. These methods may be combined in arbitrary proportions, so that using m space d n d+1 m (for m ∈ [n, n ]) allows queries in O( 1 log ( n )) time [147]. m d n Chazelle provides a lower bound of Ω( 1 ) on any query on a data structure m d log n using m space [64], this bound also holds with high probability when the points 36 CHAPTER 2. RELATED WORK are randomly sampled. Chazelle’s lower bounds hold in the arithmetic model which only counts the number of additions required to compute the range count, and thus under-estimates the total time needed. A significant special case of simplex range searching is half-space emptiness which, given a query half-space H, asks if the intersection of H and P is empty. In this setting Matouˇsek[146] gives a linear size data structure which answers queries 1 1− d in O(n 2 polylog n), for even dimensions the polylog term can be removed [5].

2.2.3 Semi-algebraic Range Searching The ranges considered so far have all been polygonal, and it makes intuitive sense that ranges with straight edges would be easier to deal with. However there are many situations (including ours in Chapter 4) which require curved ranges. A common general model of such ranges is semi-algebraic sets or Tarski cells. These are sets defined by constantly many unions, intersections and compliments of polynomial inequalities of at most constant degree. Semi-algebraic range searching was first studied by Yao and Yao [194] who introduced linearization, a method by which a semi-algebraic query can be transformed into a half-space query, at the cost of increasing the dimension of the ambient space. This method is very general, but potentially increases the dimension so drastically that range searching methods become ineffective. Agarwal and Matouˇsek[14] instead view the problem as point location in an 1− 1 +γ arrangement of algebraic surfaces, and use linear space to give O(n 2d−3 ) query time. This problem has been revisited many times, notably by the original authors and Sharir [15]. The current best query time using linear space is 1− 1 O(n d polylog(n)).

2.2.4 Multi-level Data Structures The methods so far described (broadly) involve decomposing the point set into regions, and recursively decomposing each region. Then each internal node of the tree represents a subset of vertices, and the points selected by some query are the union of O(log n) subsets. It is possible to build another data structure for each internal node on the points associated with that node. Then queries can be chained together (e.g. “count all points inside this ball which are also inside this simplex”) and take time log n times the maximum of the two query times. This is known as a multi-level data structure. While this idea has been used implicitly many times it was first explicitly described by Dobkin and Edelsbrunner [85], and has been used extensively since then. This is exactly the method by which range trees can chain together 1-dimensional queries, and simplex range searching can be performed by repeated half-space queries. 2.3. CORESETS AND -NETS 37

2.3 Coresets and -nets

We will now briefly discuss the notion of coresets, which provide an efficient way of approximating many geometric questions on large point sets. The basic concept of this section is to replace a large set of points with a smaller set that has similar properties. This is a particularly attractive technique for uncertain point sets, where the set of possible realizations of the points has exponential size; if a small number of realizations could be used instead the problem often becomes tractable using traditional techniques. Such arguments have been used by Huang and Li [120], Huang et al. [121], Abdullah et al. [1] and others. The simplest version of this idea is -samples, also sometimes called -approxi- mations. An -sample can be thought of as a smaller point set that approximates the range counting problem described in the previous section. Given a set of points P and a set of ranges R, Q ⊂ P is an -sample if for all r ∈ R:

|Q ∩ r| |P ∩ r| − ≤  |Q| |P |

An -sample can often be constructed if the ranges in question are not too complicated. The VC-dimension, named for Vapnik and Chervonenkis, is a measure of the complexity or expressiveness of a set of ranges. It has broad applications, particularly in machine learning, which are too numerous to include here. Vapnik and Chervonenkis [179] prove that -samples can be constructed by c d 1  taking a random subset of size 2 d log  + log δ , where d is the VC-dimension of R, δ is an acceptable probability of failure, and c is a constant. Anthony and Bartlett [23] give a slightly improved bound. A weaker, but often more useful notion is the -net, which approximates emptiness. Given P and R as before, Q ⊂ P is an -net if, for all r ∈ R, |r ∩ P | ≥ |P | implies that |r ∩ Q|= 6 ∅. That is, all ranges with many points stay non-empty. Haussler and Welzl [118] show an -net can be constructed with probability 1 − δ 4 2 8d 8d by randomly sampling max(  log  ,  log  ) points from P , where d is again the VC-dimension. The idea of spending near-linear time to reduce a large set of points to a smaller set which approximates some question within  is called constructing a coreset, and has been applied to a wide variety of problems. More sophisticated problems than those discussed above require more sophisticated coresets. The -kernel of Agarwal et al. [11], with improvements by Chan [60], approximates the directional width of a set of points and hence approximates a range of geometric problems. It is often considered the definitive coreset. The results in this section are treated extensively in the book by Har-Peled [117], and surveys by Phillips [159] and Agarwal et al. [12]. 38 CHAPTER 2. RELATED WORK

Figure 2.3: The visibility polygon of a point in a simple polygon

2.4 Visibility

Visibility is a concept which is central to computational geometry, both practical and theoretical, and has featured extensively throughout every era of compu- tational geometry research. The most simple visibility problem is this - given two entities (represented by points) in some environment containing a number of obstacles, is the straight line between the points intersected by any obstacle? If it is not, then we may say the two entities are mutually visible. Visibility problems have been studied in computational geometry [150, 161, 186] and in adjacent fields such as computer graphics [90], GIScience [100], and robotics [150]. Core visibility problems in computational geometry include ray shooting [13, 66, 81], guarding [75, 99, 166] and visibility graph recognition [58]. A visibility problem well-known outside computational geometry is the celebrated  n  art gallery theorem of Chvatal [75], which proves that 3 guards are always sufficient and sometimes necessary to be able to see the entire interior of a polygonal art gallery. Visibility is closely related to the problem of shortest paths, since two objects are mutually visible if the shortest obstacle-avoiding path between them is a straight line. Of particular importance to our work in Chapter 4 is a result of Guibas and Hershberger [112] which shows how a simple polygon may be preprocessed such that shortest paths between two query points can be found in O(log n) time. We discuss this result further in that chapter. The visibility polygon of some point p is the star polygon consisting of every point visible from p. Using similar techniques to Guibas and Hershberger, Aronov et al. [25] give a data structure which returns the visibility polygon of a query point, and a kinetic version which can be updated as the point moves. This work will also be considered in more detail in Chapter 4. 2.4. VISIBILITY 39

Comprehensive treatments of visibility in computational geometry include a book on art gallery theorems by O’Rourke [155] (especially chapter 8), a book on visibility in the plane by Ghosh [105] and survey papers by Durant [90] and Ghosh and Giswani [104]. 40 CHAPTER 2. RELATED WORK

Figure 2.4: Three trajectories, and a hotspot where they spend a lot of time

2.5 Trajectories

A recent, and increasing trend in computational geometry is an interest in not just spatial data, but spatio-temporal data. In this model we are concerned not with points in space, but with entities which move through space over time. These movements are typically represented as a sequence of time-stamped points in Rd, referred to as trajectories. The advent of the Internet-of-Things, and the proliferation of GPS-enabled devices, has created a vast quantity of trajectory data, and thus demand for efficient algorithms and a solid theory of trajectories. Applications are found in GIScience and databases, and in further away fields such as ecology and meteorology. Common sources of trajectory data include the movements of animals [45, 55, 115], hurricanes [173], traffic [140] and more [88]. Common trajectory problems in computational geometry relate to classifying trajectories, or extracting features from a set of trajectories. Clustering under various distance metrics, finding flocks of entities moving together [22, 32, 136], detecting frequently visited locations known as hotspots (see Figure 2.4), and inferring road maps based on traffic data (known as map construction) are all areas of ongoing research [47, 101, 102, 137, 181]. Particular mention should be made of the (discrete) Fr´echetdistance, often casually called the “dog walker’s distance”, which is the most commonly used similarity measure between pairs of trajectories. The Fr´echet distance’s combi- nation of usefulness and nice mathematical properties has made it the subject of a vast body of research. It is not possible to do this wide topic full justice here. As a starting point the reader is directed to the recent survey by Gudmundsson et al. [109]. 2.6. TRAJECTORIES AND VISIBILITY 41

2.6 Trajectories and Visibility

Despite the importance of both visibility and trajectories to computational geometry there does not exist much research on the combination of these two topics. In Chapter 4 we make some progress towards developing methods for computing visibility between two trajectories. In this section we will survey the existing research in this area. One reason for the limited current understanding of trajectory visibility is that the already developed tools for visibility and trajectory analysis cannot be combined in a straightforward manner. Consider two trajectories q and r within a simple polygon P : existing visibility tools allow us to easily check if there are subtrajectories of q and r which are mutually visible. However, the two moving entities see each other only if there is a time at which the two entities are simultaneously within two mutually visible subtrajectories. There could be quadratically many pairs of subtrajectories which are mutually visible; however, it could be that the two entities are never simultaneously within such a pair. To determine visibility between moving entities, one needs to incorporate the concept of time into pre-existing tools for visibility. The majority of existing research on trajectory visibility concerns kinetic data structures. A data structure representing the visibility polygon of a point p is created, and updated as p moves through the environment. Bern et al. [35], Mulmuley [151] and Aronov et al. [25] have produced research along this line. Results on maintaining the shortest path between two moving entities such as by Diez et al. [84] can be used to track visibility between two moving entities. A core feature of these kinetic methods is they are fundamentally event-based, the time taken to maintain the data-structure as a point moves depends on the number of events that occur along the way. (Typically an event is a change in the combinatorial structure, or arrangement, of the points and their environment). This approach is beneficial if the number of events is small, but in some cases the number of events may be proportional to the complexity of the environment. In this case very little is gained by employing such a data structure - since computing the polygon from scratch in the new location may be quicker and easier. Considerable research exists on the related problem of trajectory planning under visibility constraints, see for example Shkurti and Dudek [171]. In their problem two collaborative robots must move through a terrain while maximizing the time they are mutually visible, subject to some constraints on their movement. This problem is closely related to pursuit-evasion games for robots, for example see Bhattacharya and Hutchinson [39]. 42 CHAPTER 2. RELATED WORK

Figure 2.5: A geometric spanner of a set of points in the plane

2.7 Spanners

A key concept for our work in Chapter 5 is the geometric t-spanner. We will give a brief background of this rich area of research here. A geometric spanner, or t-spanner, of a set of points P in some metric space is a graph G(P,E) with edge weights equal to the metric distance, such that for each pair of points u and v the length of the shortest (u, v) path in G, denoted dG(u, v), is no more than t times the metric distance between u and v, d(u, v). We call t the spanning ratio of G. Since the complete graph on P is trivially a 1-spanner we introduce some restrictions on E, typically that |E| = O(|P |), where the constant factor hidden by big O notation might depend on t. Research on spanners in computational geometry extends back to the second SoCG in 1986, where Chew [71] shows that the Delaunay triangulation in the 2 L1 metric of a set√ of points in R gives a planar graph with O(n) edges and spanning ratio 10 ≈ 3.16. Chew [72] similarly shows that the triangular- distance Delaunay triangulation is a 2-spanner.

Dobkin et al. [86] show that the Euclidean√ Delaunay triangulation is also a π spanner, with a ratio no more than (1 + 5) 2 ≈ 5.08. This ratio has been the source of substantial research, with Keil and Gutwin [132] reducing the upper bound to 2.42 and Xia [189] reducing it further to 1.998. Lower bounds include 1.5846 by Bose et al. [42] and 1.5932 by Xia and Zhang [190]. The greedy spanner algorithm incrementally builds a spanner by considering each pair of vertices in ascending order of distance and adding an edge between any pair (u, v) if there does not already exist a spanning path between u and v. The greedy spanner was introduced by Alth¨ofer et al. [21] with an acknowledgement that it had been independently described by Bern in an private letter. It is known that greedy spanners have O(n) edges, O(1) maximum degree and total weight proportional to the Minimum Spanning Tree, all of which are asymptotically optimal. The current most efficient algorithm, due to Bose et al. [41], constructs 2.7. SPANNERS 43 the greedy spanner in O(n2) space and O(n2 log n) time, which is within a log factor of optimal. Alewijnse et al. [20] give an alternative construction in only O(n) space, and O(n2 log2 n) time. Another common construction is the so-called WSPD-spanner.A well-separated pair decomposition (WSPD) of a set of points P is a set of pairs (Ai,Bi), where Ai,Bi ⊂ P , Ai and Bi are far apart relative to their diameter, and each pair (p, q) ∈ P × P appears in exactly one pair (Ai,Bi). The WSPD was introduced by Callahan and Kosaraju [57], presenting material from the PhD thesis of Callahan [56]. The same authors show that constructing a WSPD immediately gives a t-spanner with O(n) edges. The fourth major kind of spanner, and the most relevant to our work in Chapter 5, is the cone-based spanner. In this construction, for each point p, the plane is partitioned into m equal-angular cones with vertex p, and an edge is added between p and the “closest” point in each cone. Variants exist depending on which notion of “closest” is used. Yao [193] defined Yao graphs as connecting each point to the nearest neighbour in each of its cones. Clarkson [76] and Keil [131] define the Θ-graph by projecting the points in each cone onto the bisector of the cone, and connecting to the closest according to that projection. Both forms are known to be spanners, and have been studied extensively. The Yao graph has the advantage of a more natural definition, while the Θ-graph can compute the point projections once for each direction and reuse the sorted lists of points, allowing for faster computations. The study of spanners is too broad to satisfactorily survey here. See the book by Narasimhan and Smid [153] and surveys by Bose and Smit [44], Eppstein [93], Gudmundsson and Knauer [108], and Smid [172]. 44 CHAPTER 2. RELATED WORK

2.8 Uncertainty

As discussed in the introduction uncertainty is a relatively new topic in compu- tational geometry. In this section we will make a survey of research in this area. Most problems involving uncertain points are adaptations of classical problems defined on deterministic points, we have organized the results in this section according to the deterministic problem being addressed. This field of research is still developing, and overarching methods, concepts and theorems are still being developed. As a consequence most papers solve one problem in one model, rather than advance a unified theory. This section is therefore similarly a catalogue of relevant results, rather than a cohesive historical narrative. The earliest treatment of uncertainty in computational geometry is the -geometry of Salesin et al. [167] which was motivated by the problem of dealing with limited precision arithmetic. This was followed by problems on imprecise points, which are constrained to appear somewhere in a geometric region. We will survey some results from this model in this section. A broad treatment of this topic can be found in the thesis of L¨offler[141]. More recently there has been considerable interest in existentially uncertain points which appear or do not appear according to a probability [98, 135, 182], and locationally uncertain points whose position is drawn from a probability distribution [53, 142, 175]. Both kinds are surveyed extensively in the following. Naturally, uncertainty has also been studied in other research communities. A notable example is optimization, where stochastic optimization dates back at least as far as Dantzig [80]. Topics such as robust optimization are also actively researched, see the survey by Bertsimas et al. [36]. Optimization problems are also studied on random graphs, for example by Bansal et al. [27]. However geometric problems typically require specialized algorithms, and so we will focus our attention to within the computational geometry community.

2.8.1 Clustering The three clustering problems of most interest in computational geometry are k-means, k-medians and k-centers problems. In each case the input consists of a set of points, and a set of k cluster centers must be chosen minimizing respectively: the sum of the squared distances, the sum of the distances and the maximum distance between each point and its closest cluster center. The three problems have been studied extensively from both theoretical and practical perspectives. Because of clustering’s importance in data mining and databases, the problem of clustering uncertain or unreliable data has been studied in those communities, for examples see surveys by Aggarwal [16] and also with Yu [17]. 2.8. UNCERTAINTY 45

The first research in computational geometry on uncertain clustering is by Cormode and McGregor [78], who use a combination model, where each point can appear in one of constantly many locations, or not appear at all. In this model they seek k-means, k-medians and k-centers solutions which minimize the expected cost. By linearity of expectation, the linear costs of k-means and k-medians can be handled similarly to the deterministic case. Cormode and McGregor are able to approximate k-centers within a factor of (1 + ) using O(k−1 log2 n) cluster centers, or within a constant factor using 2k cluster centers. Guha and Munagala [111] show how to construct a constant factor approximation to k-centers without using more than k cluster centers. Munteanu et al. [152] restrict to the 1-center problems (also called smallest enclosing ball) and provide a (1 + )-approximation. Wang and Zhang consider the k-center problem in some restricted settings and solve the 1-center problem exactly on a tree [184] and the k-center problem exactly in 1 dimension [183]. They also compute the minimum number of centers needed to ensure a given k-centers cost exactly, where the input is on a tree [185]. Huang and Li [120] provide a (1 + )-approximation to the k-center problem in full generality, in both the existential and locational models. Their method involves repeatedly reducing the size of the problem using specialized coresets, to the point where it can be brute-forced in polynomial time. While their algorithm is difficult to implement and has large constants, it remains the best theoretical result for this problem. L¨offlerand Phillips [142], also with Jørgensen [123], describe the related problem of shape fitting on locationally uncertain points. They demonstrate how to approximate the distribution of geometrically simple summary statistics of an uncertain point set, such as the perimeter of the smallest axis-aligned bounding box, or the radius of the smallest enclosing disk.

2.8.2 Minimum Spanning Tree Kamousi et al. [126] give a fully randomized polynomial approximation scheme (FPRAS) for the expected cost of the minimum spanning tree under the existential model and show that computing it exactly in both models is #P-hard. They also give a constant factor approximation for the locational model. Huang and Li [119] use an alternative technique to give a FPRAS for the minimum spanning tree cost under the locational model.

2.8.3 Nearest Neighbours and Voronoi Diagrams

Given a set of points P in Rd, we may partition Rd into regions such that all the points in each region have the same nearest point in P , this is known as a Voronoi diagram. Voronoi diagrams are of interest in computational geometry because they represent a data structure suitable for nearest neighbour queries. 46 CHAPTER 2. RELATED WORK

The most likely Voronoi diagram defined by Suri and Verbeek [174] on a set of existentially uncertain points, partitions the plane into regions whose most likely nearest neighbour is the same. They show that even in 1 dimension the most likely Voronoi diagram can have Ω(n2) complexity, but that this can be reduced under some smoothing assumptions. A follow-up paper also with Kumar and Raichel [134] extends this lower bound to Ω(n2d), which again can be improved if the input is random. Agarwal et al. [8] consider points whose location is represented by a probability density function and study the problem of finding the point with the smallest expected distance to a query point, which they call the expected nearest neigh- bour. They study several variants of the problem including different metrics, approximate or exact answers, and computing the kth expected nearest neigh- bour. In each case a near-linear size data structure is provided which allow for polylogarithmic queries. A later paper by Agarwal et al. [6] extends substantially on this topic and adds algorithms for finding all potential nearest neighbours, and the probability that a given point is the nearest neighbour. On a set of existentially uncertain points, Kamousi et al. [127] give a linear space data structure with O(log n) query time that gives the expected distance of a query point to its (1 + )-approximate nearest neighbour, when the dimension is treated as a constant.

2.8.4 Closest Pair Kamousi et al. [127] show that computing the probability that the closest pair are within a given distance is #P-hard in the existential model, even in R2, or with the L∞ norm. Huang and Li [119] give a FPRAS for the expected distance between the closest pair in both models, and for the expected distance between the k-closest pair in the existential model. The expected distance between the k-closest pair in the locational model remains an open problem. They also give a FPRAS for computing the probability that the closest pair are within a given distance, and an inapproximability result for the probability the closest pair are greater than a given distance. These results can be used to give similar results for the diameter.

2.8.5 Convex Hull Suri et al. [175] provide a dynamic programming method to compute the most likely convex hull of a set of uncertain points (that is, the convex polytope with the highest probability of being the convex hull) under both the locational and existential models. In R2 they provide a O(n3) exact algorithm for the existential case. For d ≥ 3 and for the locational model for d ≥ 2 they prove the problem is NP-hard. Agarwal et al. [10] introduce an alternative formulation, where given a query point q ∈ R2 the goal is to compute the probability that q lies in the convex hull 2.8. UNCERTAINTY 47 of P. They study the problem under both uncertainty models and provide a variety of approaches, including partitioning Rd into convex regions such that each point in a region has the same probability of being contained in the convex hull, and a Monte Carlo based approach. Fink et al. [98] show that the probability of a query point lying in the convex hull under the existential model is equivalent to the (d − 1) dimensional hyperplane separability problem, for which they have an exact algorithm. This is an improvement on [10] in the existential case. In the existential model, Xue et al. [191] give algorithms for the expected diameter, width and complexity of the convex hull. In the imprecise model, L¨offlerand Van Kreveld [143] give polynomial time algorithms or hardness results for finding the maximum and minimum possible convex hulls, when points are constrained to appear in squares, circles or on line segments. Given a set of existentially uncertain points P and a query set of deterministic points Q, Kumar and Suri [135] study the problems of containment and evasion, that is computing the probability that each point of Q lies in the convex hull of P and the probability that no point of Q lies in the convex hull of P. They compute the probabilities exactly in R2 and show the problems are #P-hard in higher dimensions.

2.8.6 Distance and Shape L¨offlerand Van Kreveld [144] study the largest and smallest bounding box and diameter of a set of imprecise points constrained to balls. Keikha et al. [130] do the same thing for imprecise points constrained to intervals. Knauer et al. [133] consider upper and lower bounding the Hausdorff distance between two sets of imprecise points constrained to balls. They give algorithms, hardness results and approximation algorithms in various settings.

2.8.7 Fr´echet Distance Despite the Fr´echet distance’s wide interest and importance in the computational geometry community there seems to be a surprising dearth of research on the Fr´echet distance under uncertainty. Buchin and Sijben [53] compute the discrete Fr´echet distance between two polyg- onal curves under the locational model. They employ a dynamic programming approach to compute the coupling with the highest probability of realizing a given Fr´echet distance in O(n4) time. A recent publication by Buchin et al. [51] gives some algorithms and hardness results on the Fr´echet distance between uncertain trajectories, as does the Masters thesis of Popov [162]. 48 CHAPTER 2. RELATED WORK

In the imprecise points model, Ahn et al. [19] study the Fr´echet distances achievable when each vertex is constrained to lie inside some ball. They give an exact algorithm for minimum Fr´echet distance, and an approximation for the maximum. Fan and Zhu [96] show computing the maximum exactly is NP-hard.

2.8.8 Hyperplane Separability Given two uncertain point sets P and Q in the existential model, Fink et al. [98] provide an exact, O(nd) algorithm to compute the probability P and Q are are linearly separable.

2.8.9 Range Queries

Given a set P of uncertain points in R under the continuous-location model, Agarwal et al. [7] provide a data structure to quickly query all the points which lie in a given interval, with probability at least a given value. This model is extended to the multi-dimensional case by Tao et al. [176] who allow both rectangular and ball-shaped queries. Abdullah et al. [1] instead construct coresets for three range-counting problems: the expected number of points in a range, the number of points in a range with probability at least a given threshold and that fewer than a given number of points are in a given range.

2.8.10 -kernels Huang et al. [121] show how to construct a set of deterministic points of size − d−1 O( 2 ) which approximates the expected with of a set of existentially or locationally uncertain points in any direction. They show how to construct a -kernel which is a subset of the original uncertain points in a restricted setting, and that it is not always possible in general. Huang et al. also show how to construct a small set of uncertain points which approximates the probability distribution of the width of a set of existentially or locationally uncertain points in any direction.

2.8.11 Skylines As we have seen in several other problems, interest in uncertain skylines begins in largely empirical research in the databases community. The first definitions of skylines on uncertain data are from Pei et al. [158]. Given a set of locationally uncertain points, Afshani et al. [2] provide exact an exact algorithm for determining the set of points likely to appear on the skyline, and a data structure to approximate the probability that a given point appears on the skyline. 2.8. UNCERTAINTY 49

In the existential model, Agrawal et al. [18] compute the set of points most likely to be the skyline in R2 and show that even approximating the problem is NP-hard in higher dimensions.

2.8.12 Spanners Constructing graphs on locationally uncertain points which are very likely to be spanners is the topic of Chapter 5. Here we will survey the current literature on this topic.

Vondr´ak[182] shows how to construct a sparse subgraph G0 of a general (not necessarily embedded in a metric space) graph G, such that when some vertices or edges are randomly removed from G0 it still contains an approximate shortest path between each pair of vertices with high probability. This can be viewed as constructing a graph with a high probability of being a spanner in the existential model. Since Vondr´ak’smodel is general graphs, there is no geometry used in this result.

A k-fault tolerant spanner is defined by Levcopoulos et al. [138] as a graph which remains a spanner even if k vertices or edges are removed. Fault tolerant spanners have been investigated in a wide range of settings. Bose et al. [43] extend this idea and define a f(k)-robust spanner as a spanner, such that if k vertices are removed, then all the graph remains a spanner except for n − f(k) of its vertices. Bose et al. [40] and Buchin et al. [48] provide sparse constructions with close to O(n log n) edges.

Poureidi and Farshi [163] consider the case when each point adversarially appears within some ball. They show that if the balls are all disjoint then a linear size WSPD can be constructed, and hence a linear size spanner.

2.8.13 Visibility Visibility between uncertain points is a topic with very little existing research. To our knowledge the only paper to deal with this topic directly is Buchin et al. [49], who compute the probability that two uncertain points can see each other where the points’ locations are given by two continuous probability distributions. The key technique of this result is breaking the probability distributions into polygonal pieces between which the visibility question can be answered.

Our own results in Chapter 4, while deterministic, are intended to serve as a subroutine that could be used to approximate the probability of visibility between uncertain trajectories as a natural extension of Buchin et al..

Otherwise this area appears rich for further research. Different models of uncertainty, and uncertainty in the obstacles, provide worthwhile questions. 50 CHAPTER 2. RELATED WORK

2.8.14 Dealing with Uncertainty Van der Hoog et al. [178] give a measure of the total uncertainty of a collection of imprecise points, which they call ambiguity and show how the running times of some simple algorithms depend on the ambiguity. A result by Evans et al. [94] considers the possible locations of entities which move along trajectories which are initially unknown but may be queried at a cost. Evans et al. show how to make queries such that the ply, the number of entities which could be in the same location, can be minimized. This result can be viewed as a preprocessing step, showing how a budget can be spent reducing the uncertainty of a problem before another algorithm is used to solve the problem. This topic is discussed further in Chapter 6, where we introduce another problem in a similar setting to Evans et al.. Chapter 3

The Universal Traveling Salesman

When faced with uncertainty in the input to an algorithm, a natural (and very conservative) goal is to compute an output which is a good solution for any realization of the input values. In reference to one solution solving many problems this is known as a universal solution. Solutions to many problems are described in terms of the input (e.g. a subset or an ordering of the input). Therefore it may not be possible to specify a solution for a collection of instances which directly makes sense for each instance. In this case we must instead talk about instance solutions induced by the universal solution. “Induced” here means “easily computed from” in some context specific way. Let us pause here to illustrate with a simple example. Suppose one was given a collection of points P ⊂ R2 on which to build a minimum spanning tree, motivated perhaps by connecting a communications network. Suppose also that not every point needs to be connected to the network, but that exactly which points need to be connected is not known ahead of time. To simplify the presentation let us say there is a point which will always be included and will be considered the root of the tree. A universal solution to the problem could then be specified as a tree T over all the possible points P. For a given instance P ⊆ P a solution could be induced by taking an edge between each point present in P and its closest ancestor in T which is also in P . Since we require the root is always present, this method will always transform T into a tree over the points of P . See Figure 3.1 for an illustration. It would be unrealistic to imagine that the induced tree will be the minimum spanning tree over every possible P , as in general one could not expect a single

51 52 CHAPTER 3. THE UNIVERSAL TRAVELING SALESMAN

Figure 3.1: (left) The minimum spanning tree on a set of points, and (right) the tree induced on a subset of points. The induced tree is close to an MST itself. solution to exactly solve a entire class of non-trivial inputs. However, if it is possible to prove a satisfactory approximation ratio on the performance of the induced solution against the optimal solution on each instance, then the universal solution may be fairly described as solving the uncertain problem. Regardless of what input is sampled from the class, the induced solution will be approximately optimal. Jia et al. [122] give a more formal treatment by imagining the input to a universal problem as a set of partial information. A universal solution is then a solution which is high quality on every possible completion of the partial information. They introduce some notation to better describe the metaproblem. Let Π denote some optimization problem, insts(Π) denote all instances of that problem, sols(I) denote all feasible solutions of I ∈ insts(Π), and cost(S) be the cost of some solution S ∈ sols(I).

They also require the notion of the sub-instance relation Π, which forms a partial order on insts(Π), and the restriction function R which takes as input I1 ∈ insts(Π), I2 ∈ insts(Π) such that I2 Π I1, and S1 ∈ sols(I1), and outputs S2 ∈ sols(I2) such that S2 is the solution induced on I2 by S1, for some problem- specific meaning of induced. Then the performance of a solution S for a family of instances I1 can be neatly defined.

cost(R(I ,I ,S)) max 1 2 I  I 0 0 2 Π 1 minS ∈sols(I2) cost(S )

This formula is nothing more than the maximum ratio of the cost of an induced solution on an instance to the cost of the optimal solution on that instance. Of course this framework can be made trivial by taking each collection of sub- instances to be very small or singletons, or by having R ignore the universal solution and compute an instance solution directly. The interesting case occurs 53 when the sub-instances are as general as possible and R is as simple a function as possible. In this chapter we describe the Universal Traveling Salesman Problem, which can be viewed in this framework, and discuss the upper and lower bounds on computing universal solutions to the traveling salesman problem. The main body of this chapter describes a new lower bound on the approximation ratio of a popular class of universal solutions to the traveling salesman problem, which we call hierarchical solutions. 54 CHAPTER 3. THE UNIVERSAL TRAVELING SALESMAN

3.1 Background

The literature on the universal traveling salesman problem is reviewed in Chap- ter 2. Here we give a quick reminder of the key concepts in this topic. The traveling salesman problem (TSP) is one of the most studied problems in theoretical computer science. For a given set of locations and their pairwise distances the TSP is to find the shortest tour visiting all locations. In the case when the locations are points in R2 with their Euclidean distances this problem is known as the Planar TSP.

2 A space-filling curve is a surjective, continuous function φ : [0, 1]  [0, 1] . Space-filling curves were first discussed by Peano in 1890 [157] and have been an important mathematical concept since. Any space-filling curve φ naturally 2 −1 induces a total order φ of [0, 1] by letting a φ b whenever min φ (a) ≤ min φ−1(b). Note the minimum in the formula is necessary, as there does not exist an injective space-filling curve. The universal traveling salesman problem (UTSP) is to define a total order over a metric space, such that an approximate traveling salesman tour on any finite subset can be found by visiting each point of the subset in the induced order. Such a total order is known as a UTSP tour. A formal definition is given in the following section. For the UTSP in the plane the total order is generally chosen according to some space-filling curve, such as the Hilbert curve (for an example see Figure 3.2). The primary advantage of the UTSP as a TSP heuristic is its simplicity and speed, once a total order has been chosen, employing it only requires sorting. The performance of a UTSP tour on a given set of points is given by the ratio between the length of the tour produced by the UTSP and length of the optimal TSP solution on those points. The competitive ratio of a UTSP tour is the worst case ratio on any set of size n. It is convenient to consider points in the unit square [0, 1]2 rather than points in the entire plane R2, a convention we will follow. The unit square can be scaled up as necessary to cover general point sets. It is known there exist orderings of the unit square such that the competitive ratio is bounded by O(log n), where n is the size of the set of points we are to visit with the tour [160]. The known orderings with this property are all of the type we call hierarchical. At a very high level, a hierarchical ordering divides the plane into “fat” convex regions of equal measure, orders the regions to provide a partial ordering on the points, and then recurses on each piece until the ordering is total. We show for any such hierarchical ordering and all large n, there exists a set of points of size n such that the competitive ratio is Ω(log n). Our approach employs the probabilistic method. We draw a random line through 3.1. BACKGROUND 55

Figure 3.2: The UTSP tour induced by the Hilbert space-filling curve. the unit square and show there is a high probability that some points spaced (approximately) evenly along the line will be ordered by the UTSP substantially differently to the order they appear along the line. Hence the UTSP tour will backtrack over itself many times, while the TSP tour will not.

While our restriction to hierarchical orderings initially seems very restrictive it covers all the orderings typically used in the UTSP and similar applications. Non-hierarchical orderings (such as the lexicographical ordering) have very poor locality, making them unsuitable for the UTSP. This is discussed further in the conclusion.

3.1.1 Previous Lower Bounds Bertsimas and Grigni [37] provide the first Ω(log n) lower bound on the UTSP for several specific space-filling curves. Their lower bound comes from treating the curves as recursive partitions and placing points along a carefully chosen line such that the induced tour “zig-zags” inefficiently over a cell in the partition. This method is then recursively applied to argue a logarithmic total “zig-zag” cost. All subsequent work on UTSP lower bounds, including our own, has employed a similar intuition about backtracking along a line.

Hajiaghayi et al. [116] provide the first general-purpose lower bound following the same basic structure. They restrict the points to a grid to simplify the argument, and choose the line as used by Bertsimas and Grigni uniformly at random. This randomness removes the problem of finding the perfect line to use for every possible space filling curve. They continue by employing the probabilistic method of Erd¨os,compute the expected cost of backtracking along the line, and argue that there exists a line with cost at least the expected cost of their random line. Unfortunately the simplifications required by Hajiaghayi et al.’s proof result in q  6 log n a loss of tightness in the bound and their lower bound of Ω log log n leaves a gap to the best known upper bound. 56 CHAPTER 3. THE UNIVERSAL TRAVELING SALESMAN

Figure 3.3: The lines used to construct counter-examples for the Hilbert and Sierpinksi curves; note the self-similarity of their intersections with the subregions.

We provide an alternative method to Hajiaghayi et al.. We remove the reliance on grid points, which Christodoulou and Sgouritsa [73] found are unlikely to be useful in providing a Ω(log n) lower bound. We restrict ourselves to hierarchical orders, and prove a Ω(log n) lower bound in that case. 3.2. PRELIMINARIES AND NOTATION 57

3.2 Preliminaries and Notation

2 2 Given , a total ordering of [0, 1] , and S = {s1, . . . , sn} ⊂ [0, 1] labeled such that s1  · · ·  sn, we define the performance of  on S and the optimal TSP value for S to be: ˆ Pn utsp(S) = i=1 d(si, si+1), ˆ Pn tsp(S) = minπ∈Sn i=1 d(sπ(i), sπ(i+1)), where sn+1 = s1, and Sn is the symmetric group of order n. Namely, utsp(S) is the cost of visiting S in the order dictated by , while tsp(S) is the optimal cost of visiting S. The competitive ratio of , as function of the number of points we are to visit, is

utsp(S) ρ(n) = sup . S⊂[0,1]2 tsp(S) |S|≤n

Throughout this chapter we will be working with decompositions into nicely behaved pieces. To quantify niceness we adapt the concept of α-fatness commonly used in computational geometry (for example by [92]).

2 Definition 1. Let R be a convex region in R , rin be the radius of the largest ball contained by R and rout be the radius of the smallest ball containing R. We say R is α-fat if rin ≥ α. rout Our result concerns the performance of a certain class of total orderings that we call hierarchical. These are orderings that can be constructed by partitioning the unit square into k nice regions, recursively ordering each region and then concatenating these partial orderings to get a total ordering of the whole unit square. Such a construction leads to a natural hierarchical decomposition of the unit square: Level 0 is made up of a single region, the whole unit square; whereas in general level i in the decomposition is made up of ki regions. Let us denote the regions in level i with Qi. Figure 3.2 shows the order of the third level regions of the decomposition associated with Hilbert’s space filling curve.

Definition 2. An ordering  is hierarchical if, for some fixed constants k and α, it can be constructed by recursively partitioning into regions such that for all regions R ∈ Qj at level j we have: 1. R is convex, 2. area(R) = 1/kj, and 3. R is α-fat 58 CHAPTER 3. THE UNIVERSAL TRAVELING SALESMAN

It is worth noting that while seemingly restrictive at first sight, most classical orderings of the unit square such as the Hilbert, Peano, Sierpinski and Lebesgue space filling curves fall within this framework. Our main result is a lower bound argument that shows that for any ordering con- structed in this fashion there is a family of point sets for which the approximation ratio is at least logarithmic on the number of points. Throughout the chapter we will make use of some properties of the perimeter of the regions in the decomposition that stem from our convexity and α-fatness assumptions. Our first observation is that the perimeter is a polygon.

Observation 1. For any Q ∈ Qi, the perimeter of Q is polygon made up of straight edges, one of each region neighboring Q.

Proof. Let T be a region neighboring Q. Since both T and Q are convex, it follows that T and Q meet along a straight line segment.

Our second observation roughly states that up to α factors, the perimeter of such a region is similar to the perimeter of a ball with equal measure.

Observation 2. For any region Q ∈ Qi, its perimeter P must be proportional − i to k 2 ; more precisely,

√ − i −1√ − i 2α πk 2 ≤ P ≤ 2α πk 2 . (3.1)

Proof. Let rin be the radius of the largest ball enclosed in Q and rout be the radius of the smallest ball enclosing Q. By α-fatness we have rin ≥ αrout. 2 −i The inner ball must have area no greater than Q, hence πrin ≤ k and so i − 1 rin ≤ (πk ) 2 . Similarly the outer ball must have area no less than Q and so i − 1 rout ≥ (πk ) 2 .

√ − i The circumference of the inner ball is 2πrin ≥ 2π(αrout) ≥ α(2 πk 2 ). The −1 −1 √ − i circumference of the outer ball is 2πrout ≤ 2π(α rin) ≤ α (2 πk k ). The perimeter of Q is, by convexity, bounded by the perimeters of the inner and √ − i −1√ − i outer balls, and hence must be between 2α πk 2 and 2α πk 2 .

We say Q ∈ Qi is an inner region if it does not touch the boundary of its parent; namely, if all its neighbors are siblings regions in the decomposition. The following lemma shows that for large enough k, every region has a child that is an inner region.

16 Lemma 1. If k > α4 then every region T in the decomposition has at least one child region Q that is an inner region. 3.2. PRELIMINARIES AND NOTATION 59

T

Figure 3.4: All the non-inner children of T must fit inside a strip (denoted in q √2 1 gray) of width α π kj+1 around the border of T which is not large enough to accommodate all the children of T . Hence T has an inner child.

Proof. Let T ∈ Qj and R = {R ∈ Qj+1 | R is a child of T and R touches the boundary of T }. 1 Recall that each R ∈ R has area(R) = kj+1 and, because the region is fat, there 1 exists a ball of area 2 j+1 enclosing R. Therefore, as exemplified in Figure 3.4, α k q √2 1 there must be a strip of width α π kj+1 around the boundary of T that contains all the regions in R.

√ −1 − j Now by Observation 2, T ’s perimeter cannot be larger than 2 πα k 2 . Hence 4 1√ all of R must be contained in a strip of area less than α2 j . It follows that √ k k 4 k 16 |R| ≤ α2 , and thus if k > α4 then |R| < k and so there must exist at least one child of T that does not touch its boundary.

Notice that the requirement that k is large is not a restrictive one. Given a 16 decomposition with k ≤ α4 we can instead take a decomposition which has a single level for multiple levels of our original decomposition. For example a decomposition with k = 4 could be turned into one with k = 256 by merging four levels, which would accommodate α = 0.5. Since k and α are constants, this is always achievable by merging a constant number of levels, so it does not impact our argument. Our lower bound construction makes use of some properties of random lines intersecting the unit square. Let us denote with L the set of all lines which intersect the unit square. As Figure 3.5 shows, a line ` ∈ L is uniquely determined by the angle it makes with respect to the x-axis, and its x-intercept or similarly its y-intercept. We define a procedure Λ for sampling a line ` ∈ L as follows: Pick uniformly at random β ∈ [0, π), consider the β-angled projection of [0, 1] on the x and y axes and pick an intercept uniformly at random within the shorter  π 3π  projection of the two; in other words, if β ∈ 4 , 4 then we pick an intercept  π 3π  along the x-axis, and if β∈ / 4 , 4 then we pick an intercept along the y-axis. Note that even though the probability distribution over lines is not uniform, the probability densities of any two lines in L differ by at most a factor of 2 since the length of the smaller projection is always in the range [1, 2]. 60 CHAPTER 3. THE UNIVERSAL TRAVELING SALESMAN

`

` β β b b

Figure 3.5: The space L of lines intersecting the unit square is sampled by uniformly picking an angle β ∈ [0, π) and an x-axis (or y-axis) intercept b in the β-angle projection of the square to the x-axis (y-axis). 3.3. LOGARITHMIC LOWER BOUND 61

3.3 Logarithmic Lower Bound

In this section we prove that any hierarchical ordering as in Definition 2 must have asymptotic approximation ratio that grows logarithmically with the cardinality of the set we are to visit.

Theorem 1. For any hierarchical ordering  of [0, 1]2, we have

ρ(n) = Ω(log n)

The key ingredient in our proof of this Theorem is the existence of a probability 2 i distribution over families of subsets S1,S2,... ⊂ [0, 1] of size k such that:

1. tsp(Si) = O(1), and

2.E[ utsp(Si)] = Ω(i). Defining this distribution is exceedingly simple: Pick a random line ` accordingly to the sampling procedure Λ, and construct Si by picking one representative point on ` from each region in Qi that ` intersects; that is, one point from each region in the ith level of the hierarchical decomposition associated with . Our goal for the rest of the section is to prove the two properties stated above.

Lemma 2. For any  and any line ` ∈ L, E[tsp(Si) | `] = O(1).

Proof. The cost of the optimal travelling salesman tour of any number of points in one dimension is upper bounded by twice the distance between the smallest and largest points, since there is a tour that visits each point in ascending order, then returns to the start. √ 2 Since the points of Si lie on the intersection of ` and [0, 1] , tsp(Si) ≤ 2 2.

In order to prove the second property we need to first establish a few key facts and definitions about the random process for constructing the Si sets. Let A, B, C ∈ Qj be regions at level j ≤ i of the decomposition. We say these regions form an out-of-order triplet with respect to a line ` ∈ L if: 1. A, B, and C have the same parent region in the decomposition, 2. ` cuts through A, B, and C in that order (or the reverse order), and 3. either B  A, C or A, C  B If B is the central part of some out-of-order triplet (A, B, C), we define its share χ(B) to be k` ∩ Bk, otherwise, χ(B) = 0. Figure 3.6 provides a pictorial representation of the share of a region. Our first observation is that these shares provide a lower bound on the cost of the universal tour on Si. 62 CHAPTER 3. THE UNIVERSAL TRAVELING SALESMAN

` B

A C

Figure 3.6: An out-of-order triplet (A, B, C) induces a share χ(B) shown by the thick line segment.

Lemma 3. For any given ` ∈ L and its associated set Si we have X X utsp(Si) ≥ χ(Q).

j≤i Q∈Qj

Proof. Let (A, B, C) be an out-of-order triplet at level j ≤ i whose parent is T . Assume without loss of generality that B  A, C (the case A, C  B is symmetric). Thus there must exist a ∈ Si ∩ A, b ∈ Si ∩ B, and c ∈ Si ∩ C such that b  a, c. This in turn means that we must have two points u, v ∈ Si ∩(T \B) that are consecutive in  such that `[u, v] cuts through B. We charge χ(B) to the segment `[u, v] of the utsp solution. Note that the segment `[u, v] can only be charged by other children of T , which by definition are all disjoint, so the total charge to the segment cannot exceed k`[u, v]k. Thus the lemma follows.

We will show that, under expectation, ` cuts through a substantial number of out-of-order triplets, each with a substantial share. But first, we need to introduce a few concepts. Let Q be a region at level j that does not touch the boundary of its parent region; by Lemma 1 we can assume without loss of generality that there are at least kj−1 such regions. Recall that by Observation 1, Q can only meet its neighbours along straight line segments. Call these the edges of Q. Note that Q can have at most k − 1 edges, since Q shares edges only with its siblings. Let α P be the perimeter of Q. Call an edge short if its length is less than 10k P and α long otherwise. It follows that the total length of short edges is at most 10 P . Call the angle1 between a pair of edges wide if it is greater than π − δ, where α δ = 5 and narrow otherwise; recall that since α ≤ 1, this means that δ ≤ 1/5. We will now introduce the concept of a meta-edge of Q. Let e be an edge of Q. The meta-edge eˆ induced by e consists of all the edges that form a wide angle with e and e itself. Note that the edges in eˆ form a contiguous section of the Q’s boundary about e.

1The angle between a pair of edges is the angle at which the lines going through them meet. 3.3. LOGARITHMIC LOWER BOUND 63

θ

L H

W

Figure 3.7: A meta-edge with height H, width W and angle θ and L the length of the solid chain.

Define W (eˆ), the width of eˆ, to be the distance between the first and last endpoints in eˆ. Define H(eˆ), the height of eˆ, to be the maximum distance between a point on any edge in eˆ and the line connecting the first and last endpoints of eˆ. Define θ(eˆ), the angle of eˆ, to be the angle between the first and last edges in eˆ. Finally, define L(eˆ), the length of eˆ, to be the sum of the lengths of the edges ine ˆ. Figure 3.7 exemplifies these concepts. Observe that because all edges that make up eˆ form a wide angle with e, it must be the case that θ ≥ π − 2δ. Note that for a meta-edge with a given length, its height is maximized and its width minimized when it forms an isosceles triangle. Therefore, L π − θ  L Lδ H ≤ sin ≤ sin δ ≤ , and (3.2) 2 2 2 2 θ  π  π  L W ≥ L sin ≥ L sin − δ ≥ L sin ≥ . (3.3) 2 2 6 2

Call a set of three edges of Q a fat triad for Q if each edge is long and the angle between any pair of the edges is narrow. First we will show that every inner region admits a fat triad and then we will show that this leads to having a large share value.

Lemma 4. Every inner region Q ∈ Qi has a fat triad.

Proof. Let us attempt to greedily build a fat triad by iteratively selecting arbitrary long edges that do not belong to the meta-edge of previously selected edges. Let e1, . . . , ep be the edges selected in this fashion.

Since the angle between any two edges in e1, . . . , ep must be narrow, any three edges form a fat triad. Assume then, for the sake of contradiction, that we select fewer than three edges. First, consider the case when we only select one edge. This could only happen if all edges outside of eˆ1 are short. Let P be the α perimeter of Q and S be the total length of its short edges; recall that S ≤ P 10 . Note since the endpoints of eˆ1 are connected with short edges, the width of P α eˆ1 is at most S; that is, W (eˆ1) ≤ 10 . On the other hand, by (3.3), we have L(ˆe1) P −S 1−α/10 W (eˆ1) ≥ 2 ≥ 2 ≥ P 2 . This leads us to a contradiction, so we can rule out the case where we only pick one edge. 64 CHAPTER 3. THE UNIVERSAL TRAVELING SALESMAN

N2 N1

Q

N3

Figure 3.8: At least one of the three dashed lines must form an out-of-order triplet

The second case to consider is what happens when we pick only two edges e1 and e2. Again, this means that edges outside the meta-edges eˆ1 and eˆ2 must be short. Let P be the perimeter of Q, and S be the total length of its short edges. It is not hard to see that the entirety of Q must lie in a strip whose width is no α greater than H(eˆ1) + H(eˆ2) + S. Therefore, by (3.2) and the fact that S ≤ P 10 , δ α  the width of the strip is no greater than P 2 + 10 .

Now let rin be the radius of the largest ball enclosed in Q and rout be the radius δ α  of the smallest ball enclosing Q. We know that 2rin ≤ P 2 + 10 , since the strip contains the inner ball. Also, we know that 2πrout ≥ P , since the outer ball contains Q. Finally, we know that rin ≥ αrout by α-fatness. Recalling that α δ = 5 , combining these inequalities leads to a contradiction, so we can rule out the case where we only pick two edges. It follows that the greedy procedure always picks at least three edges, and so Q always admits a fat triad.

Lemma 5. If Q ∈ Qj is an inner region, then

E[χ(Q)] = Ω(k−j)

Proof. Let P be the perimeter of Q. Since Q is an inner region, by Lemma 4, it must have three long edges e1, e2 and e3 such that the angle between any pair of them is narrow.

Call the neighbours of Q on the other side of these edges N1, N2 and N3 respectively. Since Q is an inner region, N1, N2, and N3 are siblings in the decomposition. Notice that two of these regions must come either before or after Q in the  ordering; these regions, together with Q form an out-of-order triplet. Therefore, as depicted in Figure 3.8, at least one of (N1, Q, N2), (N2, Q, N3) or (N3, Q, N1) must form an out-of-order triplet. Without loss of generality, assume (N1, Q, N2) is the out-of-order triplet.

Let v be the point where the linear extensions of e1 and e2 meet. Assume, without loss of generality, that e2 is closer to v than e1. Let us trim the edges to 3.3. LOGARITHMIC LOWER BOUND 65

α get e˜1 and e˜2 so that they are both exactly 20k P long and the distance between v and the furthest point in e˜2 is less than distance between v and the closest α point in e˜1. Since e1 and e2 each have length at least 10k we can take an interval α of length 20k from the end of e2 closest to v and from the end of e1 furthest from v to achieve this. To lower bound E[χ(Q)] we will consider the contribution to the expectation only when `, the random line that is used to construct Si, hits both e˜1 and e˜2. To that end, for any point p in e˜1, our first goal is to lower bound E[χ(Q) | p ∈ `]. Let ϕ be the angle of the smallest cone centered at p that contains e˜2. Since any line that passes through p and lies inside this cone intersects e˜2, the probability ϕ of ` hitting e˜2 given that p ∈ ` is at least π . Finally, let R be the distance π between p and the closest point in e˜2. Let us assume for now that ϕ ≤ 4 . Note that in this range tan ϕ ≤ 2ϕ, therefore ϕ R · tan ϕ E[χ(Q) | p ∈ `] ≥ R ≥ . (3.4) π 2π

v

θ

Z L R ϕ 1 (π − θ) p 2

Figure 3.9: The point in e˜2 that is closest to p happens to be the endpoint closer to v.

Consider now the case where the point in e˜2 that is closest to p happens to be the endpoint closer to v. Let L be the length of e˜2 and Z = R · tan ϕ. Figure 3.9 shows how these quantities are related. Now consider the following inequalities: π − θ  δ α R · tan ϕ = Z ≥ L sin ≥ L sin δ ≥ L = L , 2 2 10 and so E[χ(Q) | p ∈ `] = Ω(L).

Consider the case where the point in e˜2 that is closest to p is the endpoint furthest from v. Again, let L be the length of e˜2 and Z = R · tan ϕ. Consider the triangle defined by p and the endpoints of e˜2, and let β be the angle of the endpoint of e˜2 closest to v. Figure 3.10 shows how these quantities are related. Now consider the following inequality: R · tan ϕ = Z ≥ L sin β, 66 CHAPTER 3. THE UNIVERSAL TRAVELING SALESMAN

v v

θ

β β

L L Z

β − θ β − θ

ϕ R p p

Figure 3.10: (left) The point in e˜2 that is closest to p happens to be the endpoint furthest from v. (right) Q must be entirely contained within two cones. now if β is at least a constant it follows that E[χ(Q) | p ∈ `] = Ω(L). Let us now verify that β is a constant. Observe that by convexity, no part of Q can lie to the left of the line through p and v, or to the right of the line through e2 (following the orientation of Figure 3.10). Also since the perimeter of Q is 1 P , Q cannot extend more than 2 P in any direction. Hence all of Q must be 1 contained within the union of two cones, each of radius 2 P : one with vertex at the inner endpoint of e2 with angle β, and one with vertex at p with angle β − θ, since θ > 0 we will upper bound this by β. See Figure 3.10 (right) for an 1 2 illustration of the cones. The total area enclosed by both cones is at most 4 βP . By α-fatness and Observation 2, the area of Q is Θ(P 2), so P 2 = O(βP 2) so β = Ω(1) as required.

Consider the case where the point in e˜2 that is closest to p is in the middle of the segment. In this case we can further trim e˜2 to at most half its length so that we are back into one of the two cases considered. So we again have that E[χ(Q) | p ∈ `] = Ω(L). Finally, we need to handle the case ϕ ≥ π/4. Note that we always have ϕ ≤ π/2, since p is always further away from v than any point in e˜2. Split e˜2 into two 3.3. LOGARITHMIC LOWER BOUND 67

halves of equal length, and let ϕ1 and ϕ2 be the angles described by each half. ϕ1+ϕ2 ϕ π The half with the smaller angle has angle min(ϕ1, ϕ2) ≤ 2 = 2 ≤ 4 . Thus, using the segment with the smaller angle we default to the case previously handled only at the expense of a constant factor in the length of the segment. Therefore, in every case we have E[χ(Q) | p ∈ `] = Ω(L).

Notice that the probability of hittinge ˜1 is Ω(L). Hence,

2 2 −j E[χ(Q)] ≥ E[χ(Q) | ` ∩ e˜2 6= ∅] Pr[` ∩ e˜2 6= ∅] = Ω(L ) = Ω(P ) = Ω(k ), where the first equality follows from the fact that we have independently lower α bounded each term by Ω(L), the second equality from the fact that L ≥ 40k P , and the last inequality from Observation 2.

We have all the tools to show that the expected value of the universal TSP tour on Si is at least Ω(i).

Lemma 6. For any , E[utsp(Si)] = Ω(i).

Proof. By Lemma 3 we know that the sum of the χ-values are a lower bound: X X E[utsp(Si)] ≥ E[χ(Q)].

j≤i Q∈Qj

−j By Lemma 5, we know that every inner region Q ∈ Qj has E[χ(Q)] = Ω(k ). If 16 k > α4 then, by Lemma 1, every region in the previous level of the decomposition has at least one child that is an inner region. Therefore, we must have kj−1 inner regions in Q , so P E[χ(Q)] = Ω(1), and the lemma follows. j Q∈Qj

Everything is in place to give the proof of Theorem 1.

Proof of Theorem 1. Let Sˆi be a realization of the random variable Si such that utsp(Sˆi) ≥ E[utsp(Si)]. By Lemma 6, it follows that utsp(Sˆi) = Ω(i). On the other hand, by Lemma 2, we know that that tsp(Sˆi) = O(1). Finally, the i cardinality of Sˆi is trivially bounded by k , so ˆ i utsp(Si) ρ(k ) ≥ = Ω(i). tsp(Sˆi) Since this holds for all i, in general we have

ρ(n) = Ω(log n). 68 CHAPTER 3. THE UNIVERSAL TRAVELING SALESMAN

p q

R

Figure 3.11: If ` intersects a region R more than once, it will cause a jump by itself from p to q.

3.4 Conclusion

We have demonstrated that hierarchical orderings of the plane cannot achieve a competitive ratio better than the O(log n) proved by Platzman and Bartholdi when they introduced the UTSP problem. Included in our definition of hierarchical are the requirements that the regions are of equal measure, are convex and are α-fat. While these restrictions initially appear restrictive they cover all the orderings typically used for the UTSP. We conclude by examining examine each of these assumptions in more detail and discuss how they could potentially be relaxed.

3.4.1 Equal Measure The requirement that the regions are of equal measure is used to simplify the presentation of our arguments and could be relaxed to require regions of measure within some constant factor. The only change this would make to our proofs is to include the constant factor in various expressions, which would not change the asymptotic bound.

3.4.2 Convex Similarly, the requirement that the regions are convex does not seem to be integral to our methods. Since each region is α-fat it must contain a disk which covers most of its area, and must fit within an outer disk. Hence it must consist of a fat, convex core (which can create the jumps required for the lower bound) and a non-convex border area. Intuitively, having a non-convex border area can only increase the cost, since a line can intersect a region multiple times. By placing a point on the line each time the line intersects a region, that region can create additional jumps with itself, as in Figure 3.11.

3.4.3 α-fatness The requirement that the regions are α-fat, however, is integral to our conception of a hierarchical ordering and new ideas are needed to lift this requirement. Chapter 4

Trajectory Visibility

In this chapter we consider the following question: two entities q and r follow two different trajectories, with (possibly different) constant speed. Their trajectories lie in an environment with obstacles that block visibility. Can the two entities, at any time, see each other? This question combines two key concepts from computational geometry, namely trajectories and visibility.

69 70 CHAPTER 4. TRAJECTORY VISIBILITY

4.1 Background

This problem is motivated by requirements from computational ecology. Increased availability of small GPS and internet-connected devices has allowed ecology researchers to develop tracking collars that can be fitted to a wild animal so that its movements can be tracked over time. The resulting time-stamped locational data forms, in computational geometry terms, a trajectory. Inferences about the social behaviour of animal species may be drawn from studying collections of trajectory data. Two trajectories passing closely together represents an encounter, trajectories moving together for an extended period represents a social group, etc. The motivation for this section comes from a problem defined by Buchin et al. [52]. They considered vervet monkeys moving through a jungle terrain and their changes in behaviour when encountering another social group. In their paper Buchin et al. considered two monkeys to have encountered each other when they came within 100 meters of each other. This distance is an estimate of the average distance at which the monkeys can see each other. However if the monkeys are moving through an irregular terrain, with visibility blocking obstacles, a radius-based test like this may not be appropriate. Instead we define an encounter as happening when the line of visibility between the two monkeys is unobstructed by any obstacles. A reasonable model for this problem treats the monkeys as piecewise linear trajectories, and the terrain as a set of visibility-obstructing polygonal obstacles. In Section 4.3 we address detecting visibility in this setting. Observe that this problem is not equivalent to computing visibility between two piecewise linear curves, as each moving entity must be simultaneously present along a sight-line to see each other. See Figure 4.1.

Figure 4.1: (left) a time at which the blue and green entities are mutually visible (right) although the blue and green paths are mutually visible, there is never a time at which both entities are able to see each other.

There is a complicating factor. Due to an inherent trade-off between battery-life 4.1. BACKGROUND 71 and weight, it is not feasible for the tracking collars to continually transmit data. Hence the time intervals between data-points can be relatively long and the assumption of linear interpolation between data points is unjustified. In this setting the movement of the animals is typically considered random, according to some probability distribution, conditioned on being in certain locations at certain times. A Brownian bridge is the most common model because of its relative simplicity and elegance, it is the model used by Buchin et al. [52] for example. The natural question then is “what is the probability that the entities have come within visibility?” This question is answered by Buchin et al. [49] for stationary entities whose location given by a probability distribution. An extension of their work to compute the probability of visibility between two entities which move along uncertain trajectories would be a very appealing goal. However a direct extension does not seem to be possible, and there are obstacles to computing any analytical solution. We instead propose approximating the probability by sampling many possible pairs of trajectories from an appropriate distribution and computing visibility between each sampled pair. Counting the ratio of visible pairs to total pairs sampled can be used to give an approximation of the probability that the entities see each other. This technique involves making visibility queries between different trajectories repeatedly on the same set of obstacles. This raises a natural data structures problem: can we preprocess the set of obstacles in such a way that visibility queries between trajectories can be efficiently answered? The answer to that question, in various settings, comprises the majority of this chapter. 72 CHAPTER 4. TRAJECTORY VISIBILITY

Figure 4.2: A pair of paths through a polygonal domain and the points at which they are sampled. Linearly interpolating between samples is not appropriate for visibility questions in this setting. 4.2. INTRODUCTION 73

4.2 Introduction

In this chapter, we study the following fundamental question, which we refer to as trajectory visibility testing. Given a simple polygon, or a polygonal domain, P , and the trajectories of two moving entities q and r, is there a time t at which q and r can see each other? We assume that q and r move linearly with constant (but possibly different) speeds between trajectory vertices, and cannot see through the edges of P . We distinguish several variants depending on whether P is a simple polygon or a polygonal domain, and whether the trajectories are allowed to intersect P (e.g. vehicles moving through fog, animals moving through foliage) or not (e.g. pedestrians moving among buildings, ships moving on water bodies). These variants are illustrated in Figure 4.3. We further consider the same variants in the simpler scenario in which one of the entities is a point (e.g. a stationary guard and a moving intruder).

Note that we are interested only whether there exists a time at which the two entities see each other. (more complex questions may be considered, we start with the most basic version; refer also to Section 4.9). This implies we can temporally decompose the problem: the answer is no if and only if the answer is no between all two consecutive time stamps. When considering this question, two fundamentally different approaches come to mind. On the one hand, when the number of trajectory vertices τ is small compared to the number of polygon vertices n, the best approach may be to simply solve the problem for each time interval separately. On the other hand, when τ is large compared to n, it may be more efficient to spend some time on preprocessing P first, if this allows us to spend less time per trajectory edge. We therefore distinguish between the algorithmic question and the data structure question.

Our results are discussed below and summarized in Table 4.1. When one of the entities is stationary and the other moves we can obtain better results, these are summarized in Table 4.2. When both points are stationary existing methods are sufficient to compute visibility, see Table 4.3.

Data Structure Setting Algorithm Space Preprocessing Query 5 1 5 3 3 Simple Θ(n) O(n log n) O(n log n) O(n 4 log n) Intersect O(n log n) O(n3k) O(n3k) O(logk n) Domain Θ(n log n) O(n3k) O(n3k) O(logk n)

Table 4.1: Results when both entities move along a line segment. The first column specifies whether the setting is a simple polygon, a simple polygon where the entities may move through obstacles, or a polygonal domain where the entities may move through obstacles. These results are discussed in Section 4.6. The variable k in these results is a large constant. 74 CHAPTER 4. TRAJECTORY VISIBILITY

Figure 4.3: Different variations of the problem: two trajectories inside a simple polygon (top left), two trajectories in a polygonal domain (top right), two trajectories intersecting a simple polygon (bottom left) and two trajectories intersecting a polygonal domain (bottom right).

4.2.1 Results We focus on trajectories of at most two vertices; any set of trajectories of τ vertices can be handled by applying our algorithms or queries τ times. In Section 4.3, we discuss our algorithmic results; we build on the structural geometric properties established in this section in the remainder of the chapter. In Section 4.5 we consider the sub-problem of preprocessing a convex polygon P 0 for intersection queries with quadratic curve segments. We then extend the solution in Section 4.6 to a data structure for visibility testing in a simple polygon P using multi-level data structures. In Section 4.7, we investigate the degree to which our solution from Section 4.6 can be extended to the cases where P is a polygonal domain. The techniques are similar but the resulting data structure requires much more space. In Section 4.8, we study the more restricted problem in which one of the entities is stationary. We show that these problems can be solved with a different approach that is more efficient. 4.2. INTRODUCTION 75

Data Structure Setting Algorithm Space Preprocessing Query Simple Θ(n) O(n) O(n log n) Θ(log n) 2 3 2 3 3 +ε Intersect O(n log n) O(n log n) O(n log n) O(n 4 ) 4 3 4 3 3 +ε Domain Θ(n log n) O(n log n) O(n log n) O(n 4 )

Table 4.2: Results for the restricted case when one entity remains stationary. The first column specifies whether the setting is a simple polygon, a simple polygon where the entities may move through obstacles, or a polygonal domain where the entities may move through obstacles. These results are discussed in Section 4.8.

Data Structure Setting Algorithm Reference Space Preprocessing Query Simple Θ(n) O(n) Θ(n) Θ(log n) [112] Domain Θ(n) O(n2) O(n2) Θ(log n) [161]

Table 4.3: Results when neither entity moves can be obtained using the existing literature, in the case the points are inside a simple polygon or in a polygonal domain. 76 CHAPTER 4. TRAJECTORY VISIBILITY

4.3 Algorithms for testing visibility

Let P be a polygonal domain with n edges and let entities q and r each move at constant speed along a line segment in the plane. We first present an O(n log n) time algorithm to solve the visibility problem for q, r and P and we show this running time is tight in the worst case. We then show how to solve the visibility problem in linear time in the case where P is a simple polygon and q and r are contained in P . All other sections depend on the notion of hourglass that is presented here.

4.3.1 An O(n log n) time algorithm The entities q and r each move along a line segment with constant speed (equal to the length of the line segment) during the time interval [0, 1]. With a slight abuse of notation, we use q and r to refer to both the moving entities and their trajectories. Consider the line g(t) through both entities at time t. We dualize this line to a point using classical point-line dualization (i.e. we map the line y = ax + b to the point (a, −b)). This point γ(t) now traces a segment of a curve γ : [0, 1] → R2 in the dual space. Throughout this chapter we follow the convention of using Latin letters for objects in the primal space and Greek letters for their duals.

Lemma 7. The segment γ is a segment of a quadratic curve with 5 degrees of freedom.

Proof. We say that entity q walks from (a1, a2) to (a1 + a3, a2 + a4) and that entity r walks from (a5, a6) to (a5 + a7, a6 + a8). Note that the speed of entity q is k(a3, a4)k and that the speed of entity r is k(a7, a8)k. We can parametrize the position of entity q and r on the time t ∈ [0, 1] as follows:

 x   a + a t   x   a + a t  q(t) = q(t) = 1 3 r(t) = r(t) = 5 7 yq(t) a2 + a4t yr(t) a6 + a8t

At all times, the line g(t) is the line through the points q(t) and r(t). We say that at all times, g(t) has slope and offset (α(t), β(t)). The parametrisation of g(t) then becomes:

y −y ! !  α(t)  r(t) q(t) a6−a2+(a8−a4)t g(t) = = xr(t)−xq(t) = a5−a1+(a7−a3)t β(t) α(t) · xq(t) − yq(t) α(t)(a1 − a3t) − a2 − a4t

Observe that in this equation it is possible to divide by zero. Note that this occurs only if there is a moment in time where the line-of-sight between the 4.3. ALGORITHMS FOR TESTING VISIBILITY 77 two entities is a vertical segment. In this case, the dual of their line-of-sight is also not well defined. This is a common degeneracy with visibility queries and dualization algorithms in computational geometry and it can normally be solved by rotating the problem by a small angle. We can also split each trajectory into two pieces, before and after the vertical line-of-sight, and run our algorithm on the pieces separately. If the time t lies between 0 and 1, this parametric equation traces our curve segment γ and if we take t over all of R, the parametric equation traces a full curve which we denote by Γ. To show the degree of the curve Γ we rewrite the parametrized curve to a canonical form that drops the dependence on t. First we take the formula for the β-coordinate and isolate t:

α(t)a − a − β(t) t = 1 2 α(t)a3 + a4 We then take the formula for the α-coordinate and remove the fraction by multiplying both sides with ((a5 − a1) + (a7 − a3)t):

α(t)(a5 − a1) + α(t)(a7 − a3)t = (a6 − a2) + (a8 − a4)t

We substitute the value for t into this equation, and remove the fraction by multiplying both sides with (α(t)a3 + a4):

α(t)(α(t)a3 + a4)(a5 − a1) + α(t)(a7 − a3)(α(t)a1 − a2 − β(t))

= (a6 − a2)(α(t)a3 + a4) + (a8 − a4)(α(t)a1 − a2 − β(t))

Rearranging:

2 2 α(t) a3(a5 − a1) + α(t)a4(a5 − a1) + α(t) a1(a7 − a3)

−α(t)a2(a7 − a3) − α(t)β(t)(a7 − a3)

= α(t)a3(a6 − a2) + a4(a6 − a2)

+α(t)a1(a8 − a4) − a2(a8 − a4) − β(t)(a8 − a4)

Lastly we linearize this equation by separating polynomials based on α(t) and β(t) from polynomials based on a1 . . . a8.

2 0 = [α(t) ](a5a3 − 2a1a3 + a1a7)

+[α(t)](a4a5 − a3a6 + a2(2a3 − a7) − a1a8)

+[α(t)β(t)](−(a7 − a3)) + [β(t)](−(a8 − a4))

+[1](a4(a6 − a2) − a2(a8 − a4)) (4.1) 78 CHAPTER 4. TRAJECTORY VISIBILITY

t = 1

t = 0

Figure 4.4: The wedge Λe generated by some edge e (shaded in yellow), and the curve γ representing the line-of-visibility between q and r (in blue).

This is a degree 2 polynomial in α(t) and β(t) with 5 degrees of freedom. Notice in particular that the β(t)2 term is zero, so not every degree 2 polynomial can be realized this way.

Let e be an edge of P and denote by Le the set of lines intersecting e. The dual of Le is a wedge Λe [82]. If the segment between q and r is blocked by e at time t then g(t) must lie in Le. In the dual, this means that the curve segment γ must intersect Λe. There are at most two connected time intervals where a quadratic curve segment γ can intersect a wedge Λe; it follows that each edge e has at most two connected time intervals where it blocks the visibility between q and r. See Figure 4.4 for an illustration. This leads to a straightforward general algorithm to test if there is a time at which q can see r: for each edge e ∈ P , we compute the at most two time intervals where it blocks visibility between q and r in constant time. We sort these time intervals in O(n log n) time and check if their union covers the time interval [0, 1].

Theorem 2. Given a polygonal domain P with n vertices and moving entities q and r, we can test trajectory visibility in O(n log n) time.

4.3.2 An Ω(n log n) lower bound. If P is a polygonal domain with Ω(n) holes, this result is tight. Suppose we are given a set of n numbers A and we want to test if A = B for a given arbitrary sorted set B = {x1, x2, . . . xn}. Ben-Or [31] shows that this problem has an Ω(n log n) lower bound in the algebraic decision tree model. This leads to the following reduction (illustrated in Figure 4.5): we construct a set of n horizontal edges whose y-coordinates are 0 and whose x-coordinates are {(xi + ε, xi+1 − ε) | i ∈ [1, n − 1]} where ε is smaller than half of the minimal difference between two consecutive numbers in B; a value for ε can be found in 4.3. ALGORITHMS FOR TESTING VISIBILITY 79 linear time since B is sorted. For each of the n numbers x ∈ A we construct a horizontal edge with y-coordinate 1 and x-coordinate from x − ε to x + ε. The entity q walks from the point (1, −1) to (n, −1) and entity r walks from (1, 2) to (n, 2). Suppose the number xj from B is not in A, then q can see r at the x-coordinate xj. Note that this construction also extends to the case where one of the two entities is stationary: consider the cone between the stationary entity q and a horizontal line segment trajectory r. We can transform the set B into a set of n horizontal edges that cut in the cone between q and r. Each segment modelling a number a ∈ A gets stretched appropriately such that it intersects a ray from q to r.

A B 1 2 3 4 5 6 7

Figure 4.5: An illustration of the reduction when A = {1, 2, 3, 6} and B = [1, 3, 4, 6]. The points can see each other when they reach x = 4.

Theorem 3. There exists a polygonal domain P with n vertices, and entities q and r moving inside P for which testing trajectory visibility requires Ω(n log n) time.

4.3.3 A linear-time algorithm When P is a simple polygon and q, r ⊂ P , any segment between q and r that is contained in P is a geodesic shortest path in P between a point on q and a point on r. Guibas and Hershberger [112] define, for any two segments q and r in a simple polygon P , the hourglass H(q, r) to be the union of all shortest paths between points on q and r. The hourglass H(q, r) is a subset of P and bounded by the segments q and r and by two shortest paths. The upper chain is the shortest path between the + + upper end points pq pr of q and r, and the lower chain is the shortest path − − between pq and pr (refer to Figure 4.6). We use the names “upper” and “lower” since they intuitively correspond to our figures. If q and r are not vertical but their endpoints are in convex position, we rotate the plane until one of them is vertical. If the endpoints of q and r do not lie in convex position, the two chains share an endpoint, which is a simpler case. We define the visibility glass L(q, r) as the (possibly empty) union of line segments between q and r that are contained in P . Notice that L(q, r) is a subset of H(q, r). 80 CHAPTER 4. TRAJECTORY VISIBILITY

Figure 4.6: The hourglass between two line segments in a simple polygon.

Figure 4.7: The visibility glass between two line segments in a simple polygon and the bitangents which define it.

Observation 3. For any two segments q, r ⊂ P either L(q, r) is empty or there exist segments q0 ⊂ q, r0 ⊂ r such that L(q, r) = H(q0, r0). Moreover, q0 and r0 are bounded by two bitangents on the shortest paths between the endpoints of q and r.

Proof. Suppose that the interior of the upper and lower chains of H(q, r) intersect. Then the visibility glass L(q, r) is either a single segment or empty and thus we can either find two points q0 and r0 on q and r whose line segment forms H(q0, r0) = L(q, r) or L(q, r) is empty.

If the interior of the upper and lower chains are disjoint then these chains are + − semi-convex [112]. Consider the path from pq to pr , this path has one edge (u, v) connecting the upper and lower chain. This is the unique edge for which the path makes a clockwise turn at u and a counterclockwise turn at v or vice versa. Consider the corresponding edge that we obtain from the path between − + 0 0 pq and pr . The extension of these edges bounds q and r [69]. 4.3. ALGORITHMS FOR TESTING VISIBILITY 81

+ − Figure 4.8: The shortest path between pq and pr with the points at which the path switches from the upper to the lower chain. These define one of the two bitangents.

Chazelle and Guibas [69] note that (the supporting lines of) all line segments in L(q, r) can be dualized into a convex polygon of linear complexity which we denote by Λ(q, r). If L(q, r) contains vertical lines, it will dualize into a pair of unbounded convex regions. For simplicity of exposition, we assume this is not the case; our results can be adapted to deal with degeneracies. A shortest path between two points in P can be computed in linear time [113]. Finding the bitangents also takes linear time. It follows that we can compute L(q, r) and its dual Λ(q, r) in linear time. Suppose that we are given two entities q and r contained in a simple polygon P . Recall that the line g(t) through q and r traces a quadratic segment γ in the dual.

Observation 4. Entities q and r are mutually visible at time t if γ(t) lies in Λ(q, r).

Proof. The entities can see one another at time t if and only if g(t) ∈ L(q, r).

We can derive γ in constant time, construct Λ(q, r) in linear time, and we can check if a quadratic curve intersects a convex polygon in linear time. Thus we conclude:

Theorem 4. Given a simple polygon P with n vertices and two entities q and r moving linearly inside P , we can test trajectory visibility in Θ(n) time. 82 CHAPTER 4. TRAJECTORY VISIBILITY

4.4 Semi-algebraic range searching

We must now make a diversion to discuss an important technique which we will use extensively throughout the remainder of this chapter, the semi-algebraic range searching (or linearization) technique of Agarwal et al. [15]. Broadly, this technique allows complicated geometric intersection queries to be computed using simple half-space range searching, at the cost of an increase in dimension.

Let X be a set of n geometric objects in Rd, where each object is parametrized by a vector ~x. If X is a set of points, then the most natural parametrization is a vector of its coordinates. But X could be a more complicated algebraic object such as a line, in that case the most natural parametrization is a two-dimensional vector containing its slope and offset.

We denote by Γ the family of geometric regions (called semi-algebraic ranges) in Rd where each region G ∈ Γ is bound by an algebraic curve γ which is parametrized by a vector ~a. Two examples of such a family is the set of all disks and the set of all disks of radius 1. An arbitrary disk can be represented as a vector in many ways: three non-colinear points define a unique circle so one could represent a circle as a six-dimensional vector which specifies the coordinates of these points. However the larger the representation vector, the more complicated the linearization process becomes. The most efficient representation of a circle is by a three-dimensional vector specifying its center and radius. The family of disks of radius 1 has fewer degrees of freedom than the family of all disks, and thus their representation can be more efficiently represented (e.g. as a 2-dimensional vector specifying only its center).

Given X and Γ, we are interested in preprocessing X such that for any range G ∈ Γ, we can report which objects of X intersect G. To accomplish this, we first want to derive what we have dubbed a predicate function F (~x,~a) ≤ 0. The predicate function F takes any instance of X (parametrized by ~x) and any instance of Γ (parametrized by ~a) and outputs a real number. The object intersects the range if and only if the output value is lesser then or equal to zero. At this point we present an example: let X be a set of two-dimensional points parametrized by ~x = (x1, x2) and Γ be the set of arbitrary disks, each parametrized by ~a = (a1, a2, r). The disk parametrized by ~a has center (a1, a2) and radius r. Any point (x1, x2) is contained in this disk if and only if F (~x,~a) = 2 2 2 (a1 − x1) + (a2 − x2) − r ≤ 0 and thus we have found our predicate function. The predicate function F (~x,~a) could be seen as a map from the parameter space of our intersection problem to the boolean space {0, 1} and thus F partitions our parameter space into areas where the answer is yes or areas where the answer is no. The idea behind semi-algebraic range searching is that we search for an intersection in this parameter space, as opposed to searching in the space where our problem lives. However, the border of these areas do not have to be particularly nice. In our example each of the yes areas is bound by a quadratic surface. This is where linearization comes in. We transform the parameter space 4.4. SEMI-ALGEBRAIC RANGE SEARCHING 83

y2 −1 0 3

~x → (f1(~x), f2(~x)) y1 −2 → (−2, 4) −1 → (−1, 1) 0 → (0, 0)

1 → (1, 1) {~y | ((1 − 4) − 2y1 + y2 ≤ 0}

Figure 4.9: Consider the family Γ of 1-dimensional unit disks. We can parametrize their border by supplying a 1-dimensional center point and a radius ~a = (a1, a2). Any 1-dimensional point ~x = (x1) is contained in a disk G ∈ Γ if and only if 2 2 2 2 2 (a1−x1) −a2 ≤ 0 ⇒ [a1−a2]+[−2a1](x1)+(x1) ≤ 0. If we linearize this predicate 2 2 2 function, we get that (g0, g1, g2) = (a1 − a2, −2a1, 1) and (f1, f2) = (x1, x1). It follows that any intersection query between a disk G and a set of points X can be answered using a halfspace emptiness query. In the figure we show an example for the points (−2, −1, 0, 1) and the disk G with center 1 and radius 2.

through a polynomial map into a k-dimensional space where the boundary of the yes spaces becomes a linear-complexity surface. From there on, we can solve our problem using halfspace range searching. Specifically, we rewrite the Pk function F (~x,~a) to the form: F (~x,~a) = g0(~a) + i=1 gi(~a)fi(~x) where fi and gi are polynomials dependent only on ~x and ~a respectively. In our example we had: 2 2 2 F (~x,~a) = (a1 − x1) + (a2 − x2) − r ≤ 0. To linearize this function we need to 2 2 2 2 2 first expand the squares: F (~a,~x) = a1 − 2a1x1 + x1 + a2 − 2a2x2 + x2 − r . This immediately gives a straight-forward linearization of seven terms. However, we 2 can reduce the number of terms by grouping variables and writing: F (~x,~a) = [a1+ 2 2 2 2 a2 − r ] + [−2a1](x1) + [−2a2](x2) + [1](x1 + x2) and obtain a linearization where: 2 2 2 2 2 (g0, g1, g2, g3) = (a1 + a2 − r , −2a1, −2a2, 1) and (f1, f2, f3) = (x1, x2, x1 + x2). We get the term g0 (which is not attached to any polynomial dependent on x) “for free” and this thus becomes a three-dimensional linearization.

Agarwal et al. prove that you can map any d-dimensional point ~x to the k- dimensional point f(~x) = (f1(~x), f2(~x), . . . fk(~x)), and any query range to the n k Pk o k-dimensional halfspace G(~a) = ~y ∈ R | g0(~a) + i gi(~a)yi ≤ 0 and that ~x intersects G if and only if f(~x) is contained in G(~a). Consider our example. The map f that we found, is the well-known paraboloid projection: We map all the two-dimensional points onto a three-dimensional paraboloid and they are contained in a circle G ∈ G if and only if the points lie within a halfplane cutting the paraboloid. Refer to Figure 4.9 for an even shorter example. 84 CHAPTER 4. TRAJECTORY VISIBILITY

4.4.1 Intersecting line segments with a quadratic curve segment It may be tempting, having introduced such a powerful, general purpose tool, to employ it immediately to our problem without any further insight or techniques. While this is certainly possible and will result in sublinear query times, the predicates required do not easily linearize and the resulting increase in dimension is substantial. This means that the running time using this technique without further insight is only a very marginal improvement, and hides very large constants in the big O notation. We will illustrate this by working through using linearization to solve our problem directly. Let E be a set of n line segments. In this section we describe how to preprocess E such that for an arbitrary degree 2 curve segment γ we can test if γ intersects an edge of E in sublinear time. More specifically, we establish the following result:

Lemma 8. Let E be a set of n line segments. In O(n log n) time, we can construct a linear size data structure that can test if an arbitrary degree 2 query 1− 1 curve segment γ intersects a segment in E (and if so report it) in O(n 15000 ) time. As an object set we are given a set E of arbitrary edges with no restrictions. We say that an edge e ∈ E goes from the point (x1, x2) to the point (x1 +x3, x2 +x4) and thus parametrize each edge as a vector ~x = (x1, x2, x3, x4). As a query family, we are given the family Γ of degree 2 curve segments which are parametrized according to Equation 4.1 with a parametrization vector ~a = (a1 . . . a8). The goal is to design a predicate function F (~x,~a) that takes any pair of edge and query curve and that outputs a number. The query curve should intersect the edge if and only if the output number is less than zero. Specifically, we design four predicate functions F1,F2,F3,F4 and the query curve intersects the edge if and only if all four predicates output a number less than zero. This can be checked by using four consecutive halfspace range queries. We detect for an intersection between an arbitrary γ in three steps: 1. First we rotate and shear the plane, using the parameter vector ~x, such that the edge e starts at (0, 0) and ends at (0, k(x3, x4)k). We now know, that γ can only intersect the edge at a point where its x-coordinate is 0. 2. We algebraically compute the at most 2 times t, t0, where the x-coordinate of the full curve supporting γ is zero and we check if 0 ≤ t, t0 ≤ 1. This gives the first two predicate functions F1,F2. If this is not the case for either t, then we know that γ cannot intersect the edge e. 3. As a third step, we fill in the times t, t0 where the first coordinate of the curve supporting γ is zero and we check if the second coordinate lies between 0 and k(x3, x4)k. If this is not the case, then the curve goes over 4.4. SEMI-ALGEBRAIC RANGE SEARCHING 85

or under the line segment. This gives the last two predicate functions F3 and F4.

Having computed the predicates F1,F2,F3,F4, we need to linearize them. And we linearize them using approximately 15000 terms. We want to briefly mention why this number is so high. In its general form, the predicate has twelve unrestricted variables x1 . . . x4 and a1 . . . a8. One can imagine, that twelve unrestricted variables lead to a formula with at least twelve linear terms. At step 2, we compute the time t for which the first coordinate of the degree 2 curve is zero. To do this, we need to make use of the quadratic equation and that will give us a square root containing both variables from ~x and ~a. If we want to linearize, we need to get rid of the square root. That means squaring both sides and expanding the resulting equation which implies quadratically many more linear terms. At step 3, a similar thing happens and we again have to resolve a square root. We started with 12 terms and we squared twice, which would naively give us 20, 000 terms.

A proof of a more restricted case that linearizes to R3. We first want to show that the approach that we suggested above can be an efficient way to compute the intersection between algebraic objects. Consider the set E of arbitrary line segments, and the family of regions Γ0 where each range G ∈ Γ0 is a line segment (hence, a more restricted degree 2 curve). Agarwal notes [4] that the intersection between an edge e ∈ E and segment G ∈ Γ can be written as a predicate with three linear terms. Most likely, Agarwal uses the cross product between the segments for this linearization and this approach does not generalize to curves. We show that with our approach, you also get a linearization of three terms.

We assume that each edge e in E is from the point (x1, x2) to (x1+x3, x2+x4) and 0 that each query G in Γ is a segment from the point (a1, a2) to (a1 + a3, a2 + a4). The first step is to translate and shear the plane. First we translate the plane with the vector −(x1, x2) such that e starts at the origin. Then we shear the plane with the following map: (x, y) 7→ (x − x3 y, y) such that e points upwards. x4 If we apply our translation and shearing, we get a new segment G0(t) whose parametrization on t is:

 x  G0(t) = R(x , x ) · (G(t) − (x , x )) = 3 4 1 2 y

 x3  (a1 − x1 + ta3) − (a2 − x2 + ta4) = x4 a2 − x2 + ta4

The first thing that we do, is that we compute the time t∗ for which the x-coordinate of this transformed query segment G0(t) is zero. 86 CHAPTER 4. TRAJECTORY VISIBILITY

x = 0 ⇒ ∗ x3 ∗ 0 = (a1 − x1 + t a3) − (a2 − x2 + t a4) ⇒ x4 ∗ x3 x3 −t (a3 − a4) = a1 − x1 − (a2 − x2) ⇒ x4 x4 x3 a1 − x1 − (a2 − x2) t∗ = x4 x3 −a3 + a4 x4

The first predicate verifies whether or not t∗ is greater or equal than 0:

0 ≤ t∗ ⇒ x3 0 ≤ F1(~x,~a) = a1 − x1 − (a2 − x2) x4 x3 x3 0 = (a1) + (1)[ x2 − x1] + (a2)[− ] x4 x4

Hence we have a predicate F1 where:

A0(~a) = a1,A1(~a) = 1,A2(~a) = a2 x3 x3 f1(~x) = x2 − x1 f2(~x) = − x4 x4

The second predicate verifies whether or not t∗ is lesser or equal to 1:

t∗ ≤ 1 ⇒ x3 x3 a1 − x1 − (a2 − x2) ≤ −a3 + a4 x4 x4 x3 x3 a1 − x1 − (a2 − x2) + a3 − a4 ≤ 0 x4 x4 x3 x3 0 ≥ F2(~x,~a) = (a1 + a3) + (1)[ x2 − x1] + (a2 + a4)[− ] x4 x4

Hence we have a predicate F2 where:

A0(~a) = a1 + a3,A1(~a) = 1,A2(~a) = a2 + a4 x3 x3 f1(~x) = x2 − x1, f2(~x) = − x4 x4 4.4. SEMI-ALGEBRAIC RANGE SEARCHING 87

The third predicate, verifies whether not the y value of G0(t∗) lies below p 2 2 x3 + x4:

q ∗ 2 2 yG0(t∗) = a2 − x2 + t a4 ≤ x3 + x4 ⇒

x3 a1 − x1 − (a2 − x2) q a − x + a x4 ≤ x2 + x2 2 2 4 x3 3 4 −a3 + a4 x4

x3 We multiply both sides by −a3 + a4 and obtain: x4

  x3 x3 (−a2a3) + (a2a4) + (a3)[x2] + (a4)[− x2] + (a4a1) x4 x4   x3 x3 +(a4)[ x2 − x1] − (a2a4) x4 x4 q q 2 2 x3 2 2 ≤ (−a3)[ x3 + x4] + (a4)[ x3 + x4] x4

Therefore

q q 2 2 x3 2 2 (a1a4 − a2a3) + (a3)[x2 − x3 + x4] + (a4)[− (x2 + x3 + x4) − x1] ≤ 0 x4

Hence we have a predicate F3 where:

A0(~a) = a1a4 − a2a3,A1(~a) = a3,A2(~a) = a4 q x3 2 2 f1(~x) = , f2(~x) = x2 − x3 + x4, x4 q x3 2 2 f3(~x) = − (x2 + x3 + x4) − x1 x4

The predicate F4 checks if the y-value is above 0. It is the same predicate as F3 p 2 2 with the terms dependent on x3 + x4. It follows that we can verify if a query segment intersects an edge in E using four predicates which all have at most 3 terms.

Proof of Lemma 8. Lastly, we provide the complete proof for Lemma 8. Denote by γ any segment parametrized by a vector ~a according to Equation 4.1. The algebraic formulations will get very long, so for succinctness we denote Ai = ai − ai−4. The formulation for γ dependent on t now becomes: 88 CHAPTER 4. TRAJECTORY VISIBILITY

 x   A6+A8t  γ(t) = = A5+A7t y a(t)(a1 + a3t) − a2 − a4t

Removing brackets and applying the translation gives:

A6+A8t !   − x1 x A5+A7t γ(t) = = 2 T A6a1+A8a1t+A6a3t+A8a3t y − a2 − a4t − x2 A5+A7t We then apply the shearing and obtain γ(t)0 just as in the previous subsection:

 2    A6+A8t x3 A6a1+A8a1t+A6a3t+A8a3t ! − x1 − − a2 − a4t − x2 0 x A5+A7t x4 A5+A7t γ(t) = = 2 y A6a1+A8a1t+A6a3t+A8a3t − a2 − a4t − x2 A5+A7t

Verifying that 0 ≤ t∗ ≤ 1 The first thing that we do, is to verify the time t∗ for which x = 0. We check if this value lies below 1 and this gives our first predicate F1. The formulation for F2 follows from F1. Below we equate x to 1 and multiply each side of the equality with (A5 + A7t). Then we apply the quadratic equation to find the time(s) t∗ for which the x-coordinate is zero.

A6 + A8t − (x1 + a2 + x2 + a4t)(A5 + A7t) x3 2 − (A6a1 + A8a1t + A6a3t + A8a3t ) = 0 x4

Therefore, quadratic equation:

2 x3 [t ](−a4A7 − a3A8 )+ x4 x3 [t](−a4A5 − a2A7 + A8 − A7x1 − A7x2 − (a3A6 + a1A8) )+ x4 x3 [1](a2A5 + A6 − A5x1 − A5x2 − a1A6 ) x4

We want to demand that t∗ ≤ 1. We first do this using just the variables a, b, c from the quadratic equation:

√ −b + b2 − 4ac 1 ≥ ⇒ 2c p 2c ≥ −b + b2 − 4ac ⇒ 4c2 + 4bc + b2 ≥ b2 − 4ac ⇒ (c2 − bc + ac) ≥ 0 4.4. SEMI-ALGEBRAIC RANGE SEARCHING 89

Now we work out what the terms in this equation actually are:

Working out c2:  2 2 x3 c = a2A5 + A6 − A5x1 − A5x2 − a1A6 = x4   x3 2 (x1 + x2) (2a1A6A5) + [x1 + x2](−2a2A5 − 2A6A5)+ x4 " 2# 2 2 2 2 x3 2 2 [x1 + x2](A5) + [x1x2](2A5) + (a1A6)+ x4   x3 2 2 2 2 (−2a1a2A6A5 − 2a1A6) + [1](a2A5 + 2a2A6A5 + A6) x4

Working out the ac:   x3 ac = (x1 + x2) (a3A5A8)+ x4

[x1 + x2](a4A5A7)+ " 2# x3 (a1a3A6A8)+ x4   x3 (a1a4A6A7 − a2a3A5A8 − a3A6A8)+ x4

[1](−a2a4A5A7 − a4A6A7)

Working out the bc.   x3 bc = (x1 + x2) (a1A6A7 + a1A5A8 + a3A5A6)+ x4 2 [x1 + x2](a4A5 − A6A7 − A5A8)+ 2 2 [x1 + x2](A5A7)+

[x1x2](2A5A7) " 2# x3 2 2 (a1A6A8 + a3a1A6)+ x4   x3 (a4a1A5A6 + a2a1A6A7 − a2a1A5A8 x4 2 −2a1A6A8 − a3A6 − a2a3A5A6)+ 2 2 [1](A6A8 − a2a4A5 − a4A5A6 − a2A5A7 − a2A6A7 + a2A5A8)

Concatenating these results gives the predicate F1. The predicate F2 is simpler 2 than F1. In our formulation of c , ac, bc we already separated terms based on ~x 90 CHAPTER 4. TRAJECTORY VISIBILITY and terms based on ~a and surprisingly, we end up with only 10 non-separable terms. This means that we can check predicates F1 and F2 with a halfspace 10 emptiness query in R which is not so bad. However, from this point√ on it is ∗ −b± b2−4ac going to get worse: From the quadratic equation we know that t = 2c and with the previous predicates, we can determine if t∗ lies within the relevant interval. The next step is to check if at time t∗, the y-value of the hyperbola is p 2 2 between 0 and C = x3 + x4.

yγ(t)0 ≤ C 2 A6a1 + A8a1t + A6a3t + A8a3t − a2 − a4t − x2 ≤ C A5 + A7t

Grouping variables on t:

2 [t ](A8a3 − a4A7)+

[t](A8a1 + A6a3 − a4A5 − a2A7 − A7x2 − A7C)+

[1](A6a1 − a2A5 − A5x2 − A5C) ≤ 0

At this point, we substitute the result√ of the√ quadratic equation into t. After 2 this substitution we will have the term D = b − 4ac whose variables√ depend on both ~x and ~a. To linearize this function, we have to take D apart and square both sides. We substitute the value for t with the result of the quadratic equation and multiply both sides with 4c2:

√ 2 [b − 2b D + D](A8a3 − a4A7)+ √ [−2bc − 2c D)](A8a1 + A6a3 − a4A5 − a2A7 − A7x2 − A7C)+ 2 [4c ](A6a1 − a2A5 − A5x2 − A5C) ≤ 0

Consider the expression (b2 + c2 − bc + ac). At this point, a conservative estimate for the number of non-separable terms based on ~x and ~a in this equation would be 12 terms. These√ expressions are multiplied with at least 7 non-separable terms. Now we take D to one side (and with it, several polynomials based on both ~x and ~a) and we square both sides. This will lead to around (12 · 7)2 ∗ 2 non-separable terms. Approaching 15, 000 non-separable terms. We ran this computation using algebraic-simplification software and it seems like you cannot group expressions to reduce the total number of terms. 4.5. INTERSECTING A CONVEX POLYGON WITH AN ALGEBRAIC CURVE91

4.5 Intersecting a convex polygon with an alge- braic curve

We now turn our attention to the data structure question: can we preprocess P such that trajectory visibility may be tested efficiently (i.e. in sublinear time) for a pair of query segments q, r. By Observation 4, we can phrase such a query as an intersection between a quadratic curve segment and a convex polygon. Note, however, that both the curve segment and the convex polygon depend on the query segments q and r. As an intermediate step, we study a simplified problem in which the convex polygon is independent of q and r. In particular, the question that we study is: let P 0 be a convex polygon with n edges. Is it possible to preprocess P 0 such that for any quadratic curve segment γ, we can quickly test if γ intersects P 0? We believe this problem to be of independent interest. We answer this question in detail in this section. We will then use this solution as a subroutine in Section 4.6. Recall from the previous section that, given a family of geometric regions Γ where each range G ∈ Γ is parametrized by some vector ~a and a set of geometric objects X, where each object is parametrized by some vector ~x, a predicate function is some F (~x,~a) where ~x ∈ G if and only if F (~x,~a) ≤ 0. Recall if F can Pk be written in the form F (~x,~a) = g0(~a) + i=1 gi(~a)fi(~x) (where fi and gi are polynomials whose variables are only in ~x and ~a, respectively) then we can map X to points in Rk, and maps Γ to halfspaces in Rk, such that we can query the points intersecting G ∈ Γ by querying the points intersecting the corresponding halfspace in Rk. The resulting set of n points in Rk can be stored in a data structure of linear 1− 1 size such that the points in a query halfspace can be counted in O(n k ) time (with high probability) [61]. Testing if a query halfspace is empty can be done in 1− 1 expected O(n k/2 ) time. Data structures with a slightly slower deterministic query time are also known [61]. If we are willing to use (much) more space, faster query times are also possible [61, 147]. The n edges of a convex polygon P 0 are geometric objects. A natural parametriza- 0 tion for an edge e ∈ P is a 4-dimensional vector ~xe specifying its start and end points. A quadratic curve segment γ per definition is an semi-algebraic range parametrized by its own parameters ~aγ . If we want to apply semi-algebraic range searching, we need to design a predicate function F (~xe,~aγ ) that outputs a negative real number whenever the edge e is intersected by γ. It may be tempting to apply semi-algebraic range searching immediately. How- ever, as we saw in the previous section, this can lead to incredibly large lineariza- tion constants. Instead, we employ an approach that we will use throughout this chapter. We make several observations about the ways in which a convex polygon P 0 can be intersected by a degree 2 curve. Many types of intersections 92 CHAPTER 4. TRAJECTORY VISIBILITY

(a) (b) (c)

Figure 4.10: The cases (a), (b) and (c) of intersection between γ and P 0.

∆v

Figure 4.11: The right-subspace bounded by three halfspaces. can be detected using conventional geometric data structures. This leaves us with a more restricted case of an intersection between γ and P 0 that conventional data structures cannot find. Since this is a more restricted problem, it can be linearized to a much lower dimension (R5). Let P 0 be a convex polygon with n edges. Let γ be a quadratic curve segment ending in the points s and z and let Γ denote the unique degree 2 curve Γ ⊃ γ 2 2 given by a1x + a2x + a3xy + a4y + a5y + a6 = 0. We say ~a = (a1, . . . , a6). Observe (Figure 4.10) that if γ intersects P 0, then either (a) an endpoint s or z lies in P 0, or (b) γ cuts off a vertex or (c) γ intersects only a single edge of P 0 twice and has no endpoint in P 0 (we call this dipping). Intersections of type (a) and (c) can be identified with a regular binary search on P 0. An intersection of type (b) is detected using algebraic range searching. Since P 0 is convex, we can test if an endpoint of γ lies inside P 0 in O(log n) time. To detect an intersection of case (c), we store a Dobkin-Kirkpatrick hierarchy [87] of P 0. This takes O(n) space and requires O(n) preprocessing time. Given γ, we detect an intersection of type (c) as follows (see Figure 4.11). Any node v in this decomposition represents a sub-polygon P 00 of P 0 and a triangle 00 ∆v which splits P in a left and right part. Consider the border of the right part R = (e1, e2, . . . , em). Note that if γ does not intersect ∆v, then γ can only 4.5. INTERSECTING A CONVEX POLYGON WITH AN ALGEBRAIC CURVE93 dip an edge in R if it is contained in the union of the halfspaces that lie to the right of the lines supporting: (1) one edge of ∆v and (2) e1 and em (refer to the blue area in the figure). Given the node v we do three constant time checks. First we check if γ intersects ∆v. If not then we check for both the left and right sub-polygon if γ is contained in the specified area. If that is the case for both or neither sub-polygons then γ can never dip an edge of P 0, else we recurse in the appropriate subtree. It follows that we can detect case (c) in O(log n) time.

Lemma 9. We can preprocess a convex polygon P 0 consisting of n edges in O(n) time and using O(n) space, such that for any degree 2 curve segment γ we can detect an intersection of type (a) or (c) in O(log n) time. The curve Γ of which γ is a segment divides the plane into two areas, Γ− := 2 2 + {a1x + a2x + a3xy + a4y + a5y + a6 ≤ 0} and its complement Γ . An edge 0 ((x1, x2), (x3, x4)) of P is intersected by γ with an intersection of type (b) only if one endpoint of the edge lies in Γ− and the other in Γ+. If Γ is a curve with k + 1 ≤ 6 degrees of freedom then the formulation of Γ− and Γ+ is a predicate − + that specifies whenever a point ~x = (x1, x2) lies in Γ or Γ with k linearized terms. Thus we can detect if an edge has two endpoints on opposite sides of Γ with two consecutive halfspace range queries in Rk. We build a three-level data structure where the first two levels are 5-dimensional partition trees [145, 61]. Alternatively, cutting trees [65] can be applied to obtain faster query time at a larger space cost. On top of each node in the second level we build a binary tree on the clockwise ordering (with respect to P 0) of the edges in that node. During query time, we transform the degree 2 curve Γ into two k-dimensional halfspaces g(Γ+) and g(Γ−). With two consecutive halfspace range queries we obtain the collection EΓ(P ) of edges which have one endpoint on each side of 1− 1 Γ in O(n k ) time. Note that the set EΓ(P ) does not have to be a connected set of edges of P (refer to Figure 4.10 (b)). However, the subset of EΓ(P ) that is intersected by the curve segment γ is consecutive in the clockwise ordering 1− 1 of EΓ(P ). The set EΓ(P ) is returned as O(n k ) subtrees {T1,T2...Tm} of the secondary trees. Consider a subtree Ti and the associated binary search tree on its edges. Because of the earlier discussed property, the subset of EΓ(P ) that is intersected by the segment γ must be a consecutive subset of the leaves of Ti. Thus using Ti we can obtain these consecutive leaves in O(log n) time by checking testing if the segment γ lies before or after the point of intersection between Γ and an edge in EΓ(P ). The time and space needed for detecting case (b) dominates the time and space needed for case (a) and (c) and we conclude:

Theorem 5. Let P 0 be a convex polygon with n vertices. In O(n log2 n) time we can build a data structure of size O(n log2 n) with which we can test if an arbitrary degree 2 query curve segment γ with k + 1 ≤ 6 degrees of freedom 0 1− 1 intersects P in O(n k log n) time. 94 CHAPTER 4. TRAJECTORY VISIBILITY

7 3 11 1 5 9 13 q d3 d5 d7 d9 d r 2 4 6 8 10 12 14 d 11 1 d13

q d3 d7 d r d 11 1 d13

Figure 4.12: A triangulated polygon P with the diagonals labelled d1 to d14. There are 14 diagonals between q and r. However, we have pre-stored hourglasses H(d1, d3), H(d3, d7), H(d7, d11) and H(d11, d14). At query time, we only have to concatenate these O(log n) hourglasses to get H(d1, d14).

4.6 A data structure for two entities moving in- side a simple polygon

In this section we build a data structure to answer trajectory visibility queries in the case both the entities q and r move linearly, possibly at different speeds, inside a simple polygon P . Our main approach is the same as in our algorithm from Theorem 4: we obtain the convex polygon Λ(q, r) that is the dual of the visibility glass L(q, r), and test if the curve segment γ tracing the line through q and r in the dual space intersects Λ(q, r). By Observation 4 this allows us to answer trajectory visibility queries. The main challenge is that we cannot afford to construct Λ(q, r) explicitly. Instead, our data structure will allow us to obtain a compact representation of Λ(q, r) that we can query for intersections with γ. To obtain Λ(q, r) we use a variation of two-point shortest-path query data structure of Guibas and Hershberger [112]. Their data structure (Figure 4.12) compactly stores a collection of hourglasses that can be concatenated to obtain a shortest path between two arbitrary points p, p0 ∈ P . All shortest paths, in particular the boundary of the hourglasses, are represented using balanced binary search trees storing the vertices on the path. By reusing shared subtrees these O(n) hourglasses can be stored using only O(n) space. To report the shortest path between two query points their data structure concatenates O(log n) of these hourglasses. The result is again represented by a balanced binary tree. We now present a short overview of our data structure (refer to Figure 4.13). Unlike the Guibas and Hershberger structure, we store the hourglasses explicitly. In particular, the vertices on the boundary of an hourglass are stored in the leaves of a balanced binary search tree. The internal nodes of these trees correspond to semi-convex subchains. Let T denote the collection of all these nodes. Each 4.6. A DATA STRUCTURE FOR TWO ENTITIES MOVING INSIDE A SIMPLE POLYGON95

node v ∈ T stores its subchain Cv in an associated data structure. Specifically, we dualize the supporting-lines of the edges in Cv to points (refer to Figure 4.14). Two consecutive edges produce two points in the dual, which we again connect into semi-convex polygonal chains. So for every vertex in the sub-chain Cv the associated data structure ∆v actually stores a line-segment; together these segments again form a polygonal chain Ψv. The associated data structure will support intersection queries with a quadratic query segment γ; i.e. it will allow us to report the segments of Ψv intersected by γ. We implement ∆v using the data structure from Theorem 5.

3 Lemma 10. The total size of all chains Ψv over all nodes v is O(n log n).

Proof. The Guibas and Hershberger data structure is essentially a balanced hierarchical subdivision that recursively partitions the polygon into two roughly equal size subpolygons. Every subpolygon has O(log n) diagonals [112], and thus stores at most O(log2 n) hourglasses. Note that we use the simpler version of Guibas and Hersberger’s data structure that achieves only O(log2 n) query time. It follows that all hourglasses of a subpolygon of size m use at most O(m log2 n) space. The height of the balanced hierarchical subdivision is O(log n), and at every level the total size of the subpolygons is O(n). Therefore, the total size of all subpolygons is O(n log n). The lemma follows.

2 For a chain of size m = |Ψv| the data structure ∆v has size O(m log m) and can be built in O(m log2 m) time. It follows that our data structure uses O(n log5 n) space in total, and can be built in O(n log5 n) time.

4.6.1 Querying the data structure When we get a trajectory visibility query with entities q and r we have to test if the curve γ traced by the point dual to the line through q and r intersects Λ(q, r). By Observation 3 the primal representation L(q, r) of Λ(q, r) is an hourglass H(q0, r0). We now argue that: (i) we can find the subsegments q0 and r0 in O(log n) time, (ii) that our data structure can report O(log2 n) nodes from T that together represent an hourglass H(q0, r0), and (iii) that we can then test if γ intersects Λ(q, r) by using the associated data structures of these reported 3 3 nodes. This will result in O(n 4 log n) query time.

Lemma 11. Given our data structure, we can detect if L(q, r) is empty, or compute the subsegments q0 ⊆ q and r0 ⊆ r such that L(q, r) = H(q0, r0) in O(log n) time.

Proof. By Observation 3 the visibility glass L(q, r) is either empty or the hour- glass H(q0, r0) for two subsegments q0 and r0 and these two subsegments are 96 CHAPTER 4. TRAJECTORY VISIBILITY

Λ(q, r)

γ(t)

(1) (2)

~x1 = (a, b, c, d) f(~x1) f(~x ) ~x2 = (c, d, e, f) 2

~x3 = (e, f, g, h) f(~x4) ~x4 = (g, h, i, j) f(~x3) (3) (4)

Figure 4.13: (1) The base level of our data structure is a hierarchical triangulation. (2) Given q and r, we compute Λ(q, r) and the degree 2 curve segment γ. (3) We store the parameters of each edge of Λ(q, r) (4) Each parameter vector gets mapped to a point in R4 and the query curve segment gets mapped to a 4-dimensional halfspace which is empty only if γ intersects no edge from Λ(q, r). bound by the two bitangents of H(q, r). These bitangents are the extension of two edges, from the shortest paths between the edges of q and r. We explained in the proof of Observation 3 that the hourglass H(q, r) had an upper and lower semi-convex chain which may or may not share a point. The upper and lower chain are both a shortest path between endpoints of q and r. We can obtain them using the data structure D from [112] as a balanced binary search tree and we can verify if they share a point using this tree. If that is the case then L(q, r) is either empty or a single segment and we can verify this using an additional O(log n) time. If the upper and lower chain do not share a point then we want to identify the subsegments q0 and r0 for which L(q, r) = H(q0, r0) and recall that q0 and r0 are bound by the bitangents of H(q, r). Such a bitangent is the extension of an edge (u, v) on the shortest path between two endpoints of q and r. The edge (u, v) is the unique edge on this path for which the path makes a clockwise turn at u and a counterclockwise turn at v or vice versa. Using D we can obtain any path as a balanced binary search tree. We perform a binary search on this tree to identify the edge (u, v) whose endpoints have this unique clockwise ordering.

0 0 0 We use Lemma 11 to find the endpoints q1, q2 of q and r1, r2 of q and r , 4.6. A DATA STRUCTURE FOR TWO ENTITIES MOVING INSIDE A SIMPLE POLYGON97

C4 τ4 τ3 τ2 τ5 τ1

τ1 τ2

τ3 C4

τ5 τ4

Figure 4.14: (top) An hourglass between q and r in orange. The lower chain consists of five chains which coincide with P that are joined by outer tangents in dotted lines labelled τ1 . . . τ5. (left) An example of an area bound by the dualized chain C4. (right) A simplified version of Λ(q, r). Note that outer tangents could become vertices of Λ(q, r).

respectively. We can obtain the shortest paths π(r1, q1) and π(r2, q2) bounding L(q, r) = H(q0, r0) by concatenating O(log n) of the pre-stored hourglasses. To concatenate two hourglasses we actually select two contiguous subchains in both hourglasses, and compute two bridge edges connecting them. Such a contiguous subchain can be represented by O(log n) nodes in the binary search trees representing the hourglass boundary. It follows that π(r1, q1) can be represented by O(log2 n) nodes; each representing a pre-stored subchain in the data structure, together with O(log2 n) line-segments (the bridge segments). We now observe that the chains stored in the associated data structures of these nodes together with O(log2 n) line segments Ξ (the duals of the bridge segments) actually represent the dual Λ(q, r) of L(q, r). To check if the quadratic query segment γ intersects Λ(q, r) we check if one of the endpoints of Q lies in Λ(q, r); in this case one of the paths π(r1, q1) or π(r2, q2) is actually a single segment, or if γ intersects the boundary of Λ(q, r). To this end, we query each of these associated data structures. Since γ has 3 3 k + 1 = 5 degrees of freedom (Lemma 7) this takes O(n 4 log n) time. We test for intersection with the segments in Ξ separately. We thus obtain the following result.

Theorem 6. Let P be a simple polygon with n vertices. We can store P in a 98 CHAPTER 4. TRAJECTORY VISIBILITY data structure of size O(n log5 n) that allows us to answer trajectory visibility 3 3 5 queries in O(n 4 log n) time. Building the data structure takes O(n log n) time. 4.7. TWO MOVING ENTITIES CROSSING EDGES IN A POLYGONAL DOMAIN99

4.7 Two moving entities crossing edges in a polyg- onal domain

We investigate the variants where (1) the entities can walk through edges of P , (2) where P is a polygonal domain, and (3) where P is a polygonal domain and the entities can walk through the edges of P , simultaneously. In the previous section we showed that the visibility glass (the collection of straight-line shortest paths) between q and r dualized to a single convex, con- nected area. In all three scenarios in this section this is no longer true and this increases the difficulty of the problem. In this section, we consider the visibility glasses between all edges of P . We devise a generic dualization that dualizes a line segment to a four-dimensional point instead of a line to a two-dimensional point. The information that we gain by not discarding endpoint information can be leveraged to answer all these three more complicated problem variants. Let k be a large, unspecified constant. We prove that it is possible to preprocess a polygonal domain P with n vertices in O(nk) time, such that for any two entities q and r that each traverse a line segment possibly through edges of P , we can determine if there is a moment when q and r are mutually visible in sub-linear time. We prove that it is possible to answer the visibility query using halfspace range queries in Rk among O(n3) points. Using cutting trees [65] we can preprocess the O(n3) k-dimensional points using O(n3k) space and in O(n3k) time such that halfspace range queries (and thus our visibility query) can be answered in O(logk n) time. Our approach almost certainly does not generate an optimal solution with respect to its space requirement and query time. However, it is a non-trivial proof that sub-linear query times are achievable. We obtain these results, by transforming the visibility query in the plane into an intersection query in R4 and then immediately applying semi-algebraic range searching. Given a polygonal domain P of n vertices, we construct two near-identical data structures that test if there is line-of-sight between q and r with either positive or negative slope respectively. Let t be a time when q and r are mutually visible and the line g(t) through q and r has positive slope. The segment between q and r must be contained in a visibility glass L(e1, e2) for two edges e1, e2 ∈ P . Consider the two points p1 and p2 which are the points of intersection between g(t) and the edges e1 and e2. If p1 has a lower y-coordinate than p2, and q has a lower y-coordinate than r, then (because g(t) has a positive slope) we must have that xp1 ≤ xq ≤ xr ≤ xp2 where xpi is the x-coordinate of pi.

Let e1 and e2 be two edges of P and denote their visibility glass by L(e1, e2). We say that e1 lies on the line y = x1x − x2 and e2 lies on the line y = x3x − x4. 100 CHAPTER 4. TRAJECTORY VISIBILITY

Let ` :: y = ax − b be a line through L(e1, e2) with positive slope and let e1 lie below e2 along `. The x-coordinate of the intersection between ` and the edges b−x2 b−x4 is given by F1(a, b) = and F2(a, b) = . a−x1 a−x3

Observation 5. Suppose we have a segment on ` that starts at the point q and ends at the point r then this segment is contained in L(e1, e2) if and only if F1(a, b) ≤ xq ≤ xr ≤ F2(a, b). This observation leads to the following approach for detecting if there is a line- 2 of-sight between q and r: we construct for each of the n pairs of edges e1, e2, + the two-dimensional area Λ (e1, e2) which is the dualization of all positive slope lines through the visibility glass between e1 and e2. For reasons that will become + + apparent later, we triangulate Λ (e1, e2). Consider any triangle T (e1, e2) of this triangulation. It represents a collection of positive-slope lines that stab through + 4 the visibility glass L(e1, e2). We lift T (e1, e2) to a two-dimensional surface to R + with the map that takes a point (a, b) in T (e1, e2) and that maps it to the point 4 (a, b, F1(a, b),F2(a, b)). This creates a two-dimensional surface in R which has a constant description size. Now consider the following cylinder-like volume in R4: ∗ 4 0 0 + 0 0 T (e1, e2) = {(a, b, c, d) ∈ R | (a, b, c , d ) ∈ T (e1, e2) ∧ c ≤ c ∧ c ≤ d ∧ d ≤ d }. ∗ Any point (a, b, c, d) ∈ T (e1, e2), represents a line segment that lies on the line y = ax − b, whose start point lies below its end point, and whose start and end points lie between e1 and e2. Refer to Figure 4.15 for an example of this transformation in R3. Let q and r be given as two line-segment trajectories that do not intersect (if they do intersect, we can always split the visibility query into constantly many visibility queries). Note that we can split q and r into two sub-segments q0 and r0 where the entity q always has a lower y-coordinate than entity r and where the line through q and r has positive slope. We denote by γ0 the continuous dualization of q0 and r0 according to Equation 4.1. The two-dimensional curve segment γ0 can be mapped to a curve segment in R4 with a mapping that is very similar to our earlier transformation. Each point (a, b) ∈ γ0 represents a segment following the line y = ax − b between q and r where q must lie below r. We map the point (a, b) to the point (a, b, c, d) where c and d are the x-coordinates of intersections of the line y = ax − b with the trajectories of q and r respectively. Coincidentally, this means that we are mapping a point (a, b) to (a, b, xq, xr). If 0 ∗ γ intersects T (e1, e2) then at the time if intersection, the two entities realise a line segment that lies within the visibility glass L(e1, e2) and it follows that the entities are mutually visible. Both the volume T ∗ and the query segment γ0 can be parametrized with a constant-length parameter, so the predicate that tests their intersection can be linearized to a constant k number of terms. It follows that we can create a data structure that stores O(n3) of these volumes (one for each triangle in both the positive and negative visibility glasses), each represented by a point in Rk. Specifically, we build a cutting tree on these O(n3) points in O(n3k) time. A query supplied as two segments q and r, can be cut into constantly many pairs of segments where for each pair of segments either q 4.7. TWO MOVING ENTITIES CROSSING EDGES IN A POLYGONAL DOMAIN101

+ Λ(e1, e2) T (e1, e2)

+ 3 ∗ 3 T (e1, e2) ⊂ R T (e1, e2) ⊂ R

Figure 4.15: Unfortunately we cannot draw figures in four dimensions. So we illustrate the mapping from a visibility glass to a three-dimensional volume instead using the map F1(a, b) based on only the edge e1 and not the edge e2. is above r or vice versa. For each pair of segments, we derive its corresponding k-dimensional halfspace in O(k) time and we query the cutting tree in O(logk n) time to see if the halfspace if empty. There is no time when the two entities are mutually visible if and only if each of these queries reports an empty halfspace. Thus we conclude.

We transform P into O(n3) volumes in R4 which all have a constant-length description. Any query pair q and r gets mapped to a four-dimensional curve segment γ0 with constant description length. It follows that the predicate function specifying whether the segment γ0 intersects one of these volumes has constant description length and hence has a linearization into k terms for some large constant k. There is a time when q and r are mutually visible if and only if there is an intersection. We conclude:

Theorem 7. Let P be a polygonal domain with n vertices. We can store P in a data structure of size O(n3k), for some sufficiently large constant k, that allows us to answer trajectory visibility queries in O(logk n) time. Building the data structure takes O(n3k) time. 102 CHAPTER 4. TRAJECTORY VISIBILITY

4.8 A data structure for queries with one mov- ing entity

Finally, we consider the special case where q is a point and r is a segment. Clearly, our results for the more general scenario apply to this case as well. However, we can achieve better results using a different approach. We consider three variants of this setting: (i) P is a simple polygon and r is contained in P , (ii) P is a simple polygon but the trajectory of r may intersect edges of P , and (iii) P is a polygonal domain and r may intersect edges of P .

Figure 4.16: Three times a query pair (q, r) in a simple polygon. In the middle case, the paths π1, π2 share their first line segment but there still is a point on r which is visible from q.

4.8.1 Entity r is contained in a simple polygon When P is a simple polygon and r is contained in P we build the shortest path data structure from Guibas and Hershberger [112] on the simple polygon P . At query time, we use this data structure to obtain the two shortest paths between the endpoints of r and the entity q. We show that we can inspect these paths in constant time each, to detect if there is a time when q and r are mutually visible.

Consider the shortest paths π1, π2 from q to the end points of r. Observe that if edges of π1 and π2 coincide, they coincide in a connected chain from q [112]. 4.8. A DATA STRUCTURE FOR QUERIES WITH ONE MOVING ENTITY103

Moreover (Figure 4.16) if more than one line segment of π1 and π2 coincide, then any shortest path from q to a point on r cannot be a single line segment. If no edges of π1 and π2 coincide then there is at least one point on r, whose shortest path to q is a line segment. If exactly one line segment of π1 coincides with a segment of π2, then that segment must be connected to q and if there is a line-of-sight between q and r, it has to follow that line segment. This observation allows us to answer a visibility query by considering only the first three vertices of π1 and π2. These vertices can be found in O(log n) time using the two-point shortest path data structure of Guibas and Hershberger [112]. We conclude:

Theorem 8. Let P be a simple polygon with n vertices. We can store P in a data structure of size O(n) that allows us to answer trajectory visibility queries between a static and a linearly moving entity in O(log n) time. Building the data structure takes O(n) time. When P is a simple polygon but the trajectory of r may intersect edges of P the problem becomes significantly more difficult. Let Vq denote the visibility polygon of point q: the set of all points visible from q. There is a time at which q can see r if and only if the trajectory of r intersects (the boundary of) Vq. We build a data structure to find such a point (if one exists). To this end we extend a result of Aronov et al. [25]. They developed an O(n2) size data structure that can be built in O(n2 log n) time and can report the visibility polygon of an arbitrary 2 query point q ∈ P in O(log n) time. The visibility polygon Vq is returned in its combinatorial representation, that is, as a (pointer to a) balanced binary search tree, storing the vertices of Vq in order along the boundary. This combinatorial representation does not explicitly store the locations of all vertices of Vq. Instead, a vertex v of Vq may be represented by a pair (e, w), indicating that v is the intersection point of polygon edge e and the line through vertex w ∈ P and the query point q. Computing the explicit location of all vertices of Vq thus takes O(|Vq|) time by traversing the tree. We extend the above results using semi-algebraic range searching so that we can efficiently test if a line segment intersects Vq without spending the O(|Vq|) time to compute the explicit locations. Suppose that the entity r cuts off a vertex v of Vq represented by a pair (e, w) then the vertex v and the entity q must lie on opposite sides of the line through r. This observation leads to a predicate function F ((e, w), (q, r)) that outputs a negative real number if and only if the entity q and the vertex v lie on opposite sides of the line through r. We prove several properties of Vq and its representation in the Aronov et al. data structure which allows us to express this predicate function with as few variables as possible. The result is that we can transform the question of whether 8 r intersects Vq into a halfspace emptiness query in R . We therefore obtain the following result. When P is a polygonal domain and r may intersect edges of P we essentially use the same ideas as in the previous case. In this case, the base data structure 4 2 to obtain the visibility polygon Vq uses O(n ) rather than O(n ) space. 104 CHAPTER 4. TRAJECTORY VISIBILITY

4.8.2 Entity r can cross a simple polygon If r is able to move through edges of P then its trajectory may intersect the boundary of P linearly often. Inspecting each of the resulting subsegments explicitly would thus require at least Ω(n) time. Hence, we use a different approach. Let Vq denote the visibility polygon of point q: the set of all points visible from q. There is a time at which q can see r if and only if the trajectory of r intersects (the boundary of) Vq. We build a data structure to find such a point (if one exists).

Aronov et al. [25] actually developed an O(n2) size data structure that can be built in O(n2 log n) time and can report the visibility polygon of an arbitrary 2 query point q ∈ P in O(log n) time. The visibility polygon Vq is returned in its combinatorial representation, that is, as a (pointer to a) balanced binary search tree, storing the vertices of Vq in order along the boundary. It is important to note that this combinatorial representation does not explicitly store the locations of all vertices of Vq. Instead, a vertex v of Vq may be represented by a pair (e, w), indicating that v is the intersection point of polygon edge e and the line through vertex w ∈ P and the query point q. Computing the explicit location of all vertices of Vq thus takes O(|Vq|) time, if so desired, by traversing the tree. We now extend the results of Aronov et al. in such a way that we can efficiently test if a line segment intersects Vq without spending the O(|Vq|) time to compute the explicit locations.

We briefly review the results of Aronov et al. first. They build a balanced hierarchical decomposition of P [69]. Each node v in the balanced hierarchical decomposition represents a subpolygon Pv of P (the root corresponds to P itself) and a diagonal of Pv that splits Pv into two roughly equal size subpolygons P` and Pr. For subpolygon P` the data structure stores a planar subdivision S` (of the area outside P`) such that for all points in a cell of S` the part of the visibility polygon inside P` has the same combinatorial representation. Moreover, for each cell it stores the corresponding combinatorial representation. These representations can be stored compactly by traversing S` while maintaining the (representation of the) visibility polygon in P` in a partially persistent red black tree [25]. The data structure stores an analogous subdivision for Pr. The complete visibility polygon of q can be obtained by concatenating O(log n) subchains of these pre-stored combinatorial representations (one from every level of the hierarchical decomposition).

We use the same approach as Aronov et al. [25], but we use a different rep- resentation of Vq (refer to Figure 4.17). Our representation will be a weight balanced binary search tree (BB[α]-tree [154]) whose leaves store the vertices of Vq in order along the boundary. An internal node of this tree corresponds to a subchain of vertices along Vq, which is stored in an associated data structure. We distinguish two types of vertices in such a chain: fixed vertices, for which we know the exact location, and variate vertices, which are represented by an polygon-edge, polygon-vertex pair (e, w). We store the fixed vertices in a linear 4.8. A DATA STRUCTURE FOR QUERIES WITH ONE MOVING ENTITY105

b c a

pb

a b c 8 pc p p R a pb c

Figure 4.17: (left) A simple polygon split in O(n2) cells. For each cell, there exists a red-black tree that represents a visibility polygon. (middle) Given Vq and r, r could intersect the explicit Vq depending on the location of q. We shoot two rays from q to r and find their intersection with Vq in the red-black tree. That gives us three leaves highlighted in orange. (right top) All the fixed edges in this node are stored in a ray-shooting data structure, (right bottom) all the variate edges have 8-dimensional points that are stored in an 8-dimensional partition tree. size dynamic data structure that supports halfspace emptiness queries, that is, a dynamic convex hull data structure [46]. This data structure uses O(m) space, and supports O(log m) time updates and queries, where m is the number of stored points. The variate vertices are mapped to a point in R8 using a function f that is independent of q. We give the precise definition later. We store the resulting points in a dynamic data structure that can answer halfspace empti- ness queries [3]. This data structure uses O(m log m) space, answers queries in 3 + 2 O(m 4 ) time and supports updates in O(log m) time, where m is the number 2 of points stored. It follows that our representation of Vq uses O(n log n) space, and supports updates in amortized O(log3 n) time. Since all nodes in the data structure have constant in-degree we can make it partially persistent at the cost of O(log3 n) space per update [89]. It follows we 2 3 can represent the visibility polygons for all cells in S` in O(n log n) space.

Querying

Given a query q, r we test if the segment r intersects Vq. The main idea is to query our data structure for the part of Vq in the wedge defined by q and r. We then extend r into a line ρ, and test if this line separates a vertex of Vq in this wedge from q. The segment r intersects Vq, and thus there is a time at which r is visible from q, if and only if this is the case.

We can obtain the part of Vq that lies in the wedge defined by q and r, represented by O(log2 n) BB[α]-tree nodes. For each of these nodes we query the associated data structures to test if the halfspace ρ¬q not containing q is empty. We can 106 CHAPTER 4. TRAJECTORY VISIBILITY directly query the data structure storing the fixed vertices with ρ¬q. To test if there is a variate vertex that lies in ρ¬q we map it to a halfspace in R8 using a function g.

Lemma 12. There are functions f and g such that f maps each variate vertex (e, w) to a point f(e, w) ∈ R8 and g maps each ρ¬q to a halfspace g(ρ¬q) in R8 such that f(e, w) ∈ g(ρ¬q) if and only if the location of the variate vertex (e, w) ¬q in Vq lies in ρ .

r ρ qw `e

Figure 4.18: (left) Two points in orange and blue, which have the same implicit visibility polygon, however there could be several placement of r such that r only intersects one of the two explicit visibility polygons. (middle) Entity r in green, q in orange and the chain of uncertain edges. (right) An illustration of the geometric argument. We compute the intersection between `e and qv and check if that point lies below or above ρ.

Proof. Let q = (a3, a4), and let ρ = {x, y | 0 = a1x − a2 − y} be the supporting line of the trajectory of r. We describe the construction for the case that q lies below ρ and ρ is non-vertical. The other cases can be handled analogously.

Refer to Figure 4.18 (right) for an illustration of the proof. For each variate vertex (e, w) in a chain we know that the line qw intersects the line `e supporting e on the domain of e (this property is guaranteed since (e, w) is a vertex of Vq). Moreover, it is guaranteed that the intersection point between qv and ρ lies on the trajectory of r. It follows that q can see r if and only if, the intersection point (x, y) between qw and `e lies above ρ. Given ρ, q, w and `e, we can algebraically compute this intersection point (x, y). We then substitute the equation for (x, y) into the equation for ρ and the point (x, y) lies above this line if and only if the result is greater than 0:

  x4 − a4 x4 − a4 wq := x, y | 0 = x − x3 + x4 x3 − a3 x3 − a3

The lines wq and `e intersect at the point where their y-coordinate is equal and therefore: 4.8. A DATA STRUCTURE FOR QUERIES WITH ONE MOVING ENTITY107

x4 − a4 x4 − a4 x1x − x2 = x − x3 + x4 x3 − a3 x3 − a3 (x3 − a3)(x1x − x2) = (x4 − a4)x − (x4 − a4)x3 + (x3 − a3)x4

(x3 − a3)x1x − (x4 − a4)x = x2(x3 − a3) − (x4 − a4)x3 + (x3 − a3)x4

From this equation we can extract the coordinates of the intersection point (x, y) between wq and `e:

x (x − a ) − (x − a )x + (x − a )x x = 2 3 3 4 4 3 3 3 4 (x3 − a3)x1 − (x4 − a4) x2(x3 − a3) − (x4 − a4)x3 + (x3 − a3)x4 y = x1 − x2 (x3 − a3)x1 − (x4 − a4)

Lastly we substitute the algebraic expression for (x, y) into the formula for ρ and we linearize the predicate:

0 ≥ a1(x2(x3 − a3) − (x4 − a4)x3 + (x3 − a3)x4)−

x2 − x1(x2(x3 − a3) − (x4 − a4)x3 + (x3 − a3)x4) + x2

0 ≥ [−a1a3](x2) + [a3](x1x2) + [a1](x2x3) + [a1a4](x3)+

[−a4](x1x3) + [−a1a3](x4) + [a3](x1x4) + [−1](x1x2x3)

Thus we found a predicate F (~x,~a) with:

(f1, f2, f3, f4, f5, f6, f6, f8) = (x2, x1x2, x2x3, x3, x1x3, x4, x1x4, x1x2x3)

(g0, g1, g2, g3, g4, g5, g6, g7, g8) = (0, −a1a3, a3, a1, a1a4, −a4, −a1a3, a3, −1)

It follows that we can map every variate vertex to a point in R8 using the f-maps provided by the predicate. Any query consisting of the halfplane ρ¬q defined by ρ and q gets mapped to a halfspace in R8. The halfplane ρ¬q contains the variate vertex defined by q, w, and e if and only if its representative point lies in this halfspace.

This theorem now immediately follows:

Theorem 9. Let P be a simple polygon with n vertices. We can store P in a data structure of size O(n2 log3 n) that allows us to answer trajectory visibility 3 + queries between a static and a linearly moving entity that may cross P in O(n 4 ) time. Building the data structure takes O(n2 log3 n) time. 108 CHAPTER 4. TRAJECTORY VISIBILITY

4.8.3 Polygonal domains In case P is a polygonal domain we use a similar approach; we build a subdivision S in which all points in a cell have a visibility polygon Vq with the same combinatorial structure, and then traverse S while maintaining Vq in a partially persistent data structure. To obtain S we simply we simply take all O(n2) lines defined by pairs of polygon vertices. The subdivision S is the arrangement of these lines and has O(n4) complexity. We obtain a traversal of S by computing an Euler tour of a spanning tree of the dual of S. We conclude

Theorem 10. Let P be a polygonal domain with n vertices. We can store P in a data structure of size O(n4 log3 n) that allows us to answer trajectory visibility queries between a static and a linearly moving entity that may cross P 3 + 4 3 in O(n 4 ) time. Building the data structure takes O(n log n) time. 4.9. CONCLUSIONS 109

4.9 Conclusions

We have introduced the trajectory visibility problem, which combines two active areas of research in computational geometry, trajectories and visibility. We have given algorithms for this problem in a polygonal domain and in a simple polygon, and given matching lower bounds. We have given an efficient data structure for trajectory visibility in a simple polygon, and a different data structure for trajectories in a polygonal domain which achieves sublinear query time but has impractically high coefficients. Notably neither of these data structures is event-based, a necessary restriction to achieve sublinear query time for this problem. We also provide alternative data structures for the special case where one entity remains stationary which give better query times and use less space. This is especially notable in the polygonal domain case. As a subproblem we describe how to preprocess a convex polygon such that intersections with a query degree 2 curve can be efficiently detected. We believe this problem is of independent interest and potentially has other applications. As this is an early treatment of this problem (indeed, we believe the first) there are many questions left open for further work. An immediate question is can our data structures be made more efficient, and can lower bounds be proven on the space/time trade-off of any data structure for this problem? This is especially pertinent for the polygonal domain case. Our data structures are quite complicated and have many interlocking pieces which make implementation difficult. Is there an alternative data structure for any version of this problem which is easier to implement and has good running time on experimental data? In this work we consider the most fundamental visibility question: is there ever a time at which the entities can see each other? This question has been very effective in revealing interesting features of the problem, but there are many other questions that could be asked. For example: at what time do the entities first see each other? At which time intervals can the entities see each other? For how long can the entities see each other, either in total or the longest contiguous interval? The applications discussed in the introduction to this chapter could provide any number of further questions. 110 CHAPTER 4. TRAJECTORY VISIBILITY Chapter 5

Uncertain Spanners

We now return to a similar notion of uncertainty as found in Chapter 3. In that chapter we considered points whose location is unknown to us and discussed universal solutions - solutions for entire classes of instances which, regardless of where exactly the unknown points are located, give an approximation of the optimal solution on those points. In this chapter we consider uncertain points whose location is given by a probabil- ity distribution. Now instead of worst-case analysis we can discuss probabilities, and so instead of a solution which is a good solution on all possible realizations of the uncertain points we must give a solution which is good with high probability. We introduce the uncertain spanner problem, which is an illustrative example of this model. In this problem one is given a set of locationally uncertain points P with their probability distributions and a target spanning ratio τ; the problem is to define a graph G(P,E) with probability distributions from P as vertices, such that when a set of deterministic points P is sampled from P the resulting induced graph G(P,E) is a τ-spanner with high probability. We provide a general algorithm for constructing uncertain spanners using cone- based spanners (Yao graphs, Θ-graphs or similar) and a matching lower bound for any cone based-technique.

111 112 CHAPTER 5. UNCERTAIN SPANNERS

Figure 5.1: The Yao graph for m = 8 on a set of points. The cones of one of the points are shown in green.

5.1 Background

A geometric t-spanner is a graph embedded in Euclidean space such that the graph distance dG(u, v) between two vertices u and v is not more than t times the Euclidean distance d(u, v). Naturally the complete graph on a set of points is always a 1-spanner, however geometric t-spanners on n points for constant t > 1 can be constructed with only O(n) edges. The construction and properties of different types of geometric spanners is a rich topic with a detailed history; in Chapter 2 we give a summary of some of the key results from that history. A geometric spanner can be thought of a distance-preserving sparse subgraph of the complete graph on a set of points. Hence spanners are frequently used in computational geometry as a pre-processing step to reduce the size of a problem at the cost of approximating distances. They are also useful in their own right in designing communication networks or compressing graph data structures. The Yao graph is a type of cone-based spanner introduced by, and named after, Andrew Yao [193]. This graph is constructed for some fixed constant m by considering each point p in turn; the plane is partitioned into m equal angle cones centered at p and an edge is added between p and the closest point in each cone; once this has been done for every point the graph is returned. Intuitively one can imagine this process as adding an edge between each point and its nearest neighbour in each direction. See Figure 5.1 for an example of this construction. Notice that since each point can add at most m new edges, the total number of edges is no more than nm, which is O(n) so long as m is a constant. Yao graphs have many interesting properties, and variants such as Θ-graphs [76, 131], Yao-Yao graphs [107] and semi-Yao graphs [165]. For our purposes it is sufficient 1 to say that for m ≥ 6 the Yao graph is a spanner with spanning ratio 1 + O( m ). This chapter introduces the notion of a geometric spanner for locationally uncertain points and provides a construction based on the Yao graph. In our 5.1. BACKGROUND 113 construction an edge is added between each uncertain point and a number of points which are likely to be the nearest point in each cone. Connecting to multiple points in each cone is a similar concept to the k-semi Yao graphs used by Rahmati et al. [165] in a different context. Their construction does not deal with uncertainty or probability and instead seeks to maintain the k-nearest neighbours graph of moving points. 114 CHAPTER 5. UNCERTAIN SPANNERS

b a c x x b a c

Figure 5.2: A spanner on a set of locations which does not induce an uncertain spanner on the locations’ points.

5.2 Problem Definition

An instance of the uncertain spanner problem consists of a target spanning ratio τ, an acceptable failure rate ρ and a set of n uncertain points P, where each point can appear in k possible locations in R2. We will denote this set of nk possible locations L. For simplicity of exposition we will assume the probability 1 of a particular point realizing in a particular location is uniform (i.e. k ) although this restriction can be relaxed. The uncertain spanner problem is to produce a graph G(P,E) with vertices in P and a small number of edges, such that when a set of deterministic points P is sampled from P the probability that G(P,E), the induced graph embedded in R2, is a τ-spanner is at least 1 − ρ. We provide an efficient algorithm to compute an uncertain spanner with O(k2n log n) edges based on the Yao graph, and a lower bound which shows that no cone-based uncertain spanner can have fewer than Ω(nk log n) edges. This is not the same problem as computing a spanner G0 on L and adding an edge between (u, v) in G if at least one pair of their locations share an edge in G0. Consider the situation in Figure 5.2. In this instance there are four points, each of which can appear in two locations with equal probability. Point a appears at (0, ) or (0, −), b at (−2, ) or (−2, −), c at (2, ) or (2, −) and x at (−1, 0) or (1, 0). The graph shown in the figure has a spanning ratio of 1+0 when considered over L; however the induced uncertain spanner G({a, b, c, x}, {(a, x), (b, x), (c, x)}) has spanning ratio of almost 2 on any realization of the points. This is because the point x cannot simultaneously be part of the spanning path between a and b and the path between a and c. It may be tempting to try and repair this argument by building robust [43] or fault-tolerant [79] spanners on L and considering locations where no point has realized to have failed, in the language of those models. Unfortunately this is also not possible in our model. Firstly the failures are not statistically independent, since a point appearing in one location guarantees it will not appear elsewhere, and independence is a requirement for probabilistic faults. Secondly in our model exactly n(k − 1) locations will not receive points, this is a catastrophically high amount of failures for robustness to deal with. The problem of building a spanner on points which can probabilistically move rather than fail is not covered by existing approaches and needs specialized methods. 5.3. ALGORITHM 115

5.3 Algorithm

We begin by computing the value of some constants we will need to use in our algorithm. Firstly m, the number of cones in our Yao-style construction, which is a function of τ, the target spanning ratio. Let m ≥ 6 be the smallest integer such that:

1 2π 2π ≤ τ cos( m ) − sin( m )

That is, the smallest number m such that a Yao graph with m cones has a spanning ratio of at most τ [193]. Notice m does not depend on either k or n; since we consider τ a constant, m is also a (small) constant. For example m = 22 is large enough for τ = 1.5. Recent research, for example by Barba et al. [28], gives tighter bounds on the spanning ratio using fewer cones. Such results can be used here to achieve smaller values of m but are not necessary for our arguments. Similarly we will fix φ, the amount of redundancy built into our graph to ensure mn a failure chance of no more than ρ. Let φ = k log( ρ ) = O(k log n). The reason for this choice will become clear in proving correctness, for now it can be treated as a fixed parameter. For each location ` ∈ L, divide the plane into m equal-angled cones, centered at `. These cones partition L \ ` into m disjoint sets S`,1,...,S`,m. Sort each S`,i according to Euclidean distance from ` and create an edge between ` and the 0 closest φ locations in each S`,i. This results in a graph G on L with at most nkmφ edges, which is a supergraph of the Yao graph on L. We now lift G0 to give a graph G on P. For each edge (u, v) in G0 we add an edge between the distribution that includes u and the distribution that includes v, unless this edge already exists. Thus we have a graph G on P with no more than nkmφ edges. We will argue when a set of points P is sampled from P the probability that the graph induced by G is a supergraph of the Yao graph on P , and hence a τ-spanner on P , is at least 1 − ρ.

Lemma 13. The probability that the random graph G(P,E) is a supergraph of the Yao graph with m cones on P is at least 1 − ρ.

Proof. Consider each location ` ∈ P , and each cone C radiating from `. If there is an edge in G between ` and the closest point in P to ` in C, for every point ` and every cone C, then G contains as a subgraph the Yao graph on P .

Label the cones C1,...,Cnm. For some cone Ci with vertex `i let ei be the event that none of the φ possible locations closest to `i are realized by P , i.e. every one of the φ points with which `i shares an edge because of Ci appears in a different location in P . If ei does not occur then the closest point in Ci ∩ P to 116 CHAPTER 5. UNCERTAIN SPANNERS

p

Figure 5.3: Connecting a point to the nearest two points in each of its cones

`i is one of the φ locations that `i shares an edge with; hence `i is connected to the closest point in Ci.

Therefore if none of the nm events ei occur then each point shares an edge with the closest point in each of its cones and therefore G(P,E) is a supergraph of the Yao graph. It remains to prove that the probability of this happening is Wnm high, that is Pr( i=1 ei) ≤ ρ.

In order for ei to occur, each of the φ connected points must not appear in the first φ locations in Ci. In the worst case each of the points has k − 1 other places k−1 φ it can appear and hence Pr(ei) ≤ ( k ) ; the locations of two different points are statistically independent so multiplying the probabilities like this is justified.

The events ei are certainly not mutually independent, but we can use the Boole’s inequality to upper bound their union.

mn ! mn φ _ X k − 1 Pr e ≤ Pr (e ) = mn i i k i=1 i=1

mn 1 k 1 Now recall that φ = k log( ρ ), and that (1 − k ) ≤ e .

nm ! φ log( mn ) log( mn ) _ k − 1  1  ρ 1 ρ Pr e ≤ mn = n (1 − )k ≤ n = ρ i k k e i=1

Hence the probability that no ei occurs, and thus the probability that G(P,E) is a supergraph of the Yao graph, is at least 1 − ρ and the lemma follows. 5.3. ALGORITHM 117

c a c

a b

`

Figure 5.4: A cone with points from P in red and locations from L \ P in gray. The point ` has edges to the 3 closest locations a, b and c. Points a and b have realized elsewhere but c has realized in the expected location, so ` has an edge to the closest point in P in the cone.

If G(P,E) is a supergraph of the Yao graph with m cones on P , then there exists a spanning path in G of ratio at most τ between every pair of points in P and hence G is a τ-spanner. Thus the theorem follows.

Theorem 11. Given n uncertain points which each may appear uniformly in 2 n one of k locations a graph G can be constructed with O(nk log( ρ )) edges in O(n2k2) time, such that the probability that G is a τ-spanner is at least 1 − ρ.

Proof. For each location in L we consider a constant number of cones. In each cone we can find the φ closest location in nk time and then add edges to each location closer than it in another nk time. Lifting the edges to be between uncertain points in P can be done in time proportional to the number of edges - O(nk2 log n). Hence the running time follows.

5.3.1 Uncertain Spanners in Higher Dimensions

While we have thus far have only considered uncertain points in R2 our algorithm can be be adapted to any (constant) higher dimension d without difficulty. Instead of partitioning R2 into 2-dimensional cones centered at `, the algorithm must instead partition Rd into d-dimensional cones centered at `, sort the locations in each cone according to distance from ` and proceed as in the 2-dimensional case. The drawback of this method is the number of cones required to achieve a span- ning ratio of τ is exponential in d, so as d increases the graph constructed rapidly approaches the complete graph. This is, however, a fundamental consequence of the curse of dimensionality and spanners on deterministic points suffer the same problem. 118 CHAPTER 5. UNCERTAIN SPANNERS

5.3.2 Other Cone-Based Spanners Notice that there is nothing in our construction which requires that we use the classical Yao graph as our target subgraph. Any other cone-based spanner could be constructed with the same method. Our construction connects each location to the φ closest locations in each cone; but “closest” can be adapted in several different ways. For example by projecting the points in each cone onto some ray and considering the closest φ points in the order they appear along the ray we could instead construct an uncertain spanner based on the Θ-graph. Other, more exotic uncertain cone-based spanners could similarly be constructed. Our choice to base this Chapter on the Yao graph is purely to simplify the exposition. 5.4. A LOWER BOUND 119

p1 p2 p3 ··· pn p1 p2 p3 ··· pn p1 p2 p3 ··· pn ··· Block 1 Block 2 Block k

Figure 5.5: The construction we will use in this lower bound. It can be considered in R or along a line in a higher dimension.

p2 p4 p5 p8

Figure 5.6: Suppose each point has edges to the next two points, i.e. φ = 2. If p5 and p8 realize in this block, but p6 and p7 do not, then p5 will not have an edge to the first point in its forward cone.

5.4 A Lower Bound

mn We have shown in the previous section that φ = k log( ρ ) edges per cone are enough to guarantee G(P,E) is a supergraph of the Yao graph with high probability. In this section we will prove this is tight, and argue that adding any smaller number of edges will result in a graph which is with high probability not a supergraph of the Yao graph. Afterwards we will discuss how this argument can be generalized to other constructions.

We will consider n points, each with k possible locations in R. Label the points {p1, . . . , pn}; point pi has locations {i, n+i, 2n+i . . . (k −1)n+i}, i.e. L consists of k blocks of n locations at integer values, one for each point in lexicographical order. As all the locations are along a line the choice of m does not matter; only the cone going forwards along the line and the cone going backwards will be nonempty; we will consider m = 2 but any larger choice will have the same result. For simplicity we will only concern ourselves with ensuring each point has an edge to the closest point in the forward cone, the backwards cone we will consider always satisfied. Instead of sampling a location for each point we will use an alternative but equivalent random process to sample a realization of points. First the number of points that appear in each block will be sampled from the appropriate distribution. Then, starting with the block which receives the fewest points, points will be sampled without replacement to give each block the correct number of points. Consider B, the block with the smallest number of points. By the pigeonhole n n principle it has at most k points; we will assume it has exactly k points, since fewer points only benefits our argument. As this is the first block to sample the n identities of its points, we may consider these k points to be chosen uniformly at random from {p1, . . . , pn} without replacement. 120 CHAPTER 5. UNCERTAIN SPANNERS

By construction each point has edges to the next φ points in order, i.e. pi has edges to pi+1, pp+2, . . . , pp+φ. If there is a gap greater than φ between adjacent points in B then the point on the left of that gap will not have an edge to the first point in its forward cone, and hence the graph will not be a spanner. We will show that M, the size of the maximum gap between points in B, is Ω(k log n) with high probability. If this is true then φ must be at least Ω(k log n) to have high probability that G(P,E) is supergraph of the Yao graph.

n We will now reword this question to make it easier to deal with. Given k points, sampled uniformly at random without replacement from {1, 2, . . . , n} what can we say about the length of the longest gap M between two adjacent points? This question will occupy the remainder of this section. Before we can complete our argument we must first establish two results we will need to use: a description of the largest gap between two adjacent points when n points are sampled uniformly from the real interval [0, 1] and a description of how many coupons we must sample uniformly with replacement before we have 1 seen k of the total number of coupons.

5.4.1 Order Statistics and Spacings Values sampled from some common oracle and then indexed in ascending order are known in the literature as order statistics, the distance between adjacent order statistics are called spacings and the largest spacing is known as the maximum order spacing. Order statistics and spacings have been extensively studied in the statistics literature, see the introductory book by Arnold et al. [24] and Pyke’s seminal survey [164] respectively. Spacings of points sampled from the continuous uniform distribution on [0, 1] are well understood in the literature. Order statistics and spacings of discrete points sampled without replacement are also described in the literature, but we are not aware of any closed-form expression which can be used to analyse the distribution of the maximum order spacing of discrete points sampled without replacement or any research which addresses this problem directly. Therefore we will use a continuous model to simulate the discrete model and hence to lower bound the probability that M = Ω(k log n). We will need to use a result introduced by Levy [139] and refined by Devroye [83]:

Theorem 12. Let Gn be the maximum order spacing of n points sampled uniformly from [0, 1]. Then, as n → ∞,

E(nGn − log n) → γ where γ ≈ 0.58 is the Euler–Mascheroni constant, and

π2 Var(nG ) → n 6 5.4. A LOWER BOUND 121

As a corollary, for large n we may use Chebyshev’s inequality to show that nGn = Ω(log n) with high probability.

5.4.2 Fractional Coupon Collecting As a subproblem we define the fractional coupon collector’s problem and compute its expectation and variance. This section adapts well-known arguments about the classical coupon collector’s problem, which appear in Feller’s book [97].

1 Definition 3 (The k -fractional coupon collector’s problem). Given an urn containing n coupons, and a fixed parameter k, how many coupons must be drawn n uniformly with replacement from the urn before at least k distinct coupons have been drawn? We will call this random variable T and compute its expectation and variance. Intuitively, since discovering new coupons is very likely at the beginning and n gets progressively less likely, we should expect that collecting the first k coupons has a much lower expectation and variance than collecting all n.

Lemma 14. k n E(T ) = n log( ) and Var(T ) = Ω( ) k − 1 k

Proof. Let pi be the probability of drawing a new coupon after i − 1 distinct n−i+1 coupons have already been drawn, clearly pi = n . Let ti be the number of coupons drawn between the (i − 1)th new coupon and the ith new coupon. n P k We observe that i=1 ti = T and the ti are statistically independent. We first consider the expectation of T .

n n Xk Xk E(T ) = E( ti) = E(ti) i=1 i=1

The number of draws before a new coupon is drawn is a geometric distribution, 1 and therefore E(ti) = , hence: pi

n n n Xk 1 Xk n Xk 1 E(T ) = = = n p n − i + 1 n − i + 1 i=1 i i=1 i=1 n k = n(Hn − Hn− n ) ≈ n(log n − log(n − )) = n log( ) k k k − 1

Pn 1 Where Hn = i=1 i is the nth harmonic number, it is well-known (e.g. see [77]) that Hn ≈ log n for large n. 122 CHAPTER 5. UNCERTAIN SPANNERS

The variance can be similarly computed on each ti, again since it is a geometric 1−pi distribution we have Var(ti) = 2 . pi

n n k k X X 1 − pi Var(T ) = Var(t ) = i p2 i=1 i=1 i n ! n Xk 1 − n−i+1 Xk  i − 1  = n2 n = n n − i + 1 n − (i − 1) i=1 i=1

n   P k i−1 i=1 n−(i−1) may be approximated using the Euler-Maclaurin formula as n R k x 1 k−1 x 0 n−x dx = k−1 + log( k ). As the higher derivatives of n−x fall off quickly, this approximation via Euler-Maclaurin is justified.

n Therefore Var(T ) = Ω( k ).

Again we have upper bound the variance so we may apply Chebyshev’s inequality k to show T = Ω(n log( k−1 )) with high probability.

5.4.3 Simulating the Discrete Problem with the Continu- ous We may now combine the results of the previous two subsections to prove that M = Ω(k log n) with high probability, and hence our lower bound on φ.

n Recall M is the maximum order spacing when sampling k points from {1, . . . , n} without replacement. We introduce a different random process which we will use to lower bound M. In this process we initialize Q = ∅ and h = 0 and sample points uniformly from the real interval [0, n]. As we sample each point q we calculate its ceiling dqe. If dqe ∈ Q then we add q to Q, else we add dqe to n Q and increase h by one. When h = k the process terminates. At the end n of choosing points Q contains k points chosen uniformly without replacement from {1, . . . , n} and a random number of additional points. All the points in Q were sampled uniformly from [0, n] and then increased by at most 1. Hence the spacings between the points in Q are within 1 of the spacings of points sampled uniformly from [0, n], or within 1 of n times the spacings of points sampled uniformly from [0, 1].

n How many samples are required to get k distinct integer ceilings is an instance 1 of the k -fractional coupon collector’s problem described in Subsection 5.4.2. k Applying the bounds we derived there, we know that |Q| = Θ(n log( k−1 )) with high probability.

k Now applying the results from Subsection 5.4.1. If n log( k−1 ) points are sampled 5.4. A LOWER BOUND 123 uniformly from [0, 1] with high probability:

 k    k  n log Gn log k = log n log k − 1 ( k−1 ) k − 1

k log(n) + log log( k−1 ) nG k = n log( k−1 ) k log( k−1 )

k The additive factor log log( k−1 ) as a small constant which can be absorbed k into the asymptotic notation. Recall log(1 + x) < x and hence log( k−1 ) = 1 1 log(1 + k−1 ) < k−1 therefore with high probability:

nG k > (k − 1) log n n log( k−1 )

But this is exactly n times the maximum spacing in Q (i.e. M) and hence the theorem follows.

Theorem 13. When building a uncertain Yao graph G(P,E) as described in Section 5.3, if φ = o(k log n) then with high probability G(P,E) is not a super graph of the Yao graph on P .

5.4.4 Extensions The previous section lower bounds φ under the assumptions that each point has edges to the first φ points in in cone. In this section we look at some of these assumptions and how they may be relaxed. Firstly the assumption that we connect to the closest φ locations in each cone rather than some other locations further along the cone. Replacing an edge with an edge further away will only decrease the chance of having an edge to the closest point in that cone. So any cone-based uncertain spanner should always connect to the closest locations in each cone. Secondly the assumption that each cone gets exactly φ edges, rather than dis- tributing them unevenly. Since in our lower bound the cones are indistinguishable this would have to be done arbitrarily. Imagine we remove an edge from one cone and add it to another. The probability that a cone has an edge to the nearest point is exponential, there are diminishing returns in including more edges in a cone. Hence distributing the edges evenly will always give a higher probability of every cone succeeding. Finally there is a difference of k between our upper and lower bounds. The lower bound can be modified to introduce another factor of k. Instead of each block containing {p1, . . . , pn} in order, we have each block contain the same points in a random order. Now each point has k different cones to connect to, and since 124 CHAPTER 5. UNCERTAIN SPANNERS the large gap could appear in any block with equal probability any cone based approach must connect to the next φ locations in each cone. Because the points in each block are randomly ordered, there is a very low probability that the next φ locations contains the same points in each block. The bound follows. 5.5. CONCLUSION 125

5.5 Conclusion

We have defined the uncertain spanner problem and given a graph with O(nk2 log n) edges which is a supergraph of the Yao graph (and hence a τ-spanner) with high probability. This is, to our knowledge, the first result on the problem of building a spanner in the locationally uncertain points model. We have argued that our results can be adapted to use any cone-based spanner, not only the Yao graph, and that our number of edges is asymptotically optimal for any cone-based approach. An immediate problem for future work would be to prove a general O(n log n) lower bound on the number of edges in any uncertain spanner. It seems intuitively clear that no uncertain spanner exists with O(n) edges, but extensions to our arguments are required to prove this rigorously. It would be interesting to investigate how other varieties of spanners may be extended to work with locationally uncertain points; especially the greedy spanner. It is not immediately clear how the Delaunay triangulation could be meaningfully extended to uncertain data, but it is worth considering. Finally, a similar approach to constructing an uncertain well-separated pair decomposition seems very initially very attractive. Given a set P of locationally uncertain points, an uncertain well-separated pair decomposition is a set of pairs (Ai,Bi) such that for all p and q ∈ P with high probability there is some j such that (Aj,Bj) is well-separated and p ∈ Aj and q ∈ Bj. Intuitively it seems unlikely that an uncertain well-separated pair decomposition with o(n2) pairs could be constructed, since the probability of any pair being well-separated can be exponentially small in its size. However it would be worthwhile to investigate this further and see what constructions are possible. 126 CHAPTER 5. UNCERTAIN SPANNERS Chapter 6

Maintaining the Ply of Unknown Trajectories

6.1 Introduction

In this chapter we present an alternative formulation of uncertainty as a kind of online problem. In this model a problem instance has deterministic data but that data is only available to the algorithm through some specified queries, which have an associated cost. The algorithm must then decide how many, and which, queries to make in order to get enough knowledge about the underlying data to make a decision. In this chapter we will describe this model more formally. We then introduce a new problem in this model, the online ply maintenance problem, and provide an online algorithm, a matching lower bound on the competitive ratio and a proof that perfect play is NP-hard, even with full information. We conclude with some comments about future work in this model.

6.1.1 Minimizing Ply We must here make a brief detour to discuss the ply of a set of regions, and its connection to uncertainty in computational geometry. Given a set of regions d d R = {R1,...,Rn} in R , the local ply of a point p ∈ R in R is defined δp = |{Ri | p ∈ Ri}|, that is the number of regions containing that point. The ply of R is δ = max d δ the maximum local ply of any point. p∈R p In the context of computational geometry on imprecise points, or uncertain points with bounded support, the ply of their uncertainty regions gives a measure of how much is unknown about the geometry of the set of points. If each point is known to appear somewhere in a disk, and the set of disks is pairwise disjoint (i.e. ply 0), then the relative positions of the points are reasonably well known and some geometric results can be obtained. For example Poureidi and Farshi [163]

127 128CHAPTER 6. MAINTAINING THE PLY OF UNKNOWN TRAJECTORIES obtain a linear size well-separated pair decomposition (and thus a linear size uncertain spanner) in this setting. Conversely, if a set of points are only known to appear inside [0, 1]2 (i.e. ply n) then almost nothing can be inferred about their geometry. To continue the example, any pair of points could form the closest pair and hence any well-separated pair decomposition must have size Ω(n2). If the ply can be upper bounded by some constant ∆ then many algorithms become possible, often with running times depending on ∆. For example, Buchin et al. [50] show how a data structure can be built on a set of disks of ply ∆, such that given a point in each disk a Delaunay triangulation of those points can be found in O(n log ∆). Since it is a global maximum, the ply is a fairly rough measure of the uncertainty of a set of imprecise points. There have been recent attempts to define and compute more useful and precise measures, such as the ambiguity used by Van der Hoog et al. [178], which is closely related to the graph entropy of the intersection graph. However, because of its simplicity the ply remains a useful consideration. To conclude, the ply of a set of uncertainty regions is a very useful, if rough, measure of their uncertainty, and has implications for many further computations that might be needed on those points. This motivates our focus on maintaining acceptable ply for the remainder of this chapter.

6.1.2 Problem Statement We define our problem as a game played by a player and by an adversary over d continuous, unbounded time. Let e1, . . . , en be a set of entities moving in R where each ei is associated with a fixed but unknown to the player trajectory d fi : R≥0 → R with a maximum speed of 1 per unit of time. The initial location fi(0) of each trajectory is known to the player. The adversary meanwhile has full knowledge of each fi and has unlimited computational capabilities.

At any time t the player may query an entity ei to (instantaneously) learn its exact location fi(t). Let τ be the last time at which ei was queried, or 0 if ei has never been queried. At time t we can infer ei must be located in a closed ball of radius t − τ centered at fi(τ). Call this ball Bi(t) the uncertainty region of ei at time t. We can now formulate the online ply maintenance problem. Given a ply bound ∆, a starting location fi(0) and a location oracle fi for each entity, the online ply maintenance problem is to determine which points the player should query at which times, such that the ply of the uncertainty regions never exceeds ∆ and the total number of queries is minimized. For simplicity we require that all problem instances are feasible; that is there is never a time when more than ∆ of the entities are present at the same point. This is an online problem; the player is able to use information gained from past queries to decide which queries to make next. There is also no finishing time for 6.1. INTRODUCTION 129 the problem, the player must provide a strategy that works for any time and the number of queries must be minimized at all times. Since a large number of queries may be required regardless of strategy, the player’s performance is evaluated against the adversary, who plays a copy of the same game except with full knowledge of the fi. The adversary must still contend with growing uncertainty regions and must make queries to maintain low ply, but knows the result of every query beforehand and may plan accordingly. Define cost(t) as the number of queries required by an algorithm for the player by time t and cost∗(t) the number of queries required by the adversary. We say the algorithm has competitive ratio α if cost(t) ≤ α · cost∗(t) for all t > 0. For simplicity of exposition, we will allow the ply to exceed ∆ instantaneously, so long as queries are made at the same instant which return the ply to ∆ or less. This allows queries to be made at events in time rather than  time before them, and removes the difficulty of dealing with open sets. We also observe that these are the only kinds of queries made by any efficient algorithm. We observe that an entity should never be queried unless it is currently con- tributing to a point of ply greater than ∆. Any set of queries that results in ply at most ∆ can be modified by delaying every query to an entity until that entity’s uncertainty region contributes to a point of ply at least ∆ + 1, as long as such a delay causes no violation of the ply bound before the query, since this will make the subsequent uncertainty region smaller.

Observation 6. Any query made on an entity at a time when it is not con- tributing to the ply can be replaced with a query on the same entity at a later time when it is contributing to the ply, without requiring that any additional queries are made. As a corollary, any efficient algorithm will wait until the ply instantaneously exceeds ∆ and make a sequence of queries to reduce the ply, then wait until the ply exceeds ∆ again. Thus the continuous problem is best understood as a series of events and the queries which are made at each event, which is how we will discuss the problem from now on. It is worth noting however that two different algorithms, or an algorithm and the adversary’s algorithm, will not necessarily have the same events, or events at the same times.

6.1.3 Prior Work This problem statement is directly inspired by Evans et al. [94], who study a problem with a similar definition, but quite different structure. Their problem has the same set up - a set of entities with unknown trajectories and bounded speed, a player who must make queries against an adversary who has full knowledge of the trajectories, and a focus on the ply of the uncertainty regions. In Evans et al.’s problem, queries have no cost but each query takes unit of time to process, and no other queries can be made during this time. Instead 130CHAPTER 6. MAINTAINING THE PLY OF UNKNOWN TRAJECTORIES of minimizing the number of queries, the ply must be minimized using one query per unit of time. Unfortunately it is not feasible to maintain optimal ply at all times, hence Evans et al. must fix a target time τ and minimize the ply at that time, once or repeating every τ time interval. Evans et al. give an O(1)-approximation for large τ, an approximation for small τ, and matching lower bounds. They also show the adversary’s problem is NP-hard. Busto et al. [54] give an extension to Evans et al.’s result, and also consider minimizing the ply (and the maximum degree and chromatic number of the intersection graph of the uncertainty regions) using one query per time step. Since Evans et al. demonstrated it is not possible to be competitive at every time step, Busto et al. instead consider minimizing the maximum over a given time interval. In this model they provide an algorithm which is competitive to within a constant factor. We consider our result a natural pair to that of Evans et al. and Busto et al.. We are able to guarantee a maximum ply (and thus bound the performance of any subsequent ply-dependent algorithm), and significantly our guarantee holds at any (continuous) point in time, not just at fixed time intervals or considered over an interval. On the other hand we may use many more queries to do so, especially given the lower bounds in Section 6.4.2. In some situations the lower bounds in our model can result in a brute-force “query everything” approach which may be unacceptable if queries are expensive. We believe both versions of the problem have significant unexplored usefulness as a first step in a wide range of geometric problems in this model of uncertainty. Acknowledgement must also be made to the PhD thesis of Kahan [125], also presented as a conference paper [124]. Kahan introduces a model called data in motion which is similar to our own, although it is aimed at a different application. Like us, Kahan considers entities moving along unknown, bounded speed trajectories, whose location may be queried at a cost (Kahan calls these queries “updates”). However, while we are interested in maintaining an invariant (i.e. bounded ply) at minimum cost, Kahan considers the computational time and number of “updates” required to exactly compute some geometric property (e.g. the closest pair) of the entities at various points in time (Kahan calls these computations “queries”). Kahan also provides a notion of competitive ratio, which he calls “luckiness”. 6.2. OPTIMAL PLAY IS NP-HARD 131

Figure 6.1: Some pennies arranged along a grid and the resulting penny graph. Note the slight divergence from the grid, which is sometimes required by Ceri- oli et al. for parity reasons.

6.2 Optimal Play is NP-hard

In this section we will show that the adversary’s problem of deciding which entities to query is NP-hard, even given full knowledge of the trajectories beforehand. We only need to consider the problem of deciding which queries to make at the very first time the ply exceeds ∆ = 1 to see this behaviour. We will perform a reduction from vertex cover, via some theorems about penny graphs. We will prove the following lemma:

Lemma 15. Given a set of n equal-radius disks in the plane, possibly intersecting at tangent points but not overlapping, the problem of determining if it is possible to remove k disks such that the remaining disks are pairwise disjoint is NP- complete. The problem is clearly in NP, given a set of disks to remove, intersections between the remaining disks can be checked in quadratic time. Given a set of regions in the plane, their intersection graph has a vertex for each region, and an edge between two vertices if the regions intersect. A penny graph, sometimes called a unit coin graph, is a graph which can be realized as the intersection graph of a set of unit disks in the plane which pairwise only intersect at tangent points. Penny graphs have several interesting properties - they are planar and each vertex has degree ≤ 6, for example. Cerioli et al. [59] show that vertex cover is NP-hard, even on penny graphs. We must take some care with Cerioli’s result, since our penny graph will be defined in terms of disk centres, not edges and vertices. While a graph representation can be easily computed from the disk centres [180], the converse is not true, and even determining if a given graph is a penny graph is NP-hard [91]. Vertex cover is NP-hard even for planar graphs of degree no more than 3 [103], 132CHAPTER 6. MAINTAINING THE PLY OF UNKNOWN TRAJECTORIES and such graphs can be embedded on a polynomial-sized grid with rectilinear edges [177]. Given such a graph G, embedded onto a grid, Cerioli et al. show how pennies can be laid out on a grid such that solving a vertex cover problem on the resulting penny graph solves vertex cover on G. Hence vertex cover on penny graphs is NP-hard even if the disk centres are known. Choosing which disks to remove such that no disks intersect is exactly the problem of computing a vertex cover of the resulting intersection graph, and hence Lemma 15 follows. Given a instance of the disk-removing problem from Lemma 15 we can construct an instance of ply maintenance, by starting an entity at the center of each disk, and having the entities remain stationary. After one unit of time the disks will have intersected at tangents, and the omniscient player must decide a minimum number of disks to query to remove the intersections. By Lemma 15 even this first decision is NP-hard. The theorem follows.

Theorem 14. Making the minimum number of queries to ensure the ply never exceeds ∆ is NP-hard, even if the exact trajectories are known ahead of time, even for only the first set of required queries. 6.3. AN OPTIMAL ALGORITHM 133

6.3 An Optimal Algorithm

We now describe a simple algorithm which achieves a competitive ratio of ∆ + 1, which we will later show to be optimal for deterministic algorithms. Our algorithm waits until there is a point with ply at least ∆ + 1 and then queries every entity whose uncertainty region contains that point. If there are multiple points with ply at least ∆ + 1, say p and q, they are resolved in sequence; if querying all the entities intersecting p reduces the ply of q to ∆ or less, then it is not necessary to query all the entities intersecting q. We will prove our algorithm has a competitive ratio of ∆ + 1 using a charging argument. We will show for every ∆ + 1 queries made by our algorithm the adversary must make at least 1 query. We will do this by charging every query we make to a query made by the adversary, and ensure we never charge to the same query twice. Consider a point in time when our algorithm has ply Λ > ∆. If the adversary must make any queries at the same time we imagine the adversary makes them first and then our algorithm makes its queries second. This is purely a convenience to ensure our algorithm always charges to an earlier query by the adversary. Consider the first point of ply Λ that our algorithm must resolve. At least Λ − ∆ of the uncertainty regions that contribute to the ply of this point must be strictly smaller for the adversary than they are for our algorithm. Otherwise the point would have ply more than ∆ for the adversary, and since the adversary has already made its queries it cannot have any point of ply more than ∆. We will call this set of queries made by the adversary Q. We charge the Λ queries our algorithm makes to adversary’s queries Q. Note that Λ/(Λ − ∆) ≤ ∆ + 1 and so no query in Q has more than ∆ + 1 queries charged to it. Finally observe any query the adversary makes can only be charged to once. A query can only be charged to by a later query on an entity where the adversary has a strictly larger uncertainty region than our algorithm. When we query an entity its uncertainty region is reset to a single point, hence it will have a smaller or equal uncertainty region to the adversary. Only when the adversary queries that entity again will it be eligible to receive a charge, but then there will be a new query made by the adversary to pay for the charge. Since we can charge every ∆ + 1 of our algorithm’s prior queries to an earlier query made by the adversary, and we have ensured there is no double charging, we can be sure that at no time has our algorithm made more than ∆ + 1 times the number of queries made by the adversary. Since our algorithm trivially always maintains ply, the theorem follows.

Theorem 15. The algorithm which waits until there is a point of ply greater than 134CHAPTER 6. MAINTAINING THE PLY OF UNKNOWN TRAJECTORIES

∆ and then queries every entity which is part of that ply is (∆ + 1)-competitive. The exact running time for this algorithm depends on the underlying computa- tional model, since it involves computing geometric intersections and continuous time. However since our algorithm only needs to identify times when a query must be made, and the entities participating in a ply, and does not require any additional computation, it can reasonably be called the optimal running time for this problem. It is interesting to note here that no particular reference is needed to the geometry of the problem. It is not required that the uncertainty regions are balls, or even that they are geometric regions. The above argument holds for any uncertainty regions which grow in a monotone way. In the context of this thesis ply is most interesting when it is the ply of well-behaved geometric regions, but it would be interesting to consider other applications for this argument. 6.4. LOWER BOUND 135

6.4 Lower Bound

In this section we will construct a lower bound instance showing the competitive ratio of our algorithm from the previous section is optimal for deterministic 1 algorithms, and within a 2 factor of optimal for randomized algorithms. We will start by showing how the lower bound is constructed for ∆ = 1 and then show how the same idea can be extended to ∆ ≥ 1.

6.4.1 Lower Bound for ∆ = 1

Consider an instance in R with two entities, p and q, initially located at −1 and 1. The instance also includes an arbitrary sequence of moves m1, m2,..., where mi ∈ {left, right}. These moves are known ahead of time by the adversary, but not revealed to the player except through entity location queries. Between t = i and t = i + 1 both points move at maximum (unit) speed in the direction mi. Because the points are always distance 2 apart, queries will be needed at least once per unit of time. We will show that in the worst case the player must make exactly two queries per unit of time, while the adversary must only make one. Let us first consider the player. At t = 1 the uncertainty regions will intersect and the player must make a query to resolve this. Recall from Observation 6 that it is never beneficial to query an entity which is not currently part of a violating intersection, so the player will not query either point before t = 1. Without loss of generality imagine both points have moved left. The player has no way to distinguish the two entities and must choose arbitrarily, in the worst case they will choose q. Since q started at 1 and moved left it is now at 0, inside the uncertainty region of p, so the ply remains 2. Hence the player must also query p, which will now be at −2, to reduce the ply to 1. Now each entity is represented by a single point, at distance 2 apart. This is the same situation we began with (shifted sideways) and so the same argument can be repeated at t = 2 and so on. Therefore, in the worst case, the player must make two queries per unit time. We now show the adversary must only make one query per unit of time. Unlike the player, the adversary will not be querying every point every unit of time, so uncertainty regions may grow arbitrarily large. We will say the problem has a neutral state at some time if one entity is represented by a single point and the other uncertainty region is at distance 2, for convenience we will say p is represented by a point at 1 and the uncertainty region for q extends from −1 in the negative direction. Clearly the state at t = 0 is neutral. Unit time after a neutral state the two uncertainty regions will now intersect at 0 and a query must be made. The entity p is now either at 0 or at 2. If it is at 2 then the adversary queries p and reduces it to a single point at 2, since the uncertainty region for q ends at 0, this is a neutral state. If p is at 0 then by construction q must be at −2. The adversary queries q, since the uncertainty region for p is [0, 2] this is a neutral state. Therefore unit of time after a neutral state the adversary can spend one query to return to a neutral state, and so the adversary 136CHAPTER 6. MAINTAINING THE PLY OF UNKNOWN TRAJECTORIES

worst case optimal time

Figure 6.2: In the worst case the player needs two queries per unit of time, while the adversary never makes a wasted query and only ever needs one. need only make one query per unit of time. Together these statements give a lower bound on the competitive ratio of any algorithm of 2, for the case when ∆ = 1. See Figure 6.2 for an illustration. As an extension, we consider allowing the player to make a randomized decision of which entities to query. We use Yao’s principle [192] to instead consider the best expected cost of any deterministic algorithm on a random instance. We define our random instance almost identically to the deterministic instance described above. Instead of the deterministic but arbitrary sequence of left and right moves which define the instance, the randomized instance chooses this sequence randomly, with each move having and equal chance to be left or right. As above, any deterministic algorithm has no information to distinguish the entities and must choose one to query arbitrarily. Half the time it will choose correctly, the other half of the time it will choose incorrect and must immediately make a second query. Therefore on average the deterministic algorithm must make 1.5 queries per unit of time. Yao’s principle shows that this is then a lower bound on the competitive ratio for any randomized algorithm for this problem.

6.4.2 General Lower Bound We will now provide a lower bound for any ∆ ≥ 1, using the ideas developed in the previous section.

In this construction we have ∆ + 1 entities in R. Initially ∆ of those entities are at −1, call this set Q0, and the remaining entity p0 is at 1. Let r0 be an arbitrary entity from Q0 ∪ {p0}. For the first unit of time, every entity in (Q0 ∪ {p0}) − r0 moves towards 0, reaching it at t = 1. If r0 = p0 then p moves to 2 over this time interval, otherwise r0 moves to −2. Either way we now have ∆ entities at 0 and one entity at either −2 or 2; this is either our initial situation shifted sideways, or shifted and mirrored. Either way, the process continues symmetrically with p1 = r0, Q1 = (Q0 ∪ {p0}) − r0 and a new entity chosen from Q1 ∪ {p1} as r1. We will show that the player must make ∆ + 1 queries at each unit of time, while the adversary need only make 1. At t = 1, the uncertainty region of p will intersect with the ∆ uncertainty regions 6.4. LOWER BOUND 137 of Q and create a ply of ∆ + 1, which the player must immediately resolve. Querying any entity except r will resolve it to this point of intersection and not reduce the ply, only querying r will reduce the ply. However the player gains no useful information from querying entities other than r, any entity could be r and any non-r entity will be at 0. Hence, in the worst case, the player will have to query all ∆ + 1 entities before the ply is reduced to ∆. Therefore after the player has made all their queries what remains is ∆ entities at one point, and one entity at a distance of 2, a (mirrored, shifted) copy of the starting situation. Hence the argument repeats, and the player must make ∆ + 1 queries per unit of time.

We will now show the adversary need only query ri at each unit of time i. As before we will introduce a neutral state and show that this can be maintained. We call a state neutral at time t if one entity denoted pt is a point, one entity denoted pt−1 has its uncertainty region at distance 2 from pt, and the remaining uncertainty regions are within 1 of the half-way point between pt and pt−1, which we will call 0. Since the uncertainty regions of pt and pt−1 do not intersect there is no area of ply more than ∆. After unit of time every uncertainty region intersects at 0 and a query must be made. In fact, every entity will actually be at point 0, except for rt, which will be at 2 or −2. If we now query rt there are two cases, and in each we will be in a neutral state. In the first case pt = rt, then rt is a single point, the uncertainty region for pt−1 is at distance 2 from rt, and since every other point was at 0, their uncertainty regions are within 1 of the halfway point between rt and the region of pt−1. The other case is that rt 6= pt, in this case rt is a single point, and is 2 away from the uncertainty region of pt, and similarly all the remaining uncertainty regions are near the midpoint. Figure 6.3 gives an illustration of both situations. Thus with one query the adversary can return to a neutral state, and so the adversary must only make one query per unit of time. Concluding as in the previous section, these two statements mean that no algorithm without knowledge of the entities’ trajectories can achieve a competitive ratio better than ∆ + 1 in general, and our result from Section 6.3 is tight. As in the previous section a randomized algorithm can perform a little better by choosing which query to make uniformly. We applying Yao’s principle in a similar way and choose ri uniformly at random instead of arbitrarily. Following the same argument the lower bound on the expected competitive ratio for this ∆ instance is 2 + 1. 138CHAPTER 6. MAINTAINING THE PLY OF UNKNOWN TRAJECTORIES

pt−1 pt Case 1

rt

Qt+1 time

pt−1 pt Case 2

rt Qt+1 time

Figure 6.3: The two cases 6.5. CONCLUSIONS 139

6.5 Conclusions

We have presented a new problem, the online ply maintenance problem in the model defined by Evans et al.. We have given a complete description of the problem including an optimal algorithm, a lower bound and a hardness result. We conclude with some remarks about possible future development in this model. Both our and Evans et al.’s problems concern unknown trajectories. It would be worthwhile to consider what properties of trajectories could be computed on unknown but queryable trajectories. An immediate problem would be to consider making a small number of queries in order to approximate the distance between two entities, or to detect if the distance goes over a given bound. This could also be considered under the Fr´echet distance between the trajectories. This model could also be applied to imprecise points. Given a set of imprecise points and an allowable number of queries which return a point’s exact location, how can those queries be best spent to improve the results of a subsequent algorithm on the imprecise points? Visibility problems could prove particularly interesting, points in the uncertainty region of one entity may be able to see some, all or none of the points in the uncertainty region of another. Querying a small number of points to determine the visibility graph seems NP-hard via vertex cover, but approximation algorithms and other questions could be developed. Because visibility questions are very sensitive to small movements, uncertainty and visibility is a difficult combination, especially for locationally uncertain points. Allowing the more problematic entities to be queried for their exact location at a cost is an interesting way of getting around some of this difficulty. It would also be worthwhile to revisit the body of work on simple geometric problems (e.g. convex hull) on imprecise points and see whether better results can be achieved if a small number of exact-location queries are allowed. 140CHAPTER 6. MAINTAINING THE PLY OF UNKNOWN TRAJECTORIES Chapter 7

Conclusion

Uncertainty in Computational Geometry is an exciting topic which is currently undergoing rapid change and development. Uncertainty is a fundamental part of the modern world; more and more uncertain data is being collected and analysed every day. Cheap measurement devices (mobile phones, GPS, Internet- of-Things) have become widespread, future forecasting models have become more complicated and more important for business, and probabilistic data created by machine learning models is increasingly common. Traditionally in algorithms research any uncertainty is implicitly removed via taking point estimates of the data, using the expectation or the mode. As we have shown, this can give very misleading results which are not correct for any realization of the uncertain data. It is therefore necessary to develop specialized algorithms which are uncertainty-aware and give answers in terms of the uncertainty of the input. Uncertainty-aware algorithms may return the expected best solution, a solution which is correct with high probability, a solution with bounded cost regardless of the uncertainty, or similar. In computational geometry there are several models for dealing with uncertainty. The more traditional imprecise points model specifies the input points as a set of uncertainty regions; once the algorithm has made some choices a location is adversarially chosen from each region as the “true location” of that point. The existential uncertainty model associates each input point with an existence probability ρi; there is a 1−ρi chance that this point does not appear and should not be considered in the solution. The locational uncertainty model specifies each input point as a probability distribution; the algorithm only has access to the distributions, but the solution will be evaluated on a set of random points, consisting of one point sampled from each of the input distributions. In the black box model nothing is known about the distribution or possible values of the points, but rather examples can be sampled or information can be queried at a cost.

141 142 CHAPTER 7. CONCLUSION

In Chapter 2 we surveyed results from the literature which were important to our work in later chapters or to uncertainty in general. We also gave a survey of some of the many papers which have been written in computational geometry with uncertainty, especially focusing on results from the last decade involving probabilistic data. In Chapter 3 we discussed the concept of a universal solution and the universal traveling salesman problem specifically. We defined a class of total orders of the plane which we call hierarchical orders, which is a broad class capturing the intuition of a locality-preserving order. We proved any solution to the UTSP based on a hierarchical order can not achieve a competitive ratio better than Ω(log n), meeting the existing upper bound. We also gave some notes on how hour restrictions on the order could be relaxed. In Chapter 4 we defined the trajectory visibility problem, which combines two central topics in computational geometry. This problem is simply to detect if entities, moving along piecewise linear trajectories in a domain of obstacles, are ever able to see each other. We gave algorithms and data structures for this problem in a variety of settings: inside a simple polygon, intersecting a simple polygon, in a polygonal domain. We provided matching lower bounds for our algorithms. In the special case when one entity remains stationary we were able to improve our results. This problem was motivated by the need to answer visibility questions between animals moving along uncertain trajectories. The probability the animals see each other can be estimated by black-box sampling many pairs of trajectories and computing trajectory visibility between them; for this to be performed effectively, efficient data structures for trajectory visibility are required. In Chapter 5 we introduced the problem of building a spanner on locationally uncertain points. In the uncertain spanner problem the algorithm must build a graph G on a set of uncertain points without knowing their exact locations such that when an exact location is sampled for each uncertain point, the probability that G is a spanner is high. We gave a construction based on the Yao graph that builds an uncertain spanner with O(n log n) edges, we also argued that similar constructions are available for any cone-based spanner. We gave a lower bound on the number of edges used by a cone-based construction of this type. In Chapter 6 we discussed a model where uncertainty may be reduced by querying the exact locations of points at a cost. We introduced the online ply maintenance problem. In this model some entities move along unknown trajectories at bounded speed, each entity is known to be somewhere in an uncertainty region which grows over time. At any time an entity may be queried to learn its exact location. The online ply maintenance problem is to decide which entities to query at which times, such that the number of uncertainty regions intersecting any point is never more than a fixed ∆, while minimizing the total number of queries. We give an algorithm which never uses more than ∆ + 1 times the number of queries used by an adversary solving the same problem with full knowledge of the trajectories and unlimited computation. We provide a lower bound to show this is optimal 143

1 for deterministic algorithms and within 2 of optimal for randomized algorithms. We also prove that the adversary’s problem of choosing which queries to make with full knowledge is NP-complete. Each Chapter in this thesis has many possible extensions, which are discussed in the conclusion of that chapter. More broadly in uncertainty in computational geometry as a whole there is still a wide landscape of potential future work. Many results mentioned in the literature review, and also some of our own, do not have lower bounds, or have substantial gaps between the known upper and lower bounds. There is always value in closing these gaps with better algorithms, or better lower bounds. One can look at uncertainty in computational geometry as a project of converting existing, deterministic, problems and algorithms to work with uncertain data. Sometimes a problem can be easily adapted and existing algorithms can be used without much difficulty, sometimes introducing uncertainty completely transforms the problem and entirely new algorithms are required. There are many important problems in computational geometry whose uncertain versions have not yet been considered. Picking a classical problem and an uncertainty model and studying their combination will be a source of interesting problems for a long time yet. Finally, a serious hole in the literature is a unified theory of uncertainty in computational geometry. As we commented in Chapter 2 nearly all problems in this field are studied in isolation. Most of the techniques used on these problems are problem-specific, and common themes between problems are only recently starting to emerge. The key models and key questions in those models are becoming standardized, a trend we hope this thesis has contributed to. However theorems and techniques with broad application, a complete description of how geometry and uncertainty interact, and a method of classifying which uncertain problems are fundamentally different from their deterministic versions, all seem still some distance in the future. We hope that with further research on specific problems in uncertain computational geometry these underlying structures will become clearer, and a unified theory will be possible. 144 CHAPTER 7. CONCLUSION Bibliography

[1] Amirali Abdullah, Samira Daruki, and Jeff M. Phillips. Range counting coresets for uncertain data. In Guilherme Dias da Fonseca, Thomas Lewiner, Luis Mariano Pe˜naranda,Timothy M. Chan, and Rolf Klein, editors, Symposuim on Computational Geometry 2013, SoCG ’13, Rio de Janeiro, Brazil, June 17-20, 2013, pages 223–232. ACM, 2013. [2] Peyman Afshani, Pankaj K. Agarwal, Lars Arge, Kasper Green Larsen, and Jeff M. Phillips. (approximate) uncertain skylines. Theory Comput. Syst., 52(3):342–366, 2013. [3] P. K. Agarwal and J. Matouˇsek.Dynamic half-space range reporting and its applications. Algorithmica, 13(4):325–345, Apr 1995. [4] Pankaj K. Agarwal. Range searching. In Jacob E. Goodman and Joseph O’Rourke, editors, Handbook of Discrete and Computational Geometry, Second Edition, pages 809–837. Chapman and Hall/CRC, 2004. [5] Pankaj K Agarwal. Simplex range searching and its variants: A review. In A Journey Through Discrete Mathematics, pages 1–30. Springer, 2017. [6] Pankaj K. Agarwal, Boris Aronov, Sariel Har-Peled, Jeff M. Phillips, Ke Yi, and Wuzhou Zhang. Nearest-neighbor searching under uncertainty II. ACM Trans. Algorithms, 13(1):3:1–3:25, 2016. [7] Pankaj K. Agarwal, Siu-Wing Cheng, and Ke Yi. Range searching on uncertain data. ACM Trans. Algorithms, 8(4):43:1–43:17, 2012. [8] Pankaj K. Agarwal, Alon Efrat, Swaminathan Sankararaman, and Wuzhou Zhang. Nearest-neighbor searching under uncertainty I. Discret. Comput. Geom., 58(3):705–745, 2017. [9] Pankaj K Agarwal and Jeff Erickson. Geometric range searching and its relatives. Contemporary Mathematics, 223:1–56, 1999. [10] Pankaj K. Agarwal, Sariel Har-Peled, Subhash Suri, Hakan Yildiz, and Wuzhou Zhang. Convex hulls under uncertainty. Algorithmica, 79(2):340– 367, 2017.

145 146 BIBLIOGRAPHY

[11] Pankaj K. Agarwal, Sariel Har-Peled, and Kasturi R. Varadarajan. Ap- proximating extent measures of points. J. ACM, 51(4):606–635, 2004. [12] Pankaj K Agarwal, Sariel Har-Peled, and Kasturi R Varadarajan. Geomet- ric approximation via coresets. Combinatorial and computational geometry, 52:1–30, 2005. [13] Pankaj K Agarwal and Jiˇr´ıMatouˇsek.Ray shooting and parametric search. SIAM Journal on Computing, 22(4):794–806, 1993. [14] Pankaj K. Agarwal and Jiˇr´ıMatouˇsek.On range searching with semialge- braic sets. Discret. Comput. Geom., 11:393–418, 1994. [15] Pankaj K. Agarwal, Jiˇr´ıMatouˇsek,and Micha Sharir. On range searching with semialgebraic sets. II. SIAM J. Comput., 42(6):2039–2062, 2013. [16] Charu C. Aggarwal. A survey of uncertain data clustering algorithms. In Charu C. Aggarwal and Chandan K. Reddy, editors, Data Clustering: Algorithms and Applications, pages 457–482. CRC Press, 2013. [17] Charu C. Aggarwal and Philip S. Yu. A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng., 21(5):609–623, 2009. [18] Akash Agrawal, Yuan Li, Jie Xue, and Ravi Janardan. The most-likely skyline problem for stochastic points. Comput. Geom., 88:101609, 2020. [19] Hee-Kap Ahn, Christian Knauer, Marc Scherfenberg, Lena Schlipf, and Antoine Vigneron. Computing the discrete fr´echet distance with imprecise input. Int. J. Comput. Geometry Appl., 22(1):27–44, 2012. [20] Sander P. A. Alewijnse, Quirijn W. Bouts, Alex P. ten Brink, and Kevin Buchin. Computing the greedy spanner in linear space. Algorithmica, 73(3):589–606, 2015. [21] Ingo Alth¨ofer,Gautam Das, David P. Dobkin, Deborah Joseph, and Jos´e Soares. On sparse spanners of weighted graphs. Discret. Comput. Geom., 9:81–100, 1993. [22] Mattias Andersson, Joachim Gudmundsson, Patrick Laube, and Thomas Wolle. Reporting leadership patterns among trajectories. In Proceedings of the 2007 ACM Symposium on Applied Computing (SAC), Seoul, Korea, March 11-15, 2007, pages 3–7, 2007. [23] Martin Anthony and Peter L. Bartlett. Neural Network Learning - Theo- retical Foundations. Cambridge University Press, 2002. [24] Barry C Arnold, Narayanaswamy Balakrishnan, and Haikady Navada Nagaraja. A first course in order statistics. SIAM, 2008. [25] Boris Aronov, Leonidas J Guibas, Marek Teichmann, and Li Zhang. Visibil- ity queries and maintenance in simple polygons. Discrete & Computational Geometry, 27(4):461–483, 2002. BIBLIOGRAPHY 147

[26] Sanjeev Arora. Polynomial time approximation schemes for euclidean traveling salesman and other geometric problems. J. ACM, 45(5):753–782, 1998. [27] Nikhil Bansal, Anupam Gupta, Jian Li, Juli´anMestre, Viswanath Na- garajan, and Atri Rudra. When LP is the cure for your matching woes: Improved bounds for stochastic matchings. Algorithmica, 63(4):733–762, 2012. [28] Luis Barba, Prosenjit Bose, Mirela Damian, Rolf Fagerberg, Wah Loon Keng, Joseph O’Rourke, Andr´evan Renssen, Perouz Taslakian, Sander Verdonschot, and Ge Xia. New and improved spanning ratios for yao graphs. JoCG, 6(2):19–53, 2015. [29] Yair Bartal. Probabilistic approximations of metric spaces and its algo- rithmic applications. In Proc. of the 37th Annual IEEE Symposium on Foundations of Computer Science, pages 184–193, 1996. [30] J.J Bartholdi and L.K Platzman. An o( n log n) planar travelling salesman heuristic based on spacefilling curves. Operations Research Letters, 1(4):121– 125, 1982. [31] Michael Ben-Or. Lower bounds for algebraic computation trees. In Pro- ceedings of the fifteenth annual ACM Symposium on Theory of Computing, pages 80–86. ACM, 1983.

[32] Marc Benkert, Joachim Gudmundsson, Florian H¨ubner,and Thomas Wolle. Reporting flock patterns. Comput. Geom., 41(3):111–125, 2008. [33] Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509–517, 1975.

[34] Jon Louis Bentley. Multidimensional divide-and-conquer. Commun. ACM, 23(4):214–229, 1980. [35] Marshall Bern, David Dobkin, David Eppstein, and Robert Grossman. Visibility with a moving point of view. Algorithmica, 11(4):360–378, 1994. [36] Dimitris Bertsimas, David B Brown, and Constantine Caramanis. Theory and applications of robust optimization. SIAM review, 53(3):464–501, 2011. [37] Dimitris Bertsimas and Michelangelo Grigni. Worst-case examples for the spacefilling curve heuristic for the euclidean traveling salesman problem. Operations Research Letters, 8(5):241–244, 1989.

[38] Anand Bhalgat, Deeparnab Chakrabarty, and Sanjeev Khanna. Optimal lower bounds for universal and differentially private steiner trees and tsps. In Proc. of the 14th Int. Workshop on Approximation, Randomization, and Combinatorial Optimization, pages 75–86, 2011. 148 BIBLIOGRAPHY

[39] Sourabh Bhattacharya and Seth Hutchinson. Approximation schemes for two-player pursuit evasion games with visibility constraints. In Oliver Brock, Jeff Trinkle, and Fabio Ramos, editors, Robotics: Science and Systems IV, Eidgen¨ossischeTechnische Hochschule Z¨urich,Zurich, Switzerland, June 25-28, 2008. The MIT Press, 2008. [40] Prosenjit Bose, Paz Carmi, Vida Dujmovic, and Pat Morin. Near-optimal o(k)-robust geometric spanners. CoRR, abs/1812.09913, 2018. [41] Prosenjit Bose, Paz Carmi, Mohammad Farshi, Anil Maheshwari, and Michiel H. M. Smid. Computing the greedy spanner in near-quadratic time. Algorithmica, 58(3):711–729, 2010. [42] Prosenjit Bose, Luc Devroye, Maarten L¨offler,Jack Snoeyink, and Vishal Verma. Almost all delaunay triangulations have stretch factor greater than pi/2. Comput. Geom., 44(2):121–127, 2011. [43] Prosenjit Bose, Vida Dujmovic, Pat Morin, and Michiel H. M. Smid. Robust geometric spanners. SIAM J. Comput., 42(4):1720–1736, 2013. [44] Prosenjit Bose and Michiel H. M. Smid. On plane geometric spanners: A survey and open problems. Comput. Geom., 46(7):818–830, 2013. [45] P. Bovet and S. Benhamou. Spatial analysis of animals’ movements using a correlated random walk model. J. Theoretical Biology, 131(4):419–433, 1988. [46] Gerth Stølting Brodal and Riko Jacob. Dynamic planar convex hull. In The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings., pages 617–626. IEEE, 2002. [47] Kevin Buchin, Maike Buchin, Joachim Gudmundsson, Maarten L¨offler, and Jun Luo. Detecting commuting patterns by clustering subtrajecto- ries. International Journal of Computational Geometry & Applications, 21(03):253–282, 2011. [48] Kevin Buchin, Sariel Har-Peled, and D´anielOl´ah. A spanner for the day after. In Gill Barequet and Yusu Wang, editors, 35th International Symposium on Computational Geometry, SoCG 2019, June 18-21, 2019, Portland, Oregon, USA, volume 129 of LIPIcs, pages 19:1–19:15. Schloss Dagstuhl - Leibniz-Zentrum f¨urInformatik, 2019. [49] Kevin Buchin, Irina Kostitsyna, Maarten L¨offler,and Rodrigo I. Silveira. Region-based approximation of probability distributions (for visibility between imprecise points among obstacles). Algorithmica, 81(7):2682–2715, 2019. [50] Kevin Buchin, Maarten L¨offler,Pat Morin, and Wolfgang Mulzer. Pre- processing imprecise points for delaunay triangulation: Simplified and extended. Algorithmica, 61(3):674–693, 2011. BIBLIOGRAPHY 149

[51] Kevin Buchin, Maarten L¨offler,Aleksandr Popov, and Marcel Roeloffzen. Fr´echet distance between uncertain trajectories: Computing expected value and upper bound. 36th European Workshop on Computational Geometry (EuroCG 2020), 2020.

[52] Kevin Buchin, Stef Sijben, T. Jean Marie Arseneau, and Erik P. Willems. Detecting movement patterns using brownian bridges. In Isabel F. Cruz, Craig A. Knoblock, Peer Kr¨oger,Egemen Tanin, and Peter Widmayer, editors, SIGSPATIAL 2012 International Conference on Advances in Geo- graphic Information Systems (formerly known as GIS), SIGSPATIAL’12, Redondo Beach, CA, USA, November 7-9, 2012, pages 119–128. ACM, 2012.

[53] Maike Buchin and Stef Sijben. Discrete fr´echet distance for uncertain points. In Proceedings of the 32nd European Workshop on Computational Geometry (EuroCG), 2016.

[54] Daniel Busto, William S. Evans, and David G. Kirkpatrick. Minimizing interference potential among moving entities. In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 2400–2418. SIAM, 2019.

[55] Cl´ement Calenge, St´ephaneDray, and Manuela Royer-Carenzi. The con- cept of animals’ trajectories from a data analysis perspective. Ecological Informatics, 4(1):34 – 41, 2009.

[56] Paul B Callahan. Dealing with higher dimensions: the well-separated pair decomposition and its applications. PhD thesis, The Johns Hopkins University, 1995.

[57] Paul B. Callahan and S. Rao Kosaraju. A decomposition of multidi- mensional point sets with applications to k-nearest-neighbors and n-body potential fields. J. ACM, 42(1):67–90, 1995.

[58] Jean Cardinal and Udo Hoffmann. Recognition and complexity of point visibility graphs. Discrete & Computational Geometry, 57(1):164–178, 2017.

[59] M´arcia R Cerioli, Luerbio Faria, Talita O Ferreira, and F´abioProtti. On minimum clique partition and maximum independent set on unit disk graphs and penny graphs: complexity and approximation. Electronic Notes in Discrete Mathematics, 18:73–79, 2004.

[60] Timothy M. Chan. Faster core-set constructions and data-stream algo- rithms in fixed dimensions. Comput. Geom., 35(1-2):20–35, 2006.

[61] Timothy M. Chan. Optimal partition trees. Discret. Comput. Geom., 47(4):661–690, 2012. 150 BIBLIOGRAPHY

[62] Timothy M. Chan. Orthogonal range searching in moderate dimensions: k- d trees and range trees strike back. Discret. Comput. Geom., 61(4):899–922, 2019. [63] Timothy M. Chan, Kasper Green Larsen, and Mihai Patrascu. Orthogonal range searching on the ram, revisited. In Ferran Hurtado and Marc J. van Kreveld, editors, Proceedings of the 27th ACM Symposium on Com- putational Geometry, Paris, France, June 13-15, 2011, pages 1–10. ACM, 2011. [64] Bernard Chazelle. Lower bounds on the complexity of polytope range searching. Journal of the American Mathematical Society, 2(4):637–666, 1989. [65] Bernard Chazelle. Cutting hyperplanes for divide-and-conquer. Discrete & Computational Geometry, 9(2):145–158, 1993. [66] Bernard Chazelle, Herbert Edelsbrunner, Michelangelo Grigni, Leonidas Guibas, , Micha Sharir, and Jack Snoeyink. Ray shooting in polygons using geodesic triangulations. Algorithmica, 12(1):54–68, 1994. [67] Bernard Chazelle and Leonidas J. Guibas. Fractional cascading: I. A data structuring technique. Algorithmica, 1(2):133–162, 1986. [68] Bernard Chazelle and Leonidas J. Guibas. Fractional cascading: II. appli- cations. Algorithmica, 1(2):163–191, 1986. [69] Bernard Chazelle and Leonidas J. Guibas. Visibility and intersection problems in plane geometry. Discrete & Computational Geometry, 4:551– 581, 1989. [70] Bernard Chazelle, Micha Sharir, and Emo Welzl. Quasi-optimal upper bounds for simplex range searching and new zone theorems. Algorithmica, 8(5&6):407–429, 1992. [71] Paul Chew. There is a planar graph almost as good as the complete graph. In Proceedings of the Second Annual ACM SIGACT/SIGGRAPH Symposium on Computational Geometry, Yorktown Heights, NY, USA, June 2-4, 1986, pages 169–177, 1986. [72] Paul Chew. There are planar graphs almost as good as the complete graph. J. Comput. Syst. Sci., 39(2):205–219, 1989. [73] George Christodoulou and Alkmini Sgouritsa. An improved upper bound for the universal TSP on the grid. In Proc. of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1006–1021, 2017. [74] Nicos Christofides. Worst-case analysis of a new heuristic for the travelling salesman problem. Technical report, CMU Technical Report, 1976. [75] Vasek Chvatal. A combinatorial theorem in plane geometry. Journal of Combinatorial Theory, Series B, 18(1):39–41, 1975. BIBLIOGRAPHY 151

[76] Kenneth L. Clarkson. Approximation algorithms for shortest path motion planning (extended abstract). In Alfred V. Aho, editor, Proceedings of the 19th Annual ACM Symposium on Theory of Computing, 1987, New York, New York, USA, pages 56–65. ACM, 1987. [77] John H Conway and Richard Guy. The book of numbers. Springer Science & Business Media, 2012. [78] Graham Cormode and Andrew McGregor. Approximation algorithms for clustering uncertain data. In Maurizio Lenzerini and Domenico Lembo, edi- tors, Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2008, June 9-11, 2008, Vancouver, BC, Canada, pages 191–200. ACM, 2008. [79] Artur Czumaj and Hairong Zhao. Fault-tolerant geometric spanners. Discret. Comput. Geom., 32(2):207–230, 2004. [80] George B Dantzig. Linear programming under uncertainty. Management Science, 1, 1955.

[81] Mark De Berg. Ray shooting, depth orders and hidden surface removal, volume 703. Springer Science & Business Media, 1993. [82] Mark de Berg, Marc van Kreveld, Mark Overmars, and Otfried Schwarzkopf. Computational geometry. Springer, 1997.

[83] Luc Devroye. Laws of the iterated logarithm for order statistics of uniform spacings. The Annals of Probability, pages 860–867, 1981. [84] Yago Diez, Matias Korman, Andr´evan Renssen, Marcel Roeloffzen, and Frank Staals. Kinetic all-pairs shortest path in a simple polygon. 33rd European Workshop on Computational Geometry (EuroCG 2017), pages 21–24, 2017. abstract. [85] David P. Dobkin and Herbert Edelsbrunner. Space searching for intersecting objects. J. Algorithms, 8(3):348–361, 1987. [86] David P. Dobkin, Steven J. Friedman, and Kenneth J. Supowit. Delaunay graphs are almost as good as complete graphs. Discret. Comput. Geom., 5:399–407, 1990. [87] David P. Dobkin and David G. Kirkpatrick. Fast detection of polyhedral intersection. Theoretical Computer Science, 27(3):241 – 253, 1983. Spe- cial Issue Ninth International Colloquium on Automata, Languages and Programming (ICALP) Aarhus, Summer 1982.

[88] Somayeh Dodge, Robert Weibel, and Ehsan Forootan. Revealing the physics of movement: Comparing the similarity of movement characteristics of different types of moving objects. Computers, Environment and Urban Systems, 33(6):419–434, 2009. 152 BIBLIOGRAPHY

[89] James R. Driscoll, Neil Sarnak, Daniel D. Sleator, and Robert E. Tarjan. Making data structures persistent. Journal of Computer and System Sciences, 38(1):86 – 124, 1989. [90] Fr´edoDurand. A multidisciplinary survey of visibility. ACM Siggraph course notes Visibility, Problems, Techniques, and Applications, 2000.

[91] Peter Eades and Sue Whitesides. The logic engine and the realization problem for nearest neighbor graphs. Theor. Comput. Sci., 169(1):23–37, 1996. [92] Alon Efrat, Matthew J. Katz, Frank Nielsen, and Micha Sharir. Dynamic data structures for fat objects and their applications. Comput. Geom., 15(4):215–227, 2000. [93] David Eppstein. Spanning trees and spanners. In J¨org-R¨udigerSack and Jorge Urrutia, editors, Handbook of Computational Geometry, pages 425–461. North Holland / Elsevier, 2000.

[94] William S. Evans, David G. Kirkpatrick, Maarten L¨offler,and Frank Staals. Competitive query strategies for minimising the ply of the potential locations of moving points. In Guilherme Dias da Fonseca, Thomas Lewiner, Luis Mariano Pe˜naranda,Timothy M. Chan, and Rolf Klein, editors, Symposuim on Computational Geometry 2013, SoCG ’13, Rio de Janeiro, Brazil, June 17-20, 2013, pages 155–164. ACM, 2013.

[95] Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. A tight bound on approximating arbitrary metrics by tree metrics. J. Comput. Syst. Sci., 69(3):485–497, 2004. [96] Chenglin Fan and Binhai Zhu. Complexity and algorithms for the discrete fr´echet distance upper bound with imprecise input. CoRR, abs/1509.02576, 2015. [97] William Feller. An introduction to probability theory and its applications. Wiley, 1957. [98] Martin Fink, John Hershberger, Nirman Kumar, and Subhash Suri. Hy- perplane separability and convexity of probabilistic point sets. JoCG, 8(2):32–57, 2017. [99] Steve Fisk. A short proof of chv´atal’swatchman theorem. Journal of Combinatorial Theory, Series B, 24(3):374, 1978. [100] Leila De Floriani and Paola Magillo. Algorithms for visibility computation on terrains: a survey. Environ. Plann. B, 30(5):709–728, 2003. [101] S. Gaffney, A. Robertson, P. Smyth, S. Camargo, and M. Ghil. Probabilis- tic clustering of extratropical cyclones using regression mixture models. Climate Dynamics, 29(4):423–440, 2007. BIBLIOGRAPHY 153

[102] S. Gaffney and P. Smyth. Trajectory clustering with mixtures of regression models. In Proc. 5th ACM SIGKDD Internat. Conf. Knowledge Discovery and Data Mining, pages 63–72, 1999. [103] Michael R Garey and David S. Johnson. The rectilinear steiner tree problem is np-complete. SIAM Journal on Applied Mathematics, 32(4):826–834, 1977. [104] Subir K. Ghosh and Partha P. Goswami. Unsolved problems in visibility graphs of points, segments, and polygons. ACM Comput. Surv., 46(2):22:1– 22:29, December 2013. [105] Subir Kumar Ghosh. Visibility algorithms in the plane. Cambridge univer- sity press, 2007. [106] Igor Gorodezky, Robert D. Kleinberg, David B. Shmoys, and Gwen Spencer. Improved lower bounds for the universal and a priori TSP. In Proc. of the 13th Int. Workshop on Approximation, Randomization, and Combinatorial Optimization, pages 178–191, 2010. [107] Matthias Gr¨unewald, Tam´asLukovszki, Christian Schindelhauer, and Klaus Volbert. Distributed maintenance of resource efficient wireless net- work topologies (distinguished paper). In Burkhard Monien and Rainer Feldmann, editors, Euro-Par 2002, Parallel Processing, 8th International Euro-Par Conference Paderborn, Germany, August 27-30, 2002, Proceed- ings, volume 2400 of Lecture Notes in Computer Science, pages 935–946. Springer, 2002. [108] Joachim Gudmundsson and Christian Knauer. Dilation and detours in geo- metric networks. In Teofilo F. Gonzalez, editor, Handbook of Approximation Algorithms and Metaheuristics, Second Edition, Volume 2: Contemporary and Emerging Applications. Chapman and Hall/CRC, 2018. [109] Joachim Gudmundsson, Patrick Laube, and Thomas Wolle. Movement patterns in spatio-temporal data. In Shashi Shekhar, Hui Xiong, and Xun Zhou, editors, Encyclopedia of GIS, pages 1362–1370. Springer, 2017. [110] Joachim Gudmundsson and Sampson Wong. Computing the yolk in spatial voting games without computing median lines. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty- First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 2012–2019. AAAI Press, 2019. [111] Sudipto Guha and Kamesh Munagala. Exceeding expectations and cluster- ing uncertain data. In Jan Paredaens and Jianwen Su, editors, Proceedings of the Twenty-Eigth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2009, June 19 - July 1, 2009, Providence, Rhode Island, USA, pages 269–278. ACM, 2009. 154 BIBLIOGRAPHY

[112] Leonidas J. Guibas and John Hershberger. Optimal shortest path queries in a simple polygon. J. Comput. Syst. Sci., 39(2):126–152, 1989. [113] Leonidas J. Guibas, John Hershberger, Daniel Leven, Micha Sharir, and Robert Endre Tarjan. Linear-time algorithms for visibility and shortest path problems inside triangulated simple polygons. Algorithmica, 2:209– 233, 1987. [114] Anupam Gupta, Mohammad Taghi Hajiaghayi, and Harald R¨acke. Oblivi- ous network design. In Proc. of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 970–979, 2006. [115] Eliezer Gurarie, Russel D. Andrews, and Kristin L. Laidre. A novel method for identifying behavioural changes in animal movement data. Ecology Letters, 12(5):395–408, 2009. [116] Mohammad Taghi Hajiaghayi, Robert D. Kleinberg, and Frank Thomson Leighton. Improved lower and upper bounds for universal TSP in planar metrics. In Proc. of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 649–658, 2006. [117] Sariel Har-Peled. Geometric approximation algorithms. Number 173. American Mathematical Soc., 2011. [118] David Haussler and Emo Welzl. epsilon-nets and simplex range queries. Discret. Comput. Geom., 2:127–151, 1987. [119] Lingxiao Huang and Jian Li. Approximating the expected values for combinatorial optimization problems over stochastic points. In Magn´usM. Halld´orsson,Kazuo Iwama, Naoki Kobayashi, and Bettina Speckmann, editors, Automata, Languages, and Programming - 42nd International Colloquium, ICALP 2015, Kyoto, Japan, July 6-10, 2015, Proceedings, Part I, volume 9134 of Lecture Notes in Computer Science, pages 910–921. Springer, 2015. [120] Lingxiao Huang and Jian Li. Stochastic k-center and j -flat-center problems. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 110–129, 2017. [121] Lingxiao Huang, Jian Li, Jeff M. Phillips, and Haitao Wang. epsilon- kernel coresets for stochastic points. In Piotr Sankowski and Christos D. Zaroliagis, editors, 24th Annual European Symposium on Algorithms, ESA 2016, August 22-24, 2016, Aarhus, Denmark, volume 57 of LIPIcs, pages 50:1–50:18. Schloss Dagstuhl - Leibniz-Zentrum f¨urInformatik, 2016. [122] Lujun Jia, Guolong Lin, Guevara Noubir, Rajmohan Rajaraman, and Ravi Sundaram. Universal approximations for tsp, steiner tree, and set cover. In Proc. of the 37th Annual ACM Symposium on Theory of Computing, pages 386–395, 2005. BIBLIOGRAPHY 155

[123] Allan Jørgensen, Maarten L¨offler,and Jeff M. Phillips. Geometric compu- tations on indecisive and uncertain points. CoRR, abs/1205.0273, 2012. [124] Simon Kahan. A model for data in motion. In Cris Koutsougeras and Jeffrey Scott Vitter, editors, Proceedings of the 23rd Annual ACM Sympo- sium on Theory of Computing, May 5-8, 1991, New Orleans, Louisiana, USA, pages 267–277. ACM, 1991. [125] Simon H. Kahan. Real-time processing of moving data. PhD thesis, University of Washington, 1991. [126] Pegah Kamousi, Timothy M. Chan, and Subhash Suri. Stochastic mini- mum spanning trees in euclidean spaces. In Ferran Hurtado and Marc J. van Kreveld, editors, Proceedings of the 27th ACM Symposium on Compu- tational Geometry, Paris, France, June 13-15, 2011, pages 65–74. ACM, 2011. [127] Pegah Kamousi, Timothy M. Chan, and Subhash Suri. Closest pair and the post office problem for stochastic points. Comput. Geom., 47(2):214–223, 2014. [128] Richard M. Karp. Reducibility among combinatorial problems. In Proc. of the Symposium on the Complexity of Computer Computations, pages 85–103, 1972. [129] Marek Karpinski, Michael Lampis, and Richard Schmied. New inapprox- imability bounds for TSP. J. Comput. Syst. Sci., 81(8):1665–1677, 2015. [130] Vahideh Keikha, Maarten L¨offler,Ali Mohades, and Zahed Rahmati. Width and bounding box of imprecise points. In Stephane Durocher and Shahin Kamali, editors, Proceedings of the 30th Canadian Conference on Computational Geometry, CCCG 2018, August 8-10, 2018, University of Manitoba, Winnipeg, Manitoba, Canada, pages 142–148, 2018. [131] J. Mark Keil. Approximating the complete euclidean graph. In Rolf G. Karlsson and Andrzej Lingas, editors, SWAT 88, 1st Scandinavian Work- shop on Algorithm Theory, Halmstad, Sweden, July 5-8, 1988, Proceedings, volume 318 of Lecture Notes in Computer Science, pages 208–213. Springer, 1988. [132] J. Mark Keil and Carl A. Gutwin. Classes of graphs which approximate the complete euclidean graph. Discret. Comput. Geom., 7:13–28, 1992. [133] Christian Knauer, Maarten L¨offler, Marc Scherfenberg, and Thomas Wolle. The directed hausdorff distance between imprecise point sets. Theor. Comput. Sci., 412(32):4173–4186, 2011. [134] Nirman Kumar, Benjamin Raichel, Subhash Suri, and Kevin Verbeek. Most likely voronoi diagrams in higher dimensions. In Akash Lal, S. Akshay, Saket Saurabh, and Sandeep Sen, editors, 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, 156 BIBLIOGRAPHY

FSTTCS 2016, December 13-15, 2016, Chennai, India, volume 65 of LIPIcs, pages 31:1–31:14. Schloss Dagstuhl - Leibniz-Zentrum f¨urInformatik, 2016. [135] Nirman Kumar and Subhash Suri. Containment and evasion in stochastic point data. In Evangelos Kranakis, Gonzalo Navarro, and Edgar Ch´avez, editors, LATIN 2016: Theoretical Informatics - 12th Latin American Symposium, Ensenada, Mexico, April 11-15, 2016, Proceedings, volume 9644 of Lecture Notes in Computer Science, pages 576–589. Springer, 2016. [136] Patrick Laube, Marc J. van Kreveld, and Stephan Imfeld. Finding REMO - detecting relative motion patterns in geospatial lifelines. In Developments in Spatial Data Handling, 11th International Symposium on Spatial Data Handling, Leicester, UK, August 23-25, 2004, pages 201–215, 2004. [137] J.G. Lee, J. Han, and K.Y. Whang. Trajectory clustering: a partition-and- group framework. In Proc. ACM SIGMOD International Conference on Management of Data, pages 593–604, 2007. [138] Christos Levcopoulos, Giri Narasimhan, and Michiel H. M. Smid. Effi- cient algorithms for constructing fault-tolerant geometric spanners. In Jeffrey Scott Vitter, editor, Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, Dallas, Texas, USA, May 23-26, 1998, pages 186–195. ACM, 1998. [139] Paul L´evy. Sur la division d’un segment par des points choisis au hasard. CR Acad. Sci. Paris, 208:147–149, 1939. [140] Xiaojie Li, Xiang Li, Daimin Tang, and Xianrui Xu. Deriving features of traffic flow around an intersection from trajectories of vehicles. In Proc. 18th International Conference on Geoinformatics, pages 1–5. IEEE, 2010. [141] Maarten L¨offler. Data Imprecision in Computational Geometry. PhD thesis, Utrecht University, Netherlands, 2009. [142] Maarten L¨offlerand Jeff M. Phillips. Shape fitting on point sets with probability distributions. In Amos Fiat and Peter Sanders, editors, Al- gorithms - ESA 2009, 17th Annual European Symposium, Copenhagen, Denmark, September 7-9, 2009. Proceedings, volume 5757 of Lecture Notes in Computer Science, pages 313–324. Springer, 2009. [143] Maarten L¨offlerand Marc J. van Kreveld. Largest and smallest convex hulls for imprecise points. Algorithmica, 56(2):235–269, 2010. [144] Maarten L¨offlerand Marc J. van Kreveld. Largest bounding box, smallest diameter, and related problems on imprecise points. Comput. Geom., 43(4):419–433, 2010. [145] Jir´ıMatousek. Efficient partition trees. Discret. Comput. Geom., 8:315–334, 1992. [146] Jir´ıMatousek. Reporting points in halfspaces. Comput. Geom., 2:169–186, 1992. BIBLIOGRAPHY 157

[147] Jiˇr´ıMatouˇsek.Range searching with efficient hierarchical cuttings. Discrete & Computational Geometry, 10(2):157–182, Aug 1993. [148] Jir´ıMatousek. Geometric range searching. ACM Comput. Surv., 26(4):421– 461, 1994. [149] Joseph S. B. Mitchell. Guillotine subdivisions approximate polygonal sub- divisions: A simple polynomial-time approximation scheme for geometric tsp, k-mst, and related problems. SIAM J. Comput., 28(4):1298–1309, 1999. [150] Ether Moet. Computation and complexity of visibility in geometric envi- ronments. PhD thesis, Utrecht University, 2008. [151] Ketan Mulmuley. Hidden surface removal with respect to a moving view point. In Proceedings of the twenty-third annual ACM symposium on Theory of computing, pages 512–522. ACM, 1991. [152] Alexander Munteanu, Christian Sohler, and Dan Feldman. Smallest enclos- ing ball for probabilistic data. In Siu-Wing Cheng and Olivier Devillers, editors, 30th Annual Symposium on Computational Geometry, SOCG’14, Kyoto, Japan, June 08 - 11, 2014, page 214. ACM, 2014. [153] Giri Narasimhan and Michiel H. M. Smid. Geometric spanner networks. Cambridge University Press, 2007. [154] J. Nievergelt and E. M. Reingold. Binary Search Trees of Bounded Balance. SIAM Journal on Computing (SICOMP), 2(1):33–43, 1973. [155] Joseph O’Rourke. Art Gallery Theorems and Algorithms. Oxford University Press, Inc., New York, NY, USA, 1987. [156] Christos H. Papadimitriou. The euclidean traveling salesman problem is np-complete. Theor. Comput. Sci., 4(3):237–244, 1977. [157] Giuseppe Peano. Sur une courbe, qui remplit toute une aire plane. Mathe- matische Annalen, 36(1):157–160, 1890. [158] Jian Pei, Bin Jiang, Xuemin Lin, and Yidong Yuan. Probabilistic skylines on uncertain data. In Christoph Koch, Johannes Gehrke, Minos N. Garo- falakis, Divesh Srivastava, Karl Aberer, Anand Deshpande, Daniela Flo- rescu, Chee Yong Chan, Venkatesh Ganti, Carl-Christian Kanne, Wolfgang Klas, and Erich J. Neuhold, editors, Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007, pages 15–26. ACM, 2007. [159] Jeff M. Phillips. Coresets and sketches. CoRR, abs/1601.00617, 2016. [160] Loren K. Platzman and John J. Bartholdi III. Spacefilling curves and the planar travelling salesman problem. J. ACM, 36(4):719–737, 1989. [161] Michel Pocchiola and Gert Vegter. The visibility complex. International Journal of Computational Geometry and Applications, 6(3):279–308, 1996. 158 BIBLIOGRAPHY

[162] Aleksandr Popov. Similarity of uncertain trajectories. Master’s thesis, Eindhoven University of Technology, 2019. [163] Abolfazl Poureidi and Mohammad Farshi. The well-separated pair decom- position for balls. CoRR, abs/1706.06287, 2017. [164] Ronald Pyke. Spacings. Journal of the Royal Statistical Society: Series B (Methodological), 27(3):395–436, 1965. [165] Zahed Rahmati, Mohammad Ali Abam, Valerie King, and Sue Whitesides. Kinetic k-semi-yao graph and its applications. Comput. Geom., 77:10–26, 2019. [166] G¨orkem Safak. The Art-Gallery Problem: A Survey and an Extension. Skolan f¨ordatavetenskap och kommunikation, Kungliga Tekniska h¨ogskolan, 2009. [167] David Salesin, Jorge Stolfi, and Leonidas J. Guibas. Epsilon geome- try: Building robust algorithms from imprecise computations. In Kurt Mehlhorn, editor, Proceedings of the Fifth Annual Symposium on Compu- tational Geometry, Saarbr¨ucken,Germany, June 5-7, 1989, pages 208–217. ACM, 1989. [168] Frans Schalekamp and David B. Shmoys. Algorithms for the universal and a priori TSP. Oper. Res. Lett., 36(1):1–3, 2008. [169] Alexander Schrijver. On the history of combinatorial optimization (till 1960). Handbooks in operations research and management science, 12:1–68, 2005. [170] Michael Ian Shamos and Dan Hoey. Closest-point problems. In 16th Annual Symposium on Foundations of Computer Science, Berkeley, California, USA, October 13-15, 1975, pages 151–162. IEEE Computer Society, 1975. [171] F. Shkurti and G. Dudek. Maximizing visibility in collaborative trajec- tory planning. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 3771–3776, May 2014. [172] Michiel H. M. Smid. Closest-point problems in computational geometry. In J¨org-R¨udigerSack and Jorge Urrutia, editors, Handbook of Computational Geometry, pages 877–935. North Holland / Elsevier, 2000. [173] Andreas Stohl. Computation, accuracy and applications of trajectories – a review and bibliography. Atmospheric Environment, 32(6):947 – 966, 1998. [174] Subhash Suri and Kevin Verbeek. On the most likely voronoi diagram and nearest neighbor searching. Int. J. Comput. Geometry Appl., 26(3-4):151– 166, 2016. [175] Subhash Suri, Kevin Verbeek, and Hakan Yildiz. On the most likely convex hull of uncertain points. In Hans L. Bodlaender and Giuseppe F. Italiano, editors, Algorithms - ESA 2013 - 21st Annual European Symposium, Sophia BIBLIOGRAPHY 159

Antipolis, France, September 2-4, 2013. Proceedings, volume 8125 of Lecture Notes in Computer Science, pages 791–802. Springer, 2013. [176] Yufei Tao, Xiaokui Xiao, and Reynold Cheng. Range search on multidi- mensional uncertain data. ACM Trans. Database Syst., 32(3):15, 2007. [177] Leslie G Valiant. Universality considerations in vlsi circuits. IEEE Trans- actions on Computers, 100(2):135–140, 1981. [178] Ivor van der Hoog, Irina Kostitsyna, Maarten L¨offler,and Bettina Speck- mann. Preprocessing ambiguous imprecise points. In Gill Barequet and Yusu Wang, editors, 35th International Symposium on Computational Geometry, SoCG 2019, June 18-21, 2019, Portland, Oregon, USA, volume 129 of LIPIcs, pages 42:1–42:16. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik, 2019. [179] Vladimir N Vapnik and A Ya Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. In Measures of complexity, pages 11–30. Springer, 2015. [180] Remco Veltkamp. Closed object boundaries from scattered points, volume 885. Springer Science & Business Media, 1994. [181] M. Vlachos, D. Gunopulos, and G. Kollios. Discovering similar multidi- mensional trajectories. In Proc. 18th Internat. Conf. Data Engineering, pages 673–684, 2002. [182] Jan Vondr´ak.Shortest-path metric approximation for random subgraphs. Random Struct. Algorithms, 30(1-2):95–104, 2007. [183] Haitao Wang and Jingru Zhang. One-dimensional k-center on uncertain data. Theor. Comput. Sci., 602:114–124, 2015. [184] Haitao Wang and Jingru Zhang. Computing the center of uncertain points on tree networks. Algorithmica, 78(1):232–254, 2017. [185] Haitao Wang and Jingru Zhang. Covering uncertain points in a tree. Algorithmica, 81(6):2346–2376, 2019. [186] Emo Welzl. Constructing the visibility graph for n-line segments in o (n2) time. Information Processing Letters, 20(4):167–171, 1985. [187] Dan E. Willard. Polygon retrieval. SIAM J. Comput., 11(1):149–165, 1982. [188] Dan E. Willard. New data structures for orthogonal range queries. SIAM J. Comput., 14(1):232–253, 1985. [189] Ge Xia. Improved upper bound on the stretch factor of delaunay triangu- lations. In Ferran Hurtado and Marc J. van Kreveld, editors, Proceedings of the 27th ACM Symposium on Computational Geometry, Paris, France, June 13-15, 2011, pages 264–273. ACM, 2011. 160 BIBLIOGRAPHY

[190] Ge Xia and Liang Zhang. Toward the tight bound of the stretch factor of delaunay triangulations. In Proceedings of the 23rd Annual Canadian Con- ference on Computational Geometry, Toronto, Ontario, Canada, August 10-12, 2011, 2011. [191] Jie Xue, Yuan Li, and Ravi Janardan. On the expected diameter, width, and complexity of a stochastic convex hull. Comput. Geom., 82:16–31, 2019. [192] Andrew Chi-Chih Yao. Probabilistic computations: Toward a unified measure of complexity (extended abstract). In 18th Annual Symposium on Foundations of Computer Science, Providence, Rhode Island, USA, 31 October - 1 November 1977, pages 222–227. IEEE Computer Society, 1977. [193] Andrew Chi-Chih Yao. On constructing minimum spanning trees in k- dimensional spaces and related problems. SIAM J. Comput., 11(4):721–736, 1982. [194] Andrew Chi-Chih Yao and F. Frances Yao. A general approach to d- dimensional geometric queries (extended abstract). In Robert Sedgewick, editor, Proceedings of the 17th Annual ACM Symposium on Theory of Computing, May 6-8, 1985, Providence, Rhode Island, USA, pages 163–168. ACM, 1985.