c 2015 by Benjamin Adam Raichel. All rights reserved. IN PURSUIT OF LINEAR COMPLEXITY IN DISCRETE AND COMPUTATIONAL GEOMETRY
BY
BENJAMIN ADAM RAICHEL
DISSERTATION
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2015
Urbana, Illinois
Doctoral Committee:
Professor Sariel Har-Peled, Chair Professor Chandra Chekuri Dr. Kenneth Clarkson, IBM Almaden Professor Jeff Erickson Abstract
Many computational problems arise naturally from geometric data. In this thesis, we consider three such problems: (i) distance optimization problems over point sets, (ii) computing contour trees over simplicial meshes, and (iii) bounding the expected complexity of weighted Voronoi diagrams. While these topics are broad, here the focus is on identifying structure which implies linear (or near linear) algorithmic and descriptive complexity. The first topic we consider is in geometric optimization. More specifically, we define a large class of distance problems, for which we provide linear time exact or approximate solutions. Roughly speaking, the class of problems facilitate either clustering together close points (i.e. netting) or throwing out outliers (i.e pruning), allowing for successively smaller summaries of the relevant information in the input. A surprising number of classical geometric optimization problems are unified under this framework, including finding the optimal k-center clustering, the kth ranked distance, the kth heaviest edge of the MST, the minimum radius ball enclosing k points, and many others. In several cases we get the first known linear time approximation algorithm for a given problem, where our approximation ratio matches that of previous work. The second topic we investigate is contour trees, a fundamental structure in computational topology. Contour trees give a compact summary of the evolution of level sets on a mesh, and are typically used on massive data sets. Previous algorithms for computing contour trees took Θ(n log n) time and were worst-case optimal. Here we provide an algorithm whose running time lies between Θ(nα(n)) and Θ(n log n), and varies depending on the shape of the tree, where α(n) is the inverse Ackermann function. In particular, this is the first algorithm with O(nα(n)) running time on instances with balanced contour trees. Our algorithmic results are complemented by lower bounds indicating that, up to a factor of α(n), on all instance types our algorithm performs optimally. For the final topic, we consider the descriptive complexity of weighted Voronoi diagrams. Such diagrams have quadratic (or higher) worst-case complexity, however, as was the case for contour trees, here we push beyond worst-case analysis. A new diagram, called the candidate diagram, is introduced, which allows us to bound the complexity of weighted Voronoi diagrams arising from a particular probabilistic input model. Specifically, we assume weights are randomly permuted among fixed Voronoi sites, an assumption which is weaker than the more typical sampled locations assumption. Under this assumption, the expected complexity is shown to be near linear.
ii To my family, and to all those who supported me, in ways both big and small.
iii Acknowledgments
First and foremost, I would like to thank my advisor, Sariel Har-Peled. At an uncertain time, after having just received my bachelor’s degree, Sariel helped me find a career path forward. Even though when we first met, I had little knowledge of theoretical computer science, he believed in me, and continued to do so over the years, even at times when I was uncertain of myself. Undoubtedly, without his support, this thesis, and indeed my chosen research path, would not have been possible. I would also like to thank Seshadhri Comandur for taking me on as an intern the past two summers at Sandia National Labs, and showing me a research world outside of UIUC. In particular, the middle portion of this thesis is the result of the work we started together at Sandia. Also, thank you to all my committee members. First, thanks to Chandra Chekuri and Jeff Erickson, from whom I have learned a great deal during my time as a grad student. Their door has always been open to me, and each has uniquely provided me with a significant amount of valuable and well-thought-out advice. Also, thank you to Ken Clarkson for graciously agreeing to serve on my committee, despite us not having worked together previously, and for twice traveling a long distance to listen to me speak. I am also very grateful to all my co-authors. Specifically, thank you to Hsien-Chih Chang, Seshadhri Comandur, and Sariel Har-Peled who worked with me on results contained in this thesis, and also to Anne Driemel, Alina Ene, Nirman Kumar, David Mount, and Amir Nayyeri for their work with me on other results. In my time as a graduate student I have also been fortunate enough to be a part of the UIUC theory group. The group has provided me with a sense of community right from the start. The faculty deeply care about the success of their students, and the students support one another to succeed. So thank you to the entire theory group, both past and present, for taking me in. I would also like to thank all those who gave advice and support, both directly and indirectly, on research in and outside of this thesis. This includes, but is not limited to: Pankaj Agarwal, Janine Bennett, Sergio Cabello, Timothy Chan, Esther Ezra, Haim Kaplan, Steve LaValle, Joseph O’Rourke, Ali Pinar, Manoj Prabhakaran, Raimund Seidel, Micha Sharir, Jessica Sherette, Anastasios Sidiropoulos, Yusu Wang, and Carola Wenk. More generally, I would like to thank all the mentors I have had over the years, for pushing me to succeed and for believing in me. Also, thanks to all my classmates who struggled together with me in our quest for understanding. Last, but certainly not least, thank you to all my friends and family. First, thank you to my parents, to whom I owe the most, and who have always supported me and pushed me to succeed in whatever path I chose in life. Next, I thank my brothers for always being there for me. We complement each other, and I am very proud of both of them. Finally, my love and thanks goes to my fiancée, Aubrey, whose love and support has been immeasurable and who I will forever be grateful to as we continue through life together.
iv Table of Contents
List of Figures...... vii
Chapter 1 Introduction ...... 1 1.1 Geometric optimization via Net and Prune...... 1 1.2 Computing contour trees in shape sensitive time...... 3 1.3 Bounding probabilistic inputs for weighted diagrams...... 4 1.3.1 Related work...... 6 1.4 Remarks...... 6
Chapter 2 Net and Prune...... 8 2.1 Introduction...... 8 2.2 Preliminaries...... 14 2.2.1 Basic definitions...... 14 2.2.2 Computing nets quickly for a point set in IRd ...... 14 2.2.3 Identifying close and far points...... 15 2.3 Approximating nice distance problems...... 16 2.3.1 Problem definition and an example...... 16 2.3.2 A linear time approximation algorithm...... 17 2.3.3 The result...... 21 2.4 Applications...... 23 2.4.1 k-center clustering...... 23 2.4.2 The kth smallest distance...... 23 2.4.3 The kth smallest m-nearest neighbor distance...... 25 2.4.4 The spanning forest partitions, and the kth longest MST edge...... 26 2.4.5 Computing the minimum cluster...... 28 2.4.6 Clustering for monotone properties...... 31 2.4.7 Smallest non-zero distance...... 32 2.5 Linear Time with high probability...... 33 2.5.1 Preliminaries...... 33 2.5.2 The algorithm...... 35 2.5.3 Algorithm analysis...... 37 2.5.4 The result...... 39 2.6 Conclusions...... 40
Chapter 3 Contour Trees ...... 42 3.1 Introduction...... 42 3.1.1 Previous Work...... 44 3.2 Contour tree basics...... 44 3.2.1 Some technical remarks...... 46 3.3 A tour of the new contour tree algorithm...... 47 3.3.1 Breaking M into simpler pieces...... 47 3.3.2 Join trees from painted mountaintops...... 49
v 3.3.3 The lower bound...... 52 3.4 Divide and conquer through contour surgery...... 52 3.5 Raining to partition M ...... 54 3.6 Contour trees of extremum dominant manifolds...... 56 3.7 Painting to compute contour trees...... 58 3.7.1 The data structures...... 58 3.7.2 The algorithm...... 59 3.7.3 Proving correctness...... 60 3.7.4 Running time...... 62 3.8 Leaf assignments and path decompositions...... 63 3.8.1 Shrubs, tall paths, and short paths...... 63 3.8.2 Proving Theorem 3.8.1...... 64 3.8.3 Proving Lemma 3.8.6: the root is everything in U ...... 65 3.8.4 Our main result...... 67 3.9 Lower bound by path decomposition...... 68 3.9.1 Join trees...... 68 3.9.2 Contour trees...... 70
Chapter 4 Weighted Voronoi Diagrams...... 72 4.1 Introduction...... 72 4.2 Problem definition and preliminaries...... 75 4.2.1 Formal definition of the candidate diagram...... 75 4.2.2 Sampling model...... 77 4.2.3 A short detour into backward analysis...... 77 4.3 The proxy set...... 78 4.3.1 Definitions...... 78 4.3.2 Bounding the size of the proxy set...... 79 4.3.3 The proxy set contains the candidate set...... 79 4.4 Bounding the complexity of the kth order proxy diagram...... 80 4.4.1 Preliminaries...... 80 4.4.2 Bounding the size of the below conflict-lists...... 82 4.4.3 Putting it all together...... 84 4.5 On the expected size of the staircase...... 85 4.5.1 Number of staircase points...... 85 4.5.2 Bounding the size of the candidate set...... 86 4.6 The main result...... 87 4.7 Backward analysis with high probability...... 87 4.8 Basic facts...... 89 4.8.1 Arrangements of lines...... 90 4.8.2 An integral calculation...... 90
References...... 92
vi List of Figures
2.1 The approximation algorithm. The implicit target value being approximated is f = f(W, Γ), where f is a ϕ-NDP. That is, there is a ϕ-decider for f, denoted by decider, and the only access to the function f is via this procedure...... 18 2.2 Algorithm for computing a constant factor approximation to a value in the interval [dδ(P), d1−δ(P)], for some fixed constant δ ∈ (0, 1)...... 35 1/4 2.3 Roughly estimating dO(log n)(P). Here c5 and c6 are sufficiently large constants...... 36
3.1 Two surfaces with different orderings of the maxima, but the same contour tree...... 44 3.2 On left, a surface with a balanced contour tree, but whose join and split trees have long tails. On right (from left to right), the contour, join and split trees...... 45 3.3 On left, downward rain spilling only (each shade of gray represents a piece created by each different spilling), producing a grid. Note we are assuming raining was done in reverse sorted order of maxima. On right, flipping the direction of rain spilling...... 48 3.4 On the left, red and blue merge to make purple, followed by the contour tree with initial colors. On the right, additional maxima and the resulting contour tree...... 51 3.5 Left: angled view of a diamond / Right: a parent and child diamond put together...... 68 3.6 A child diamond attached to a parent diamond with opposite orientation...... 71
vii Chapter 1
Introduction
Understanding the world around us and the physical phenomena it contains, is the basis of much of modern science. This broad area of study typically involves producing representative data sets, either by direct physical measurements or through simulations. Specifically, these data sets consist of (or are represented by) points in low dimensional Euclidean space. Due to the strength of modern computing, producing data is “cheap”, leading to massive data sets. The challenge is thus understanding the structure of this data and exploiting that structure to handle it efficiently. Motived by this fundamental challenge, we investigate the underlying geometric properties that lead to linear time solutions and near linear complexity structures, as linear (or near linear) complexity is required when handling massive data sets. Specifically, this thesis has three main parts. In the first part, we discuss the Net and Prune technique, which gives linear time algorithms for a large and well defined class of classical geometric optimization problems. In the second part, we provide an algorithm for computing the contour tree, a fundamental structure in computational topology. While it is not always possible to compute this structure in linear time, by taking into account the shape of the tree, we provide the first algorithm which (up to a factor of the inverse Ackermann function) is instance class optimal. In the final part, we consider various weighted Voronoi diagrams which have high worst-case complexity. Here we introduce the candidate diagram and use it to prove that by assuming that weights are randomly permuted among the sites, these diagrams instead have near linear expected complexity. The first two parts are connected by the goal of obtaining linear time algorithms. In particular, for Net and Prune, the problems covered include basic distance queries which are prerequisites for many other com- putational tasks. For contour trees, the need for linear time is even more immediate, as one of their primary uses is to analyze the output of scientific simulations run on supercomputers. To get this improved running time requires moving beyond worst-case analysis, and this is how the last two sections of the thesis connect. Specifically, we give the first linear time algorithm for balanced trees, and more generally relate the shape of the tree to the hardness of its computation. For weighted Voronoi diagrams, we identify a probabilistic input structure which leads to near linear expected complexity. As our analysis is via randomized incremental construction, a common technique for computing Voronoi diagrams, we get similar near linear bounds on the time to compute these weighted diagrams. Below we provide a high level sketch of the important aspects of each part.
1.1 Geometric optimization via Net and Prune
We now overview Chapter2, which is largely based on the paper [HR15a].
1 d n Setup. A set of n points in IR determines a set of 2 inter-point distances. Many classical geometric optimization problems, such as finding the minimum cost k-center clustering or the longest edge in the Euclidean MST, concern finding such an inter-point distance. The Net and Prune framework, provides linear time approximation algorithms for a large subset of problems which can be phrased as searching for a particular inter-point distance. This subset is defined by two key properties. First, the problem must be efficiently decidable, i.e. in linear time, given a query value, we must be able to decide whether the query is larger, smaller, or approximately equal to the optimum. Second, the problem must be “shrinkable”, that is given that we know our query was too small or too large, we must be able to use that information to throw out irrelevant input points to produce a smaller subset of the data which (approximately) preserves the optimum value.
A concrete example. Consider the problem of computing the kth smallest inter-point distance, often referred to as the distance selection problem, for which the Net and Prune framework provides a linear time algorithm. In particular, when k = 1 this is the well studied closest pair problem, which already had a linear time solution [Rab76]. However, for larger values of k we provide the first (1 + ε)-approximate solution to run in O(n/εd) time. One can draw an analogy between this and the well known selection problem, where the goal is to find the value of rank k in an array of n values. Conceptually, our algorithm for distance selection on Euclidean points mimics the standard Θ(n) time randomized selection algorithm for an array with n values. However, in selection the goal is to find the kth ranked value in a set of n values, whereas for distance selection one works with a much larger Θ(n2) sized (implicit) set of distances, yet our algorithm runs in Θ(n/εd) time.
Contributions. The main contribution of the Net and Prune framework is the identification of a simple pair of conditions which describe a large class of problems for which a linear time decision procedure can be converted into a linear time optimization procedure. These conditions are simple to check for, yet general enough to capture many previously studied geometric optimization problems, such as k-center clustering, distance selection, finding the kth longest MST edge, and more. For many of these problems, Net and Prune provides the first known linear time algorithm. However, even for problems where previously linear time algorithms were known, Net and Prune unifies them under one framework. Conceptually, they are all described by the problem of looking for a particular inter-point distance, and they are all solvable by the same searching technique. Additionally, the Net and Prune framework is simple, both at a conceptual level, and from an implemen- tation standpoint (requiring only well optimized basic primitives such as hashing and counting).
Significance of linear time. For the problems handled by Net and Prune, previously O(n log n) time approximate solutions were known (specifically via well-separated pairs decompositions [CK95, Har11]). However, removing the O(log n) here is significant for several reasons. First, with our above motivation of massive data sets, an algorithm which takes longer than the time for a few linear scans of the data may be prohibitively expensive. Second, our approach is simpler than the previous O(n log n) time algorithms. While in general one may argue that removing an O(log n) factor is not exciting, note that here we are removing the last O(log n) factor. In particular, for the problems we consider our running time is optimal. Note that more generally, problems may admit sublinear time solutions, and there has been significant success in identifying and solving such problems via sampling. However, for the problems we consider linear
2 time will be a lower bound. Specifically these problems are akin to searching for a needle in a haystack, for which sampling fails. As a simple example, consider finding the closest pair distance. For this problem either the removal or addition of a single point can dramatically change the value of the solution, and hence all points must be seen.
Approximation quality. The approximation quality produced by Net and Prune, will be the same as the approximation quality of the best (linear time) decision procedure. For most of the problems we discuss, this will be a (1 + ε)-approximation, with a couple of notable exceptions. First, for k-center clustering, our algorithm yields a 2-approximation, however, such a constant approximation is necessary as k-center clustering in the plane is known to be hard to approximate within 1.8, unless P = NP [FG88]. Second, consider the set of nearest neighbor distances (i.e. for each input point this set contains the distance to its nearest distinct input point). Surprisingly, for any rank k, we can compute the value of rank k in this set exactly in linear time. In particular, for the simple case of computing the closest pair distance, our approach matches previous results.
Expected running time. Finally, note that the Net and Prune framework provides a Las Vegas algorithm, i.e. it is always correct, but the running time is only linear in expectation. However, we also show how to turn this expected time algorithm into one which runs in linear time with polynomially high probability. Doing so is technically challenging, and while interesting from a theoretical standpoint, practically, the simpler expected linear time algorithm suffices as it quickly reduces the input to a small fraction of its initial size.
1.2 Computing contour trees in shape sensitive time
We now overview Chapter3, which is largely based on the paper [RS14].
Setup. Consider the manifold in IRd+1, induced by interpolating a height function f defined over the vertices of a simplicial mesh in IRd. A contour of this manifold, is defined as a connected component of a level set of f at some height h. As an everyday example, a contour in a barometric pressure map may, for example, represent a storm front. The contour tree is a compact structure which tracks how contours are born, split, merge, and die as h varies, and is an indispensable tool in exploring and visualizing scientific simulation data. More specifically, the set of leaves in the contour tree are in one-to-one correspondence with the set of maxima and minima in the manifold, while internal vertices represents locations where contours merge together or split apart. Returning to the example of barometric pressure maps, intuitively when viewing the pressure map as a planar graph, the contour tree is similar to the dual graph.
Sorting. Along any maxima leaf to minima leaf path in the contour tree, vertices are in sorted height ordering. Therefore, as a natural first step, previous algorithms for computing the contour tree sorted the input, leading to a Θ(n log n) running time. Imagine any partition of the contour tree into a set of disjoint and maximal length descending paths, P . Specifically, identify any maximal length descending path, remove it, and recur on the remaining subtrees (i.e. intuitively break the tree into the trunk, limbs, and branches). P For any path p ∈ P , the vertices must appear in sorted order in P and so intuitively, p∈P |p| log |p|, where P |p| is the length of p, is a lower bound on the time to compute the tree. However, observe that n = p∈P |p|,
3 P and moreover for a balanced binary tree p∈P |p| log |p| = Θ(n). Therefore, there is a gap between the cost of a global sort and our lower bound for the amount of sorting required by a specific tree structure.
Join and split Trees. There are two sister structures related to the contour tree called the join and split trees. As one varies the height h, the contour tree tracks the components of the level set (i.e. at exactly height h), while the join tree and split tree track the components of the super-level set (i.e. at h and higher) and sub-level set (i.e. at h and lower), respectively. Conceptually, since the manifold is connected, the join tree looks like a rooted tree, where the root is the global minimum (and similarly for the split tree with the global maximum). The standard algorithm for computing the contour tree first computes the join and split trees, and then interleaves them to produce the contour tree. However, we show that there are inputs for which computing the join and split trees takes Θ(n log n) time, while the contour tree can be computed in O(nα(n)) time, where α(n) is the inverse Ackermann function.
Contributions. There are two main contributions. The first is a new algorithm for computing the contour tree, which (up to a factor of α(n)) performs optimally with respect to a given input class, and in particular is the first O(nα(n)) time algorithm for computing balanced contour trees. This new algorithm consists of two main pieces. The first piece is a new linear time algorithm to partition the input complex into subcomplexes, such that the contour tree of each subcomplex is the same as its join tree. Naturally, given this partitioning scheme, the second (and more significant) piece is a new algorithm for computing the join tree, whose running time is (near) optimal with respect to the shape of the join tree. Additionally, both the new partitioning and the new join tree algorithm are simple, mainly consisting of several DFS calls. (To optimize the running time, the join tree algorithm further requires a data structure such as binomial heaps.) The second contribution is a characterization of the hardness of computing the contour tree in terms of path decompositions. Specifically, given a path decomposition P (as described above), we describe a Q family of p∈P |p|! height functions defined over a single simplicial complex such that each function has P a distinct contour tree. This implies a comparison lower bound of Ω(C(P )), where C(P ) = p∈P |p| log |p|. This lower bound is more input aware than the previous worst-case Ω(n log n) bound. Finally, to prove that our algorithm is (near) optimal, we describe a particular canonical path decomposition P 0, and prove that our algorithm requires Θ(C(P 0)) comparisons. This last step involves a sophisticated charging argument, using a variant of heavy path decompositions [ST83].
1.3 Bounding probabilistic inputs for weighted diagrams
We now overview Chapter4, which is largely based on the paper [CHR15].
Setup. Given a set of n point sites in the plane, the standard Voronoi diagram is the partition of the plane into maximal cells such that every point in a cell has the same nearest site. In other words, which cell you are contained in determines your “best site” under the Euclidean distance. However, often when determining a best site, there are other relevant features. Specifically, consider the more general setting where additionally each site is given a cost vector. For example, if the sites represent stores, then the entries in the vector might represent cost or quality of products at a given store. Here we consider weighted diagrams which take into account both the Euclidean distance as well as the cost vector when determining the best site.
4 Worst-case versus expected complexity. The standard Voronoi diagram in the plane has many nice features, such as Θ(n) worst-case complexity, convex cells, and straight line boundary pieces. However, weighted generalizations of this diagram are often not so well behaved. For example, the multiplicative diagram, where the distance to a site is the Euclidean distance times some (site dependent) positive weight, has Θ(n2) worst-case complexity, disconnected cells, and curved boundary pieces. Since the worst-case complexity of weighted diagrams is prohibitively expensive, here we focus on expected complexity. Previous papers considered the expected complexity of various geometric structures under sampled site locations, however, this appears to be a stronger assumption than we require. On the other hand, significantly weaker models like smoothed analysis appear to be insufficient (in particular, the standard quadratic lower bound example for multiplicative diagrams appears stable to such perturbations). Instead, here we assume that given any set of fixed weights and site locations, that the weights are randomly permuted among the sites (for weight vectors, this assumption is coordinate-wise).
Candidate diagram. As noted above, it is not immediately clear how to use our probabilistic input assumption since weighted diagrams are difficult to reason about directly (i.e. poorly behaved cell boundaries and shapes). Therefore, instead we introduce a new and more well behaved diagram, called the candidate diagram, which is interesting in its own right. Again, suppose we are given a set of sites, where the ith site i has an associated weight vector v . For a given location q in the plane, we say a site si is a candidate for q if there is no other site which is better than si in every way, i.e. there is no site sj which is both closer to q and coordinate-wise vj ≤ vi. The candidate diagram is then the partition of the plane into maximal cells of points with the same set of candidates.
Contributions. There are two main contributions. The first is the introduction of the candidate diagram.
Specifically, consider any weighted distance function, with the natural restriction that a site sj cannot be the nearest weighted site if there is another site si which is better in every way (i.e. nearest sites must be candidates). Observe that the candidate diagram is a single structure which conveys information about all such functions simultaneously. Specifically, your location in the diagram tells you all relevant sites under any distance function, i.e. as you change your preferences over time (i.e. vary your distance function), the candidate diagram remains fixed. As a consequence, given any such weighted distance function, the com- plexity of the candidate diagram can be used to bound the complexity of the weighted diagram. Moreover, the candidate diagram has convex cells and polygonal boundaries, making it significantly easier to reason about. As the candidate diagram is more expressive, in general it may have high complexity (i.e. potentially higher than the weighted diagrams mentioned above). Our second contribution is showing that our proba- bilistic input assumption leads to an expected O(n polylog n) bound on the complexity of such diagrams. In the process several results are shown which are of independent interest including: (i) with high probability the expected number of staircase points when sampling from the unit hypercube does not (asymptotically) exceed its expectation, (ii) a general lemma giving high probability bounds for various instances of back- wards analysis, and (iii) the expected complexity of the overlay of cells during a randomized incremental construction of the kth order Voronoi diagram is bounded by O k4n log n.
5 1.3.1 Related work
While the presentation in this thesis on weighted diagrams largely follows that in [CHR15], the author’s work on weighted Voronoi diagrams began with the paper “On the Expected Complexity of Randomly Weighted Multiplicative Voronoi Diagrams” [HR15b], discussed below.
Previous results on weighted diagrams. In [HR15b] the simpler case of scalar weights, rather than weight vectors, was considered. Therefore, for point sites in the plane, [CHR15] largely subsumes this work. However, unlike [CHR15], in [HR15b] a more general class of allowable sites was considered. This includes not only point sites, but also sites which are interior-disjoint convex sets, and more generally sites whose bisectors respect (a superset of) the conditions for abstract Voronoi diagrams [KLN09], a class defined to capture the minimal properties required for linear complexity unweighted diagrams. Specifically, for a set of n such sites, an O(n polylog n) bound on the expected complexity of the randomly weighted multiplica- tive Voronoi diagram was shown. As a consequence this yields an alternative proof to that of Agarwal et al. [AHKS14] of the near linear complexity of the union of randomly expanded disjoint segments or con- vex sets (with an improved bound on the latter).
The author has also worked on bounding the expected complexity of other types of Voronoi diagrams, such as in the paper “On the Expected Complexity of Voronoi Diagrams on Terrains” [DHR12], discussed below.
Voronoi diagrams on terrains. In [DHR12], the combinatorial complexity of geodesic Voronoi diagrams on polyhedral terrains was investigated using a probabilistic analysis. Aronov et al. [ABT08] proved that, √ if one makes certain realistic input assumptions on the terrain, this complexity is Θ(n + m n) in the worst case, where n denotes the number of triangles that define the terrain and m denotes the number of Voronoi sites. In [DHR12] it was shown that under a relaxed set of assumptions the Voronoi diagram has expected complexity O(n + m), given that the sites are sampled uniformly at random from the domain of the terrain (or the surface of the terrain). Furthermore, a construction was presented of a terrain which implies a lower bound of Ω(nm2/3) on the expected worst-case complexity if these assumptions on the terrain are dropped.
1.4 Remarks
Computation model. Throughout this thesis the model of computation will be the real-RAM model. This is the standard model in Computational Geometry, and allows for storing arbitrary real numbers, along with unit cost arithmetic operations and comparisons. Typically, this model does not include a unit cost floor function (i.e. conversion between real numbers and integers), however, in Chapter2 a unit cost floor function is assumed, as this is required for efficient grid computations. Subtleties about this assumption are discussed further in Chapter2.
A note on dimension. In this thesis, the Euclidean dimension, d, is assumed to be a small fixed constant, and hence constant factors involving d are ignored. This assumption is required as many standard techniques used in this thesis are exponential in the dimension. However, as discussed above there are many problems related to the physical world in which this is a natural assumption.
6 More generally, in low dimensional settings, the set of available tools is much broader, leading to a larger class of problems which can be efficiently handled (i.e. the types of problems considered in this thesis). Indeed, even in high dimensional settings, in order to make the problem tractable, often one assumes the input has some “hidden” lower-dimensional structure.
7 Chapter 2
Net and Prune
2.1 Introduction
In many optimization problems one is given a geometric input and the goal is to compute some real valued function defined over it. For example, for a finite point set, the function might be the distance between the closest pair of points. For such a function, there is a search space associated with the function, encoding possible candidate values, and the function returns the minimum value over all possible solutions. For the closest pair problem, the search space is the set of all pairs of input points and their associated distances, and the target function computes the minimum pairwise distance in this space. As such, computing the value of such a function requires searching for the optimal solution, and returning its value; that is, optimization. Other examples of such functions include: (A) The minimum cost of a clustering of the input. Here, the search space is the possible partitions of the data into clusters together with their associated prices. (B) The price of a minimum spanning tree. Here, the search space is the set of all possible spanning trees of the input (and their associated prices). (C) The radius of the smallest enclosing ball. Here, the search space is the set of all possible balls that enclose the input and their radii (i.e., it is not necessarily finite). Naturally, many other problems can be described in this way. It is often possible to construct a decision procedure, for such an optimization problem, which given a query value can decide whether the query is smaller or larger than the value of the function, without computing the function explicitly. Naturally, one would like to use this decider to compute the function itself by (say) performing a binary search, to compute the minimum value which is still feasible (which is the value of the function). However, often this is inherently not possible as the set of possible query values is a real interval (which is not finite). Instead, one first identifies a finite set of critical values which must contain the optimum value, and then searches over these values. Naturally, searching over these values directly can be costly as often the number of such critical values is much larger than the desired running time. Instead one attempts to perform an implicit search. One of the most powerful techniques to solve optimization problems efficiently in Computational Geom- etry, using such an implicit search, is the parametric search technique of Megiddo [Meg83]. It is relatively complicated, as it involves implicitly extracting values from a simulation of a parallel decision procedure (often a parallel sorting algorithm). For this reason it is inherently not possible for parametric search to lead to algorithms which run faster than O(n log n) time. Nevertheless, it is widely used in designing efficient geometric optimization algorithms, see [AST94, Sal97]. Luckily, in many cases one can replace parametric search by simpler techniques (see prune-and-search below for example) and in particular, it can be replaced
8 by randomization, see the work by van Oostrum and Veltkamp [vOV04]. Another example of replacing para- metric search by randomization is the new simplified algorithm for the Fréchet distance [HR11]. Surprisingly, sometimes these alternative techniques can actually lead to linear time algorithms.
Linear time algorithms. There seems to be three main ways to get linear time algorithms for geometric optimization problems (exact or approximate): (A) Coreset/sketch. One can quickly extract a compact sketch of the input that contains the desired quantity (either exactly or approximately). As an easy example, consider the problem of computing the axis parallel bounding box of a set of points - an easy linear scan suffices. There is by now a robust theory of what quantities one can extract a coreset of small size for, such that one can do the (approximate) calculation on the coreset, where usually the coreset size depends only on the desired approximation quality. This leads to many linear time algorithms, from shape fitting [AHV04], to (1 + ε)-approximate k-center/median/mean clustering [Har01, Har04a, HM04, HK07] in constant dimension, and many other problems [AHV05]. The running times of the resulting algorithms are usually O(n + func(sketch size)). The limitation of this technique is that there are problems for which there is no small sketch, from clustering when the number of clusters is large, to problems where there is no sketch at all [Har04b] – for example, for finding the closest pair of points one needs all the given input and no sketching is possible. (B) Prune and search. Here one prunes away a constant fraction of the input, and continues the search recursively on the remaining input. The paramount example of such an algorithm is the linear time median finding algorithm, however, one can interpret many of the randomized algorithms in Compu- tational Geometry as pruning algorithms [Cla88, CS89, Mul94]. For example, linear programming in constant dimension in linear time [Meg84], and its extension to LP-type problems [SW92, MSW96]. Intuitively, LP-type problems include low dimensional convex programming (a standard example is the smallest enclosing ball of a point set in constant dimension). However, surprisingly, such problems also include problems that are not convex in nature – for example, deciding if a set of (axis parallel) rectangles can be pierced by three points is an LP-type problem. Other examples of prune-and-search algorithms that work in linear time include (i) computing an ear in a triangulation of a polygon [EET93], (ii) searching in sorted matrices [FJ84], and (iii) ham-sandwich cuts in two dimensions [LMS94]. Of course, there are many other examples of using prune and search with running time that is super-linear. (C) Grids. Rabin [Rab76] used randomization, the floor function, and hashing to compute the closest pair of a set of points in the plane, in expected linear time. Golin et al. [GRSS95] presented a simplified version of this algorithm, and Smid provides a survey of algorithms on closest pair problems [Smi00]. A prune and search variant of this algorithm was suggested by Khuller and Matias [KM95]. Of course, the usage of grids and hashing to perform approximate point-location is quite common in practice. By itself, this is already sufficient to break lower bounds in the comparison model, for example for k-center clustering [Har04a]. The only direct extension of Rabin’s algorithm the author is aware of is the work by Har-Peled and Mazumdar [HM05] showing a linear time 2-approximation to the smallest ball containing k points (out of n given points). There is some skepticism of algorithms using the floor function, since Schönhage [Sch79] showed how to solve a PSPACE complete problem, in polynomial time, using the floor function in the real RAM model – the trick being packing many numbers into a single word (which can be arbitrarily long
9 in the RAM model, and still each operation on it takes only constant time). Note, that as Rabin’s algorithm does not do any such packing of numbers (i.e., its computation model is considerably more restricted), this criticism does not seem to be relevant in the case of this algorithm and its relatives.
In this chapter, we present a new technique that combines together all of the above techniques to yield linear time approximation algorithms.
Nets. Given a point set P, an r-net N of P is a subset of P that represents well the structure of P in resolution r. Formally, we require that for any point in P there is a net point in distance at most r from it, and no two net points are closer than r to each other, see Section 2.2.2 for a formal definition. Thus, nets provide a sketch of the point set for distances that are r or larger. Nets are a useful tool in presenting point sets hierarchically. In particular, computing nets of different resolutions and linking between different levels, leads to a tree like data-structure that can be used to facilitate many tasks, see for example the net-tree [KL04, HM06] for such a data-structure for doubling metrics. Nets can be defined in any metric space, but in Euclidean space a grid can sometimes provide an equivalent representation. In particular, net-trees can be interpreted as an extension of (compressed) quadtrees to more abstract settings. Computing nets is closely related to k-center clustering. Specifically, Gonzalez [Gon85] shows how to compute an approximate net that has k points in O(nk) time, which is also a 2-approximation to the k- center clustering. This was later improved to O(n) time, for low dimensional Euclidean space [Har04a], if k is sufficiently small (using grids and hashing). Har-Peled and Mendel showed how to preprocess a point set in a metric space with constant doubling dimension, in O(n log n) time, such that an (approximate) r-net can be extracted in (roughly) linear time in the size of the net.
Our contribution
We consider problems of the following form: Given a set P of weighted points in IRd, one wishes to solve an optimization problem whose solution is one of the pairwise distances of P (or “close” to one of these values). Problems of this kind include computing the optimal k-center clustering, or the length of the kth edge in the MST of P, and many others. Specifically, we are interested in problems for which there is a fast approximate decider. That is, given a value r > 0 we can, in linear time, decide if the desired value is (approximately) smaller than r or larger than r. The goal is then to use this decider to approximate the optimum solution in linear time. As a first step towards a linear time approximation algorithm for such problems, we point out that one can compute nets in linear time in IRd, see Section 2.2.2. However, even if we could implicitly search over the critical values (which we cannot) then we still would require a logarithmic number of calls to the decider which would yield a running time of O(n log n), as the number of critical values is at least linear (and usually polynomial). So instead we use the return values of the decision procedure as we search to thin out the data so that future calls to the decision procedure become successively cheaper. However, we still cannot search over the critical values (since there are too many of them) and so we also introduce random sampling in order to overcome this.
Outline of the new technique. The new algorithm works by randomly sampling a point and computing the distance to its nearest neighbor. Let this distance be r. Next, we use the decision procedure to decide if we are in one of the following three cases.
10 (A) Net. If r is too small then we zoom out to a resolution of r by computing an r-net and continuing the computation on the net instead of on the original point set. That is, we net the point set into a smaller point set, such that one can solve the original problem (approximately) on this smaller sketch of the input. (B) Prune. If r is too large then we remove all points whose nearest neighbor is further than r away (of course, this implies we should only consider problems for which such pruning does not affect the solution). That is, we isolate the optimal solution by pruning away irrelevant data – this is similar in nature to what is being done by prune-and-search algorithms. (C) Done. The value of r is the desired approximation.
We then continue recursively on the remaining data. In either case, the number of points being handled (in expectation) goes down by a constant factor and thus the overall expected running time is linear. In particular, getting this constant factor decrease is the reason we chose to sample from the set of nearest neighbor distances rather than from the set of all inter-point distances.
Significance of results. Our basic framework is presented in a general enough manner to cover, and in many cases greatly simplify, many problems for which linear time algorithms had already been discovered. At the same time the framework provides new linear time algorithms for a large collection of problems, for which previously no linear time algorithm was known. The framework should also lead to algorithms for many other problems which are not mentioned. At a conceptual level the basic algorithm is simple enough (with its basic building blocks already having efficient implementations) to be highly practical from an implementation standpoint. Perhaps more im- portantly, with increasing shifts toward large data sets, algorithms with super linear running time can be impractical. Additionally, our framework seems amenable to distributed implementation in frameworks like MapReduce. Indeed, every iteration of our algorithm breaks the data into grid cells, a step that is similar to the map phase. In addition, the aggressive thinning of the data by the algorithm guarantees that after the first few iterations the algorithm is resigned to working on only a tiny fraction of the data.
Significance of the netting step. Without the net stage, our framework is already sufficient to solve the closest pair problem, and in this specific case, the algorithm boils down to the one provided by Khuller and Matias [KM95]. This seems to be the only application of the new framework where the netting is not necessary. In particular, the wide applicability of the framework seems to be the result of the netting step.
Framework and results. We provide a framework that classifies which optimization problems can be solved using the new algorithm. We get the following new algorithms (all of them have an expected linear running time, for any fixed ε): (A) k-center clustering (Section 2.4.1). We provide an algorithm that 2-approximates the optimal k- center clustering of a point set in IRd. Unlike the previous algorithm [Har04a] that was restricted to k = O(n1/6), the new algorithm works for any value of k. This new algorithm is also simpler. (B) kth smallest distance (Section 2.4.2). In the distance selection problem, given a set of points in IRd, one would like to compute the kth smallest distance defined by a pair of points of P. It is believed that such exact distance selection requires Ω n4/3 time in the worst case [Eri95], even in the plane (in higher dimensions the bound deteriorates). We present an O(n/εd) time algorithm that
11 (1+ε)-approximates the kth smallest distance. Previously, Bespamyatnikh and Segal [BS02] presented O(n log n + n/εd) time algorithm using a well-separated pairs decomposition (see also [DHW12]). Given two sets of points P and W with a total of n points, using the same approach, we can (1 + ε)- approximate the kth smallest distance in a bichromatic set of distances X = {d(p, q) | p ∈ P, q ∈ W} . Intuitively, the mechanism behind distance selection underlines many optimization problems, as it (essentially) performs a binary search over the distances induced by a set of points. As such, being able to do approximate distance selection in linear time should lead to faster approximation algorithms that perform a similar search over such distances. d (C) The kth smallest m-nearest neighbor distance (Section 2.4.3). For a set P = {p1,..., pn} ⊆ IR of n points, and a point p ∈ P, its mth nearest neighbor in P is the mth closest point to p in P\{p}.
In particular, let dm(p, P) denote this distance. Here, consider the set of these distances defined for
each point of P; that is, X = {dm(p1, P),..., dm(pn, P)}. We can approximate the kth smallest number in this set in linear time. (D) Exact nearest neighbor distances (Section 2.4.3). For the special case where m = 1, one can turn the above approximation into an exact computation of the kth nearest neighbor distance with only a minor post processing grid computation. As an application, when k = n, one can compute in linear time, exactly, the furthest nearest neighbor distance; that is, the nearest neighbor distance of the point whose nearest neighbor is furthest away. This measure can be useful, for example, in meshing applications where such a point is a candidate for a region where the local feature size is large and further refinement is needed. We are unaware of any previous work directly on this problem, although one can compute this quantity by solving the all-nearest-neighbor problem, which can be done in O(n log n) time [Cla83]. This is to some extent the “antithesis” to Rabin’s algorithm for the closest pair problem, and it is somewhat surprising that it can also be solved in linear time. (E) The kth longest MST edge (Section 2.4.4). Given a set P of n points in IRd, we can (1 + ε)- approximate, in O(n/εd) time, the kth longest edge in the MST of P. (F) Smallest ball with a monotone property (Section 2.4.5). Consider a property defined over a set of points, P, that is monotone; that is, if W ⊆ Q ⊆ P has this property then Q must also have this property. Consider such a property that can be easily checked, for example, (i) whether the set contains k points, or (ii) if the points are colored, whether all colors are present in the given point set. Given a point set P, one can (1 + ε)-approximate, in O(n/εd) time, the smallest radius ball b, such that b ∩ P has the desired property. For example, we get a linear time algorithm to approximate the smallest ball enclosing k points of P. The previous algorithm for this problem [HM05] was significantly more complicated. Furthermore, we can approximate the smallest ball such that all colors appear in it (if the points are colored), or the smallest ball such that at least t different colors appear in it, etc. More generally, the kind of monotone properties supported are sketchable; that is, properties for which there is a small summary of a point set that enables one to decide if the property holds, and furthermore, given summaries of two disjoint point sets, the summary of the union point set can be computed in constant time. We believe that formalizing this notion of sketchability is a useful abstraction. See Section 2.4.5 for details. (G) Smallest connected component with a monotone property (Section 2.4.5). Consider the connected components of the graph where two points are connected if they are distance at most r from each other. Using our techniques, one can approximate, in linear time, the smallest r such that
12 there is a connected component of this graph for which a required sketchable property holds for the points in this connected component. As an application, consider ad hoc wireless networks. Here, we have a set P of n nodes and their locations (say in the plane), and each node can broadcast in a certain radius r (the larger the r the higher the energy required, so naturally we would like to minimize it). Assume there are two special nodes. It is natural to ask for the minimum r, such that there is a connected component of the above graph that contains both nodes. That is, these two special nodes node can send message to each other, by message hopping (with distance at most r at each hop). We can approximate this connectivity radius in linear time. (H) Clustering for a monotone property (Section 2.4.6). Imagine that we want to break the given point set into clusters, such that the maximum diameter of a cluster is minimized (as in k-center clustering), and furthermore, the points assigned to each cluster comply with some sketchable mono- tone property. We present a (4 + ε)-approximation algorithm for these types of problems, that runs in O(n/εd) time. This includes lower bounded clustering (i.e., every cluster must contain at least α points), for which recently the author and his co-authors provided an O(n log(n/ε)) time (4 + ε)-approximation algorithm [ERH12]. One can get a 2-approximation using network flow, but the running is significantly worse [APF+10]. See Section 2.4.6 for examples of clustering problems that can be approximated using this algorithm. (I) Connectivity clustering for a monotone property (Section 2.4.6). Consider the problem of computing the minimum r, such that each connected component (of the graph where points in distance at most r from each other are adjacent) has some sketchable monotone property. We approximate the minimum r for which this holds in linear time. An application of this for ad hoc networks is the following – we have a set P of n wireless clients, and some of them are base stations; that is, they are connected to the outside world. We would like to find the minimum r, such that each connected component of this graph contains a base station. (J) Closest pair and smallest non-zero distance (Section 2.4.7). Given a set of points in IRd, consider the problem of finding the smallest non-zero distance defined by these points. This problem is an extension of the closest pair distance, as there might be many identical points in the given point set. We provide a linear time algorithm for computing this distance exactly, which follows easily from our framework.
High probability. Finally, we show in Section 2.5 how to modify our framework such that the linear running time holds with high probability. Since there is very little elbow room in the running time when committed to linear running time, this extension is quite challenging, and requires several new ideas and insights. See Section 2.5 for details.
What can be done without the net & prune framework. For the problems studied in this chapter, an approximation algorithm with running time O(n log n) follows by using a (1 + ε)-WSPD (well-separated pairs decomposition [CK95, Har11]), for some fixed ε, to compute O(n) relevant resolutions, and then using binary search with the decider to find the right resolution. Even if the WSPD is given, it is not clear though how to get a linear time algorithm for the problems studied without using the net & prune framework. In particular, if the input has bounded spread, one can compute an O(1)-WSPD in linear time, and then use the WSPD inside the net & prune framework to get a deterministic linear time algorithm, see Section 2.3.3.
13 2.2 Preliminaries
2.2.1 Basic definitions
The spread of an interval I = [x, y] ⊆ IR+ is sprd(I) = y/x.
Definition 2.2.1. For a point set P in a metric space with a metric d, and a parameter r > 0, an r-net of P is a subset C ⊆ P, such that (i) for every p, q ∈ C, p 6= q, we have that d(p, q) ≥ r, and (ii) for all p ∈ P,
we have that minq∈C d(p, q) < r.
Intuitively, an r-net represents P in resolution r.
d Definition 2.2.2. For a real positive number ∆ and a point p = (p1,..., pd) ∈ IR , define G∆(p) to be the grid point (bp1/∆c∆,..., bpd/∆c∆).
The quantity ∆ is the width or sidelength of the grid G∆. Observe ∆ d that G∆ partitions IR into cubes, which are grid cells. The grid cell of p is uniquely identified by the integer point id(p) = bp /∆c ,..., bp /∆c. 1 d r For a number r ≥ 0, let N≤r(p) denote the set of grid cells in distance ≤ r p from p, which is the neighborhood of p. Note, that the neighborhood also
includes the grid cell containing p itself, and if ∆ = Θ(r) then |N≤r(p)| = Θ (2 + d2r/∆e)d = Θ(1). See figure on the right.
d N r(p) 2.2.2 Computing nets quickly for a point set in IR ≤
There is a simple algorithm for computing r-nets. Namely, let all the points in P be initially unmarked. While there remains an unmarked point, p, add p to C, and mark it and all other points in distance < r from p (i.e. we are scooping away balls of radius r). By using grids and hashing one can modify this algorithm to run in linear time. The following is implicit in previous work [Har04a], and we include it here for the sake of completeness∗ – it was also described by the author and his co-authors in [ERH12].
Lemma 2.2.3. Given a point set P ⊆ IRd of size n and a parameter r > 0, one can compute an r-net for P in O(n) time. √ Proof : Let G denote the grid in IRd with side length ∆ = r/ 2 d . First compute for every point p ∈ P the grid cell in G that contains p; that is, id(p). Let G denote the set of grid cells of G that contain points of P. Similarly, for every cell ∈ G we compute the set of points of P which it contains. This task can be performed in linear time using hashing and bucketing assuming the floor function can be computed in constant time. Specifically, store the id(·) values in a hash table, and in constant time hash each point into its appropriate bin. Scan the points of P one at a time, and let p be the current point. If p is marked then move on to the next point. Otherwise, add p to the set of net points, C, and mark it and each point q ∈ P such that d(p, q) < r. Since the cells of N≤r(p) contain all such points, we only need to check the lists of points stored in these grid cells. At the end of this procedure every point is marked. Since a point can only be marked if it is in distance < r from some net point, and a net point is only created if it is unmarked when visited, this implies that C is an r-net.
∗Specifically, the algorithm of Har-Peled [Har04a] is considerably more complicated than Lemma 2.2.3, and does not work in this settings, as the number of clusters it can handle is limited to O n1/6. Lemma 2.2.3 has no such restriction.
14 As for the running time, observe that a grid cell, c, has its list scanned only if c is in the neighborhood of some created net point. As ∆ = Θ(r), there are only O(1) cells which could contain a net point p such that c ∈ N≤r(p). Furthermore, at most one net point lies in a single cell since the diameter of a grid cell is strictly smaller than r. Therefore each grid cell had its list scanned O(1) times. Since the only real work done is in scanning the cell lists and since the cell lists are disjoint, this implies an O(n) running time overall.
Observe that the closest net point, for a point p ∈ P, must be in one of its neighborhood’s grid cells. Since every grid cell can contain only a single net point, it follows that in constant time per point of P, one can compute each point’s nearest net point. We thus have the following.
Corollary 2.2.4. For a set P ⊆ IRd of n points, and a parameter r > 0, one can compute, in linear time, an r-net of P, and furthermore, for each net point the set of points of P for which it is the nearest net point.
In the following, a weighted point is a point that is assigned a positive integer weight. For any subset P S of a weighted point set P, let |S| denote the number of points in S and let ω(S) = p∈S ω(p) denote the total weight of S. In particular, Corollary 2.2.4 implies that for a weighted point set one can compute the following quantity in linear time.
Algorithm 2.2.5 (net). Given a weighted point set P ⊆ IRd, let net(r, P) denote an r-net of P, where the weight of each net point p is the total sum of the weights of the points assigned to it. We slightly abuse notation, and also use net(r, P) to designate the algorithm (of Corollary 2.2.4) for computing this net, which has linear running time.
2.2.3 Identifying close and far points
For a given point p ∈ P, let
ν(p, P) = arg min d(p, q) and d(p, P) = min d(p, q), (2.1) q∈P\{p} q∈P\{p}
denote the nearest neighbor of p in P \{p}, and the distance to it, respectively. The quantity d(p, P) can be computed (naively) in linear time by scanning the points. For a set of points P, and a parameter r, let P≥r denote the set of r-far points; that is, it is the set of all points p ∈ P, such that the nearest-neighbor of p in P \{p} is at least distance r away (i.e., d(p, P) ≥ r). Similarly, P Lemma 2.2.6. Given a weighted set P ⊆ IRd of n points, and a distance ` > 0, in O(n) time, one can compute the sets P<` and P≥`. Let delFar(`, P) denote the algorithm which computes these sets and returns P<`. Proof : Build a grid where every cell has diameter `/c, for some constant c > 1. Clearly any point p ∈ P such that d(p, P) ≥ ` (i.e., a far point) must be in a grid cell by itself. Therefore to determine all such “far” points only grid cells with singleton points in them need to be considered. For such a point q, to determine if d(q, P) ≥ `, one checks the points stored in all grid cells in distance ≤ ` from it. If q has no such close neighbor, we mark q for inclusion in P≥r. By the same arguments as in Lemma 2.2.3, the number of such cells is O(1). Again by the arguments of Lemma 2.2.3 every non-empty grid cell gets scanned O(1) times 15 overall, and so the running time is O(n). Finally, we copy all the marked (resp. unmarked) points to P≥` (resp. P<`). The algorithm delFar(`, P) then simply returns P<`. 2.3 Approximating nice distance problems 2.3.1 Problem definition and an example Definition 2.3.1. Let P and Q be two sets of weighted points in IRd (of the same weight). The set Q is a ∆-drift of P, if Q can be constructed by moving each point of P by distance at most ∆ (and not altering its weight). Formally, there is an onto mapping f : P → Q, such that (i) For p ∈ P, we have that d(p, f(p)) ≤ ∆, and (ii) for any q ∈ Q, we have that ω(q) is the sum of the weights of all points p ∈ P such that f(p) = q. Note that for a (potentially weighted) point set W, net(∆, W) is a ∆-drift of W. Definition 2.3.2. Given a set X and a function f : X → IR, a procedure decider is a ϕ-decider for f, if for any x ∈ X and r > 0, decider(r, x) returns one of the following: (i) f(x) ∈ [α, ϕα], where α is some real number, (ii) f(x) < r, or (iii) f(x) > r. Definition 2.3.3 (ϕ-NDP). An instance of a ϕ-NiceDistanceProblem consists of a pair (W, Γ), where W ⊆ IRd is a set of n distinct weighted points†, and Γ is the context of the given instance (of size O(n)) and consists of the relevant parameters for the problem‡. For a fixed ϕ-NDP, the task is to evaluate a function f(W, Γ) → IR+, defined over such input pairs, that has the following properties: • Decider: There exists an O(n) time ϕ-decider for f, for some constant ϕ ≥ 1. • Lipschitz: Let Q be any ∆-drift of W. Then |f(W, Γ) − f(Q, Γ)| ≤ 2∆. • Prune: If f(W, Γ) < r then f(W, Γ) = f(W Remark 2.3.4 ((1 + ε)-NDP). If we are interested in a (1+ε)-approximation, then ϕ = 1+ε. In this case, we require that the running time of the provided (1 + ε)-decider (i.e., property Decider above) is O(n/εc), for some constant c ≥ 1. This covers all the applications presented in this chapter. Our analysis still holds even if the decider has a different running time, but the overall running time might increase by a factor of log(1/ε), see Lemma 2.3.18p21 for details. An example – k center clustering As a concrete example of an NDP, consider the problem of k-center clustering. Problem 2.3.5 (kCenter). Let W be a set of points in IRd, and let k > 0 be an integer parameter. Find a set of centers C ⊆ W such that the maximum distance of a point of W to its nearest center in C is minimized, and |C| = k. Specifically, the function of interest, denoted by fcen(W, k), is the radius of the optimal k-center clustering of W. We now show that kCenter satisfies the properties of an NDP. †The case when W is a multiset can also be handled. See Remark 2.3.10. ‡In almost all cases the context is a set of integers, usually of constant size. 16 Remark 2.3.6. Usually one defines the input for kCenter to be a set of unweighted points. However, since the definition of NDP requires a weighted point set, we naturally convert the initial input into a set of unit weight points, W. Also, for simplicity we assume fcen(W, k) 6= 0, i.e. |W| > k (which can easily be checked). Both assumptions would be used implicitly throughout this chapter. Lemma 2.3.7. The following relations between |net(r, W)| and fcen(W, k) hold: (A) If we have |net(r, W)| ≤ k then fcen(W, k) < r. (B) If |net(r, W)| > k then r ≤ 2fcen(W, k). Proof : (A) Observe that if |net(r, W)| ≤ k, then, by definition, the set of net points of net(r, W) are a set of ≤ k centers such that all the points of W are in distance strictly less than r from these centers. (B) If |net(r, W)| > k then W must contain a set of k + 1 points whose pairwise distances are all at least r. In particular any solution to kCenter with radius < r/2 would not be able to cover all these k + 1 points with only k centers. Lemma 2.3.8. For any instance (W, k) of kCenter, fcen(W, k) satisfies the properties of Definition 2.3.3; that is, kCenter is a (4 + ε)-NDP, for any ε > 0. Proof : We need to verify the required properties, see Definition 2.3.3p16. Decider: We need to describe a decision procedure for kCenter clustering. To this end, given a distance r, the decider first calls net(r, W), see Algorithm 2.2.5p15. If we have |net(r, W)| ≤ k, then by Lemma 2.3.7 the answer “fcen(W, k) < r” can be returned. Otherwise, call net((2 + ε/2)r, W). If |net((2 + ε/2)r, W)| ≤ k, then, by Lemma 2.3.7, we have r/2 ≤ fcen(W, k) < (2 + ε/2)r and “f ∈ [r/2, (2 + ε/2)r]” is returned. Otherwise |net((2 + ε/2)r, W)| > k, and Lemma 2.3.7 implies that the answer “r < fcen(W, k)” can be returned by the decider. Lipschitz: Observe that if Q is a ∆-drift of W, then a point and its respective center in a k-center clustering of W each move by distance at most ∆ in the transition from W to Q. As such, the distance between a point and its center changes by at most 2∆ by this process. This argument also works in the other direction, implying that the k-center clustering radius of W and Q are the same, up to an additive error of 2∆. fcen(W, k) < r then any point of W whose neighbors are all ≥ r away must be a center by itself in the optimal k-center solution, as otherwise it would be assigned to a center ≥ r > fcen(W, k) away. Similarly, any point assigned to it would be assigned to a point ≥ r > fcen(W, k) away. Therefore, for any point p ∈ W whose nearest neighbor is at least distance r away, fcen(W, k) = fcen(W \{p} , k − 1). Repeating this observation implies the desired result. The context update (and computing W 2.3.2 A linear time approximation algorithm We now describe the general algorithm which given a ϕ-NDP, (W, Γ), and an associated target function, f, computes in linear time a O(ϕ) spread interval containing f(W, Γ). In the following, let decider denote the given ϕ-decider, where ϕ ≥ 1 is some parameter. Also, let contextUpdate denote the context updater associated with the given NDP problem, see Definition 2.3.3. Both decider and contextUpdate run in linear time. The algorithm for bounding the optimum value of an NDP is shown in Figure 2.1. 17 ndpAlg(W, Γ): Let W0 = W, Γ0 = Γ and i = 1. while TRUE do 1: Randomly pick a point p from Wi−1. 2: `i ← d(p, Wi−1). // Next, estimate the value of fi−1 = f(Wi−1, Γi−1) res> = decider(`i, Wi−1, Γi−1) res< = decider(cnet`i, Wi−1, Γi−1) // cnet = 37: Corollary 2.3.16. 3: if res> = “fi−1 ∈ [x, y]” then return “f(W, Γ) ∈ [x/2, 2y]”. 0 0 0 0 4: if res< = “fi−1 ∈ [x , y ]” then return “f(W, Γ) ∈ [x /2, 2y ]”. if res> = “`i Figure 2.1: The approximation algorithm. The implicit target value being approximated is f = f(W, Γ), where f is a ϕ-NDP. That is, there is a ϕ-decider for f, denoted by decider, and the only access to the function f is via this procedure. Remark 2.3.9. For the sake of simplicity of exposition we assume that f(W, Γ) 6= 0. Since all our appli- cations have f(W, Γ) 6= 0 we choose to make this simplifying assumption. In particular, we later show in Corollary 2.3.16 that if initially f(W, Γ) 6= 0, then ndpAlg preserves this property. Remark 2.3.10. Note that the algorithm of Figure 2.1 can be modified to handle inputs where W is a multiset (namely, two points can occupy the same location), and we still assume (as done above) that f(W, Γ) 6= 0. Specifically, it must be ensured that the distance computed in Line 2 is not zero, as this is required for the call to decider. This can be remedied (in linear time) by first grouping all the duplicates of the point sampled in Line 1 into a single weighted point (with the sum of the weights) before calling Line 2. This can be done by computing a net for a radius that is smaller than (say) half the smallest non-zero distance. How to compute this distance is described in Section 2.4.7. Analysis A net iteration of the algorithm is an iteration where net gets called. A prune iteration is one where delFar gets called. Note that the only other type of iteration is the one where the algorithm returns. In the following, P≤` is defined analogously to P<` (see Section 2.2.3). Lemma 2.3.11. Let P be a point set. A (2+ε)`-net of P, for any ε > 0, can contain at most half the points of P≤`. Proof : Consider any point p in P≤` which became one of the net points. Since p ∈ P≤`, a disk of radius ` centered at p must contain another point q from P≤` (indeed, p ∈ P≤` only if its distance from its nearest neighbor in P is at most `). Moreover, q cannot become a net point since it is too close to p. Now if we place a ball of radius ` centered at each point of P≤` which became a net point, then these balls will be disjoint 18 because the pairwise distance between net points is ≥ (2 + ε)`. Therefore each point of P≤` which became a net point can charge to at least one point of P≤` which did not make it into the net, such that no point gets charged twice. Lemma 2.3.12. Given an instance (W, Γ) of an NDP, the algorithm ndpAlg(W, Γ) runs in expected O(n) time. Proof : In each iteration of the while loop the only non-trivial work done is in computing `i, the two calls to decider, and the one call to either net or delFar. It has already been shown that all of these can be Pi=k−1 computed in O(|Wi−1|) time. Hence the total running time for the algorithm is O i=0 |Wi| , where k denotes the last (incomplete) iteration of the while loop. So consider the beginning of iteration i < k of the while loop. Let the points in Wi−1 be labeled p1, p2,..., pm in increasing order of their nearest neighbor distance in Wi−1. Let j be the index of the point ≥j ≤j chosen in Line 1 and let (Wi−1) and (Wi−1) be the subset of the points with index ≥ j and index ≤ j, respectively. Now since a point is randomly picked in Line 1, with probability ≥ 1/2, j ∈ [m/4, 3m/4]. Lets ≥j ≤j call this event a successful iteration. We have min (Wi−1) , (Wi−1) ≥ |Wi−1| /4 for a successful iteration. Since i < k is not the last iteration of the while loop, either delFar or net must get called. If ≥j delFar(`i, Wi−1) gets called (i.e. Line 6) then by Lemma 2.2.6, all of (Wi−1) gets removed. So sup- pose net gets called (i.e. Line 7). In this case Lemma 2.3.11 implies that the call to net removes at least ≤j (Wi−1) /2 points. ≥j ≤j Therefore, for any iteration i < k, at least νi = min ( (Wi−1) , (Wi−1) /2) points get removed. If an iteration is successful then νi ≥ |Wi−1| /8. In particular, E νi | |Wi−1| ≥ |Wi−1| /16. Now, |Wi| ≤ |Wi−1|−νi and as such E |Wi| | |Wi−1| ≤ (15/16) |Wi−1|. Therefore, for 0 < i < k, h i h i 15 15 h i E |Wi| = E E |Wi| | Wi−1 ≤ E |Wi−1| = E |Wi−1| . 16 16 i Hence, by induction on i, E |Wi| ≤ (15/16) |W0| and so, in expectation, the running time is bounded by "i=k−1 #! i=k−1 ! i=k−1 ! X X h i X i O E |Wi| = O E |Wi| = O (15/16) |W0| = O |W0| . i=0 i=0 i=0 Correctness. The formal proof of correctness is somewhat tedious, but here is the basic idea: At every iteration, either far points are being thrown away (and this does not effect the optimal value), or we net the points. However, the net radius being used is always significantly smaller than the optimal value, and throughout the algorithm execution the radii of the nets being used grow exponentially. As such, the accumulated error in the end is proportional to the radius of the last net computation before termination, which itself is also much smaller than the optimal value. Before proving that ndpAlg returns a bounded spread interval containing f(W0, Γ0), several helper lemmas will be needed. For notational ease, in the rest of this section, we use f(Wi) as shorthand for f(Wi, Γi). 19 Lemma 2.3.13. Suppose that net is called in iteration i of the while loop (i.e., Line 7 in Figure 2.1p18). Then for any iteration j > i we have, `j ≥ 3`i. Proof : Consider the beginning of iteration j of the while loop. The current set of points, Wj−1, are a subset of the net points of a 3`i-net (it is a subset since Line 6 and Line 7 might have been executed in between rounds i and j). Therefore, being a net, the distance between any two points of Wj−1 is ≥ 3`i, see Definition 2.2.1p14. In particular, this means that for any point p of Wj−1, we have d(p, Wj−1) ≥ 3`i. Lemma 2.3.14. For i = 1, . . . , k, we have |f(Wi) − f(W0)| ≤ 9`i. Proof : Let I be the set of indices of the net iterations up to (and including) the ith iteration. Similarly, let I be the set of iterations where delFar get called. If net was called in the jth iteration, then Wj is a 3`j-drift of Wj−1 and so by the Lipschitz property, |f(Wj) − f(Wj−1)| ≤ 6`j. On the other hand, if delFar gets called in the jth iteration, then f(Wj) = f(Wj−1) by the Prune property. Let m = max I, we have that i X |f(Wi) − f(W0)| ≤ f(Wj) − f(Wj−1) j=1 X X = f(Wj) − f(Wj−1) + f(Wj) − f(Wj−1) j∈I j∈I ∞ X X X 1 ≤ 6` + 0 ≤ 6` ≤ 9` ≤ 9` , j m 3j m i j∈I j∈I j=0 by Lemma 2.3.13. The following lemma testifies that the radii of the nets computed by the algorithm are always significantly smaller than the value we are trying to approximate. Lemma 2.3.15. For any iteration i of the while loop such that net gets called, we have `i ≤ f(W0)/η, where 0 < η = cnet − 9. Proof : The proof will be by induction. Let m1, . . . mt be the indices of the iterations of the while loop in which net gets called. For the base case, observe that in order for net to get called, we must have η`m1 < cnet`m1 < f(Wm1−1). However, since this is the first iteration in which net is called it must be that f(W0) = f(Wm1−1) (since for all previous iterations delFar must have been called). So now suppose that `mj ≤ f(W0)/η for all mj < mi. If a call to net is made in iteration mi then again cnet`mi < f(W(mi)−1) = f(Wm(i−1) ). Thus, by Lemma 2.3.14 and induction, we have