ON THE GEOMETRIC SEPARABILITY OF BICHROMATIC POINT SETS

by

Bogdan Andrei Armaselu

APPROVED BY SUPERVISORY COMMITTEE:

Ovidiu Daescu, Chair

Benjamin Raichel

B. Prabhakaran

Xiaohu Guo Copyright c 2017

Bogdan Andrei Armaselu

All rights reserved I dedicate this dissertation to my family ON THE GEOMETRIC SEPARABILITY OF BICHROMATIC POINT SETS

by

BOGDAN ANDREI ARMASELU, BS, MS

DISSERTATION

Presented to the Faculty of

The University of Texas at Dallas

in Partial Fulfillment

of the Requirements

for the Degree of

DOCTOR OF PHILOSOPHY IN

COMPUTER SCIENCE

THE UNIVERSITY OF TEXAS AT DALLAS

August 2017 ACKNOWLEDGMENTS

I would first like to thank my advisor, Dr. Ovidiu Daescu, for helping me towards my goal of completing my PhD, and also for his advice in writing this dissertation. Also, I would like to thank my family and friends for supporting me and making me believe in my dream of earning my PhD.

June 2017

v ON THE GEOMETRIC SEPARABILITY OF BICHROMATIC POINT SETS

Bogdan Andrei Armaselu, PhD The University of Texas at Dallas, 2017

Supervising Professor: Ovidiu Daescu, Chair

Consider two sets of points in the two- or three-dimensional space, namely, a set R of n

“red” points and a set B of m “blue” points. A separator of the point sets R and B is a geometric locus enclosing all red points, that contains the fewest possible blue points. In this dissertation, we study the separability of these two point sets using various separators, such as circles, axis-aligned rectangles, or arbitrarily oriented rectangles. If there are an infinity of separators, we consider optimum criteria such as minimizing the radius (for circles) and maximizing the area (for rectangles). We first give an overview of and the work related to geometric separability. Then, we study the circular separation problem and present three dynamic data structures that allow insertions and deletions of blue points, as well as reporting an optimal circle after such an insertion or deletion. The

first is a unified data structure that supports both insertions and deletions and has near- linear query and update time. The other two data structure have logarithmic query time and near-quadratic update time. One of them allows only insertions and the other supports only deletions of blue points. These are the first algorithms for the dynamic circular separation problem. After that, we introduce the rectangular separation problem and focus on the axis-aligned case (that is, the target rectangle has to be axis-aligned). We prove that the number of optimal solutions can be Ω(m) in the worst case, present an algorithm to find one optimal solution that has near-linear running time, and then prove a matching lower bound

vi for finding one optimal solution. We also introduce a number of extensions of the rectangular separation problem. Specifically, we consider the case when a fixed number of blue points are allowed inside the separating rectangle and the case where the blue points are replaced by axis-aligned rectangles. Finally, we conclude by discussing the on-going work and give possible future directions for geometric separability.

vii TABLE OF CONTENTS

ACKNOWLEDGMENTS ...... v ABSTRACT ...... vi LIST OF FIGURES ...... x LIST OF TABLES ...... xiv CHAPTER 1 INTRODUCTION ...... 1 1.1 Related work on geometric separability ...... 7 1.2 Our contributions ...... 13 CHAPTER 2 PRELIMINARIES ...... 15 2.1 Convex hulls ...... 15 2.2 Voronoi diagrams ...... 16 2.2.1 Farthest-point Voronoi diagrams ...... 18 2.3 Monotone matrices ...... 19 CHAPTER 3 MINIMUM BICHROMATIC SEPARATING CIRCLE PROBLEM . . 24 3.1 Introduction ...... 24 3.1.1 Our results ...... 26 3.2 Preliminaries ...... 27 3.3 Unified data structure for insertions and deletions ...... 29 3.4 Logarithmic query time for insertions ...... 37 3.5 Logarithmic query time for deletions ...... 40 3.6 Implementation and Experiments ...... 42 3.7 Conclusion and future work ...... 52 CHAPTER 4 MAXIMUM AREA BICHROMATIC SEPARATING RECTANGLE PROBLEM ...... 53 4.1 Introduction ...... 53 4.1.1 Related work ...... 53 4.1.2 Our Results ...... 58 4.2 Preliminaries ...... 60 4.3 Finding all optimal solutions ...... 62

viii 4.4 Finding one optimal solution ...... 64 4.4.1 Case 1 ...... 65 4.4.2 Case 2 ...... 67 4.4.3 Case 3 ...... 67 4.4.4 Algorithm ...... 71 4.5 Lower bound ...... 73 4.6 Conclusion and future work ...... 75 CHAPTER 5 EXTENSIONS OF THE MAXIMUM SEPARATING RECTANGLE PROBLEM ...... 77 5.1 Introduction ...... 77 5.1.1 Related work ...... 77 5.1.2 Our contributions ...... 78 5.1.3 Preliminaries ...... 79 5.2 Blue rectangles version ...... 80 5.3 Outliers version ...... 81 5.4 Conclusion and future work ...... 86 CHAPTER 6 CONCLUSIONS ...... 87 REFERENCES ...... 89 BIOGRAPHICAL SKETCH ...... 96 CURRICULUM VITAE

ix LIST OF FIGURES

1.1 A triangulation of a simple is shown...... 2 1.2 The closest pair of points from a given planar set are shown linked together. . . 3 1.3 The Delaunay triangulation of a is shown...... 3 1.4 An example of a Google map is shown, with marked points of interest such as cities, universities, and (main) road junctions. Between any two points of interest, an edge is shown only if there is a direct route between them (that does not cross other edges)...... 4 1.5 The graph corresponding to the road network in Figure 1.4 is shown...... 4 1.6 Tissue containing tumor cells (marked in red) as well as healthy cells (marked in blue). The smallest circle separating the red cells from the blue cells is computed for surgical or radiational treatment purposes...... 5 1.7 An arbitrary separator of a red dataset set and a blue dataset is illustrated, with red points shown in solid dots and blue points shown in empty dots...... 6 1.8 A linear SVM classifier between two data sets is shown. The positive instances are denoted by solid red dots and the negative instances are denoted by empty blue dots...... 6 1.9 In this digital circuit, components already on the board are marked by blue dots. The goal is to find the largest empty rectangle in order to place a new rectangular component that needs as much space as possible (such as a processor)...... 6 2.1 The convex hull diagram CH(P ) of a planar point set P is shown...... 16 2.2 The Voronoi diagram VD(P ) of a planar point set P is shown...... 17 2.3 The farthest-point Voronoi diagram FVD(P ) of a planar point set P is shown in solid lines. The convex hull CH(P ) is shown in dashed lines...... 18 2.4 Double-staircase matrix, with the defined region denoted by shading...... 21 2.5 Inverse double-staircase matrix, with the defined region denoted by shading. . . 21 2.6 Rising single-staircase matrix...... 22 2.7 Falling single-staircase matrix...... 22 3.1 Minimum separating circle of a set R of red points and a set B of blue points. . 25 3.2 The farthest-point Voronoi diagram FVD(R) of the red points is shown, along with minimum separating circle C, which is centered at O, and the minimum enclosing circle MEC, centered at OC . Blue point p is inside MEC but not inside C...... 25

x 3.3 Left: The blue point p displayed as an empty circle is inserted; Right: The blue point p is deleted. In both cases, the old minimum separating circle C and the new minimum separating circle C0 are displayed...... 27

3.4 e is an edge of FVD(R) defined by ri, rj ∈ R. q ∈ e is an enter event point and s ∈ e is an exit event point...... 28 3.5 The data structure for insertion and deletion...... 30

∗ 3.6 Left: Insertion Case 2: s < qe and s is an exit event point; Right: Insertion Case ∗ 3: qe < s and s is an enter event point...... 31 ∗ 3.7 Left: Insertion Case 4: s < qe and s is an enter event point; Right: Insertion ∗ Case 5: qe < s and s is an exit event point...... 32 ∗ 3.8 Left: Deletion Case 2: qe < s and s is an exit event point; Right: Deletion Case ∗ 3: s < qe and s is an enter event point...... 34 3.9 Left: Deletion Case 4: q < s and s is an enter event point; Right: Deletion Case 5: s < q and s is an exit event point...... 35 3.10 The data structure T ...... 38 3.11 The Java Applet user interface for computing the minimum separating circle. . . 43 3.12 An example of output for points specified by user...... 44 3.13 An example of output for a random dataset...... 44 3.14 User inserts the indicated blue point, and the MBSC is updated dynamically (in this case it stays the same)...... 45 3.15 User deletes the indicated blue point, and the MBSC is re-computed using the dynamic data structure (in this case it changes)...... 46 3.16 The running time to compute FVD(R) for n between 10 and 100000...... 47 3.17 The running time to compute an MBSC using the static algorithm, for n = 1000 and m between 10 and 1000000...... 47 3.18 The running time to compute an MBSC using the static algorithm, for m = 1000 and n between 10 and 100000...... 47 3.19 The running time to compute an MBSC after a query, for n = 1000 and m between 10 and 1000000...... 48 3.20 The running time to compute an MBSC after a query, for m = 1000 and n between 10 and 100000...... 48 3.21 The running time to update the data structure, for n = 1000 and m between 10 and 1000000...... 49

xi 3.22 The running time to update the data structure, for m = 1000 and n between 10 and 100000...... 49 4.1 Red points in R are shown in solid circles and blue points in B are shown in

empty circles. The minimum enclosing rectangle Smin of all red points and the maximum separating rectangle S∗ are displayed...... 54 4.2 For a set of points P and a bounding rectangle A, the largest P -empty axis-aligned rectangle contained in A is shown, denoted by S∗...... 55 4.3 Largest P -empty rectangle of arbitrary orientation bounded by a rectangle A is shown, for a set of points P ...... 56 4.4 For set P of planar points and a bounding rectangle A, largest P -empty rectangle S∗ inside A that contains only query point q is displayed...... 57 4.5 Given rectilinear polygon P and a point p ∈ P , the shaded square S∗ is the largest square inside P containing only p...... 58 4.6 Given bounding box R and its R-tree hierarchical representation with smaller

boxes R1,R2,R3,R4, with input points being corners of R1,R2,R3,R4, the largest empty rectangle containing only query point q, denoted S∗, is shown with inter- rupted lines...... 59 4.7 Given 4 sets of points, each denoted by a different kind of filled or empty circles or squares, the rectangle S∗ shown is the smallest that contains points from all sets, and the strip W ∗ shown is the narrowest that contains points from all sets. 59

4.8 The minimum enclosing rectangle Smin, the rectangle Smax which bounds the solution space, and the subsets BNE,BNW ,BSW ,BSE...... 62

4.9 The set of candidate points defining a solution. The sets BNE,BNW ,BSW ,BSE are ordered by X, then by Y, and form a staircase...... 63 4.10 There are 4 red points directly above, below, to the right and to the left of the

origin O. The blue points are p, q, in BNE, and m − 2 other points in BSW . All rectangles enclose R and have the same area of x0y0, thus giving Ω(m) maximum rectangles...... 63

4.11 The top, right, and bottom supports lie to the right of Smin and the left support lies to the left of Smin. For any such top-right pair (topk, rightk), there is a unique bottom support bottomk to the right of Smin and a unique left support leftk to the left of Smin...... 66

4.12 Each support is from a different quadrant, with topk ∈ BNE. For any such top- right pair (topk, rightk), there is a unique bottom support bottomk ∈ BSW and a unique left support leftk ∈ BNW ...... 68

xii 2 4.13 For each top-right pair (topk, rightk) ∈ BNE, there are multiple bottom-left pairs 2 (bottomk, leftk) ∈ BSW . However, they have to lie above p (if p exists), and to the 2 right of q (if q exists). For the next top-right pair (topk0 = rightk, rightk0 ) ∈ BNE, the bottom-left pairs have to lie above p0 (if p0 exists) and to the right of q0 (if q0 0 0 exists). Note that p is after p in BSE and q is before q in BNW ...... 68 4.14 Staircase matrix M. The defined portion is marked by X’s...... 69 4.15 M is padded on the left of the defined portion with 0’s, and on the right of the defined portion with negative numbers in decreasing order...... 71 4.16 The pointers above, below, left and right...... 72

4.17 R consists of the origin O(0, 0) and four points sE, sN , sW , sS. For each ai ∈ A, there are two blue points pi, qi of opposite coordinates. Each tuple (pi, pj, qi, qj), 1 ≤ i, j ≤ m defines a candidate rectangle...... 74

5.1 The lines defining Smin partition the plane into 9 regions: Smin, 4 quadrants NE,NW,SW,SE, and 4 side regions E, N, W, S. By sliding Smin outwards in each side region until it hits a blue point, we obtain a rectangle Smax...... 79 5.2 Largest rectangle S∗ containing all red points and avoiding all blue rectangles is shown...... 80 5.3 For every rectangle r intersecting a side region, consider a blue point on the edge of r that is closest to Smin. For every rectangle r contained in a quadrant, consider a blue point as the corner of r that is closest to Smin. Denote the set of resulting points by B0 and solve the original problem on R and B0...... 82 5.4 Given red and blue points, and an integer k, the goal is to find the largest rectangle that contains all red points and at most k “outlier” blue points...... 82

5.5 The staircase ST3(NE) for an example where kNE = 3...... 83

5.6 If there is no point in T above p and below qt, then p is added to STt(NE) and qt is set to p...... 84

5.7 If there is a point q ∈ T above p and below qt, then r is added to STt(NE) and qt is set to r...... 85

5.8 The set P with STt(NE) = ST0(P )...... 85

xiii LIST OF TABLES

3.1 Running time of our MBSC implementation, for the static version, insertion query, deletion query, and updates, for n = 10 and different values of m. . . . . 50 3.2 Running time of our MBSC implementation for n = 100 and various m...... 50 3.3 Running time of our MBSC implementation for n = 1000 and various m. . . . . 51 3.4 Running time of our MBSC implementation for m = 1000 and various n. . . . . 51

xiv CHAPTER 1

INTRODUCTION

The notion of computational geometry was introduced by Shamos in 1975 (Shamos, 1975). In computational geometry, the focus is on efficiently processing and solving geometric op- timization problems on large sets of geometric objects, such as points, lines, circles, and . For instance, one is interested in triangulating a polygon of n vertices, that is, partitioning the polygon into non-overlapping triangles (see Figure 1.1 for an illustration). Many algorithms have been proposed for polygon triangulation, including ear clipping in O(n2) time (ElGindy, 1993), decomposition in O(n log n) time (deBerg, 2000), and a more complex algorithm for O(n) time by Chazelle (Chazelle, 1991). Since Chazelle’s algorithm is a lot more complex and harder to implement, monotone polygon decomposition is more commonly implemented. Another important problem is computing the convex hull of a point set, which is the smallest convex set enclosing all points in the set. The best-known algorithms for the planar case are Graham’s Scan (Graham, 1972) and the divide and conquer (Hong, 1977), which both run in O(n log n) time. When the size of the hull h is small relative to n, there are better algorithms by Jarvis - O(nh) (Jarvis, 1973), and Chan - O(n log h) (Chan, 1996), for the planar case. In addition, Chan also has an O(n log h) time algorithm for the 3D case (Chan, 1996). For higher dimensions, Chazelle proved that

[ d ] d-dimensional convex hulls of n points have complexity O(n 2 ). They also gave an algorithm

[ d ] to compute the convex hull of n given points in O(n log n + n 2 ) time (Chazelle, 1993). The concept of Voronoi diagram is of particular interest to us in this dissertation. The standard Voronoi diagram of a point set partitions the plane (or space) into regions, each correspond- ing to its closest given point, according to the euclidean distance measure. Several other variations are for higher dimensions, various distance metrics, and for k-th closest point(s), as well as the farthest point. In this dissertation, in particular, we are using the concept of Farthest-Point Voronoi diagram, which divides the plane (space) into regions corresponding

1 Figure 1.1. A triangulation of a simple polygon is shown. to its farthest given point, according to the euclidean distances. A well-known geometric optimization problem is finding the closest pair of points among a given set of n points (see

Figure 1.2). Shamos and Hoey presented an algorithm to find the closest pair of points, given a set of n planar points, which runs in O(n log n) time (Hoey, 1975). Later, Khuller and

Matias proposed a randomized algorithm for the d dimensional case whose running time is linear in n but exponential in d (Khuller, 1995). An optimization problem involving triangu- lations is computing the Delaunay triangulation, in which no vertex is inside the circumcircle of any triangle, and it maximizes the smallest angle of any triangle (as shown in Figure 1.3).

The currently best-known algorithm for Delaunay triangulation is by Guibas (Guibas, 1985) and runs in O(n log n) time.

The importance of convex hulls and Voronoi diagrams will be revealed in the next chapter, along with more details.

Computational geometry has applications in quite a few important fields, such as Geospa- tial Information Systems (GIS), data science, machine learning, medical imaging, and digital circuit design. In GIS, one is interested in data structures that store relevant information about maps or overlays or multiple maps. For instance, given a map of a city, which provides information about its streets and points of interest, one could use GIS to store the map as a

2 Figure 1.2. The closest pair of points from a given planar set are shown linked together.

Figure 1.3. The Delaunay triangulation of a simple polygon is shown. graph in which edges are streets and vertices are intersections and/or points of interest. See

Figures 1.4 and 1.5 for an illustration of how GIS would be used.

Medical imaging as a very interesting application of computational geometry. When look- ing at medical images for the purpose of diagnosis, pathologists consider certain aspects of an image (such as color, shape, texture, density, etc.), which indicate the presence of elements that help to put the diagnosis. For example, an osteosarcoma (bone cancer) pathologist may look at an H&E-stained whole slide image and identify cancer cells based on their color, shape, or clustering pattern. Based on features such as cell clustering or density, a pathol- ogist may decide the diagnosis (whether it is cancer or not). Often times, it is necessary to separate a tumor from the heathy tissue, for the purpose of surgery or radiation treatment.

3 Figure 1.4. An example of a Google map is shown, with marked points of interest such as cities, universities, and (main) road junctions. Between any two points of interest, an edge is shown only if there is a direct route between them (that does not cross other edges).

Figure 1.5. The graph corresponding to the road network in Figure 1.4 is shown.

4 Figure 1.6. Tissue containing tumor cells (marked in red) as well as healthy cells (marked in blue). The smallest circle separating the red cells from the blue cells is computed for surgical or radiational treatment purposes.

In the whole-slide image showing the tumor, the pathologist may mark the tumor cells with dots a certain color (e.g. red) and the healthy cells with another color (e.g. blue). The problem of extracting such tumor leads us to geometric optimization problems, such as com- puting the smallest enclosing circle of the red points, or the smallest circle separating the red points from the blue points. An example of tumor with red and blue colors annotation is presented in Figure 1.6.

In data science and machine learning, one could be interested in separating two or more given datasets (represented by sets of colored points), for the purpose of classification, or in efficiently calculating statistics over the data (e.g. mean, variance). See Figures 1.7 and 1.8 for examples of data science and machine learning applications.

In digital circuit design, an important aspect is finding the largest rectangular circuit portion that is free of circuit elements, typically for the goal of placing other rectangular components (Figure 1.9). This leads to the geometric optimization topic of finding the largest empty rectangle, which could be among points, line segments, polygons, or other shapes. If we require the circuit to contain specified points, then our circuit design problem reduces to

finding the largest rectangle separating the red points from the blue points.

5 Figure 1.7. An arbitrary separator of a red dataset set and a blue dataset is illustrated, with red points shown in solid dots and blue points shown in empty dots.

Figure 1.8. A linear SVM classifier between two data sets is shown. The positive instances are denoted by solid red dots and the negative instances are denoted by empty blue dots.

Figure 1.9. In this digital circuit, components already on the board are marked by blue dots. The goal is to find the largest empty rectangle in order to place a new rectangular component that needs as much space as possible (such as a processor).

6 An important topic in computational geometry is the separability of point sets, which we focus on in this dissertation. Consider two finite sets of points in Rd, a red set R and a blue set B, of size |R| = n and |B| = m, respectively. For a family F of curves in Rd, we say that R and B are F-separable if there exists a curve f ∈ F such that each connected component of the Rd space partitioned by f contains only red points or only blue points, but not both. The curve f is called a separator for sets R and B. As mentioned earlier, geometric separability has applications in many fields, such as tumor extraction based on medical imaging, machine learning, data science, and circuit design.

1.1 Related work on geometric separability

Separability problems in computational geometry and related fields have been extensively studied in the past few decades. Various types of separators have been considered, such as lines (Aronov, 2012; Chan, 2005; Demaine, 2005; Dobkin, 1985; Hurtado, 2004; Megiddo, 1983), circles (Bereg, 2015; Bitner, 2010; Urrutia, 1995; Boissonnat, 2000; Cheung, 2010; Zivanic, 2011; Fisk, 1986; Kosaraju, 1986), wedges (Arkin, 2006; Demaine, 2005; Hurtado, 2001, 2005), strips (Agarwal, 2006; Arkin, 2006; Demaine, 2005; Hurtado, 2001, 2004), and even polygons (Fekete, 1992). Denote by n the number of red points and by m the number of blue points. From now on, unless otherwise specified, we will assume that the input sets for all separability problems that we mention are comprised of points. The simplest and best-known type of geometric separability is the linear separability, in which the separator is a line (or hyperplane, in higher dimensional spaces). Linear separa- bility is known to be reducible to linear programming, which can be solved using Megiddo’s algorithms in linear time for fixed dimension (Megiddo, 1983, 1984). There is also work done on linear separability with violations (or misclassified points), to handle the case when the sets of points are not linearly separable. For point sets of total size

7 N = m+n in two and three dimensions, with k misclassifications allowed, Chan (Chan, 2005) gave a solution relying on linear programming with violations, which has O((N + k2) log n)

expected space and O(N + k11/4N 1/4) expected time requirements.

Alternatively, if minimizing the number of outliers is sought, one could use one of Aronov’s algorithms (Aronov, 2012). Aronov et. al defined four metrics for classification error and provided algorithms for computing the optimal separator for each metric (that is, the sep- arator minimizing the classification error under the specified metric). For the case when the error metric is the number of outliers, which is most related to our problem, Aronov’s solution runs in O(N d) time, where d ≥ 2 is the dimension of the space.

For linear separability of polyhedra, Dobkin et al (Dobkin, 1985) presented a linear-time

solution based on a hierarchical data structure that stores the polyhedra.

In the decision version of the circular separability problem, the aim is to decide whether

or not the two point sets can be exactly separated by a circle (with no misclassifications

allowed). This version was first solved by Megiddo et. al (Kosaraju, 1986), who gave an

algorithm that runs in time O(N) by using linear programming. They also proved that a

smallest exact separating circle can be found in O(N) time, while the largest separating

circle can be computed in O(N log N) optimal time.

In the optimization version, the goal is to find the separating circle having the small- est radius, with no misclassifications allowed. This optimal circular separability problem was first studied by Fisk (Fisk, 1986), who provided an algorithmic solution that is based on nearest-point and farthest-point Voronoi diagrams and takes quadratic time and space.

Later, Kosaraju et. al improved this result to linear time and space (Kosaraju, 1986), by reducing the R2 circular separability problem to a linear separability problem in three di- mensions.

Boissonnat and Urrutia et. al (Urrutia, 1995) considered the circular separability of sets of line segments, and gave an O(Nα(N) log N) time algorithm to compute the largest

8 separating circle. Here α(N) is the very slowly-growing inverse of the Ackermann’s function. For circular separability of polygons, Boissonnat also gave a linear time solution to decide whether a separating circle exists (Boissonnat, 2000). The minimum bi-chromatic separating circle problem (MBSC) was introduced by Bitner, Cheung, and Daescu (Bitner, 2010). In the MBSC problem, the goal is to compute the smallest circle containing all the red points and the minimum number of blue points (called smallest separating circle). Bitner, Cheung, and Daescu presented two algorithms for finding the smallest separating circle, based on the farthest-point Voronoi diagram of the red point set (FVD(R)). The first algorithm relies on sweeping FVD(R) and requires O(mn log m + n log n) time and O(m+n) space. The second algorithm uses circular range queries and takes O((m + n) log n + m1.5 logO(1) m) time and O(m1.5 logO(1) m) space. They also show how to find the largest bi-chromatic separating circle in O(m(m+n) log(m+n)) time and O(m+n) space. Cheung and Daescu (Cheung, 2010) presented a linear programming solution for the problem, which takes O(N +k11/4N 1/4 logO(1) N) time in expectation, where k is the number of blue points in the optimal solution. In the Kinetic MBSC problem, some points can move along linear trajectories, and the goal is to find the locus of the optimal solution over time. The Kinetic MBSC problem was introduced and solved by Cheung, Zivanic, and Daescu (Zivanic, 2011). They consider the cases with only one mobile red point, as well as with only one mobile blue point. For the former case, they prove that the geometric locus of the center of the minimum separating circle has complexity O(m2n) and provide an O(m2n log m) time algorithm to compute it. For the latter, they show how that the geometric locus of the center has complexity O(m2n2+) and can be computed in O(m2n2+ log(mn) + mn3+) time, where  > 0 is an arbitrary small constant. Barbay et. al (Barbay, 2014) considered the following related problem, called the maximum- weight box (MWB) problem. Given a set of points d-dimensional space, each given a real- valued weight, the goal is to find an axis-aligned box maximizing the total weight of the

9 points it contains. They first give an algorithm for the d = 2 case, which runs in O(N 2) time, and then prove that MWB can be solved in O(N d) time for any d ≥ 2. In addition, they show that, if there are n points with positive weights and m points with negative weights, MWB can be solved in O(n min{m, n}) time. Recently, Daescu and Bereg et. al (Bereg, 2015) studied a problem similar to the MWB problem for d = 2, in which the goal is to find the circle of smallest radius that maximizes the total weight of the points it contains. The authors describe an algorithm that runs in polynomial time O(N p), where p depends on the number of points with positive and negative weight. They also consider a version in which the circle is restricted to be centered on a given line and provide an algorithm with running time and space bounds of O(n(m+n) log(m+n)) and O(m + n), respectively. Finally, they prove that, if the target circle must contain all points with positive weights, it can be computed in O((m + n) log(m + n) time. There are also results on bi-chromatic separability for more general types of separators. Hurtado et. al showed that separability of two planar point sets by means of a strip, wedge, or double wedge can be decided in O(N log N) time (Hurtado, 2001, 2004). In a follow-up paper, they studied slice separability, wedge separability, and double wedge separability for three-dimensional point sets (Hurtado, 2005). They also considered diwedge separability and showed how to solve the decision problem in O(N 4) time and actually compute a separator in O(N 4 log N) time. For prismatic, pyramidal, and dipyramidal separability, they prove that the decision problems can be solved, respectively, in O(N 3), O(N 7), and O(N 8 log N) time. Later, Arkin et. al (Arkin, 2006) prove a lower bound of Ω(N log N) for strip and wedge separability, which makes Hurtado’s algorithms optimal within a multiplicative constant. Fekete (Fekete, 1992) proved that it is NP -hard to find a convex polygon that separates two planar point sets and has the minimum number of vertices. Agarwal et. al. (Agarwal, 2006) described a near linear time algorithm to decide whether two sets of points are separable by a prism, as well as near quadratic time solutions for separability by a slab or wedge.

10 Demaine et. al (Demaine, 2005) studied the separability of sets of points that lie inside

a simple polygon, called bounding polygon. They consider different objects to separate the

point sets, such as line segments, chords, and multi-chords. They show that separability

by means of line segments and chords can be decided in O(N log N) time, and also prove

matching lower bounds. For multi-chord separability, they provide an algorithm that runs

in O(N 5) time. They also show that minimizing the number of chords needed to separate

point sets inside a polygon is NP -hard.

Recently, Bandyapadhyay et. al considered a more general class of problems, called “bi- chromatic problems”, which includes any problem involving two color point sets (Bandya- padhyay, 2017). They design efficient polynomial time algorithms for a number of these problems. Specifically, they consider the maximum red rectangle problem, which asks for the arbitrarily-oriented rectangle containing no blue points and the maximum number of red points. For the maximum red rectangle problem, they give an algorithm that runs in

O(m2(m + n) log(m + n) + n2) time and O(n2 + m2) space. They also study the maximum coloring problem, in which we are given a set of pairs of points, and the goal is to color every point such that

(1) each pair consists of one red and one blue point and

(2) the maximum number of red points inside a halfplane, for all halfplanes that contain no blue points, is maximized.

4 + For the maximum coloring problem, the authors provide an algorithm that takes O(n 3 log n)

time.

Problems related to geometric separability include finding the largest geometric locus

that is empty with respect to the input set, as well as the smallest geometric locus that

contains all or part of the input set. The smallest enclosing circle problem, which asks for

the smallest circle containing all the n given points, was solved by Megiddo (Megiddo, 1983)

in O(N) time using linear programming.

11 The query version, in which the center of the circle must lie on a query line segment, was solved by Nandy et. al (Das, 2009). They get O(log2 N) query time with O(N log N) time, O(N) space pre-processing. Currently, the best solution is the one by Bose (Bose, 2008), namely O(log N) query time with O(N log N) time, O(N) space pre-processing. Another related problem is the k-enclosing circle, introduced by Matousek (Matousek, 1995), in which the aim is to find the smallest circle containing at least k out of N input points. Matousek provided an algorithm that takes O(N log N +Nk) time and O(Nk) space (Matousek, 1995). To date, the best known result for the k-enclosing circle problem is by Har-Peled et. al (Har-Peled, 1985), namely O(Nk) time and O(N + k2) space. The problem of computing the largest empty circle (or sphere) has also attracted a great deal of interest in literature. The planar case was solved by Toussaint (Toussaint, 1983), who gave an approach that takes O(N log N) optimal time to find the largest empty circle C among n points, with the restriction that C has to be centered inside the convex hull of the points. Toussaint also addressed the case where C is constrained to be centered inside an arbitrary simple polygon P , and solve it in O(N log N+k log N) time, where k = O(N 2) is the number of intersections between P and the Voronoi diagram of the vertices of P (Toussaint, 1983). Preparata and Shamos independently discover an O(N log N) time algorithm for the version in which the target circle is centered inside the convex hull of the points (Preparata, 1985). More recently, Augustine et. al (Augustine, 2010) solved the query version, in which the circle has to lie on a query line, and design a data structure that allows answering queries in O(log N) time with O(N 3 log n) time, O(N 3) space for pre-processing. They also consider a more restricted version, in which the query line has to pass through a fixed point, and design a data structure to handle queries in O(log N) time after O(Nα(N)O(α(N) log N) pre-processing time and space. Finally, for the case when the query line is restricted to be horizontal, their data structure can be used to answer queries in O(log N) time after O(Nα(N) log N) pre-processing time and space. Here α(N) denotes the very slow-growing inverse of the Ackermann function.

12 In addition to the largest empty circle, there is significant work done on finding the largest empty rectangle among points (Hsu, 1984; Chazelle, 1986; Aggarwal, 1987; Mukhopad- hyay, 2003; Chaudhuri, 2003; Nandy, 1998; Ray, 1990; Datta, 2000) and line segments (Sinha, 1994), as well as inside convex polygons (Snoeyink, 1995), and rectilinear polygons (McKenna, 1985). These results will be discussed in Chapter 4. Finally, there is work done on different versions of problems of computing the largest empty polygon, such as orthoconvex polygons (Nandy, 2008; Ramkumar, 1990). We will discuss these results in Chapter 5.

1.2 Our contributions

First, we study the dynamic version of the minimum bi-chromatic separating circle problem, in which blue points can be dynamically inserted or removed, and we need data structures to efficiently report a new optimal solution after each operation. We present three data structures, each having a different trade-off, depending on the values of m and n. Namely, our first data structure supports both insertions and deletions of blue points and allows reporting an optimal circle after an insertion in O(n + log m) time, and after a deletion in O(n log m) time. The data structure can be updated in O((m+n) log m) time and the space requirement for this data structure is O(m + n). Our second data structure supports only insertions. It allows finding an optimal solution in O(log(mn)) time after an insertion, can be updated in O(mn log(mn)) time, and uses O(mn log(mn)) space. Finally, the third data structure allows only deletions. It needs O(mn log(mn)) space, allows finding an optimal solution in O(log2(mn)) time after a deletion, and can be updated in O(mn log(mn)) time. Second, we consider the maximum area bi-chromatic separating rectangle problem, in which the goal is to find the largest axis-aligned rectangle that contains all red points and the fewest blue points. We first show that there are Ω(m) optimal solutions in the worst case. Then, we show how to find all optimal solutions in O(m2) time. After that, we give an

13 algorithm to find one optimal solution in O(m log m+n) time and O(m+n), which reduces to O(m+n) if blue points are pre-sorted either by their X or by their Y coordinates. Finally, we prove a matching lower bound of Ω(m log m + n) time to find one optimal solution, provided that points are not pre-sorted. Last, we consider extensions of the maximum separating rectangle problem. For the blue rectangles version, in which the blue points are replaced by blue rectangles, we give an O(m log m + n) time, O(m + n) space algorithm. After that, we study the outliers version, in which there is a fixed number k of “outliers” (that is, blue points that are allowed to be inside the target rectangle). For the outliers version, our algorithm takes O(k7m log m + n) time and O(m + n) space. The dissertation is organized as follows. In Chapter 2, we introduce some computational geometry concepts that we consider to be useful for our algorithms. In Chapter 3, we study the Dynamic MBSC problem and present our three dynamic data structures. In Chapter 4, we consider the maximum separating rectangle problem, and present our algorithms and the lower bounds. In Chapter 5, we consider extensions to the maximum separating rectangle problem, and describe our algorithms. Finally, in Chapter 6, we discuss our results, draw some conclusions and remarks, and list a few possible future directions.

14 CHAPTER 2

PRELIMINARIES

We discuss some important concepts in computational geometry, which we are going to use throughout the dissertation.

2.1 Convex hulls

Given a set of planar points P , the convex hull of P (denoted CH(P )) is the smallest convex set that contains all the points in P . It is also the intersection of all sets containing all the points in P . That is, CH(P ) = ∩{π ∈ 2R2 : π is convex, P ⊂ π}.

Figure 2.1 shows the convex hull of a planar point set.

For a polygon Q, we define CH(Q) to be the smallest convex set that encloses Q. Alter- natively, we can define CH(Q) = CH(vertices(Q)), where vertices(Q) is the set of vertices of polygon Q.

The notion of convex hull can be generalized to higher dimensions. That is, for a set of points P in d dimensions, CH(P ) = ∩{π ∈ 2Rd : π is convex, P ⊂ π}.

By complexity of CH(P ) we mean the total number of vertices, edges, faces, etc. It is easy to see that the convex hull CH(P ) of a planar set P of size n has complexity O(n)

(that is, it has O(n) edges).s However, the complexity increases with dimension. In the d-

b d c dimensional space, CH(P ) is known to have complexity O(n 2 ) (Chazelle, 1993), where bxc

b d c denotes the floor function of x. That is, a d-dimensional CH(P ) is a polytope with O(n 2 ) hyper-faces. In (Chazelle, 1993), an algorithm to compute the d-dimensional convex hull of

b d c n points is also given, and it takes O(n log n + n 2 ) time. It is also shown in (Chazelle,

b d c 1993) that Ω(n 2 ) is a lower bound for computing the convex hull in d dimensions.

Note that there is a lower bound of Ω(n log n) time for computing the convex hull, even for planar datasets (McMullen, 1970). However, optimal algorithms are known for computing

15 Figure 2.1. The convex hull diagram CH(P ) of a planar point set P is shown. the convex hull of a planar point set, such as Graham’s scan (Graham, 1972) and the divide- and-conquer approach (Hong, 1977). When h = |CH(P )| is much less than n, one may use

Jarvis’ march for O(nh) time (Jarvis, 1973), or Chan’s algorithm for O(n log h) time (Chan,

1996). For the 3D case, Chan also has an algorithm that takes O(n log h) time (Chan, 1996).

We are going to use convex hulls for finding the minimum bi-chromatic separating circle.

2.2 Voronoi diagrams

Given a set of planar points P = (p1, . . . , pn), the closest-point Voronoi diagram of P is a partition of the plane into regions VD(P ) = (R1,...,Rn) such that all points from some region Ri have the same point pi as closest neighbor. That is, ∀pi, pj ∈ P, q ∈ Ri, we have d(q, pi) ≤ d(q, pj), where d(p, q) denotes the euclidean distance between points p, q. This is also called the first-order Voronoi diagram or simply the Voronoi diagram.

An illustration of the first-order Voronoi diagram of a planar point set is given in Figure

2.2. Note that some of the regions are unbounded.

An optimal O(n log n) time algorithm to compute the Voronoi diagram of a planar point set was given by Fortune (Fortune, 1986) (widely known as “Fortune’s algorithm”).

16 Figure 2.2. The Voronoi diagram VD(P ) of a planar point set P is shown.

There are also generalizations of Voronoi diagrams, namely k-order Voronoi diagrams, for any 1 ≤ k ≤ n − 1. The k-th order Voronoi diagram of P is a partition R of the plane into regions, each corresponding to its closest k points from P .

Note that all these variations of the Voronoi diagram can be easily generalized to higher dimensions. However, their complexity increases with dimensionality, much like the com- plexity of higher dimensional convex hulls. For higher dimensions, Lloyd’s algorithm can be used to compute the Voronoi diagram of a d-dimensional point set (Lloyd, 1982). The draw- back of Lloyd’s algorithm is that it is based on least squares quantization and it runs until converges is attained, so the running time depends on the value of the input (e.g. the volume of the point set). Lloyd’s algorithm can also be used to compute the K-means clustering of an arbitrary point set. Bowyer’s (Bowyer, 1981) and Watson’s (Watson, 1983) algorithms can also be used, for a running time of O(n2) in the worst case, and O(n log n) in the average

case.

There also exist variations of Voronoi diagrams for non-euclidean metrics. Any algorithm

mentioned above can be easily adapted to work with alternative metrics. The most popular

are the Manhattan and the Mahalanobis distances.

17 Figure 2.3. The farthest-point Voronoi diagram FVD(P ) of a planar point set P is shown in solid lines. The convex hull CH(P ) is shown in dashed lines.

Edelsbruner presented the so-called “Power” diagrams, which are essentially weighted Voronoi diagrams (Edelsbrunner, 1987).

2.2.1 Farthest-point Voronoi diagrams

The farthest-point Voronoi diagram of P , denoted FVD(P ), partitions the plane into regions such that points within the same region have the same point in R as their farthest neighbor. See Figure 2.3 for an illustration of the Farthest-point Voronoi diagram. It is worth noting that all regions in FVD(P ) are unbounded and that FVD(P ) has a tree-like structure that is rooted at one of its vertices. Moreover, only points in CH(P ) can affect FVD(P ), since only these points can be the farthest from any given point. The complexity is O(h), where h is the number of vertices in CH(P ). Every edge of FVD(P ) is perpendicular to the line between two points of P . The h edges of FVD(P ) defined by edges of CH(P ) are unbounded, while the rest of them are bounded. Skyum presented an optimal O(n log n) time algorithm to compute the farthest-point Voronoi diagram for a planar set of points (Skyum, 1990). For higher dimensions, Goel et. al (Goel, 2001) show how to approximately compute the farthest-point Voronoi diagram of a d dimensional point set in subquadratic time, using reductions to Nearest Neighbor queries. In addition, they also approximate the diameter of

18 a point set, and show how to approximately solve the metric facility location. Specifically,

for FVD(P ) and for the diameter problem, they show how to obtain a (1+)-approximation

1+ 1 O(1) √ for a running time of O(dn + dn 1+ log n), as well as a 2-approximation for a running

time of O(dn logO(1) n). For facility location, they get a 3(1 + ) approximation for a running

1+ 1 O(1) time of O(n 1+ log n).

Bespamyatnikh (Bespamyatnikh, 1996) presented dynamic data structures of size O(n)

that support the following queries, all of them at the expense of O(log n) update time:

1 d−1 (1) the (1 + )-approximate closest neighbor in O((1 +  ) + log n) time,

1 d (2) the (1 + )-approximate range count in O((1 +  ) ) time,

1 d (3) the (1 + )-approximate furthest neighbor in O((1 +  ) ) time, and

1 2d−2 (4) the (1 + )- approximate diameter in O((1 +  ) ) time. We will later reveal how to use the farthest-point Voronoi diagram to compute the min-

imum bi-chromatic separating circle.

2.3 Monotone matrices

A matrix M of size m × n is said to be monotone if, for every i, j, k, l such that i < k, j < l

and all entries M(i, j),M(i, l),M(k, j),M(k, l) are defined, we have M(i, j) ≤ M(i, l) implies

M(k, j) ≤ M(k, l).

A matrix M is totally monotone if:

(1) M is monotone and

(2) M is totally defined (that is, M(i, j) is defined for every i, j).

Alternatively, M is called inverse monotone if, for every i, j, k, l such that i < k, j < l and all entries M(i, j),M(i, l),M(k, j),M(k, l) are defined, we have M(k, j) ≤ M(k, l) implies

M(i, j) ≤ M(i, l).

Also, M is totally inverse monotone if M is inverse monotone and completely defined.

19 A totally defined matrix M is said to be Monge if, for every i, j, k, l such that i < k, j < l and M(i, j),M(i, l),M(k, j),M(k, l) are all defined, we have M(i, j) + M(k, l) ≤ M(k, j) +

M(i, l).

Similarly as before, we call a totally defined (inverse) Monge matrix a totally (inverse)

Monge matrix.

The concept of matrix monotonicity has been studied by quite a few computer scientists

(Smawk, 1987; Kleitman, 1990; Klawe, 1990; Sharir, 2011; Mozes, 2017). They all consider various matrix search problems, such as finding maxima or minima per matrix, per row or per column, on monotone matrices. They also provide the applications of matrix searching to the field of computational geometry, along with efficient matrix search algorithms relying on monotonicity.

Sharir, Mozes, Aggarwal, Wilber, and Klawe studied the total monotone matrices and give an algorithm to compute all row-maxima in totally monotone matrices of size n × m

(Smawk, 1987), which runs in O(m(1 + log(n/m))) time. Their algorithm is known as the

SMAWK algorithm. They also show how to find all row maxima in arbitrary monotone matrices in O(m log n) time. Finally, they list some applications of the SMAWK algorithm, including finding all-pair farthest neighbors in convex polygons.

Sharir and Kaplan et. al (Sharir, 2011; Mozes, 2017) design dynamic data structures to answer queries of the following form on a totally Monge matrix M of size n × n: Given a contiguous submatrix A of M, what is the maximum element of A? Their data structure requires O(n log n) space and O(n log2 n) pre-processing time and can answer queries in

O(log2 n) time. For partially Monge matrices, their pre-processing time and query time grow by a factor of α(n), where α(n) is the inverse of the Ackermann function. They show how to use this data structure to find the largest rectangle containing only a query point among a set of n planar points in O(log4 n) time, with O(n log4 nα(n)) time, O(n log3 nα(n))

space for pre-processing.

20 Figure 2.4. Double-staircase matrix, with the defined region denoted by shading.

Figure 2.5. Inverse double-staircase matrix, with the defined region denoted by shading.

Among partially defiend matrices, the concept of double-staircase matrix has attracted great interest (Aggarwal, 1987; Klawe, 1990; Kleitman, 1990; Sharir, 2011; Mozes, 2017). A double-staircase matrix is a matrix M such that, for any two rows i, k with i ≤ k, we have l(i) ≥ l(k) and r(i) ≥ r(k), where l(i), r(i) denote the column indices of left-most and the right-most defined entry of row i (Aggarwal, 1987). Informally, matrix M contains two sets of undefined entries whose peripheries form two staircases inside the matrix. An inverse double-staircase matrix M satisfies the following condition. For any two rows i, k with i ≤ k, we have l(i) ≤ l(k) and r(i) ≤ r(k).

Figures 2.4 and 2.5 illustrate the concepts of double-staircase matrix and inverse double- staircase matrix, respectively.

21 Figure 2.6. Rising single-staircase matrix.

Figure 2.7. Falling single-staircase matrix.

A single-staircase matrix is a matrix that contains a single set of undefined entries that forms a staircase inside the matrix. We say that a single-staircase matrix is falling if, for all rows i, 1 ≤ i < n, we have f(i) ≤ f(i + 1), where f(i) is the column index of the right-most defined entry of row i (Kleitman, 1990). Similarly, we say that a single-staircase matrix is rising if, for all rows i, 1 ≤ i < n, we have f(i) ≥ f(i + 1).

Figures 2.6 and 2.7 illustrate the rising and falling single-staircase matrices.

Klawe and Kleitman considered single-staircase matrices and provided a near-linear time algorithm to find the row-minima of falling single-staircase of size n×m. It runs in O(mα(n)) time (Kleitman, 1990). However, they show that finding the row-minima of rising single- staircase matrices can be done in O(m + n) time.

22 In general, matrix searching cannot be done in linear time, as is shown by Klawe in (Klawe, 1990). In (Klawe, 1990), the author proves that any the lower bound for finding row minima (maxima) on partial monotone matrices of size 2n × n is Ω(nα(n)). The notions of (inverse) double-staircase matrices and totally (inverse) monotone matri- ces will be used in our dissertation for computing the maximum bi-color separating rectan- gle. Throughout this dissertation, whenever understood, we are going to refer to a double- staircase matrix as simply a staircase matrix.

23 CHAPTER 3

MINIMUM BICHROMATIC SEPARATING CIRCLE PROBLEM

3.1 Introduction

Consider a set of n red points R and a set of m blue points B in the plane. In the Minimum Bichromatic Separating Circle (MBSC) problem, the goal is to find the smallest circle that contains all red points and the smallest possible number of blue points. In the figures throughout this chapter, we will denote the red points as solid dots and blue points as empty circles. Figure 3.1 illustrates the MBSC problem. Note that the optimal solution may not be unique. As mentioned in Chapter 1, the applications include medical imaging, data science, and machine learning problems. For machine learning, one could be interested in classifying the data using elliptical classifiers with perfect sensitivity and maximal specificity. Other motivations could arise from military missions. For instance, the red points may denote enemy targets and the goal would be to eliminate all targets while minimizing unneeded damage (buildings, infrastructure, etc., marked as blue points). During mission times, targets may change locations, raising the need for fast queries and updates on the optimal solution. In this dissertation, we consider the dynamic version of the MBSC problem, which we call the Dynamic Minimum Bichromatic Separating Circle (DMBSC) problem. In the DMBSC problem, the goal is to report the optimal solution efficiently after insertions and removals of red and blue points. In this work, we consider the red points to be fixed in time and address only the insertion and deletion of blue points. Our results are non-trivial and are based on insightful observations. When it comes to dynamically updating the optimal solution, there are major challenges involved. It is known (Bitner, 2010) that the minimum enclosing circle of the red points is centered at a vertex of the farthest-point Voronoi diagram FVD(R) (see 3.2 for an il- lustration). Virtually all known static algorithms rely on FVD(R), which has complexity

24 Figure 3.1. Minimum separating circle of a set R of red points and a set B of blue points.

Figure 3.2. The farthest-point Voronoi diagram FVD(R) of the red points is shown, along with minimum separating circle C, which is centered at O, and the minimum enclosing circle MEC, centered at OC . Blue point p is inside MEC but not inside C.

O(n). We also know from (Demaine, 2006) that an insertion or removal of a single red point may trigger Ω(n) changes to FVD(R). Hence, any operations involving red points would be inherently affected by this lower bound. Similarly, update operations involving blue points can trigger events on many edges of FVD(R), and these events can affect the current optimal solution. It follows from(Bitner,

25 2010) that a blue point may define Ω(n) events. Thus, a blue point update operation may require inspecting Ω(n) edges of FVD(R), implying any FVD-based data structure would have a lower bound of Ω(n) time for updates. A naive approach that would identify and go through all such events would result in slow reporting of the new optimal solution. To avoid that, we design data structures that do not need to check all events triggered by insertion and removal of blue points, at least when it comes to reporting the optimal solution. For reporting an optimal solution after insertion or removal of blue points, we conjecture a lower bound of Ω(log n) time.

3.1.1 Our results

Our first solution is a unified, O(m + n) size data structure that allows both insertion and removal of blue points. Our data structure relies on the farthest-point Voronoi diagram of the red points, FVD(R). It can handle updates in O((m + n) log m) time and can be used to report an optimal solution on insertion and removal of blue points in O(n + log m) and O(n log m) time, respectively. After that, we present a data structure of size O(mn log m) that allows reporting of an optimal circle in O(log(mn)) time when only insertions are al- lowed, at the expense of a higher time for updating the data structure. Specifically, the update time is O(mn log(mn)). This makes sense since the update can be done as a “back- stage” operation, assuming that queries arise at a reasonable rate in practice. Our data structure builds upon a special binary search tree-like data structure that supports so-called “off-line ball exclusion search” (OLBES) queries, which will be described later. Finally, we describe a data structure of size O(mn log(mn)) that allows O(log2(mn)) time for reporting when only deletions are allowed, also with O(mn log(mn)) update time. The data structure is based on a binary search tree-like data structure that allows “off-line ball inclusion search” (OLBIS) queries, which will be revealed later. The preprocessing time in each case above is O(mn log m + n log n), due to running the algorithm in (Bitner, 2010) on the initial input of n red points and m blue points.

26 Figure 3.3. Left: The blue point p displayed as an empty circle is inserted; Right: The blue point p is deleted. In both cases, the old minimum separating circle C and the new minimum separating circle C0 are displayed.

Our algorithms are the first reported for the DMBSC problem, and these results were published in (Armaselu, 2015), as well as (Armaselu, November 2016).

3.2 Preliminaries

Let CH(R) denote the convex hull of R. Note that only vertices of CH(R) contribute to

FVD(R), so from now on we assume that the points in R are in convex position. We further

discard all blue points inside CH(R), as they are contained in any enclosing circle of R.

We assume that, having initially available n red points and m blue points, the O(mn log m+

n log n) time and O(m + n) space static algorithm of (Bitner, 2010) has been executed and

the set M of O(n) minimum separating circles is available. Figure 3.2 illustrates the static

version of the problem and Figure 3.3 illustrates how the minimum separating circle changes

after an insertion (resp. deletion) of a blue point.

27 Figure 3.4. e is an edge of FVD(R) defined by ri, rj ∈ R. q ∈ e is an enter event point and s ∈ e is an exit event point.

Definition 3.2.1. (Bitner, 2010). Let e be an edge of FVD(R). A point q ∈ e is an enter (resp. exit) event point if a blue point is included (resp. excluded) from the circle centered at q and passing through the two red points defining e, as we sweep along e in increasing order of circle radius. See Figure 3.4 for an illustration.

Definition 3.2.2. Let ei,j be an edge of FVD(R) defined by two red points pi and pj.

For any q, r ∈ ei,j, we say that q is to the left of r, and we write q < r, if the circle centered at q and defined by pi and pj has a smaller radius than the one centered in r and defined by pi and pj.

Observation 3.2.1 (Bitner, 2010). Any minimum separating circle must be centered on an edge of FVD(R) (see Figure 3.2).

Observation 3.2.2 (Bitner, 2010). A blue point has an exit event point on at most one edge of FVD(R). As a consequence, there are O(m) exit event points.

28 Observation 3.2.3 (Bitner, 2010). A minimum separating circle is either the minimum enclosing circle (MEC) of R or it is centered at an exit event point.

From now on, we treat the center of the MEC as an exit event point.

Notations

1. Let e be an edge of F V D(R). For any point q ∈ e, we denote by C(q) the circle centered at q and passing through the two red points defining e. 2. For any circle C, denote by rad(C) the radius of C and by m(C) the number of blue points inside C. 3. For any event point q, we denote the radius of C(q) by rad(q).

4. For any event point q, we denote by mq the number of blue points included in C(q) (which is initially computed by the algorithm in (Bitner, 2010)).

3.3 Unified data structure for insertions and deletions

In this section, we present a data structure that allows efficient insertion and deletion of blue points, as well as efficient reporting of a (possibly new) optimal circle. Let k be the current number of blue points in an MBSC. Let p be the point to be inserted or deleted. We call p the query point. The edges of FVD(R) are stored in an array A, which is pre-computed using the algo- rithm in (Bitner, 2010) and does not change after an insertion or deletion of a blue point.

∗ ∗ For any edge e, denote by qe the leftmost exit event point on e such that mqe is minimum for all event points q ∈ e. If one of the current circles in the MBSC set M is centered on e

∗ ∗ then obviously mqe = k, otherwise mqe > k. For the event point s ∈ e associated with p, denote by l(s) the leftmost exit event point

∗ ∗ q < s such that mq = mqe + 1. Note that l(s) is undefined if mq > mqe + 1, ∀q < s. Also,

∗ denote by r(s) the leftmost exit event point q ≥ s such that mq = mqe . Note that r(s) is

∗ undefined if mq > mqe , ∀q ≥ s.

29 Figure 3.5. The data structure for insertion and deletion.

Each event point q ∈ e is associated with a pair of values, mq and rad(q). We actually do not need ms if s is an enter event point for e. It is necessary to only maintain the correct count for exit event points.

For every edge e of FVD(R), we store the following in A(e):

1. The red points ri and rj of R defining e, where 1 ≤ i < j ≤ n;

∗ 2. qe ;

∗ 3. l(qe );

∗ 4. Me, the circle centered at qe ; 5. A pointer to a balanced binary search tree XEP (e) of all exit event points q ∈ e, indexed first by mq, then (to break ties) by rad(q); Figure 3.5 illustrates the data structure for an edge e.

We next describe the reporting and update procedures for insertion and deletion of blue points.

Insertion query (Algorithm Insert − Blue − Query(A, k, p)). Suppose we are given a new blue point p. The goal is to report a (possibly new) MBSC. First, we check whether

30 ∗ Figure 3.6. Left: Insertion Case 2: s < qe and s is an exit event point; Right: Insertion Case ∗ 3: qe < s and s is an enter event point.

p ∈ CH(R). If so, then the current optimal solution will not change. We report it as a new optimal solution and we are done. Now suppose p∈ / CH(R). Note that p defines an exit event point on at most one edge of FVD(R) (Bitner, 2010), but it may define an enter event point on many edges of FVD(R), and edges where p defines an entry event point may give a new optimal solution. For each edge e of FVD(R) we do the following. Let s be the event point defined by p on edge e, if it exists (it is possible that p generates no event on e). We compute a possible

0 new optimal circle Me after the insertion of p. Based on the location of s, there are only five possible cases to consider.

0 (1) p generates no event point on e. In this case, Me remains the same. We set Me = Me. Note that either all circles centered on e and passing through the red points defining e contain

∗ p or none of them does. If they do, we also increment mqe . ∗ (2) s < qe and s is an exit event point (see Figure 3.6 - left). That is, only circles centered

∗ 0 ∗ on e at some t < qe will contain the new point p. We set Me = Me. In this case mqe is unchanged.

∗ (3) qe < s and s is an enter event point (see Figure 3.6 - right). That is, only circles

∗ 0 ∗ centered on e at some t > qe will contain the new point p. We set Me = Me and leave mqe unchanged.

∗ (4) s < qe and s is an enter event point (see Figure 3.7 - left). In this case, the circle

∗ ∗ centered at qe will have mqe + 1 blue points. On the other hand, circles centered on e at

∗ some t : l(s) ≤ t ≤ s may have mqe + 1 points and thus only the circle centered at l(s) may

31 ∗ Figure 3.7. Left: Insertion Case 4: s < qe and s is an enter event point; Right: Insertion ∗ Case 5: qe < s and s is an exit event point.

0 ∗ be optimal for edge e. If l(s) exists, then we set Me = C(l(s)) and qe = l(s). Otherwise, we

0 ∗ set Me = Me. In both cases we also update mqe . ∗ (5) qe < s and s is an exit event point (see Figure 3.7 - right). In this case, the circle

∗ ∗ centered at qe will contain mqe + 1 blue points. On the other hand, circles centered on e

∗ at some t ≥ r(s) may have mqe points and thus only the circle centered at r(s) may be 0 ∗ optimal for edge e. If r(s) exists, then we set Me = C(r(s)) and qe = r(s). Otherwise, we

0 ∗ set Me = Me and increment mqe . 0 0 After all edges e are treated, we select the circles Me for which m(Me) is minimum and

0 rad(Me) is minimum. Those circles form the new set of optimal solutions. We now claim the running time bound for insertion queries in the following lemma.

Lemma 3.3.1. Given a new blue point p, the MBSC’s can be reported in O(n + log m) time using algorithm Insert − Blue − Query.

Proof. Computing the event point on each edge e ∈ FVD(R) corresponding to p, as well as the type of the event (enter or exit) can be done in O(1) time (Bitner, 2010). We treat cases (1), (2) and (3) in O(1) time, each occurring O(n) times for all edges. Case (4) occurs O(n)

∗ times for all edges, and is treated in O(1) time for each edge e ∈ FVD(R), as l(s) = l(qe ), if

∗ ∗ l(qe ) is defined. In case (5), we find r(s) using a binary search on XEP (e), first by mqe and then (for equal point count) by rad(s). This takes O(log m) time. Recall that s can be exit event point on only one edge of FVD(R), so case (5) happens only on one edge of FVD(R). Hence, the total time required to report an optimal solution is O(n + log m).

32 Insertion update (Algorithm Insert − Blue − Update(A, k, p)).

To update the data structure after inserting point p, we do the following. For each edge

e ∈ FVD(R), let s ∈ e be the event point corresponding to p. We treat each case defined in

the insertion query algorithm as follows.

- In case (1), if p is contained within the circles centered on e, traverse XEP (e) and increment the count mq for each exit event q. - In cases (2) and (5), we traverse XEP (e) and, for all exit event points q < s, we

increment mq and then relocate q into XEP (e). We then insert s in XEP (e). In case (2),

∗ we also recompute and store the new l(qe ) if needed. - In cases (3) and (4), we do the same for all exit event points q > s.

∗ ∗ - In case (4), if l(s) was defined (l(s) = l(qe ) in this case), we set qe = l(s); Me is set

∗ accordingly. Either way, we find and update the new l(qe ). - In case (5) we may have to set

∗ ∗ ∗ qe = r(s), if r(s) is defined, and set l(qe ) = qe ; Me is set accordingly. Finally, we set k to the number of blue points of any current MBSC.

Lemma 3.3.2. After inserting a blue point p, the data structure can be updated in

O((m + n) log m) time using algorithm Insert − Blue − Update.

∗ Proof. Computing the new l(qe ) in cases (2) and (4) takes O(log m) time per edge, for a

∗ total of O(n log m) time (case (2) happens on only one edge). Re-computing qe in case (5) takes another O(log m) time, as described in the query procedure above.

To update the location in XEP (e) for all exit event points q ∈ e for which mq changes, over all edges, we spend O(n + m log m) time in total. Summing up, we get a total running

time of O((m + n) log m).

Deletion query (Algorithm Delete − Blue − Query(A, k, p)). Suppose we are given

a blue point p to remove. Similarly as for insertion of a blue point, we check if p ∈ CH(R).

33 ∗ Figure 3.8. Left: Deletion Case 2: qe < s and s is an exit event point; Right: Deletion Case ∗ 3: s < qe and s is an enter event point.

If so, we report the old optimal solution as the new optimal solution. Otherwise, we treat

the 5 possible cases for the event point s generated by p on e.

0 (1) s does not exist. In this case, Me remains the same and we set Me = Me. If the

∗ circles centered on e enclose p then we also update mqe .

∗ (2) qe < s and s is an exit event point (see Figure 3.8 - left). That is, only circles centered

0 on e at some t < s contain the point p to be deleted, including Me. We set Me = Me and

∗ also update mqe .

∗ (3) s < qe and s is an enter event point (see Figure 3.8 - right). That is, only circles

0 centered on e at some t > s contain the point p to be deleted, including Me. We set Me = Me

∗ and also update mqe .

∗ (4) qe < s and s is an enter event point (see Figure 3.9 - left). In this case, the circle

∗ ∗ centered at qe will have mqe blue points. On the other hand, circles centered on e at some

∗ t ≥ r(s) may have mqe − 1 points and thus only the circle centered at r(s) may be optimal

0 ∗ ∗ for edge e. If r(s) exists, then we set Me = C(r(s)), set qe = r(s) and decrement mqe .

0 Otherwise, we set Me = Me.

∗ (5) s < qe and s is an exit event point (see Figure 3.9 - right). In this case, the circle

∗ ∗ centered at qe will still have mqe blue points. On the other hand, circles centered on e at

∗ ∗ some t : l(qe ) = l(s) < t < s may now have mqe points and thus only the circle centered

0 ∗ at l(s) may be optimal for edge e. If l(s) exists, then we set Me = C(l(s)) and qe = l(s).

0 ∗ Otherwise, we set Me = Me. mqe remains unmodified.

34 Figure 3.9. Left: Deletion Case 4: q < s and s is an enter event point; Right: Deletion Case 5: s < q and s is an exit event point.

0 Similarly as for insertion, after all edges e are treated, we select the circles Me for which

0 0 m(Me) is minimum and rad(Me) is minimum. Those circles form the new set of optimal solutions.

The next lemma gives a bound on the deletion query time. Lemma 3.3.3. Given a point p to be deleted, the MBSC’s can be reported in O(n log m) time using algorithm Delete − Blue − Query.

Proof. Computing the event point on each edge e ∈ FVD(R) corresponding to p, as well as the type of the event (enter or exit) can be done in O(1) time (Bitner, 2010). We treat cases (1), (2) and (3) in O(1) time, each occurring O(n) times for all edges. Case (4) occurs O(n)

∗ times for all edges. Each time, we need to find r(s). Since mqe and rad(s) are available we

∗ can locate r(s) by binary search on XEP (e), first by mqe and then (for equal point count) by rad(s). This takes O(log m) time per edge. Case (5) occurs only on one edge e. Since

∗ l(s) = l(qe ) this case requires O(1) time. Hence, the total time required to find an optimal solution is O(n log m).

Deletion update (Algorithm Delete − Blue − Update(A, k, p)). To update the data structure after deleting point p, we do the following. For each edge e ∈ FVD(R), let s ∈ e be the event point corresponding to p. We treat each case defined in the delete query procedure.

∗ - In case (1), if needed, we update mqe and each entry in XEP (e).

35 - In cases (2) and (5), for all exit event points q < s, we remove q from XEP (e), set mq = mq − 1, and then re-insert q back into XEP (e). - In cases (3) and (4), we do the same for all exit event points q > s.

∗ - In case (3), we update l(qe ) (it is possible it becomes undefined). - In case (4), if r(s) is defined, updates are done as described in the query procedure above.

∗ - In case (5), we make the updates for Me and qe as described in the query procedure

∗ above. We also compute the new l(qe ). Finally, we set k to the number of blue points of any current MBSC.

Lemma 3.3.4. After removing a blue point p, the data structure can be updated in O((m + n) log m) time using algorithm Delete − Blue − Update.

∗ Proof. Finding l(qe ) in case (3) takes O(log m) time, by searching XEP (e), for a total of

∗ O(n log m) time. Re-computing qe in case (4) takes O(log m) time per edge e (to find r(s)),

∗ by searching XEP (e). In case (5), the new l(qe ) can be found in O(log m) time by a search on XEP (e). To remove and re-insert all affected exit event points in XEP (e), we spend O(log m) time for each q, or O(m log m) time in total. Summing up, we get a total running time of O((m + n) log m).

Theorem 3.3.5. The red and blue points can be preprocessed in O(n log n + mn log m) time into a data structure of size O(m + n) that supports insertions and removals of blue points in O((m + n) log m) time and can be used to report a MBSC in O(n + log m) time at insertion of a blue point and O(n log m) time at removal of a blue point.

Proof. We pre-process the data structure using the approach in (Bitner, 2010) for the static version of the problem, using O(n log n + mn log m) time and O(m + n) storage. We then associate with each edge e of FVD(R) the data structure described earlier. The size of the

36 data structure is O(m + n) (size of XEP () is O(m), which is the number of all exit event

points). The running times for insertions, removals, as well as reporting MBSC’s, follow

from Lemmas 3.3.1 - 3.3.4.

Notice that our update solution is polynomially better than recomputing the optimal

solution from scratch, using the algorithms in (Bitner, 2010).

3.4 Logarithmic query time for insertions

In this section, we present a data structure of size O(mn log(mn)) that allows reporting of an optimal circle in logarithmic time when only insertions are allowed, at the expense of

O(mn log(mn)) time for updating the data structure. This makes sense in practice, since the update can be done as a “backstage” operation, assuming that queries arise at a reasonable rate. The time to construct the data structure is O(mn log(mn)), given an initial input with

n red points and m blue points.

The data structure. We maintain all enclosing circles defined by event points on the

edges of FVD(R) in a balanced binary search tree data structure, T , with O(m) nodes.

The keys of T are the number of blue points i, 1 ≤ i ≤ m, inside a circle, and the values

are lists T i of enclosing circles with i blue points. Each T i is stored in a balanced binary

search tree-like data structure that is an extension of the Offline Ball Exclusion Search Data

Structure (OLBES) in (Chen, 2005). Specifically, the circles are stored at the leaves of T i, in

sorted order by radius, along with the edge of FVD(R) that they are centered on. Each T i

contains O(mn) circles at the leaves and there are a total of O(mn) circles in T . Each inner

node stores the intersection of circles that are leaf descendants of that node. Also, with each

circle, we store the type of event it is centered at (enter or exit).

The data structure T is illustrated in Figure 3.10.

37 Figure 3.10. The data structure T .

Algorithm Insert − Blue − Query − Logtime(T, p) Let k be the smallest key in T . We perform an off-line ball exclusion search (OLBES query) on T k with query point p, to obtain the smallest circle in T k not containing p. We consider two scenarios: I. There exists a circle Ck∗ ∈ T k that does not contain p, which is reported by the OLBES query mentioned above; II. All circles in T k contain p, which is shown by the fact that the OLBES query is unsuccessful. In scenario I the optimal circle must be in T k, so we do the following. Let q be the center of Ck∗ and e be the edge containing q. Also, let s be the event point induced by p on e. We find an optimal circle C∗ and report it as MBSC, as described next. According to the types of s, q, we consider the following cases: 1) q is an enter event point and s is an enter event point, 2) q is an enter event point and s is an exit event point, 3) q is an exit event point and s is an enter event point, 4) q is an exit event point and s is an exit event point. Note that, in cases 1 and 2, q must be preceded by an exit event point r < q (or vertex of FVD(R)). It is easy to see that case 1 may not happen, as Ck∗ would be centered at some exit event point r0 < q instead of q. In case 2 we must have r < s < q, so we set C∗ = C(s).

38 In case 3, we must have q < s, so we set C∗ = Ck∗. In case 4, s must immediately precede q, so again we set C∗ = Ck∗. We can also report all circles in T k of equal radii, in postorder following Ck∗, if all optimal solutions are needed. Now we consider scenario II. The optimal circle may be in T k+1. We perform an OLBES query on T k+1 with query point p . If the query in T k+1 successfully returns a circle Ck+1, then we consider the 4 cases in sce- nario I and find the optimal circle in T k+1, say C0∗. We then report argmin{radius(C0∗,Ck)}, where Ck is the smallest circle in T k . If the query in T k+1 was unsuccessful, we report the smallest circle C∗ ∈ T k as MBSC. This completes the description of the algorithm.

To update the data structure, we do the following. Algorithm Insert − Blue − Update − Logtime(T, p) 1. Let l be the largest number of blue points of any circle in T 2. For each i from l downto k do: 2.1. For all circles C ∈ T i such that p ∈ C, move C from T i to T i+1 (Create node T i+1 if needed) 2.2. If node T i becomes empty, delete it from T 3. For all edges e of FVD(R) do 3.1. Let s be the event point defined by p on e and C be the circle centered at s 3.2. Store the type of s (enter or exit) 3.3. Count the number k0 of blue points inside C 3.4. Add C to T k0 (Create node T k0 if needed) 4. For each i from l downto k do Update the OLBES data structure of T i

39 End.

Theorem 3.4. With O(mn log(mn)) preprocessing time and space, insertion of a blue

point can be performed with O(log(mn)) reporting time and O(mn log(mn)) update time

using procedures Insert − Blue − Query − Logtime and Insert − Blue − Update − Logtime.

Proof. An OLBES query on a set of O(c) circles can be done in O(log c) time with O(c log c)

preprocessing time and space (Chen, 2005). In our case, each T i may have O(mn) circles,

so c = mn. Hence, reporting an MBSC takes O(log(mn)) time, after O(mn log(mn)) pre-

processing time. As for updating the data structure, there may be O(mn) circles containing

i Pl p in T . For each i, suppose there are ni circles in T containing p, with i=k ni = O(mn).

i+1 For each of them, we spend O(log ni+1) time to re-insert it in T in proper order. We also need O(mn + n log(mn)) time to add the new circles centered at events defined by p Finally,

i we spend O(ni log ni+1) to recompute the OLBES structures for each T . Hence, the update Pl time is O( i=k ni log ni+1) + O(mn log(mn)) = O(mn log(mn)).

3.5 Logarithmic query time for deletions

In this section, we present a data structure that allows reporting of minimum separating

circles in O(log2(mn)) time when only deletions are allowed. The time required to update

the data structure is O(mn log(mn)). The preprocessing time and the space required by the

data structure are O(mn log(mn)). The deletion procedure is very similar to the insertion, with a few exceptions.

The data structure. We maintain a data structure U, which has the same balanced binary search tree structure as T in the previous section, except that each U i is stored as an Offline Ball Inclusion Search Data Structure (OLBIS) (?). Specifically, the circles are stored at the leaves of U i, in sorted order by radius, along with the edge of FVD(R) that

40 they are centered on. For each circle C, lifting C to the 3D unit paraboloid defines a plane such that a point p is inside C iff the image of p on the unit paraboloid lies below the plane. C is associated with the halfspace above the plane, which is also stored at the leaf of U i containing C. The inner nodes store the common intersection of the halfspaces stored at their descendants. Each U i contains O(mn) circles at the leaves. Algorithm Delete − Blue − Query − Logtime(U, p) 1. Let k be the smallest key in U 2. Perform an off-line ball inclusion search (OLBIS) on U k with query p, to obtain the smallest circle in U k containing p 3. If the OLBIS query successfully returns a circle Ck∗, report it as MBSC (along with other circles in U k of equal radii, if all solutions are needed) 4. Otherwise (no circle in U k contains p, so they would still have k blue points after the removal of p): 4.1. Perform an OLBIS query on U k+1 with query point p 4.2. If the query successfully returns a circle Ck+1 then report argmin{radius(Ck+1,Ck)}, where Ck is the smallest in U k (along with other circles in U k of equal radii, if needed) 4.3. Otherwise, return the circle in U k having the smallest radius. End.

To update the data structure, we do the following: Algorithm Delete − Blue − Update − Logtime(U, p) 1. Let l be the largest number of blue points of any circle in U 2. For each i from l downto k do: 2.1. For all circles C ∈ U i such that p ∈ C, move C from U i to U i−1 (Create a node U i−1 if needed) 2.2. Remove all circles C ∈ U i that go through p

41 2.3. If node U i becomes empty, delete it from U.

3. For each i from l downto k do

Update the OLBIS data structure of U i

End.

Theorem 3.5. With O(mn log(mn)) preprocessing time and space, deletion of a blue point can be performed with O(log2(mn)) reporting time and O(mn log(mn)) update time using procedures Delete − Blue − Query − Logtime, Delete − Blue − Update − Logtime.

Proof. We perform an OLBIS query on a set of O(c) circles in O(log2 c) time, after spend- ing O(c log c) preprocessing time and space(Kurdia, 2008). Each U i can have O(mn) cir- cles, so c = mn in our case. Therefore, reporting an MBSC takes O(log2(mn)) time, with

O(mn log(mn)) pre-processing time and O(mn log(mn)) space. Denote by ni the number of

i Pl circles in U containing p, with i=k ni = O(nm). For each such circle, we spend O(log ni−1) time to re-insert it in U i−1 in proper order. Also, since there are O(n) circles going through p, we require O(n log(mn)) time to remove them from U. After that, we spend O(ni−1 log(ni−1)) time to update the OLBIS structures for each U i (Kurdia, 2008). Therefore, the update time Pl is O( i=k ni log ni−1) + O(mn log(mn)) = O(mn log(mn)).

3.6 Implementation and Experiments

We implemented the unified data structure in Java, and we designed a Java Applet interface that allows the user to place the red and blue points and compute the minimum bichromatic separating circle for the given point sets.

The user options are:

• Select point color (red or blue)

• Place a point by clicking on the canvas

42 Figure 3.11. The Java Applet user interface for computing the minimum separating circle.

• Generate random dataset of red and blue points

• Specify n and m (for generating random datasets)

• Insert a blue point into the data structure

• Remove a blue point from the data structure

• Compute CH(R) and display it in red

• Compute FVD(R) and display it in red. CH(R) needs to be computed before FVD(R)

can be computed.

• (Re-)Compute MBSC, the minimum bichromatic separating circle and display it in

blue. FVD(R) needs to be computed before MBSC can be computed.

Figure 3.11 shows the initial user interface described above.

An output example for a user-specified instance is shown in Figure 3.12. User specifies the red and blue points by clicking on the canvas, and then presses the “CH(R)”, “FVD(R)”,

43 Figure 3.12. An example of output for points specified by user.

Figure 3.13. An example of output for a random dataset. and “MBSC” buttons (in this order). The MBSC for the specified dataset is computed shown in purple. FVD(R) is shown in black.

For an example of output for a random dataset, see also Figure 3.13. The user enters the number of red points as n = 10 and blue points as m = 10, and then presses the

44 Figure 3.14. User inserts the indicated blue point, and the MBSC is updated dynamically (in this case it stays the same).

“Generate random red & blue points” button. A random dataset is then generated. The red points sampled uniformly at random from a circle CR((x0, y0), r0) (i.e., points (r, θ), with r, θ sampled uniformly at random from [0, 200], respectively, [0, 2π]). The blue points are sampled uniformly at random from a circle CB((x0, y0), 4r0). The user then clicks the

“CH(R)”, “FVD(R)”, and “MBSC” buttons (in this order). The MBSC is computed and shown in purple and FVD(R) is shown in black.

Figure 3.14 shows how the MBSC in Figure 3.13 is updated after inserting a blue point.

Figure 3.15 shows the how the MBSC in Figure 3.14 is updated after deleting a blue point.

In addition, we re-implemented the static algorithm to compute the minimum separating circle. Note that the previous implementation does not allow the user to specify the number of random red and blue points to be placed (it always places 18 red and 18 blue random

45 Figure 3.15. User deletes the indicated blue point, and the MBSC is re-computed using the dynamic data structure (in this case it changes).

points). In our implementation, we allow the user to specify any number of random blue and red points to be placed. We tested our implementation on a randomly generated dataset, in which the values of n and m ranged from 10 to 1000000. We hereby show 7 graphs. The first graph (Figure 3.16) shows the running time to compute FVD(R) for different values of n. The next two graphs (Figures 3.17 and 3.18) show the running time to compute an MBSC using the static algorithm for n = 1000 and different values of m (respectively, for m = 1000 and different values of n). For all values of n and m shown, the running time is averaged over three runs. For different values of m and n, we run the queries on three random instances. In each instance the MBSC is first computed statically, and then 5 insertions and 5 deletions are performed. The running times shown are the overall averages for all such instances. The

46 Figure 3.16. The running time to compute FVD(R) for n between 10 and 100000.

Figure 3.17. The running time to compute an MBSC using the static algorithm, for n = 1000 and m between 10 and 1000000.

Figure 3.18. The running time to compute an MBSC using the static algorithm, for m = 1000 and n between 10 and 100000.

47 Figure 3.19. The running time to compute an MBSC after a query, for n = 1000 and m between 10 and 1000000.

Figure 3.20. The running time to compute an MBSC after a query, for m = 1000 and n between 10 and 100000. graphs in Figures 3.19 and 3.20) show, respectively, the running time for queries, for n = 1000 and different values of m (respectively, for m = 1000 and different values of n).

The graphs in Figures 3.21 and 3.22) display the running time for updates, for n = 1000 and different values of m (respectively, for m = 1000 and different values of n). Again, for every value of m and n, three random instances are run, with 5 insertions and 5 deletions performed in each instance.

All instances were run on a desktop computer with 16GB memory and 3.4 GHz under

Windows 7.

48 Figure 3.21. The running time to update the data structure, for n = 1000 and m between 10 and 1000000.

Figure 3.22. The running time to update the data structure, for m = 1000 and n between 10 and 100000.

Tables 3.1, 3.2, 3.3, and 3.4 show our experimental results in more detail. For various values of m, n, we display the running time for the static version (MBSC), insertion query, deletion query, and update. We tested all pairs (m, n) with:

(1) m, n powers of 10 up to 1000,

(2) n = 1000 and m a power of 10 up to 1 million, and

(3) m = 1000 and n a power of 10 up to 1 million.

In Table 3.1, we show the running time (in milliseconds) of our MBSC implementation,

for the static version, insertion query, deletion query, and updates, for n = 10 and different

values of m up to 1000000.

49 Table 3.1. Running time of our MBSC implementation, for the static version, insertion query, deletion query, and updates, for n = 10 and different values of m. n m MBSC time Insert query time Delete query time Update time 10 10 0.33 0.0133 0.01 0.0167 10 100 0.33 0.0166 0.01 0.0167 10 1000 14 0.0166 0.0133 0.0233 10 2000 63 0.0166 0.0133 0.08 10 3000 93.33 0.0166 0.0133 0.1333 10 4000 125 0.0166 0.0133 0.1466 10 5000 175.33 0.0166 0.0133 0.1633 10 10000 336 0.0166 0.02 0.2266 10 20000 714.33 0.02 0.02 0.4066 10 30000 993.33 0.01 0.01 0.05 10 40000 1313.33 0.02 0.02 0.5266 10 50000 1662.33 0.02 0.02 0.5366 10 100000 3342 0.02 0.0266 1.9233 10 1000000 36616 0.03 0.03 40.75

Table 3.2. Running time of our MBSC implementation for n = 100 and various m. n m MBSC time Insert query time Delete query time Update time 100 10 0.33 0.0133 0.01 0.01 100 100 0.33 0.02 0.02 0.0233 100 1000 18.33 0.02 0.02 0.023 100 2000 78 0.0266 0.02 0.05 100 3000 109.33 0.0266 0.0233 0.0933 100 4000 133.33 0.0266 0.0233 0.11 100 5000 183 0.0266 0.0233 0.1033 100 10000 348 0.0266 0.0233 0.2466 100 20000 512.67 0.0266 0.0233 0.3733 100 30000 1070.33 0.0266 0.2666 1.0733 100 40000 1380.33 0.0266 0.03 1.35 100 50000 1800.33 0.0266 0.03 1.45 100 100000 3623 0.0266 0.0333 2.76 100 1000000 37537 0.047 0.034 42.7

In Table 3.2, we show the running time of our MBSC implementation for n = 100 and different values of m up to 1000000.

50 Table 3.3. Running time of our MBSC implementation for n = 1000 and various m. n m MBSC time Insert query time Delete query time Update time 1000 10 5.55 0.0133 0.0166 0.0133 1000 100 6 0.03 0.02 0.0133 1000 1000 26.67 0.0267 0.02 0.0267 1000 2000 70.67 0.03 0.0233 0.0633 1000 3000 125 0.03 0.0233 0.07 1000 4000 161.33 0.03 0.0233 0.07 1000 5000 213.67 0.03 0.0233 0.1033 1000 10000 387 0.0333 0.03 0.2733 1000 20000 799.67 0.0333 0.03 0.5766 1000 30000 1181 0.0333 0.0336 0.7433 1000 40000 1655 0.0333 0.0336 0.7766 1000 50000 1996 0.04 0.0336 1.1533 1000 100000 3994 0.06 0.04 2.9266 1000 1000000 44864 0.09 0.06 49.5

Table 3.4. Running time of our MBSC implementation for m = 1000 and various n. n m MBSC time Insert query time Delete query time Update time 2000 1000 27 0.0266 0.0233 0.0266 3000 1000 36.3333 0.0296 0.0233 0.0266 4000 1000 36.6666 0.0296 0.03 0.0333 5000 1000 38.6666 0.0296 0.03 0.0333 10000 1000 57.3333 0.0333 0.0366 0.0533 20000 1000 62.3333 0.0333 0.0366 0.0533 30000 1000 88.6666 0.0366 0.0433 0.0666 40000 1000 109.6666 0.0433 0.0433 0.0666 50000 1000 114.6666 0.0433 0.0433 0.0666 100000 1000 197.3333 0.0733 0.0633 0.1833 1000000 1000 1149 0.0767 0.1111 0.2067

In Table 3.3, we show the running time of our MBSC implementation for n = 100 and different values of m up to 1000000.

In Table 3.4, we show the running time of our MBSC implementation for m = 1000 and different values of n up to 1000000.

51 3.7 Conclusion and future work

We presented a unified data structure for efficient insertions and deletions of blue points. We also presented data structures that allow logarithmic query time for insertions, as well as for deletions of blue points. We leave open a few problems: (1) efficient data structures for insertion and deletion of red points. Note that, since virtually all known algorithms for finding the MBSC rely on FVD(R), it seems that solutions for insertion/deletion of red points would rely on the dynamic FVD, which is known to have Ω(n)) worst case update time. (2) unified data structures for insertion and deletion of both blue and red points. (3) algorithms for computing the minimum separating sphere for red and blue points in higher dimensions.

52 CHAPTER 4

MAXIMUM AREA BICHROMATIC SEPARATING

RECTANGLE PROBLEM

4.1 Introduction

Consider two sets of points, R and B in the plane R contains n points called red points, and B contains m points called blue points. We say that a rectangle r separates the sets R and B, if it contains all the red points and the minimum number of blue points. Rectangle r is called a separating rectangle for R and B. The problem we study in this chapter, called the Maximum Area Bichromatic Separating Rectangle problem, is to find the separating rectangle for R,B that has the maximum area. We call such a rectangle a maximum separating rectangle. An example of maximum separating rectangles is illustrated in Figure 4.1.

Applications that require separation of bi-color points could benefit from efficient results to this problem. As we mentioned in Chapter 1, such applications may be in digital circuit design and tumor treatment planning. For instance, consider we are given a tissue containing a tumor, where the tumor cells are specified by their coordinates. The coordinates of healthy cells are also given. The goal is to separate the tumor cells from the healthy cells, for surgical removal or radiation treatment. Another application would be in city planning, where blue points represent buildings and red points represent monuments and the goal is to build a park that contains the monuments and as few buildings as possible.

4.1.1 Related work

The problem of finding the largest P -empty axis-aligned rectangle among a set P of n points that is contained in a given rectangle A, was introduced by Hsu et. al (Hsu, 1984). An illustration of the largest P -empty rectangle for a point set P is given in Figure 4.2. They

53 Figure 4.1. Red points in R are shown in solid circles and blue points in B are shown in empty circles. The minimum enclosing rectangle Smin of all red points and the maximum separating rectangle S∗ are displayed. consider the axis-aligned version and show how to find all optimal solutions in O(n2) worst- case, O(n log2 n) expected time. Later, Chazelle et. al showed how to find one optimal solution in O(n log3 n) in the worst case (Chazelle, 1986). They also prove that the largest empty square can be found optimally in O(n log n) time using Voronoi diagrams in the L1- metric space. Currently, the best-known result for this problem is by Aggarwal and Suri (Aggarwal, 1987), namely O(n log2 n) time to find an optimal solution. Mukhopadhyay et. al (Mukhopadhyay, 2003) studied the problem of finding the largest empty arbitrary oriented rectangle among a planar point set bounded by a given axis-aligned rectangle. Figure 4.3 illustrates this problem. They give an O(n3)-time, O(n2)-space algo- rithm to find all such maximum empty rectangles. Chaudhuri et. al (Chaudhuri, 2003) independently gave a different solution for the same problem, also with O(n3) time and O(n2) space. For the 3D axis-aligned case, Nandy et. al (Nandy, 1998) give an algorithm to compute all the maximal empty boxes contained in a given box, i.e., boxes that cannot be extended

54 Figure 4.2. For a set of points P and a bounding rectangle A, the largest P -empty axis- aligned rectangle contained in A is shown, denoted by S∗. in any direction without introducing a point. The algorithm runs in O(n2 log n + M) time and uses O(n) space, where M is the number of maximal empty boxes. They also prove M = O(n3) in the worst case, giving a running time of O(n3). Datta et. al (Datta, 2000) also solved the 3D axis-aligned case. They presented an output-sensitive algorithm that reports all maximum-volume empty boxes in O(n2 log2 n + M) time and O(n2 log n) space and also proved M = O(n log4 n) in the expected case. Dumitrescu and Jiang (Dumitrescu, 2013) considered the problem of computing the maximum-volume axis-aligned d-dimensional box that is empty with respect to a given point set P in d dimensions. The target box is restricted to be contained within a given axis-aligned box R. For this problem, they give the first known FPTAS, which computes a box of a volume at least (1 − )OPT , for an arbitrary , where OPT is the volume of the optimal box. Their algorithm runs in O((22d−2)d · n logd n) time. In a different paper, they

55 Figure 4.3. Largest P -empty rectangle of arbitrary orientation bounded by a rectangle A is shown, for a set of points P . consider the same problem (Dumitrescu, 2012), but with P consisting of random points.

They show that the expected number of maximal boxes in the unit hypercube [0, 1]d in Rd

(2d−2)! d−1 is (1 ± o(1)) (d−1)! n ln n, for any fixed dimension d. They also prove a matching lower bound of Ω(n logd−1 n), where previously only the upper bound of O(n logd−1 n) was known.

More recently, they show that the number of empty boxes of maximum area among n points

is N = Ω(n) and N = O(n log n2α(n)) for the 2D case (d = 2) (Jiang, 2016). They also prove

N = O(nd) and N = Ω(2αd−2(n)n) for any fixed dimension d.

Kaplan and Sharir study the problem of finding the largest empty axis-aligned rectangle

inside a given bounding rectangle A and containing a query point (Sharir, 2011). See Figure

4.4 for an illustration of this problem. They design an algorithm to answer queries in

O(log4 n) time, with O(nα(n) log4 n) time and O(nα(n) log3 n) space for preprocessing. Here

α(n) is the inverse of the Ackermann’s function. It is worth noting that, when an “origin”

56 Figure 4.4. For set P of planar points and a bounding rectangle A, largest P -empty rectangle S∗ inside A that contains only query point q is displayed. point O and staircases of points around O are given, they can find the largest rectangle containing only O in O(nα(n)) time (Sharir, 2011; Mozes, 2017). In a different paper, they also solve the disk version of the problem (finding the maximal empty disk containing a query point) (Kaplan, 2012). Their approach takes O(log2 n) query and O(n log2 n) preprocessing time, using a O(n log n) space data structure. Recently, Gester et. al (Gester, 2015) considered the query version of finding the largest empty axis-aligned square inside a rectilinear polygon P . In this version, the query is a point p located inside the polygon, and the goal is to find the largest axis-aligned square containing only p, as shown in Figure 4.5. They describe a data structure of size O(n) that can be used to answer queries in O(log n) time with O(n log n) time for pre-processing. Gutierrez et. al (Gutierrez, 2012) consider another query version of the largest empty box, in which queries are points in multi-dimensional databases. They present an algorithm

57 Figure 4.5. Given rectilinear polygon P and a point p ∈ P , the shaded square S∗ is the largest square inside P containing only p. that is based on the assumption that the points are stored in an R-tree, and run experiments with it, which prove that it runs 3 to 140 times faster and uses 3-4% as much memory as the best algorithm not using this assumption. See Figure 4.6 for an example. Abellanas et. al (Abellanas, 2001) solved the following problem. Given k sets of up to n points, compute the smallest rectangle containing a point from each set (see Figure 4.7 for an illustration). The authors give an algorithm to solve this problem in O(n(n − k) log2 k) time. They also show how to compute the narrowest strip defined by two parallel lines that covers all k sets (also shown in Figure 4.7) in O(n2α(k) log k) time.

4.1.2 Our Results

In this work, we study the 2D axis-aligned version of the problem. That is, the given point sets are in the plane, and the target rectangle has to be axis-aligned. We first prove that the

58 Figure 4.6. Given bounding box R and its R-tree hierarchical representation with smaller boxes R1,R2,R3,R4, with input points being corners of R1,R2,R3,R4, the largest empty rectangle containing only query point q, denoted S∗, is shown with interrupted lines.

Figure 4.7. Given 4 sets of points, each denoted by a different kind of filled or empty circles or squares, the rectangle S∗ shown is the smallest that contains points from all sets, and the strip W ∗ shown is the narrowest that contains points from all sets.

59 number of optimal solutions is Ω(n) in the worst case and give an algorithm to compute all optimal solutions. After that, we give an O(m log m + n) time, O(m + n) space algorithm to find one optimal solution, based on a staircase approach. The running time reduces to O(m+n) if the points are pre-sorted by one of the coordinates. Finally, we prove a matching lower bound for the problem, by reducing from a known “Ω(n log n)-hard” problem. We begin by giving some definitions and making some important observations in Section 4.2. In Section 4.3, we prove the lower bound of Ω(n) for the number of optimal solutions and present our algorithm to find all of them. Then, in Section 4.4, we discuss our algorithm to find one optimal solution. After that, in Section 4.5, we prove the lower bound for the problem. Finally, in Section 4.6, we draw the conclusion for this work and give some future directions. These contributions have been published in (Armaselu, August 2016) and submitted for publication in (Armaselu, 2017).

4.2 Preliminaries

We begin by describing some properties of the bounded optimal solution.

Observation 4.2.1. The maximum axis-aligned separating rectangle S must contain at least one blue point on each of its sides.

Consider a quad Q of 4 non-colinear blue points q1, q2, q3, q4 in this order. Two vertical lines going through two of these points and two horizontal lines going through the other two of those points define a rectangle S. We say that Q defines S.

Definition 4.2.1. Consider a vertical strip formed by the two vertical lines bounding R to the left and right, as well as a horizontal strip formed by the parallel lines bounding R above and below. The minimum R-enclosing rectangle Smin is the intersection between the vertical and horizontal strips (refer to Figure 4.8).

60 Definition 4.2.2.A candidate rectangle is an R-enclosing rectangle that contains the min- imum number of blue points and cannot be extended in any direction without introducing a blue point.

We start with the minimum R-enclosing rectangle Smin. For each side of Smin, we slide it outwards parallel to itself until it hits a blue point (if no such point exists, then the solution is unbounded). Denote by Smax the resulting rectangle (shown in Figure 4.1). Unbounded solutions can be easily determined in linear time, so from now on we assume bounded solutions. If the interior of Smax \ Smin does not contain blue points, then Smax is the optimal solution, and we are done. We discard the blue points contained in Smin, as well as the blue points outside of Smax, from B. The set of remaining blue points is partitioned into 4 disjoint subsets (quadrants) BNE,BNW ,BSW ,BSE. Each quadrant contains points that are located in a rectangle formed by right upper (resp. left upper, left lower, right lower) corners of Smin and Smax (see Figure 4.8 for details).

Definition 4.2.3. For two blue points p(xp, yp), q(xq, yq) ∈ BNE, we say that p dominates q if xp > xq and yp > yq. By flipping inequalities, we extend this definition for any two points p, q that are both located in BNW ,BSW , or BSE.

Consider the points of BNE and sort them by X coordinate. A point p ∈ BNE is a candidate to be a part of an optimal solution only if there are no points q ∈ BNE such that p dominates p. We only leave such possible candidate blue points in BNE and discard the rest.

The elements of BNE form a staircase sequence, denoted STNE, which is ordered non- decreasingly by X coordinate and non-increasingly by Y coordinate, as shown in Figure 4.9.

The sets BNW ,BSW ,BSE are treated appropriately in a similar way and form the staircases

STNW ,STSW ,STSE, which are ordered non-decreasingly by X coordinate. The 4 staircases

61 Figure 4.8. The minimum enclosing rectangle Smin, the rectangle Smax which bounds the solution space, and the subsets BNE,BNW ,BSW ,BSE. can be found in O(m log m) time (Mukhopadhyay, 2003) and do not change in the axis- aligned cases. Thus, abusing notation, we will refer to STq as simply Bq for each quadrant q.

While the staircase construction approach has been used before (Mukhopadhyay, 2003), there are differences between how we use it in this paper and how it was used in (Mukhopad- hyay, 2003). Specifically, note that a maximum B-empty rectangle for a given orientation, computed as in (Mukhopadhyay, 2003), may not contain all red points.

4.3 Finding all optimal solutions

We first prove that we can have Ω(m) maximum separating rectangles in the worst case. To do that, we use a construction similar to the one in (Jiang, 2016), except that we have to ensure that all rectangles will contain Smin.

Theorem 4.3.1. In the worst case, there are Ω(m) maximum separating rectangles.

62 Figure 4.9. The set of candidate points defining a solution. The sets BNE,BNW ,BSW ,BSE are ordered by X, then by Y, and form a staircase.

Figure 4.10. There are 4 red points directly above, below, to the right and to the left of the origin O. The blue points are p, q, in BNE, and m − 2 other points in BSW . All rectangles enclose R and have the same area of x0y0, thus giving Ω(m) maximum rectangles.

63 Proof. Consider that all blue points are in BNE ∪ BSW and refer to Figure 4.10. Let BNE be composed of two points p, q and BSW of a sequence of points r1, . . . , rm−2, with the following coordinates.

x0 y0 1. p( 4 , 2 ), with x0 > 0, y0 > 0;

x0 y0 2. q( 2 , 4 );

3. ri(xi, yi), ∀i = 1, . . . , m − 2;

3x0 4. − 2 < xi < xj < 0, ∀i, j : 1 ≤ i < j ≤ m − 2;

y0 x0y0 5. yi = 2 − x0 , ∀i > 1. 2 −xi−1

x0 y0 In addition, let R contain 4 red points sE( 2 , 0), sN (0, 2 ), sW (xm−3, 0), sS(0, y2).

It is easy to check that all rectangles passing through p, q, ri, ri+1, for some i, j : 1 < i < j ≤ m − 2, enclose R. Moreover, all these rectangles have an area equal to (yp − yi) · (xq −

x0y0 x0 xi−1) = x0 · ( 2 − xi−1) = x0y0. All larger rectangles either contain a blue point or do 2 −xi−1 not contain all red points. Thus, there are Ω(m) maximum separating rectangles.

To compute all maximum separating rectangles, we do the following.

We first compute Smin,Smax in O(m + n) time. Then, we find all B-empty rectangles

2 bounded by Smax using the approach in (Hsu, 1984) in O(m ) time. For each such rectangle, we check whether it contains Smin and, if it does not, we discard it. Finally, we report the remaining rectangles that have the maximum area. We have proved the following result. Theorem 4.3.2. All axis-aligned maximum area rectangles separating n red points and m blue points can be found in O(m2 + n) time.

4.4 Finding one optimal solution

Suppose Smin,Smax, and the staircases Bi, i = NE,NW,SW,SE, have been already com- puted. We describe an algorithm to find only one optimal solution based on these inputs. We call this problem the staircase problem.

64 Observe that all maximal rectangles containing Smin are defined by tuples of four points from Bi, i = NE,NW,SW,SE (called support points). Each of these points supports an edge of the rectangle. For the k-th candidate rectangle, denote the top support by topk, the left support by leftk, the right support by rightk and the bottom support by bottomk. The problem that we solve is essentially the one of finding the maximal B-empty rectangle containing a given “origin” point, considered in (Sharir, 2011; Mozes, 2017). However, in our case, the target rectangle has to contain Smin, rather than a single origin point. Based on the position of each support of a candidate rectangle, three cases may arise (Sharir, 2011; Mozes, 2017). Note that their solution takes O(m log m) time in cases 1 and 2 and O(mα(m)) time in case 3. The following is the list of the three cases.

Case 1. Three supports are in the same side of Smin and the fourth support is on the opposite side of Smin. Case 2. Each support is from a different quadrant. Case 3. Two supports are from a quadrant and the other two are from an opposite quadrant. We now show how to solve this problem in O(m) time in every case.

4.4.1 Case 1

Without loss of generality, suppose that the top, right, and bottom supports lie to the right

of Smin and the left one lies to the left of Smin (as in Figure 4.11). The other four cases are treated in a fully symmetric manner. Note that, for each top-right tuple with the top support

in BNE, there is a unique bottom support to the right of Smin, which is in BSE. Thus, the left support is also unique. Also, the top and the right supports must be consecutive in X order, so there are O(m) top-right tuples. Similarly, for each top-left tuples with the top

support in BNW , we get a unique bottom-right tuple. This gives us a total of O(m) candidate rectangles in case 1.

65 Note that case 1 corresponds to the case (i) from (Sharir, 2011; Mozes, 2017), where

three points defining a maximal empty rectangle are in a halfplane defined by one of the

axes. Suppose this is the halfplane to the left of the Y axis and denote by Pl the subset of

points in this halfplane. For each point p ∈ Pl, there is at most one rectangle with three

defining points on its left half that is supported by p on its left side. Such a rectangle is

obtained by connecting p to the Y axis by a horizontal segment, then sliding it vertically in

both directions until it hits a point in Pl in both directions, and then sliding the right edge

of the rectangle from the Y axis to the right until it hits a point. Thus, if there are m input points, there are O(m) rectangles in case (i) (Sharir, 2011; Mozes, 2017).

Later on, we will show how to find the optimal solution in case 1.

Figure 4.11. The top, right, and bottom supports lie to the right of Smin and the left support lies to the left of Smin. For any such top-right pair (topk, rightk), there is a unique bottom support bottomk to the right of Smin and a unique left support leftk to the left of Smin.

66 4.4.2 Case 2

Suppose, without loss of generality, that topk ∈ BNE, which implies rightk ∈ BSE, bottomk ∈

BSW and leftk ∈ BNW (see Figure 4.12). For each top-right tuple satisfying these conditions, there is a unique bottom support from BSW and a unique left support from BNW . Again, the top and the right supports are consecutive in X order, yielding O(m) top-right tuples.

Similarly, for each top-left tuple with topk ∈ BNW , the bottom and right supports satisfying the condition are unique. Thus, there are O(m) candidate rectangles in Case 2. Case 2 corresponds to the case (ii) from (Sharir, 2011; Mozes, 2017), where each defining point of a maximal empty rectangle is located in a different quadrant. Suppose that the

first quadrant contains the right support pr of the rectangle, the second quadrant contains the top support pt, the third quadrant contains the left support pl, and the fourth quadrant contains the bottom support pb. In (Sharir, 2011; Mozes, 2017) it is shown that pr can be the right support of at most one rectangle whose supports are in four different quadrants. Thus, there are O(m) rectangles in case (ii) (Sharir, 2011; Mozes, 2017). The algorithm to find an optimal solution in case 2 will be revealed later on.

4.4.3 Case 3

Suppose that topk, rightk ∈ BNE and bottomk, leftk ∈ BSW (refer to Figure 4.13). For each such top-right pair, there are multiple choices of bottom-left pairs formed by adjacent points from BSW . However, the bottom support has to be above or equal to the last point p ∈ BSE to the left of rightk, if p exists, and the left support has to be to the right or equal to the last point q ∈ BNW below topk, if q exists.

Consider two functions, f and l, that assign, to every top-right pair (topk, rightk), the index in BSW of the first (resp., last) bottom support that occurs with (topk, rightk), denoted by f(k) and l(k), respectively. Note that f and l are monotonically decreasing functions (as shown in Figure 4.13).

67 Figure 4.12. Each support is from a different quadrant, with topk ∈ BNE. For any such top-right pair (topk, rightk), there is a unique bottom support bottomk ∈ BSW and a unique left support leftk ∈ BNW .

2 Figure 4.13. For each top-right pair (topk, rightk) ∈ BNE, there are multiple bottom-left 2 pairs (bottomk, leftk) ∈ BSW . However, they have to lie above p (if p exists), and to the 2 right of q (if q exists). For the next top-right pair (topk0 = rightk, rightk0 ) ∈ BNE, the bottom-left pairs have to lie above p0 (if p0 exists) and to the right of q0 (if q0 exists). Note 0 0 that p is after p in BSE and q is before q in BNW .

68 Case 3 gives O(m2) tuples so, from now on, we focus on case 3.

Note that all supporting points are in BNE ∪ BSW . Candidate rectangles are defined by two pairs of adjacent points in BNE and BSW respectively. Denote by (pi, qi), the i-th pair

of adjacent points in BNE, and by (rj, sj), the j-th pair of adjacent points in BSW . Consider a matrix M such that M(i, j) denotes the area of the rectangle supported on the top-right

by (pi, qi) and on the bottom-left by (rj, sj). Some entries (i, j) of M correspond to cases

where (pi, qi) and (rj, sj) do not define a B-empty rectangle (e.g. rj is below p or sj is to the left of q), and are therefore set to “undefined”. The goal is to compute the maximum of

each row i, along with the column j(i) where it occurs. To break ties, we always take the

rightmost index.

It is easy to see that the defined portion of each row of M is contiguous. Since the

functions f and l are monotonically decreasing, the defined portion of M has a staircase

structure (as shown in Figure 4.14). We say that M is staircase-defined by f and l. Moreover,

it turns out that M is a partially defined inverse Monge matrix (Sharir, 2011) of size O(m) ·

O(m), and thus all row-maxima can be found in O(mα(m)) time using the algorithm in

(Mozes, 2017).

Figure 4.14. Staircase matrix M. The defined portion is marked by X’s.

69 To eliminate the α(m) factor, we extend M to a totally inverse monotone matrix, so that all row maxima can be found in O(m) time using the SMAWK algorithm (Smawk, 1987). Recall that M is totally inverse monotone iff ∀i, j, k, l such that 1 ≤ i < k ≤ m, 1 ≤ j < l ≤ m, we have M(k, j) ≤ M(k, l) => M(i, j) ≤ M(i, l). To do that, we could use the approach in (Sinha, 1994) to make M totally inverse monotone, by filling it with 0’s on both sides of the defined portion. However, it does not work in our case. Note that it may happen that there exist indices i, j, k, l : 1 ≤ i < k ≤ m, 1 ≤ j < l ≤ m such that M(i, j) > 0 and M(i, l) = M(k, j) = M(k, l) = 0, so M is not totally inverse monotone. Moreover, it follows by a similar argument that M is not totally monotone either. Therefore, we resort to a different filling scheme, which is similar to the one in (Kleitman, 1990). Specifically, we fill only the left undefined portion of each row with 0. That is, if the defined portion of row i starts at j1(i), then M(i, j) = 0, ∀1 ≤ i ≤ m, 1 ≤ j < j1(i). We also fill the right undefined portion of each row with negative numbers such that, if the defined

portion of row i ends at j2(i), then M(i, j) = j2(i) − j, ∀1 ≤ i ≤ m, j2(i) < j ≤ m, that is, negative numbers in decreasing order (see Figure 4.15). Lemma 4.4.1. M is a totally inverse monotone matrix.

Proof. Suppose this is not the case. Then there exist i, j, k, l, 1 ≤ i < k ≤ m, 1 ≤ j < l ≤ m such that M(i, j) > M(i, l) and M(k, j) ≤ M(k, l). If M(i, l) < 0, then by construction M(k, l) < 0, so M(k, j) > M(k, l), contradiction. Thus, M(i, l) ≥ 0. If M(i, l) = 0, then by construction M(i, j) = 0, again contradiction. Hence, M(i, j) > M(i, l) > 0, which entails M(k, j) > M(k, l) or M(k, j) > 0,M(k, l) > 0. The first choice gives a contradiction, so the only remaining possibility is M(i, j) > M(i, l) > 0 and 0 < M(k, j) ≤ M(k, l). But this contradicts the total inverse monotonicity of the defined (positive) portion of M.

As a sidenote, if the functions f and l were monotonically increasing, rather than decreas- ing, we could make M totally monotone instead. That is, M(i, j) < M(i, l) => M(k, j) <

70 Figure 4.15. M is padded on the left of the defined portion with 0’s, and on the right of the defined portion with negative numbers in decreasing order.

M(k, l). To do that, we fill the “undefined” portion of M with zeros on the right side, and negative numbers in increasing order on the left side of the defined portion of M. By a similar argument as in Lemma 4.2.4, it follows that M is totally monotone. Note that computing M explicitly would take Ω(m2) time. To avoid that, we only store the pairs from BSW that may define optimal solutions in a list, as in (Smawk, 1987), and evaluate M(i, j) only when needed. Thus, we only require O(m) time and space.

4.4.4 Algorithm

In order to find the optimal solution, after having computed the staircases, we do the fol- lowing.

Suppose we have fixed a top-right pair (pk, qk), with pk ∈ BNE(BNW ), qk ∈ BNE ∪

BSE(BNE ∪ BSW ) in X order. The leftmost possible support is the highest point in BNW below pk, denoted belowNW (pk), and the lowest possible support is the rightmost point in

BSE to the left of qk, denoted leftSE(qk). Both such extremal supports can be found in O(1) time if we store, with each point p in some staircase, and each quadrant q, the pointers belowq(p) and leftq(p) (see Figure 4.16).

71 Figure 4.16. The pointers above, below, left and right.

In addition, we consider, for each r, the pointers aboveq(r) to the lowest point in quadrant

q above r, and rightq(p) to the leftmost point in quadrant q to the right of p. These pointers can be precomputed in O(m) time for all p ∈ B through a scan in X order. Also, the staircases are stored as doubly-linked lists with pointers prev(p), next(p) to the point before (resp., after) p in the respective staircase, plus the pointers mentioned above whenever needed.

We then consider all bottom-left pairs occurring with (pk, qk) and assume wlog that pk ∈ BNE. If pk, qk and one of leftSE(qk), bottomNW (pk) are on the same side of Smin, say left(qk), then we are in case 1 with a rectangle defined by pk, qk, left(qk), above(left(qk)). If

pk, qk, leftSW (qk), and belowNW (pk) are on different quadrants, then we are in case 2 with

a rectangle defined by pk, qk, leftSW (qk), belowNW (pk). Thus, cases 1 and 2 require O(m) time

and space in total. Otherwise (i.e., we are in case 3), we store f(k) = rightSW (belowNW (pk)), l(k) =

aboveSW (leftSE(qk)), respectively, in two arrays, F and L. After all top-right and top-left pairs are treated, we run the SMAWK algorithm as de- scribed earlier, in order to compute all row-maxima of M in O(m) time, where M is staircase-

72 defined by F and L, and M(i, j) is the area of rectangle defined by the i-th pair from one

quadrant and the j-th pair from the opposite quadrant in case 3. We then report the (last

index) rectangle corresponding to the maximum between all row-maxima of M and all max-

imum area rectangle obtained in cases 1 and 2, along with its area.

Thus, we have proved the following result.

Theorem 4.4.2. The axis-aligned version of the maximum-area separating rectangle

problem can be solved in O(m log m + n) time and O(m + n) space. The running time

reduces to O(m + n) if the blue points are presorted by their X coordinates.

4.5 Lower bound

In this section, we prove that Ω(m log m+n) steps are sometimes needed in order to compute a maximum axis-aligned separating rectangle, provided that the blue points are not pre- sorted.

To do that, we reduce our problem from the 1D-Furthest-Adjacent-Pair problem, which is known to have a lower bound of Ω(m log m) for a set of m numbers.

In the 1D-Furthest-Adjacent-Pair, we are given a set A of m real numbers, and the goal is to find the two numbers a, b ∈ A, a < b for which the quantity b − a is the maximum among all adjacent pairs in the sorted order of A (denoted by A0). Wlog assume that all these numbers are in the interval [0, 1].

The reduction to our problem is as follows. For each ai ∈ A, we consider two points,

1 1 pi(ai, ) and qi(−ai, − ). The set of blue points B is the set of all such pi’s and 1+ai 1+ai qi’s. The red point set R consists of the origin O, together with four special points,

1 1 sE(a2, 0), sN (0, ), sW (−a2, 0), and sS(0, − ). See Figure 4.17 for details. 1+am−1 1+am−1 Now we prove that the reduction works.

73 Figure 4.17. R consists of the origin O(0, 0) and four points sE, sN , sW , sS. For each ai ∈ A, there are two blue points pi, qi of opposite coordinates. Each tuple (pi, pj, qi, qj), 1 ≤ i, j ≤ m defines a candidate rectangle.

Lemma 4.5.1. Two adjacent numbers ai, aj ∈ A form the furthest adjacent pair of numbers if and only if the rectangle S bounded by pi, pj, qi and qj is one of the separating rectangles of maximum area.

0 0 Proof. Let ai, aj be adjacent in A and aj −ai ≥ |ak −al|, ∀ak, al ∈ A that are adjacent in A .

Note that y(pi) decreases while x(pi) increases (same for qi, pj, qj) and y(pi) ≥ sN , x(pj) ≥ sE, y(qi) ≤ sS, x(qj) ≤ sW , ∀1 ≤ i < j ≤ m, so S encloses R. We need to show that the rectangle S has the following properties: (1) does not contain any blue point, and (2) has the maximum area among all such rectangles.

If S would contain a point pk (similarly, qk), then we would have x(pk) = ak with ai < ak <

1 1 aj, contradiction. So property (1) is satisfied. We have the points pi(ai, ), qi(−ai, − ), 1+ai 1+ai

1 1 aj pj(aj, ), and qj(−aj, − ). This means that area(S) = 4 . Let aj − ai = d > 0 and 1+aj 1+aj 1+ai f(d) = area(S). We have f(d) = 4 ai+d , which is monotonically increasing. Therefore, S has ai+1 the highest area among all rectangles with property (1).

74 Now let S be a largest separating rectangle bounded by (pi, pj, qj, qi), for some ai, aj.

Since S does not contain any blue points, there cannot exist any ak ∈ A : ai < ak < aj, so

0 ai, aj are adjacent in A . Moreover, since area(S) is monotonically increasing in aj − ai, it

0 follows that aj − ai ≥ |ak − al|, ∀ak, al ∈ A that are adjacent in A .

This gives us the following result.

Theorem 4.5.2. Ω(m log m + n) steps are needed in order to compute the maximum axis-aligned rectangle separating R and B.

Proof. First, Ω(n) steps are needed in order to compute Smin, as one cannot compute the

optimal solution without knowing Smin. The Ω(m log m) term follows from the reduction from 1D-Furthest-Pair.

4.6 Conclusion and future work

We addressed the problem of finding the maximum area separating rectangle of red and blue

points and solved the 2D axis-aligned case. We first proved that there are Ω(m) optimal solutions in the worst case, and provided an O(m2 + n) time algorithm to find all opti- mal solutions. After that, we presented an algorithm to compute one optimal solution in

O(m log m + n), which becomes O(m + n) if blue points are pre-sorted. We also proved this bound to be optimal within a multiplicative constant, by proving a matching lower bound of O(m log m + n) for the problem, using a reduction from the 1D-Furthest-Adjacent-Pair problem.

We leave open a few topics, including:

(1) improving the running time bounds in (Armaselu, August 2016) for the 3D case -

O(m2(m + n)), and for the 2D arbitrary orientation case - O(m3 + n log n). Note that the

algorithms presented in (Armaselu, August 2016) for those two cases compute all optimal

75 solutions. It would be interesting to find algorithms that compute only one optimal solution, but much faster than O(m2(m+n)) and O(m3+n log n) time for those two cases, respectively. (2) efficient approximation algorithms for the versions mentioned above. (3) either prove or disprove that the largest empty rectangle among n points in the plane can be computed in o(n log2 n) time. Note that any largest empty rectangle algorithm running in o(n log2 n) time would be an improvement over Aggarwal and Suri’s algorithm, which has O(n log2 n) running time. (4) data structures that allow to efficiently update the maximum separating rectangle when the red points are given dynamically as query points. It would also be interesting to see if one could adapt the data structures in (Sharir, 2011; Mozes, 2017) to solve this dynamic version.

76 CHAPTER 5

EXTENSIONS OF THE MAXIMUM SEPARATING

RECTANGLE PROBLEM

5.1 Introduction

In the previous chapter, we argued that computing the maximum separating rectangle is

motivated by applications, such as tumor separation in cancer treatment, or component

placement in circuit design. In practice, it is often possible that the entities that we aim

to separate are “large” and cannot be simply considered as points. For example, in tumor

detection, tissue cells can be represented as small circles, and tumor extraction would reduce

to separating two sets of circles (or points from circles). Similarly, in circuit design, circuit

components that are to be avoided can be considered as rectangles, so component placement

reduces to finding the optimal rectangle separating points from rectangles. Moreover, there

may be situations in which one has to separate uncertain data. Depending on the type of

uncertainty, the uncertain data points can be modeled as circles, squares, rectangles, or other

shapes.

5.1.1 Related work

The problem of computing a maximal empty axis-aligned rectangle among n axis-aligned

non-intersecting rectangles was solved by Nandy, Bhattacharya, and Ray (Ray, 1990). They gave an algorithm that runs in O(n log n + r) time to find all r maximal empty rectangles.

Nandy, Bhattacharya, and Sinha (Sinha, 1994) solved the problem of finding the largest

empty axis-aligned rectangle among n arbitrary obstacles such as line segments and polygons.

Their solution takes O(n log2 n) time for both types of obstacles and computes one optimal

solution. They also consider the problem of finding the largest empty rectangle inside a

polygon in O(n log2 n) time.

77 McKenna et. al considered the problem of finding the largest empty rectangle inside

a rectilinear polygon of size n (McKenna, 1985). They provide an algorithm that runs in

O(n log5 n) time. Snoeyink et. al worked on computing the largest isothetic rectangle located inside a convex n-gon and provided an O(log n)-time algorithm (Snoeyink, 1995).

The largest empty isothetic orthoconvex polygon problem, which asks for the largest

axis-aligned orthoconvex polygon not containing any of the given n points, was studied by

Datta and Ramkumar (Ramkumar, 1990). They gave an O(n3)-worst case, O(n2 log n)-

expected time algorithm for the case with L-shaped polygons, an O(n3) time algorithm for

cross-shaped polygons, and an O(n2) time algorithm for edge-visible orthoconvex polygons.

Nandy, Bhattacharya, and Mukhopadhyay solved the general case using a O(n3) time, O(n2)

space algorithm (Nandy, 2008).

5.1.2 Our contributions

In this chapter, we consider extensions of the maximum separating rectangle problem, in

which the target rectangle must be axis-aligned and must avoid various geometric objects

other than points. That is, we are given a set of n red points R and a set of m blue shapes B,

which can be points, rectangles, or other. We first consider the blue rectangles version, in

which B consists of possibly intersecting blue axis-aligned rectangles. After that, we consider

the outlier version, in which B consists of points, and we allow up to k blue points (called

“outliers”) to be part of a solution, where k is a given integer.

This chapter is organized as follows. First, we give the notations and preliminary results

that will be used for both versions. Then, in Section 5.2, we study the blue rectangles version

and give an algorithm that runs in O(m log m + n) time. After that, in Section 5.3, we study

the outlier version and give an algorithm that runs in O(k7m log m + n) time. Finally, in

Section5.4, we draw the conclusions and mention the problems that we leave as future work.

This work has been published in (Raichel, October 2016).

78 5.1.3 Preliminaries

Figure 5.1. The lines defining Smin partition the plane into 9 regions: Smin, 4 quadrants NE,NW,SW,SE, and 4 side regions E, N, W, S. By sliding Smin outwards in each side region until it hits a blue point, we obtain a rectangle Smax.

We are going to reuse the notations from Chapter 4. Specifically, we denote by Smin the smallest rectangle enclosing the red points. The lines defining Smin partition the plane into

9 regions, namely Smin itself, 4 quadrants NE,NW,SW,SE, and 4 side regions E, N, W, S.

By sliding Smin outwards in each side region until it hits a blue point, we obtain a rectangle

Smax. See Figure 5.1 for an illustration.

Since the blue points inside Smin cannot be avoided, we can assume w.l.o.g. that there are no blue points inside Smin. So we can restate the goal as finding the largest rectangle containing all red points and avoiding all blue points (respectively, blue rectangles).

We now give some definitions.

79 Definition 5.1.1. For any quadrant Q and integer k, the k-th level staircase of Q,

2 denoted STk(Q), is the boundary of the subset of points in R that dominates at most k points in Q.

Note that ST0(Q) = BQ for any quadrant Q. Recall from Chapter 4 that the staircase

problem, i.e., finding the maximum separating rectangle given Smin,Smax, and the staircases

BQ, can be solved in O(m) time.

5.2 Blue rectangles version

Figure 5.2. Largest rectangle S∗ containing all red points and avoiding all blue rectangles is shown.

Given a set of n red points R and a set of m blue axis-aligned rectangles B, the goal is to find the largest rectangle that contains all red points and avoids all blue rectangles, called the maximum separating rectangle of R and B. An example is shown in Figure 5.2. To solve this problem, we reduce it to the staircase problem, by constructing a blue point set B0 as follows.

For the side region E, for each rectangle r that intersects E, consider a point pr on the

0 left edge of r, and add it to B . Note that a rectangle containing Smin intersects or contains

r if and only if it contains pr. The other three side regions are treated in a similar fashion.

80 For the quadrant NE, for each rectangle that is contained in NE, consider the corner

0 qr of r closest to Smin, and add it to B . Note that a rectangle containing Smin intersects or

contains r if and only if it contains qr. Similarly, we treat the other three quadrants. Using the process above, we end up with a set of blue points B0 such that the maximum

rectangle separating R and B0 is the same as the maximum separating rectangle of R and

B. Therefore, we solve the problem of finding the maximum rectangle separating R and B0 as described in Chapter 4 in O(|B0| log |B0| + n) time. Since for each blue rectangle in B, we have introduced exactly one blue point in B0, we have |B0| = m, so this gives us the following result.

Theorem 5.2. The maximum area rectangle separating a set of n red points R and a set of m blue axis-aligned rectangles B can be computed in O(m log m + n) time.

An illustration of this process is shown in Figure 5.3.

5.3 Outliers version

Given a set of n red points R, a set of m blue points B, and an integer k, the goal is to find the largest rectangle containing all red points that contains at most k blue points. We call such this rectangle the maximum rectangle separating R and B with k outliers (see Figure

5.4).

Our approach is as follows.

In each region Q outside Smin, we “guess” a maximum number of outliers kQ, such that kE + kNE + kN + kNW + kW + kSW + kS + kSE = k. Once 7 of these integers are fixed, the 8th is computed by subtracting the first 7 from k. Thus, we only need to consider all tuples

(kE, kNE, kN , kNW , kW , kSW , kS) such that kE +kNE +kN +kNW +kW +kSW +kS ≥ 0. There are O(k7) such tuples.

81 Figure 5.3. For every rectangle r intersecting a side region, consider a blue point on the edge of r that is closest to Smin. For every rectangle r contained in a quadrant, consider a blue 0 point as the corner of r that is closest to Smin. Denote the set of resulting points by B and solve the original problem on R and B0.

Figure 5.4. Given red and blue points, and an integer k, the goal is to find the largest rectangle that contains all red points and at most k “outlier” blue points.

82 Figure 5.5. The staircase ST3(NE) for an example where kNE = 3.

For each such tuple, we reduce the outlier problem to the staircase problem in Chapter

4 as follows. The goal is to find the largest rectangle containing at most kNE points from

BNE, kE points from BE, and so on. We first construct the rectangle Smax. From each side region, e.g. E, we consider the kE + 1-th leftmost point in BE, say r. The right edge of

Smax goes through r. We do the same for the other three side regions in order to determine

Smax. Then, for each quadrant, e.g. NE, we compute the kNE-th level staircase STkNE (NE) (Figure 5.5 The other three staircases are computed similarly. We then solve the staircase

problem on Smin,Smax, and the staircases STkQ (Q) for every quadrant Q. Finally, we report the largest rectangle obtained for all such tuples. The running time of this approach is O(n + k7(C + S)), where C is the time needed to

compute STkQ (Q), and S is the size of STkQ (Q). Now we show that C = O(m log m) and S = O(m).

Lemma 5.3.1. For any quadrant Q and integer t, STt(Q) can be computed in O(m log m)

time. Moreover, we have |STt(Q)| = m.

Proof. First we describe how to compute STt(Q) with a simple sweep line algorithm. Sort

and label the points in BQ by increasing x-coordinate, p1, . . . , pm. Sweep a vertical line l from

83 x = −∞ to x = ∞. For a given position of l, let Q(l) = p1, . . . , pi be the set of points to the left of l. We maintain a balanced binary search tree T over Q(l), indexed by y-coordinate. The intersection of t-th level staircase with l is a single point, namely the highest point on l dominating at most t points of Q(l). This is given by the (t + 1)-th smallest entry in T .

Keep a pointer qt to this entry in T . As we move l from left to right, r can only change when

l intersects a point of Q. When we reach the point p = pi+1, we insert it into T , and then we have three cases.

Case 1. y(p) > y(qt). In this case the STt(Q) does not change height, so we do nothing else.

Figure 5.6. If there is no point in T above p and below qt, then p is added to STt(NE) and qt is set to p.

Case 2. y(p) ≤ y(qt) and there is no point q ∈ T such that y(p) ≤ y(q) ≤ y(qt) (Figure

5.6). In this case, p is added to STt(Q) and qt is set to p.

Case 3. There exists q ∈ T such that y(p) ≤ y(q) ≤ y(qt) (Figure 5.7). To break ties, we take the highest q with this property. In this case, the intersection r between the horizontal through q and the vertical through p is added to STt(Q) and qt is set to r.

Let P be the set of points in STt(Q) (Figure 5.8). It is not hard to argue that STt(Q) =

ST0(P ). Moreover, |P | ≤ m, as a point is added to P only when the sweep line crosses a blue

84 Figure 5.7. If there is a point q ∈ T above p and below qt, then r is added to STt(NE) and qt is set to r.

Figure 5.8. The set P with STt(NE) = ST0(P ).

point in Q. Finally, the running time bound follows since we sorted BQ and then performed

O(m) balanced binary search tree operations. This completes the proof.

We are now in a position to state the following result.

Theorem 5.3.2. The maximum area rectangle separating a set of n red point R and a

set of m blue points B with k outliers can be computed in O(k7m log m + n) time.

85 5.4 Conclusion and future work

We considered extensions of the maximum separating rectangle problem in Chapter 4. We first considered the blue rectangles version, where we have blue axis-aligned rectangles instead of blue points, for which we presented an O(m log m) time algorithm. We also studied the outlier version, in which we are also given an integer k denoting the number of blue points allowed inside the optimal solution (or outliers). For the outlier version, we presented an algorithm that runs in O(k7m log m + n) time. We leave for future consideration finding efficient algorithms for a few other problems of computing the largest shape enclosing red points while avoiding blue obstacles, namely computing the (1) largest rectangle avoiding circles, (2) largest circle avoiding rectangles, and (3) largest circle avoiding circles. We also leave for future work improving the running time bounds for the outlier version (that is, reduce the k7 factor), as well as finding approximations for the future problems described above.

86 CHAPTER 6

CONCLUSIONS

We studied the geometric separability of bi-chromatic point sets, and we addressed three problems on this topic. First, we addressed the dynamic minimum separating circle problem, for which we give dynamic data structures to report the smallest separating circle when only blue points can be inserted or removed. The first data structure can be used for both insertions and dele- tions and has fast update time - O((m + n) log m). The second data structure can handle only insertions and supports fast queries - O(log(mn)) time, but the update is slower - O(mn log(mn)) time. The third data structure supports only deletions, with O(log2(mn)) query time and O(mn log(mn)) time. We also implemented our first data structure and tested it on a randomly generated dataset. Second, we studied the problem of computing the maximum separating rectangle of red and blue point sets, and we give different solutions for finding all optimal solutions in O(m2 + n) time, as well as for finding one optimal solution in O(m log m + n) time. Further, we proved that, if points are pre-sorted by one of the coordinates, one can find an optimal solution in O(m + n) time. We also proved a lower bound of Ω(m) on the number of optimal solutions, as well as Ω(m log m+n) on the running time to find one optimal solution provided that points are not sorted. Finally, we considered extensions of the maximum separating rectangle problem. Specif- ically, we considered the blue rectangles version, where the blue points are replaced by blue rectangles, as well as the outlier version, where the target rectangle is allowed to contain a fixed number of blue points. We proved that the blue rectangles version can be solved in O(m log m + n) time and the outlier version can be solved in O(k7m log m + n) time. For each of these problems, we also listed a few open problems and future directions. These include finding fast approximation algorithms, improving the existing time bounds, or

87 studying extensions of the minimum separating circle problem, as well as other extensions of the maximum separating rectangle problem.

88 REFERENCES

M. Abellanas, F. Hurtado, C. Icking, R. Klein, E. Langetepe, L. Ma, and B. Palop. Smallest color-spanning objects. In Proc. Annual European Symposium on Algorithms, 2001: 278– 289.

P. Aggarwal and B. Aronov and V. Koltun, Efficient algorithms for bichromatic sep- arability, ACM Transactions on Algorithms, 2006, issn: 1549-6325, pp.209–227, http://doi.acm.org/10.1145/1150334.1150338, ACM, New York, NY, USA.

A. Aggarwal and S. Suri, Fast algorithms for computing the largest empty rectangle, Sym- posium on Computational Geometry, 1987: 278–290.

A. Aggarwal, M. Klawe, S. Moran, P. Shor and R. Wilber, Geometric Applications of a Matrix Search Algorithm, Algorithmica, vol. 2 (2), 1987: pp. 195–208.

H. Alt, D. Hsu, and J. Snoeyink: Computing the largest inscribed isothetic rectangle. CCCG 1995: 67–72

E. M. Arkin, F. Hurtado, J. S. B. Mitchell, C. Seara, and S. Skiena, Some Lower Bounds on Geometric Separability Problems, International Journal of Compuational Geometry and Applications, vol. 16, pp. 1–26, 2006, http://dx.doi.org/10.1142/S0218195906001902.

B. Armaselu and O. Daescu, Dynamic Minimum Bichromatic Separating Circle, Conference on Combinatorial Optimization and Applications, 2015: 688–697, DOI: 10.1007/978-3- 319-26626-8 50.

B. Armaselu and O. Daescu, Maximum Area Rectangle Separating Red and Blue Points, Canadian Conference on Computational Geometry, August 2016: 244–251.

B. Armaselu, O. Daescu, C. Fan, and B. Raichel, Largest Red Blue Rectangles Revisited, Fall Workshop on Computational Geometry, October 2016.

B. Armaselu and O. Daescu, Dynamic Minimum Bichromatic Separating Circle, Journal of Theoretical Computer Science, DOI: 10.1016/j.tcs.2016.11.036, Available online November 2016.

B. Armaselu and O. Daescu, Maximum Area Rectangle Separating Red and Blue Points, Journal of Computational Geometry (submitted manuscript), 2017.

B. Aronov and D. Garijo, Y. N´u˜nez,D. Rappaport. C. Seara and J. Urrutia, Measuring the error of linear separators on linearly inseparable data, Discrete Applied Mathematics, vol. 160(10–11), 2012, pp. 1441–1452.

89 B. Aronov, P. Bose, E. D. Demaine, J. Gudmundsson, J. Iacono, S. Langerman, and M. H. M. Smid, Data Structures for Halfplane Proximity Queries and Incremental Voronoi Diagrams, Latin American Theoretical Informatics Symposium, 2006: 80–92.

J. Augustine, B. Putnam, and S. Roy, Largest empty circle centered on a query line, Journal of Discrete Algorithms, vol. 8(2), 2010, pp. 143–153.

Bandyapadhyay S., Banik A. (2017) Polynomial Time Algorithms for Bichromatic Problems. In: Gaur D., Narayanaswamy N. (eds) Algorithms and Discrete Applied Mathematics. CALDAM 2017. Lecture Notes in Computer Science, vol 10156.

J. Barbay, T. M. Chan, G. Navarro, and P. P´erez-Lantero, Maximum-weight planar boxes in O(n2) time (and better), Information Processing Letters, vol. 114(8), 2014, pp. 437–445.

S. Bereg, O. Daescu, M. Zivanik, and T. Rozario, Smallest Maximum-Weight Circle for Weighted Points in the Plane, International Conference on Computational Science and Its Applications, 2015: 244–253.

M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry (2nd revised edition), Springer-Verlag, ISBN 3-540-65620-0 Chapter 3: Polygon Triangu- lation: pp. 45–61.

S. Bespamyatnikh: Dynamic Algorithms for Approximate Neighbor Searching. CCCG 1996: 252-257.

S. Bitner, Y. K. Cheung, and O. Daescu, Minimum separating circle for bichromatic points in the plane, International Symposium on Voronoi Diagrams in Science and Engineering, 2010, 50–55.

J. Boissonnat, J. Czyzowicz, O. Devillers, J. Urrutia, M. Yvinec, and F. Preparata, Com- puting Largest Circles Separating Two Sets of Segments, International Journal of Com- putational Geometry and Applications, isbn: 0-89871-349-8, pp.273–281, 1995, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.

J. Boissonnat, J. Czyzowicz, O. Devillers, and M. Yvinec, Circular separability of polygons, 6th annual ACM-SIAM Symposium on Discrete Algorithms, vol. 10, 41–53, 2000.

P. Bose, S. Langerman, and S. Roy. Smallest enclosing circle centered on a query line segment. In Proc. Canadian Conference on Computational Geometry, 2008.

A. Bowyer: Computing Dirichlet Tessellations. The Computer Journal 24(2): pp. 162–166 (1981).

T. M. Chan, Low-Dimensional Linear Programming with Violations, SIAM Journal of Computing, vol. 34, pp. 879–893, 2005, issn: 0097-5397, http://dx.doi.org/10.1137/S0097539703439404, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.

90 T. M. Chan, Optimal Output-Sensitive Convex Hull Algorithms in Two and Three Dimen- sions, Journal of Discrete and Computational Geometry, vol. 16(4), pp. 361–368, 1996.

J. Chaudhuri, S. C. Nandy and S. Das, Largest empty rectangle among a point set, Journal of Algorithms, 1(46): pp. 54–78, 2003.

B. Chazelle, R.L. Drysdale III, D.T. Lee, Computing the largest empty rectangle, SIAM Journal of Computing, 15: pp. 300–315, 1986,.

B. Chazelle, Triangulating a simple polygon in linear time, Journal of Discrete and Com- putational Geometry, 6(3): pp. 485–524, 1991.

B. Chazelle, An optimal convex hull algorithm in any fixed dimension, Journal of Discrete and Computational Geometry, vol. 10(1): pp. 377–409, 1993, doi:10.1007/BF02573985.

D.Z. Chen, O. Daescu, J. Hershberger, P.M. Kogge, N. Mi, and J. Snoeyink, Polygonal Path Simplification with angle constraints, International Journal of Computational Geometry and Applications, 32(3), pp. 173–187, 2005.

Y. K. Cheung and O. Daescu, Minimum separating circle for bichromatic points by linear programming, 20th Annual Fall Workshop on Computational Geometry, 2010.

Y. K. Cheung, O. Daescu, and M. Zivanic, Kinetic Red-Blue Minimum Separating Circle, 5th Annual International Conference on Combinatorial Optimization and Applications 2011, pp. 448–463.

A. Datta, G. D. S. Ramkumar, On some largest empty orthoconvex polygons in a point set, In Proc. Foundations of Software Technology and Theoretical Computer Science, 1990, 270–285

A. Datta and S. Soundaralakshmi, An efficient algorithm for computing the maximum empty rectangle in three dimensions, Journal of Information Sciences, 2000, vol. 128 (1-2), pp. 43–65.

O. Daescu and A. Kurdia, Polygonal Chain Simplification with Small Angle Constraints, Canadian Conference on Computational Geometry, 2008, 191–194.

S. Das, S. Roy, A. Karmakar, and S. Nandy. Constrained minimum enclosing circle with center on a query line segment. Computational Geometry: Theory and Applications, 42(6-7): pp. 632–638, 2009.

E. Demaine, J. Erickson, F. Hurtado, J. Iacono, S. Langerman, H. Meijer, M. Overmars, and S. Whitesides, Separating Point Sets in Polygonal Environments, International Journal of Computational Geometry and Applications, vol. 15, 2005, pp. 403-420.

91 D. Dobkin and D. Kirkpatrick, A Linear Algorithm for Determining the Separation of Convex Polyhedra, Journal of Algorithms, vol. 6, 1985, pp. 381-392.

A. Dumitrescu and M. Jiang, On the largest empty axis-parallel box amidst n points, Algorithmica, 2013, vol. 66(2), pp. 225–248.

A. Dumitrescu and M. Jiang (2012) Maximal Empty Boxes Amidst Random Points. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Tech- niques: 529–540. Lecture Notes in Computer Science, vol 7408. Springer, Berlin, Heidel- berg.

A. Dumitrescu and M. Jiang, On the Number of Maximum Empty Boxes Amidst n Points. Symposium on Computational Geometry, 2016, 36:1–36:13.

H. Edelsbrunner, (1987), “13.6 Power Diagrams”, Algorithms in Combinatorial Geometry, EATCS Monographs on Theoretical Computer Science, vol. 10, Springer-Verlag, pp. 327- 328.

S. Fekete, On the complexity of min-link red-blue separation, Manuscript, 1992.

S. Fisk, Separating Point Sets by Circles, and the Recognition of Digital Disks, IEEE Transactions on Pattern Analysis and Machine Intelligence, July 1986, pp. 554–556.

S. Fortune. A sweepline algorithm for Voronoi diagrams. Proceedings of the second annual symposium on Computational geometry. Yorktown Heights, New York, United States, 313-322, 1986.

M. Gester, N. Hahnle, J. Schneider (2015) Largest Empty Square Queries in Rectilinear Polygons. In: Computational Science and Its Applications – ICCSA 2015: 267–282. Lecture Notes in Computer Science, vol 9155. Springer, Cham.

H. ElGindy, H. Everett, and G.T. Toussaint, Slicing an ear using prune-and-search, Pattern Recognition Letters, 1993, vol. 14 (9), pp. 719–722, doi:10.1016/0167-8655(93)90141-y.

A. Goel, P. Indyk, and K. R. Varadarajan: Reductions among high dimensional proximity problems. In Proc. Symposium On Discrete Algorithms 2001: 769–778.

R. L. Graham, An Efficient Algorithm for Determining the Convex Hull of a Finite Planar Set, Information Processing Letters, 1972, vol. 1 (4), pp. 132–133.

L. Guibas and J. Stolfi, Primitives for the Manipulation of General Subdivisions and the Computation of Voronoi Diagrams, ACM Transactions on Graphics, Vol. 4(2), April 1985, pp. 74–123.

92 G. Gutierrez and J.R. Parama, Finding the largest empty rectangle containing only a query point in large multidimensional databases, In Proc. International Conference on Scientific and Statistical Database Management, 2012: 316–333.

S. Har-Peled and S. Mazumdar, Fast algorithms for computing the smallest k-enclosing circle, Algorithmica, vol. 41(3): pp. 147–157, 2005.

W.L. Hsu, D.T. Lee and A. Namaad, On the maximum empty rectangle problem, Discrete Applied Mathematics, 1984, vol. 8, pp. 267–277.

D. Hoey, M. I. Shamos, Closest-point problems, Symposium on Foundations of Computer Science, 1975, 151–162.

S. J. Hong, F. P. Preparata, Convex Hulls of Finite Sets of Poin ts in Two and Three Dimensions, Communications of ACM, 1977, vol. 20(2), pp. 87–93.

F. Hurtado, M. Noy, P. A. Ramos, and C. Seara, Separating objects in the plane by wedges and strips, Discrete Applied Mathematics, vol. 109, 2001, pp. 109–138.

F. Hurtado, M. Mora, P. A. Ramos, and C. Seara, Separability by two lines and by nearly straight polygonal chains, Discrete Applied Mathematics, 2004, vol. 144, pp. 110–122, http://dx.doi.org/10.1016/j.dam.2003.11.014.

F. Hurtado, C. Seara, and S. Sethia, Red-Blue Separability Problems in 3D, Interna- tional Journal of Computational Geometry and Applications, vol. 15, 2005, pp. 167–192 DOI:10.1142/S021819591360008X.

R. A. Jarvis, On the Identification of the Convex Hull of a Finite Set of Points in the Plane, Information Processing Letters, 1973, vol. 2(1), pp. 18–21.

H. Kaplan and M. Sharir, Finding the Maximal Empty Disk Containing a Query Point, Symposium on Computational Geometry, 2012, pp. 287–292.

H. Kaplan and M. Sharir, Finding the Maximal Empty Rectangle Containing a Query Point, CoRR abs/1106.3628 (2011).

H. Kaplan, S. Mozes, Y. Nussbaum and M. Sharir, Submatrix maximum queries in Monge matrices and partial Monge matrices, and their applications, ACM transactions on Algo- rithms, 2017, vol. 13(2), doi: 10.1145/3039873.

S. Khuller, Y. Matias, A Simple Randomized Sieve Algorithm for the Closest-Pair Problem, Journal on Information and Computation, 1995, vol. 118(1), pp. 34–37.

M. Klawe, Superlinear bounds matrix searching, Journal of Algorithms vol. 13(1): pp. 55–78 (1992).

93 M. Klawe and D.J. Kleitman, An almost linear time algorithm for generalized matrix search- ing, SIAM Journal of Discrete Math, 1990, vol. 3, pp. 81–97.

S. Kosaraju, J. O’Rourke and N. Megiddo, Computing circular separability, Journal of Discrete and Computational Geometry, 1986, vol. 1, pp. 105–113.

S. P. Lloyd, (1982), “Least squares quantization in PCM”, IEEE Transactions on Information Theory, vol. 28 (2): pp. 129–137.

J. Matousek. On enclosing k points by a circle. Information Processing Letters, vol. 53(4): pp. 217–221, 1995.

M. McKenna, J. O’Rourke and S. Suri, Finding the largest rectangle in an orthogonal polygon, in: Proc. 23rd Allerton Conference on Communication, Control and Computing, 1985, 486–495.

P. McMullen, The maximum numbers of faces of a convex polytope, Mathematika, vol. 17 (1970), pp. 179–184.

N. Megiddo, Linear-Time Algorithms for Linear Programming in R3 and Related Problems, SIAM Journal of Computing, SIAM, vol. 12, 1983, pp. 759–776, http://link.aip.org/link/?SMJ/12/759/1, doi: 10.1137/0212052.

N. Megiddo: Linear Programming in Linear Time When the Dimension Is Fixed. Journal of ACM vol. 31(1): pp. 114–127 (1984).

N. Megiddo, On the complexity of Polyhedral Separability, Journal of Discrete and Com- putational Geometry, 1988, vol. 3, pp. 325–337.

A. Mukhopadhyay and S.V. Rao, Computing a Largest Empty Arbitrary Oriented Rectan- gle. Theory and Implementation, International Journal of Computational Geometry and Applications, 2003, vol. 13 (3), pp. 257–271.

S.C. Nandy, B. B. Bhattacharya and Sibabrata Ray, Efficient algorithms for Identifying All Maximal Isothetic Empty Rectangles in VLSI Layout Design, Conference on Foundations of Software Technology and Theoretical Computer Science, 1990, pp. 255–269.

S.C. Nandy, A. Sinha and B. B. Bhattacharya, Location of the Largest Empty Rectan- gle among Arbitrary Obstacles, Conference on Foundations of Software Technology and Theoretical Computer Science, 1994, pp. 159–170.

S.C. Nandy and B.B. Bhattacharya, Maximal empty cuboids among points and blocks, Journal of Computers and Mathematics with Applications, 1998 vol. 36 (3), pp. 11–20.

94 S.C. Nandy, B.B. Bhattacharya, and K. Mukhopadhyay, Recognition of largest empty ortho- convex polygon in a point set, Information Processing Letters vol. 110(17): pp. 746–752, 2008.

F.P. Preparata and M.I. Shamos, “Computational Geometry”, Springer-Verlag, 1985.

M. I. Shamos, Problems in Computational Geometry, unpublished manuscript, 1975.

M. I. Shamos, Computational Geometry, PhD Dissertation, Yale University, 1978.

S. Skyum, A simple algorithm for computing the smallest enclosing circle, Information Processing Letters vol. 37 (1991), pp. 121–125.

G. Toussaint, Computing largest empty circles with location constraints, Internationational Journal of Parallel Programming, vol. 12(5): pp. 347–358, 1983.

D. F. Watson, (1981). “Computing the n-dimensional Delaunay tessellation with ap- plication to Voronoi polytopes”. The Computer Journal vol. 24 (2): pp. 167–172. doi:10.1093/comjnl/24.2.167.

95 BIOGRAPHICAL SKETCH

Bogdan Armaselu graduated with a Bachelor’s degree in Computer Science from Politehnica University of Bucharest in July 2012. In August 2012, he joined the PhD program in Com- puter Science at The University of Texas at Dallas. After completing his coursework in the Intelligent Systems track, he defended his master’s thesis on Fair Partitioning of Convex Polygons in Fall 2014 and graduated with a master’s degree in Computer Science in May 2015. Since 2015, Armaselu has been doing research in Computational Geometry, where he studied the geometric separability of bichromatic point sets and in Medical Imaging and Combinatorial Optimization. In 2015, he received a CPRIT award to develop image pro- cessing algorithms for whole-slide image-based bone tumor identification and classification, in order to aid pathologists in identifying osteosarcoma, a form of bone cancer.

96 CURRICULUM VITAE

Bogdan Armaselu

Contact Information: Department of Computer Science Email: [email protected] The University of Texas at Dallas 800 W. Campbell Rd. Richardson, TX 75080-3021, U.S.A. Educational History: B.S., Computer Science, Politehnica University of Bucharest, 2012 M.S., Computer Science, The University of Texas at Dallas, 2015 Fair Partitioning of Convex Polygons Master’s Thesis Computer Science Department, The University of Texas at Dallas Advisor: Dr. Ovidiu Daescu Intelligent Tutoring Systems Bachelor’s Thesis Department of Computer Science, Politehnica University of Bucharest, Romania Advisor: Dr. Stefan Trausan

Publications: 1. B. Armaselu and O. Daescu. Maximum Area Rectangle Separating Red and Blue Points. Journal of Computational Geometry (submitted manuscript) 2. H.B. Arunachalam, R. Mishra, B. Armaselu, M. Martinez, O. Daescu, K. Cederberg, D. Rakheja, M. Ni’suilleabhain, A. Sengupta, and P. Leavey. Computer Aided Image Segmen- tation and Classification for Viable and Non-Viable Tumor Identification in Osteosarcoma. PSB 22: 195-206 (2017). 3. B. Armaselu and O. Daescu. Dynamic Minimum Bichromatic Separating Circle. Theo- retical Computer Science, In Press, Available online 30 November 2016. 4. B. Armaselu, O. Daescu. C. Fan, and B. Raichel. Largest Red Blue Separating Rectangles Revisited. In FWCG 2016. 5. B. Armaselu and O. Daescu. Maximum Area Rectangle Separating Red and Blue Points. CCCG 2016: 244-251. 6. B. Armaselu, H.B. Arunachalam, O. Daescu, J.P. Bach, K. Cederberg, S. Glick, D. Rakheja, A. Sengupta, S. Skapek, and P. Leavey. Large Scale SVS Images Stitching for Osteosarcoma Identification. In BIOCOMP 2016. 7. B. Armaselu and O. Daescu. Dynamic Minimum Bichromatic Separating Circle. COCOA 2015: 688-697, DOI: 10.1007/978-3-319-26626-8 50. 8. B. Armaselu, H.B. Arunachalam, O. Daescu, J.P. Bach, K. Cederberg, D. Rakheja, A. Sengupta, S. Skapek, and P. Leavey. Whole slide images stitching for osteosarcoma detection. ICCABS 2015: 1-5. 9. B. Armaselu and O. Daescu. Algorithms for fair partitioning of convex polygons. Theo- retical Computer Science 607: 351-362 (2015).

Employment History: Software Engineer Intern, Spectral MD, May 2017 - Present Research Assistant, The University of Texas at Dallas, May 2015 – May 2017 Teaching Assistant, The University of Texas at Dallas, August 2012 – May 2015 Research Intern - Modeling and Simulation, Institute for Research in Informatics, Romania, June - August 2011

Professional Memberships: Institute of Electrical and Electronics Engineers (IEEE), 2012–present Association for Computing Machinery (ACM), 2013–present