Constraint Clustering and Parity Games

Constrained Clustering Problems and Parity Games Clemens Rösner geboren in Ulm Dissertation zur Erlangung des Doktorgrades (Dr. rer. nat.) der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn Bonn 2019 1. Gutachter: Prof. Dr. Heiko Röglin 2. Gutachterin: Prof. Dr. Anne Driemel Tag der mündlichen Prüfung: 05. September 2019 Erscheinungsjahr: 2019 Angefertigt mit Genehmigung der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn Abstract Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. We study several clustering objectives. We begin with studying the Euclidean k-center problem. The k-center problem is a classical combinatorial optimization problem which asks to select k centers and assign each input point in a set P to one of the centers, such that the maximum distance of any input point to its assigned center is minimized. The Euclidean k-center problem assumes that the input set P is a subset of a Euclidean space and that each location in the Euclidean space can be chosen as a center. We focus on the special case with k = 1, the smallest enclosing ball problem: given a set of points in m-dimensional Euclidean space, find the smallest sphere enclosing all the points. We combine known results about convex optimization with structural properties of the smallest enclosing ball to create a new algorithm. We show that on instances with rational coefficients our new algorithm computes the exact center of the optimal solutions and has a worst-case run time that is polynomial in the size of the input. We use the new algorithm to show that we can solve the Euclidean k-center problem in polynomial time for constant k and dimension m. The general unconstrained clustering problems are mostly very well studied. The k-center problem for example allows for elegant 2-approximation algorithms [47, 53]. However, the situation becomes significantly more difficult when constraints are added to the problem. We first look at the fair clustering. The fairness constraint is motivated by the fact that the general process of computing a clustering may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters – especially if the data is already biased. At NIPS 2017, Chierichetti et al. [29] proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation algorithm for the fair k-center problem and an O(t)-approximation algorithm for the fair k-median problem, where t is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation algorithm for fair k-center [90]. We extend and improve the known results. Firstly, we give a 5-approximation algorithm for the fair k-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximation algorithms for the fair version of all of the classical clustering objectives (k-center, k- supplier, k-median, k-means and facility location). The latter approximation algorithms are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where for example the centers are already fixed. ii The second clustering constraint we study is privacy: Here, we are asked to only open a center when at least ` points will be assigned to it. We raise the question whether a general method can be derived to turn an approximation algorithm for a clustering problem with some constraints into an approximation algorithm that additionally respects privacy. We show how to combine privacy with several other constraints and obtain approximation algorithms for the k-center problem with several combinations of constraints. In this dissertation we also study parity games, a two player game played on a directed graph. We study the case in which one of the two players controls only a small number k of nodes and the other player controls the n − k other nodes of the game. Our main result is√ a fixed-parameter-tractable algorithm that solves bipartite√ parity games in time kO( k) · O(n3), and general parity games in time (p + k)O( k) · O(pnm), where p is the number of distinct priorities and m is the number of edges. For all games with k = o(n) this improves the previously fastest algorithm by Jurdziński, Paterson, and Zwick [64]. We also obtain novel kernelization results and an improved deterministic algorithm for parity games on graphs with small average node-degree. Zusammenfassung Clustering ist ein grundlegendes Werkzeug im Data Mining. Es unterteilt Punkte in Gruppen (Cluster) und kann verwendet werden, um Entscheidungen für jeden Punkt basierend auf seiner Gruppe zu treffen. Wir untersuchen verschiedene Clustering Modelle. Als erstes untersuchen wir das euklidische k-Center Problem. Das k-Center Problem ist ein klassisches kombinatorisches Optimierungsproblem, bei dem wir k Zentren auswählen und jeden Punkt aus der Eingabemenge P einem der Zentren zuweisen sollen, sodass die maximale Entfernung eines Punktes zu seinem zugewiesenen Zentrum minimiert wird. Das euklidische k-Center Problem geht davon aus, dass die Eingabemenge P eine Teilmenge eines euklidischen Raums ist und dass jeder mögliche Ort im euklidischen Raum als Zentrum gewählt werden darf. Wir konzentrieren uns auf den Sonderfall mit k = 1, das Smallest Enclosing Ball Problem: Gegeben sei eine Menge von Punkten im m-dimensionalen euklidischen Raum, finde die kleinste Kugel, die alle diese Punkte enthält. Wir kombinieren bekannte Ergebnisse der konvexen Optimierung mit strukturellen Eigenschaften des Smallest Enclosing Balls, um einen neuen Algorithmus zu entwerfen. Wir zeigen, dass unser neuer Algorithmus bei Instanzen mit rationalen Koeffizienten das exakte Zentrum der optimalen Lösungen berechnet und eine Worst-Case-Laufzeit aufweist, die polynomiell in der Eingabegröße ist. Wir verwenden den neuen Algorithmus, um zu zeigen, dass wir das euklidische k-Center Problem in Polynomialzeit für konstante k und Dimension m lösen können. Ohne Nebenbedingungen sind die allgemeinen Clustering-Probleme zumeist sehr gut untersucht. Für das k-Center Problem gibt es beispielsweise elegante 2-Approxima- tionsalgorithmen [47, 53]. Die Situation wird jedoch erheblich schwieriger, wenn wir fordern, dass zusätzliche Nebenbedingungen erfüllt werden. Als erstes betrachten wir faires Clustering. Die Fairness-Nebenbedingung wird durch die Tatsache motiviert, dass die allgemeine Berechnung guter Clusterings geschützte (Minderheits-) Klassen schädigen kann, wenn das Ergebnis sie nicht in den bevorzugten Clustern angemessen repräsentiert, insbesondere wenn die Daten bereits voreingenommen sind. Auf der NIPS 2017 haben Chierichetti et al. [29] ein Modell für faires Clustering vorgeschlagen, das einfordert, dass die Repräsentation jedes Clusters (ungefähr) den globalen Anteil jeder geschützten Klasse widerspiegelt. Sie beschränkten sich auf zwei geschützte Klassen und entwickelten sowohl einen 4-Approximationsalgorithmus für das faire k-Center Problem als auch einen O(t)-Approximationsalgorithmus für das faire k-Median Problem, wobei t ein Parameter des Fairness-Modells ist. Bei mehreren geschützten Klassen ist das beste bekannte Ergebnis ein 14-Approximationsalgorithmus für das faire k-Center Problem [90]. Wir erweitern und verbessern die bekannten Ergebnisse. Zum einen geben wir einen 5-Approximationsalgorithmus für das faire k-Center Problem mit mehreren geschützten Klassen an. Zum anderen schlagen wir eine abgeschwächte Fairness Bedingung vor, iv unter der wir für die faire Version von allen klassischen Clustering-Problemen (k- Center, k-Supplier, k-Median, k-Means und Facility Location) einen bikriteriellen Approximationsalgorithmus mit konstantem Approximationsfaktor angeben können. Die letzteren Approximationsalgorithmen werden durch ein Rahmenwerk erreicht, das eine beliebige gegebene unfaire (ganzzahlige) Lösung und eine faire (fraktionelle) LP- Lösung verwendet und diese mittels eines schwach überwachten Rundungsschema zu einem im Wesentlichen fairen Clustering kombiniert. Auf diese Weise kann Fairness nachträglich zu einem Clustering, bei dem zum Beispiel die Zentren bereits festgelegt sind, hinzugefügt werden. Die zweite untersuchte Nebenbedingung für Clustering-Probleme ist Privacy: Hier werden wir aufgefordert, ein Zentrum nur dann zu öffnen, wenn ihm mindestens ` Punkte zugewiesen werden. Wir stellen die Frage, ob es eine allgemeine Methode gibt, um einen Approximationsalgorithmus für ein Clustering Problem mit einigen Nebenbedingungen in einen Approximationsalgorithmus umzuwandeln, der zusätzlich Privacy erfüllt. Wir zeigen, wie man Privacy mit mehreren anderen Nebenbedingungen kombinieren kann und erhalten Approximationsalgorithmen für das k-Center Problem mit mehreren Kombinationen von Nebenbedingungen. In dieser Dissertation untersuchen wir auch Paritätsspiele, ein 2-Spieler-Spiel auf einem gerichteten Graph. Wir untersuchen die eingeschränkte Variante, bei der einer der beiden Spieler nur eine kleine Anzahl

Constraint Clustering and Parity Games

FOCS 2005 Program SUNDAY October 23, 2005

Distance-Sensitive Hashing∗

Arxiv:2102.08942V1 [Cs.DB]

Hierarchical Clustering with Global Objectives: Approximation Algorithms and Hardness Results

SIGMOD Flyer

Lower Bounds on Lattice Sieving and Information Set Decoding

Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality

Implementation of Locality Sensitive Hashing Techniques

Model Checking Large Design Spaces: Theory, Tools, and Experiments

Detecting Near-Duplicates for Web Crawling

Scalable Nearest Neighbor Search for Optimal Transport∗

SETH-Based Lower Bounds for Subset Sum and Bicriteria Path∗