From Practice to Theory, Separation Yields Approximation Schemes for Clustering and Network Design
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITE PARIS.DIDEROT (Paris 7) SORBONNE PARIS CITE Ecole´ doctorale 386: Sciences mathematiques´ de Paris Centre THÈSE PRÉPARÉE AU LABORATOIRE D'INFORMATIQUE DE L'ÉCOLE NORMALE SUPÉRIEURE THESE` DE DOCTORAT Mention Informatique present´ ee´ et soutenue publiquement le 15 septembre 2016 en vue de l’obtention du grade de Docteur en sciences par VINCENT COHEN-ADDAD From Practice to Theory Approximation schemes for clustering and network design These` dirigee´ par Mme Claire MATHIEU Membres du jury: M. Philippe JACQUET (Directeur de recherche, Nokia Bell Labs, France) examinateur M. Zhentao LI (Maˆıtre de conference,´ ENS, France) examinateur Mme. Claire MATHIEU (Directrice de recherche, CNRS, ENS, France) directrice de these` M. Yuval RABANI (Professor, The Hebrew University of Jerusalem, Israel)¨ rapporteur M. Andras´ SEBO˝ (Directeur de recherche, CNRS, G-SCOP, France) rapporteur M. Ola SVENSSON (Assistant professor, EPFL, Suisse) examinateur M. Stephan´ THOMASSE´ (Professeur, ENS Lyon, France) examinateur To my Mother, my Father, Grand-Ma and Papi – my sources of inspiration. Contents Acknowledgments1 General Introduction3 Publications of the Author 15 I Separation and Clustering 17 1 Introduction to Clustering 19 2 Local Search for Clustering: General Instances 33 3 Local Search for Clustering: Well-Separated Instances 39 3.1 Properties of Well-Separated Instances.................... 40 3.1.1 r-Division in Graph with Small Separators............. 40 3.1.2 r-Division in Euclidean Space.................... 42 3.1.3 Contraction of Voronoi cells and Properties of the r-Divisions... 43 3.1.4 Tightness of the Analysis....................... 44 3.2 Isolation.................................... 44 3.3 A Structure Theorem............................. 46 3.4 Clustering in Graphs with Small Separators................. 50 3.4.1 Local Search for Facility Location.................. 50 3.4.2 Local Search for the k-Clustering Problem............. 53 3.5 Clustering in the Euclidean Setting...................... 60 4 Beyond Separators: Local Search for Clustering in Stable Instances 63 4.1 α-Perturbation-Resilient Instances...................... 64 4.2 Distribution Stability............................. 65 vi Contents 4.3 Euclidean Distribution Stability....................... 73 4.4 Spectral Separability............................. 75 5 Beyond Separators: Clustering in Data Streams 83 5.1 The Diameter Problem............................ 84 5.2 The k-Center Problem............................ 87 5.3 Lower bounds................................. 90 6 Implementations and Experiments 95 6.1 The Questions................................. 95 6.2 Experimental Results............................. 97 6.2.1 Real Data............................... 98 6.2.2 Data generated from a mixture of k Gaussians............ 100 6.2.3 Experimental Analysis of Local Search on Mixtures of Gaussians. 102 6.3 Interpretation................................. 105 II Separation and Network Design 111 7 Introduction to Network Design 113 8 Local Search for Random Traveling Salesman and Steiner Tree 123 9 Computing More Structured Separators 129 9.1 A New Framework.............................. 132 9.1.1 A Separator Theorem......................... 133 9.1.2 Planarization Theorem........................ 141 9.1.3 Branchwidth reduction........................ 145 9.2 Algorithmic Implications........................... 148 Concluding Remarks and Open Questions 153 A Notations and Definitions 161 A.1 Problem Definitions.............................. 161 A.1.1 Clustering Problems......................... 162 A.1.2 Network Design Problems...................... 162 A.2 Separators................................... 163 A.2.1 The Separation Property in Graphs.................. 164 A.2.2 The Separation Property in Euclidean Space............. 165 A.3 Graph Theory................................. 167 A.3.1 Graph Classes............................. 168 A.4 Computational Complexity.......................... 173 A.5 Hardness Result for an FPTAS for the Planar k-Clustering Problem..... 174 Acknowledgments Several people have made the last three years an intense, rewarding and exciting experience. I am extremely grateful to my advisors Claire Mathieu and Zhentao Li. Claire and Zhentao have been – and continue to be – very inspiring for my future academic life and research. Thank you Claire and Zhentao for your patience, your support, giving me your insight into theoretical computer science. Thank you Claire for having given me the opportunity to do my Ph.D under your guidance and so many opportunities to learn and to discover research during those three years. I thank Andras´ Sebo˝ and Yuval Rabani for having accepted to review this manuscript and to be on my thesis jury. Thank you very much for the interest you had in my work, your time and patience. I am also extremely grateful to Philippe Jacquet, Ola Svensson and Stephan´ Thomasse´ for having accepted to be on my thesis jury; I am very honored. It was a great pleasure to spend three years in the Talgo group at ENS. I was very happy to work with and learn from Eric´ Colin de Verdiere.` Thank you Eric´ for sharing your knowledge on embedded graphs, and providing me with your advises for my future academic life. I thank all the people in the Talgo group for having made every day spent at ENS very exciting. Thank you Frederik, Hang, and Victor! I thank Amos Fiat for having introduced me to algorithmic game theory, to the ways of sailing, and with the excellent restaurants in Tel-Aviv. I will keep a great memory of my two visits at Tel-Aviv and hope to make many more. Those visits would not have been the same without Alon Eden and Orit Raz – Thanks to both of you for sharing great ideas, talking about our experiences in research and showing me the bars of Tel-Aviv, I hope we can meet again soon! Many thanks to Michal Feldman for her patience and collaboration. I also had the great chance to benefit from the wisdom and advises of Philip N. Klein. Thank you Phil for your patience and sharing your knowledge on algorithms and planar graphs. I hope I’ll have the chance to learn from you again in the future. I thank Christian Sohler for hosting me in TU Dortmund and guiding me through clus- tering problems in dynamic data streams. I had a very rewarding experience in the group. I was also very lucky to work with Chris Schwiegelshohn in Dortmund. Chris is a great team mate; thanks Chris for sharing your view and your expertise! 2 Acknowledgments I thank Evripidis Bampis, Christoph Durr,¨ and Ioannis Millis for collaborating with me at the beginning of those three years and sharing their knowledge on scheduling problems. I thank Nabil Mustafa for sharing his knowledge on computational geometry and his advices. I thank Daniel Kral´ for his fruitful collaboration. My academic life would not have been the same without the support and trust of all my professors at ENS Lyon. Thank you Eric´ Thierry and Daniel Hirschkoff for your training and for having given me the opportunity to join ENS Lyon. I thank Michel Habib and Fabien de Montgolfier for advising me during my master thesis. I also thank Arnaud de Mesmay for being my big brother in research. Thanks Arnaud for giving me advises, sharing your experience and knowledge, and cheering me up. I thank Varun Kanade for sharing his views on research and theoretical computer sci- ence, for introducing me to learning theory and machine learning. Thank you Varun for your great support. This thesis has also been heavily supported by several (more or less extra-academic) friends and family members. Thank you Lucca and Marie-Charlotte for all this time spent together! Thank you Simon for all the evenings we spent talking about life, research, and art. I also thank Bruno, Thomas, and Mathieu for all the discussions about research and life we had during those three years. Thank you Jean-Florent for sharing all your journeys (in research but not only). I also thank my friends of always, Antoine, Basile, Manu, Nicolas, and Sebastien.´ Thank you for everything we lived together during the last 10 years (actually 23 for some of you). Thank you very much Sophie for your strong support, cheering me up and recalling me that research is a strict subset of life. Finally, thank you to all my family for taking care of me for so long and making me reach that point. I thank very much my father and my mother, Jacques, Jade, Sylvie, Reinhard, Charlotte, and Elisabeth for having closely supported me and having listened to me during those three years. Thank you maman, papa for having inspired me and made me as I am. General Introduction How to solve this problem? How fast can it be solved? Answering those questions rigor- ously for a computational problem often entails the design and analysis of an algorithm – a step-by-step method that, applied to an instance of the problem, produces a solution to that instance. In 2013, Frey and Osborne [87] estimated that about 47 percent of the total U.S. employment could be concerned by job automation in roughly two decades, showing that algorithms are taking a preponderant role in our lives. Thus, proceeding na¨ıvely by, for example, designing algorithms that achieve good enough performances on a benchmark or in a very specific context is often unsatisfactory when not risky. How can one rely on an algorithm whose efficiency and correctness are not established for solving critical tasks? 0 (km) 250 51° 30' 41° Figure 1: A (traveling salesman) tour through several french cities using the highway net- work. 4 General Introduction A systematic approach consists in proceeding at a more abstract level by: (1) distilling a problem faced in practice by proposing a model (2) designing and analyzing an algorithm for the problem in that model and (3) applying this algorithm to practical instances. However, the distilling step sometimes introduces a gap between theory and practice. If the model is too general it induces a two-fold gap: on the one hand no guarantee on the algorithms that are efficient in practice can be proven, on the other hand the best theoretical algorithms turn out to be noncompetitive in practice.