Distributed Kernelization for Independent Sets Tom George

Bachelor thesis Distributed Kernelization for Independent Sets Tom George Date: 29.11.2018 Supervisors: Prof. Dr. Peter Sanders M.Sc. Demian Hespe M.Sc. Sebastian Lamm Institute of Theoretical Informatics, Algorithmics Department of Informatics Karlsruhe Institute of Technology Abstract Graphs are mathematical structures used to model networks. Entities in the network are associated with vertices in a graph. Relationships between entities are described by edges connecting vertices. A lot of different problems from various domains can be transformed to problems on graphs. They can then be solved using graph theory. In this thesis we tackle the NP-hard maximum independent set problem. An independent set is a set of vertices such that pairwise vertices in it are not connected by an edge. The maximum independent set for a graph looks for the largest cardinality independent set. It is used to solve problems from various domains, like biology or computer vision. Kernelization is a technique to reduce the complexity of a problem by computing a kernel on it. A kernel is exact if solving the problem on the original instance yields the same result as solving it on the kernel. Exact algorithms for the maximum independent set problem use the branch-and-reduce paradigm. They shrink the problem instance by computing its kernel and branch on vertices, if the kernel is minimal. Successive branching and reducing eventually leads to a base case and the independent set size can be determined. Branching- and-reducing an instance can be represented as a rooted tree, where a leaf is the base case. There exists one path from leaf to root, that results in a independent set with the maximum cardinality. Related is the maximal independent set problem. A maximal independent set for a graph is an independent set, whose cardinality can not be increased by adding further vertices to it. Due to the complexity of the maximum independent set problem it is intractable for most instances. The computation of a maximal independent set can be used as an approximation for the maximum independent set. Different approaches have been developed to improve that approximation. A technique called local search transforms a maximal independent set into a different maximal independent set with larger cardinality. Growing problem instances add another problem: They might be too big to fit onto one machine. Parallelism and distributed computing can be used to tackle this problem. In this thesis we propose a distributed algorithm using exact kernelization before computing a high quality maximal independent set. Our kernelization removes degree one and two vertices to compute a kernel. We prove the exactness of the kernelization and evaluate the impact on the quality of the maximal independent set. Our results indicate that it is a promising technique increasing the cardinality of the maximal independent sets up to 12 percent. We achieve relative speedup up to 30 for the kernelization and maximal independent set algorithm on 64 processor elements. Zusammenfassung Graphen sind mathematische Strukturen zur Modellierung von Netzwerken. Objekte in Netzwerken werden mit Knoten in einem Graph assoziiert. Beziehungen zwischen Objek- ten im Netzwerk werden durch Kanten zwischen den Knoten beschrieben. Viele Probleme aus diversen Domänen lassen sich auf Probleme auf Graphen transformieren. Graphenthe- orie kann dann benutzt werden, um diese zu lösen. In dieser Arbeit behandeln wird das NP-harte Problem der größten unabhängigen Menge. Eine unabhängige Menge ist eine Menge von Knoten, sodass paarweise Elemente daraus nicht mit einer Kante verbunden sind. Die größte unabhängige Menge für einen Graphen ist die unabhängige Menge mit der größten Mächtigkeit. Das Problem löst diverse Probleme unterschiedlicher Domänen, beispielsweise in der Biologie oder Computer Vision. Kernfindung ist eine Technik zur Komplexitätsreduktion einer Probleminstanz. Dazu wird der Kern der Probleminstanz berechnet. Ein Kern ist exakt, falls die Lösung des Problem auf Kern und ursprünglicher Probleminstanz dasselbe Resultat ergeben. Exakte Algorith- men zur Berechnung der größten unabhängigen Menge benutzen das branch-and-reduce Paradigma. Sie verkleinern die Probleminstanz durch Kernfindung und verzweigen an Knoten wenn der Kern minimal ist. Wiederholtes Verkleinern und Verzweigen führt zu einem Basisfall, der es ermöglicht die Mächtigkeit der unabhängigen Menge zu ermitteln. Das Verzweigen-und-Verkleinern der Probleminstanz kann als gewurzelter Baum repräsen- tiert werden. Ein Blatt repräsentiert den Basisfall. Es existiert ein Pfad von Blatt zu Wurzel, der in einer unabhängigen Menge mit größter Mächtigkeit resultiert. Verwandt ist das Problem der maximalen unabhängigen Menge. Die Mächtigkeit einer maximalen unabhängigen Menge kann nicht durch das Hinzufügen weiterer Knoten ver- größert werden. Aufgrund der Komplexität des größte unabhängige Menge Problem ist dieses unlösbar auf den meisten Instanzen. Die Berechnung einer maximalen unabhängi- gen Menge kann als Approximation für die größte unabhängige Menge verwendet werden. Verschiedene Ansätze verbessern diese Approximation. Eine Technik namens lokale Suche transformiert eine maximal unabhängige Menge in eine andere maximal unabhängige Menge mit größer Mächtigkeit. Wachsende Probleminstanzen erzeugen ein neues Problem: Sie passen eventuell nicht in den Speicher einer Maschine. Parallelismus und verteiltes Rechnen kann zur Lösung dieses Problems verwendet werden. In dieser Thesis stellen wir einen verteilten Algorithmus vor, der exakte Kernfindung benutzt um hochqualitative, maximale unabhängige Mengen zu berechnen. Unsere Kernfind- ung entfernt Knoten mit Grad eins und zwei, um einen Kern zu finden. Wir zeigen, dass unsere Kernfindung exakt ist und evaluieren ihren Einfluss auf die Qualität unserer maximalen unabhängigen Menge. Unsere Ergebnisse zeigen, dass Kernfindung eine vielversprechende Technik ist, welche die Größe der maximalen unabhängigen Menge um bis zu 10 % erhöht. Wir erreichen relativen Speedup von bis zu 30 für die Kernfindung und den Algorithmus zur Berechnung der maximalen unabhängigen Menge auf 64 Prozessorelementen. Acknowledgments I’d like to thank my supervisors Demian Hespe and Sebastian Lamm for their guidance during my thesis. Also I’d like to thank Prof. Sanders for the opportunity to work on this thesis. Last, I’d like to thank my friends and family for their love and support. Hiermit versichere ich, dass ich diese Arbeit selbständig verfasst und keine anderen, als die angegebenen Quellen und Hilfsmittel benutzt, die wörtlich oder inhaltlich übernommenen Stellen als solche kenntlich gemacht und die Satzung des Karlsruher Instituts für Technolo- gie zur Sicherung guter wissenschaftlicher Praxis in der jeweils gültigen Fassung beachtet habe. Ort, den Datum Contents Abstract iii Zusammenfassung v 1 Introduction 1 1.1 Motivation ................................... 1 1.2 Contribution .................................. 2 1.3 Structure of Thesis .............................. 2 2 Fundamentals and Related Work 3 2.1 Basic Definitions ............................... 3 2.1.1 Related Problems ........................... 4 2.2 Distributed Memory and Graphs ....................... 4 2.2.1 Degree two paths spreading multiple PEs .............. 5 2.2.2 Graph Partitioning .......................... 6 2.3 Related Work ................................. 6 2.3.1 Exact Algorithms ........................... 7 2.3.2 Approximation for Maximum Independent Set ........... 7 2.3.3 Parallel Algorithms for Maximal Independent Set .......... 8 3 Reduction Rules 9 4 The Algorithm 15 4.1 Computing the Kernel ............................. 15 4.1.1 Local Work .............................. 15 4.1.2 Communication Between PEs .................... 19 4.1.3 Putting it All Together ........................ 24 4.2 Computing the maximal independent set ................... 25 4.3 Implementation Details ............................ 27 5 Experimental Evaluation 29 5.1 Experimental setup .............................. 29 5.2 Running Time and Scalability ........................ 30 5.3 Impact of Partitioning ............................. 34 5.4 The Effect of Kernelization .......................... 35 ix 6 Discussion 39 6.1 Conclusion .................................. 39 6.2 Future Work .................................. 39 A Results of Experiments 41 Bibliography 43 1 Introduction 1.1 Motivation The maximum independent set problem is a well studied NP-hard problem [19]. For a given graph the maximum independent set problem looks for the largest cardinality subset of vertices such that elements are pairwise non-adjacent. A related problem is the maximal independent set (MIS) problem, which looks for an independent set whose cardinality can not be enhanced by adding further vertices to it. The maximal independent set problem can be solved in polynomial time. Its applications cover biology [10], coding theory [11], computer vision [14], route plan- ing [27] and analysis of social networks [31]. To give a concrete example we look at the protein docking problem. For two proteins it finds out whether they interact to form a sta- ble complex and if so how. Graph theory is used to solve this problem [18]. The proteins can be represented as a set of potential hydrogen bond donors and acceptors. A clique- detection (equivalent to the independent set problem) algorithm is used to find maximally complementary sets of donor/ acceptor pairs. Due to the complexity of the maximum independent

Load more