Finding Optimal Tree Decompositions

MSc thesis Computer Science Finding Optimal Tree Decompositions Tuukka Korhonen June 4, 2020 Faculty of Science University of Helsinki Supervisor(s) Assoc. Prof. Matti Järvisalo,Dr. Jeremias Berg Examiner(s) Assoc. Prof. Matti Järvisalo,Dr. Jeremias Berg Contact information P. O. Box 68 (Pietari Kalmin katu 5) 00014 University of Helsinki,Finland Email address: [email protected].fi URL: http://www.cs.helsinki.fi/ HELSINGIN YLIOPISTO – HELSINGFORS UNIVERSITET – UNIVERSITY OF HELSINKI Tiedekunta — Fakultet — Faculty Koulutusohjelma — Utbildningsprogram — Study programme Faculty of Science Computer Science Tekijä— Författare— Author Tuukka Korhonen Työnnimi — Arbetets titel — Title Finding Optimal Tree Decompositions Ohjaajat — Handledare — Supervisors Assoc. Prof. Matti Järvisalo, Dr. Jeremias Berg Työnlaji — Arbetets art — Level Aika — Datum — Month and year Sivumäärä— Sidoantal — Number of pages MSc thesis June 4, 2020 94 pages, 1 appendix page Tiivistelmä— Referat — Abstract The task of organizing a given graph into a structure called a tree decomposition is relevant in multiple areas of computer science. In particular, many NP-hard problems can be solved in polynomial time if a suitable tree decomposition of a graph describing the problem instance is given as a part of the input. This motivates the task of finding as good tree decompositions as possible, or ideally, optimal tree decompositions. This thesis is about finding optimal tree decompositions of graphs with respect to several notions of optimality. Each of the considered notions measures the quality of a tree decomposition in the context of an application. In particular, we consider a total of seven problems that are formulated as finding optimal tree decompositions: treewidth, minimum fill-in, generalized and fractional hypertreewidth, total table size, phylogenetic character compatibility, and treelength. For each of these problems we consider the BT algorithm of Bouchittéand Todinca as the method of finding optimal tree decompositions. The BT algorithm is well-known on the theoretical side, but to our knowledge the first time it was implemented was only recently for the 2nd Parameterized Algorithms and Computational Experiments Challenge (PACE 2017). The author’s implementation of the BT algorithm took the second place in the minimum fill-in track of PACE 2017. In this thesis we review and extend the BT algorithm and our implementation. In particular, we improve the efficiency of the algorithm in terms of both theory and practice. We also implement the algorithm for each of the seven problems considered, introducing a novel adaptation of the algorithm for the maximum compatibility problem of phylogenetic characters. Our implementation outperforms alternative state-of-the-art approaches in terms of numbers of test instances solved on well-known benchmarks on minimum fill-in, generalized hypertreewidth, fractional hypertreewidth, total table size, and the maximum compatibility problem of phylogenetic characters. Furthermore, to our understanding the implementation is the first exact approach for the treelength problem. ACM Computing Classification System (CCS) Theory of computation → Design and analysis of algorithms → Graph algorithms analysis Avainsanat — Nyckelord — Keywords tree decompositions, triangulations, Bouchitté–Todinca algorithm, potential maximal cliques Säilytyspaikka — Förvaringsställe— Where deposited Helsinki University Library Muita tietoja — övrigauppgifter — Additional information Algorithms specialisation line Contents 1 Introduction1 1.1 Contributions.................................. 3 1.2 Organization of the Thesis........................... 4 2 Preliminaries5 2.1 Graph Theory and Notations ......................... 5 2.2 Tree Decompositions.............................. 5 2.3 Triangulations.................................. 7 2.4 From Tree Decompositions to Triangulations................. 9 2.5 Cost Functions on Minimal Triangulations.................. 11 3 Optimal Tree Decompositions 13 3.1 Treewidth.................................... 13 3.2 Minimum Fill-In ................................ 14 3.3 Generalized and Fractional Hypertreewidth.................. 14 3.3.1 Generalized Hypertreewidth...................... 15 3.3.2 Fractional Hypertreewidth....................... 16 3.3.3 On the Notion of Hypertreewidth................... 17 3.4 Total Table Size................................. 18 3.5 Phylogenetic Character Compatibility..................... 19 3.5.1 Phylogenetic Trees ........................... 20 3.5.2 Formulation by Triangulations..................... 21 3.6 Treelength.................................... 23 4 The Bouchitté–Todinca Algorithm 25 4.1 Combinatorial Objects............................. 25 4.1.1 Minimal Separators........................... 25 4.1.2 Potential Maximal Cliques....................... 27 4.2 Enumeration Phase............................... 28 4.2.1 Enumerating Minimal Separators................... 28 4.2.2 Enumerating Potential Maximal Cliques ............... 30 4.3 Dynamic Programming Phase......................... 35 4.3.1 Characterization of Minimal Triangulations ............. 35 4.3.2 Finding Optimal Minimal Triangulations............... 38 4.3.3 The BT-DP Algorithm......................... 39 4.4 Adaptation to Maximum Compatibility.................... 42 5 Implementation 44 5.1 Overview..................................... 44 5.2 Preprocessing.................................. 46 5.2.1 General Techniques........................... 46 5.2.2 Treewidth................................ 47 5.2.3 Minimum Fill-In ............................ 48 5.2.4 Phylogenetic Character Compatibility................. 49 5.2.5 Other Problems............................. 49 5.3 Optimizing PMC-ENUM............................ 49 5.4 Computing Edge Covers ............................ 51 5.5 Low-level Implementation Details....................... 52 6 Empirical Comparison 53 6.1 Empirical Setup................................. 53 6.2 Benchmarks................................... 53 6.3 Treewidth.................................... 54 6.4 Minimum Fill-In ................................ 56 6.5 Generalized and Fractional Hypertreewidth.................. 57 6.5.1 Generalized Hypertreewidth...................... 59 6.5.2 Fractional Hypertreewidth....................... 60 6.6 Total Table Size................................. 61 6.7 Phylogenetic Character Compatibility..................... 61 6.7.1 Perfect Phylogeny............................ 62 6.7.2 Binary Maximum Compatibility.................... 63 6.7.3 Multi-state Maximum Compatibility ................. 65 6.8 Treelength.................................... 67 7 Analysis of the Implementation 68 7.1 Treewidth.................................... 68 7.2 Minimum Fill-In ................................ 71 7.3 Generalized Hypertreewidth.......................... 73 7.4 Fractional Hypertreewidth........................... 76 7.5 Perfect Phylogeny................................ 78 7.6 Multi-State Maximum Compatibility ..................... 80 8 Conclusions 84 Bibliography 87 A Summary of Empirical Comparison 1 Introduction Graphs and trees are ubiquitous objects in computer science. Graphs are used to represent any data with pairwise relations, such as flights between cities, connections in social net- works, or dependencies between variables. On the other hand, trees are used to represent more structured data, such as elements in a web page or indexed databases. Graphs allow the representation of more general structures than trees. However, this comes with the cost that many algorithmic problems solved efficiently in trees are intractable in graphs in general. For example, the graph problems of maximum independent set and minimum dominating set are NP-hard in general but can be solved in linear time if the input graph is a tree [5]. The desire to attain the best of both worlds motivates the task of organizing arbitrary graphs into tree-like structures. A tree decomposition of a graph is a tree whose nodes correspond to bags that contain vertices of the graph and whose structure maintains certain connectivity properties of the graph [15]. Tree decompositions have been rediscovered multiple times [14, 57, 91] and are also known as join trees [8], clique trees [46], and junction trees [62] in different contexts. It is trivial to obtain a tree decomposition of a graph in the form of a single bag containing all vertices of the graph. However, such a tree decomposition does not give any insight on the structure of the graph, and is thus not useful. To quantify the usefulness of a tree decomposition, multiple different cost functions on the quality of tree decompositions have been introduced, each with the associated goal to find tree decompositions with as small cost as possible, or in other words, optimal tree decompositions. Perhaps the most well-known cost function associated with tree decompositions is the width of a tree decomposition, measuring the number of vertices in the largest bag of the decomposition [15]. The graph parameter of treewidth denotes the smallest width of a tree decomposition of a graph [15]. The graphs with small treewidth generalize trees, trees having treewidth one. Many classical NP-hard graph problems, such as maximum independent set, minimum dominating set, chromatic number, and Hamiltonicity can be solved with dynamic programming over a tree decomposition, with a running time exponential in the width of the decomposition but linear in the size

Load more