The Power of Algorithmic Approaches to the Graph Isomorphism Problem
Von der Fakult¨atf¨urMathematik, Informatik und Naturwissenschaften der RWTH Aachen University zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften genehmigte Dissertation
vorgelegt von
Daniel Neuen, Master of Science aus Dormagen
Berichter: Universit¨atsprofessorDr. Martin Grohe Universit¨atsprofessorDr. Pascal Schweitzer Universit¨atsprofessorDr. L´aszl´oBabai
Tag der m¨undlichen Pr¨ufung:17. Dezember 2019
Diese Dissertation ist auf den Internetseiten der Universit¨atsbibliothek online verf¨ugbar. ii
Abstract The Graph Isomorphism Problem asks, given two input graphs, whether they are structurally the same, that is, whether there is a renaming of the vertices of the first graph in order to transform it to the second graph. By a recent breakthrough result of Babai (STOC 2016), this problem can be solved in quasipolynomial time. However, despite extensive research efforts, it remains one of only few natural problems in NP that are neither known to be solvable in polynomial time nor known to be NP- complete. Over the past five decades several powerful techniques tackling the Graph Isomor- phism Problem have been investigated uncovering various surprising links between different approaches. Also, the situation has led to a number of algorithms solving the isomorphism problem on restricted classes of input graphs. In this thesis, we continue the investigation of various standard approaches to the Graph Isomorphism Problem to further broaden our understanding on the power and limits of such approaches. In particular, this leads to several improved algorithms solving the isomorphism problem for important restricted classes of graphs. One of the most fundamental methods in the context of graph isomorphism test- ing is the Weisfeiler-Leman algorithm, which iteratively computes an isomorphism- invariant coloring of vertex-tuples. While the algorithm is unable to decide the isomorphism problem itself, it is commonly used as a subroutine and, for various restricted graph classes, it already serves as a complete isomorphism test. In the latter direction we prove for example that the Weisfeiler-Leman dimension of graph classes of bounded rank-width is bounded. While it was already known that the isomorphism problem for graphs of rank-width at most k is polynomial-time solvable (Grohe and Schweitzer, FOCS 2015), the previous best algorithm is complicated and the exponent of the running time depends non-elementary on k. In contrast, our analysis of the Weisfeiler-Leman algorithm yields a simple isomorphism test running in time nO(k). A framework closely related to the Weisfeiler-Leman algorithm and which works particularly well in practice is the Individualization-Refinement paradigm. Extending our understanding on the limits of combinatorial approaches we provide the first exponential lower bounds on the worst case complexity of a large and natural class of algorithms within this framework. In particular, this includes all practical state- of-the-art isomorphism tools answering an open question from Babai (STOC 2016) on the worst-case complexity of such solvers. A second crucial approach to the Graph Isomorphism Problem is based on group- theoretic techniques. In this direction, one of the algorithmic cornerstones is Luks’s polynomial time algorithm for testing isomorphism of bounded degree graphs (JCSS 1982). Adapting the novel group-theoretic methods by Babai developed for his quasipolynomial time isomorphism test (STOC 2016) we give an isomorphism test for graphs of maximum degree d running in time npolylog(d). This significantly improves over the previous best isomorphism test for graphs of maximum degree d running in time nO(d/ log d) (Babai, Kantor and Luks, FOCS 1983). With Luks’s algorithm being used as a subroutine in a number of other algo- rithms it is natural to ask for the consequences of this improvement. Besides simple applications regarding structures of small degree, we present an isomorphism test for graphs of tree-width k running in time 2k polylog(k) poly(n) improving the fixed- parameter tractable algorithm of Lokshtanov et al. (FOCS 2014) running in time 5 2O(k log k) poly(n). iii
Zusammenfassung Das Graphisomorphieproblem fragt, gegeben zwei Graphen, ob diese strukturell iden- tisch sind, d.h. ob man durch Umbenennung der Knoten des ersten Graphen diesen in den zweiten Graphen transformieren kann. Mit dem j¨ungstenDurchbruch von Babai (STOC 2016) kann dieses Problem in quasipolynomieller Zeit gel¨ostwerden. Jedoch bleibt es, trotz intensiver Forschung, eines von nur wenigen nat¨urlichen Problemen in NP, f¨urdas weder ein Polynomialzeitalgorithmus noch die NP-Schwere bekannt ist. In den letzten f¨unfJahrzehnten wurden vielf¨altige algorithmische Techniken f¨ur das Graphisomorphieproblem erforscht, was ¨uberraschende Verbindungen zwischen verschiedenen Ans¨atzenhervorgebracht hat. Außerdem f¨uhrtedies zu zahlreichen Algorithmen, die das Isomorphieproblem auf eingeschr¨anktenKlassen von Eingabe- graphen l¨osen. Diese Dissertation verfolgt das Ziel unser Verst¨andnis¨uber St¨arken und Grenzen verschiedener wichtiger Ans¨atzezum Graphisomorphieproblem zu er- weitern. Dies resultiert insbesondere in mehreren verbesserten Algorithmen die das Isomorphieproblem f¨urwichtige Klassen von Graphen l¨osen. Eine der grundlegendsten Methoden f¨urdas Testen von Isomorphie ist der Weisfei- ler-Leman Algorithmus, der iterativ eine isomorphieinvariante F¨arbungder Knotentu- pel berechnet. Obwohl dieser Algorithmus das Isomorphieproblem nicht alleine l¨osen kann, wird er regelm¨aßigals Unterprogramm verwendet und dient als vollst¨andiger Isomorphietest f¨urmehrere eingeschr¨ankteKlassen von Graphen. In diesen Zusam- menhang zeigen wir, dass die Weisfeiler-Leman Dimension von Graphen beschr¨ankter Rangweite beschr¨anktist. W¨ahrendein Polynomialzeitalgorithmus f¨urdas Isomor- phieproblem von Graphen mit Rangweite h¨ochstens k bereits vorher bekannt war (Grohe und Schweitzer, FOCS 2015), so sind bisherige Algorithmen kompliziert und der Exponent der Laufzeit h¨angtnicht-elementar von k ab. Im Gegensatz dazu ergibt unsere Analyse des Weisfeiler-Leman Algorithmus einen einfachen Isomorphietest mit Laufzeit nO(k). Ein mit dem Weisfeiler-Leman Algorithmus eng verwandter Ansatz, der besonders gut in der Praxis funktioniert, ist das Paradigma des Individualisierens und Verfei- nerns. Um unser Verst¨andnisder Grenzen kombinatorischer Ans¨atzezu erweitern, geben wir die ersten exponentiellen unteren Schranken f¨urdie Laufzeit einer großen und nat¨urlichen Klasse von Algorithmen innerhalb dieses Paradigmas. Insbeson- dere erhalten wir exponentielle untere Schranken f¨urdie Laufzeit s¨amtlicher moder- ner praktischer Isomorphieprogramme und beantworten damit eine offene Frage von Babai (STOC 2016). Ein zweiter fundamentaler Ansatz f¨urdas Graphisomorphieproblem ist die Ver- wendung gruppentheoretischer Methoden. In dieser Hinsicht ist Luks Algorithmus f¨ur das Testen von Isomorphie von Graphen beschr¨anktenGrades einer der Grundpfeiler der algorithmischen Theorie des Graphisomorphieproblems. Indem wir die grup- pentheoretischen Methoden, die Babai f¨urseinen Quasipolynomialzeitalgorithmus entwickelt hat, anpassen, erhalten wir einen Isomorphietest f¨urGraphen mit Ma- ximalgrad h¨ochstens d mit einer Laufzeit von npolylog(d). Dies stellt eine deutliche Verbesserung im Vergleich zum bisher schnellsten Algorithmus f¨urdieses Problem dar, welcher nO(d/ log d) Schritte ben¨otigt(Babai, Kantor und Luks, FOCS 1983). Da Luks Algorithmus als Unterprogramm f¨ureine Reihe weiterer Algorithmen dient, stellt sich die Frage welche Konsequenzen sich aus obiger Verbesserung ergeben. Neben einfachen Anwendungen f¨urdas Isomorphieproblem von relationalen Struk- turen mit kleinem Grad zeigen wir, dass man Isomorphie von Graphen mit Baumweite h¨ochstens k in Zeit 2k polylog(k) poly(n) testen kann und verbessern damit den FPT- 5 Algorithmus von Lokshtanov et al. (FOCS 2014) mit Laufzeit 2O(k log k) poly(n). iv v
Acknowledgments First and foremost, I am grateful to my supervisor Martin Grohe for his guidance and support during the time I spent in his group researching for this thesis. Also, I want to thank Pascal Schweitzer for his continued support and for always having an open door when still being at RWTH Aachen University. Moreover, I would like to thank my colleagues Sandra Kiefer and Daniel Wiebking for joint collaborations and for proof-reading parts of this thesis. I am particularly thankful to Sandra Kiefer for sharing an office all these years and making the time at our group more joyful. Finally, I am grateful to my family, and in particular my parents, for their endless support throughout my time as an undergraduate and graduate student. vi Contents
1 Introduction 1
2 Isomorphism and Combinatorial Algorithms 11 2.1 Graphs and Isomorphism ...... 11 2.1.1 Graphs and Notation ...... 11 2.1.2 Tree Decompositions and Tree-Width ...... 12 2.1.3 Isomorphisms ...... 13 2.1.4 Isomorphism Invariance and Canonization ...... 14 2.2 The Weisfeiler-Leman Algorithm ...... 16 2.2.1 The Algorithm ...... 16 2.2.2 Pebble Games ...... 19 2.2.3 Connection to Logic ...... 20 2.3 Individualization-Refinement ...... 21 2.3.1 The Basic Paradigm ...... 21 2.3.2 Pruning with Invariants ...... 23 2.3.3 Pruning with Automorphisms ...... 23
3 Upper Bounds on the WL Dimension 25 3.1 Tree-Width ...... 25 3.2 Rank-Width ...... 30 3.2.1 Definition and Properties ...... 30 3.2.2 Split Pairs and Flip Functions ...... 32 3.2.3 A Recursive Strategy for Spoiler ...... 36 3.3 Further Results ...... 41
4 Lower Bounds 43 4.1 Weisfeiler-Leman Algorithm ...... 43 4.2 The I/R-Method in Theory ...... 48 4.2.1 A Framework for a Lower Bound ...... 49 4.2.2 The Multipede Construction ...... 50 4.2.3 The Weisfeiler-Leman Refinement and Closure Operators ...... 53 4.2.4 Meager Graphs ...... 56 4.2.5 Lower Bounds for I/R-Algorithms ...... 58 4.3 The I/R-Method in Practice ...... 61
vii viii CONTENTS
5 Group Theory 65 5.1 Permutation Groups ...... 65 5.1.1 Basics ...... 65 5.1.2 Algorithms for Permutation Groups ...... 67 5.1.3 Groups with Restricted Composition Factors ...... 68 5.2 String Isomorphism ...... 69 5.2.1 Graphs of Bounded Color Class Size ...... 69 5.2.2 String Isomorphism Problem ...... 70 5.2.3 Recursion Mechanisms ...... 71 5.3 Bounded Degree Graphs and Group Theory ...... 74 5.4 Primitive Groups ...... 76 5.4.1 The O’Nan Scott Theorem ...... 77 5.4.2 Affine Groups ...... 78 5.4.3 Non-Affine Groups ...... 79 5.4.4 A Characterization Theorem ...... 81 5.5 String Isomorphism in Quasipolynomial Time ...... 84
6 Isomorphism for Bounded Degree Graphs 89 6.1 Structure Trees ...... 89 6.1.1 Sequences of Partitions and Structure Trees ...... 90 6.1.2 Structure Graphs and Tree Unfoldings ...... 92 6.1.3 Normalizing the Action ...... 95 6.2 Affected Orbits ...... 100 6.3 Recursion ...... 102 6.4 Local Certificates ...... 104 6.4.1 The Algorithm ...... 104 6.4.2 Comparing Local Certificates ...... 106 6.4.3 Aggregating Local Certificates ...... 107 6.5 String Isomorphism ...... 112 6.6 Applications ...... 115 6.6.1 Isomorphism for Structures of Bounded Degree ...... 115 6.6.2 Coset-Labeled Hypergraphs ...... 116
7 Isomorphism for Bounded Tree-Width Graphs 119 7.1 Isomorphism-Invariant Decompositions ...... 119 7.1.1 Idea ...... 119 7.1.2 Clique Separators ...... 120 7.1.3 Decomposition of Basic Graphs ...... 121 7.2 Isomorphism Testing using Dynamic Programming ...... 126
8 Discussion 133 Chapter 1
Introduction
Two graphs G and H are called isomorphic if they are structurally the same, i.e., if there is a bijection ϕ: V (G) → V (H) which preserves the edge relation meaning that vw ∈ E(G) if and only if ϕ(v)ϕ(w) ∈ E(H) for all vertices v, w ∈ V (G). The Graph Isomorphism Problem asks, given two input graphs G and H, to decide whether G is isomorphic to H. This problem is one of only few natural problems which are contained in the complexity class NP and which are neither known to be contained in PTIME nor known to be NP-complete. Indeed, together with the Factorization Problem, the Graph Isomorphism Problem is one of the most prominent examples for such a computational problem. While the Factorization Problem is generally believed to be difficult to solve with important applications in cryptography, the status of the Graph Isomorphism Problem is still wide open despite extensive efforts to solve the problem over the past four decades (see [4, 95] for surveys). Actually, already in 1977, the phenomenon of extensively researching the problem was referred to as the Graph Isomorphism Disease in a survey article of Read and Corneil [133] and interest in the problem has not declined in the following decades underlying its significance in theoretical computer science. In 2015, the interest in the Graph Isomorphism Problem was further highlighted with a Dagstuhl Seminar solely devoted to this problem [16]. Already early on in the study of the Graph Isomorphism Problem there had been evidence suggesting the problem may not be NP-hard. For example, Babai [6] and Mathon [110] inde- pendently proved that the counting version of the problem is equivalent to its decision version which cannot be observed for any other known NP-complete problem (cf. [144]). Also, the Graph Isomorphism Problem is contained in the complexity class co-AM [59, 60] which implies the prob- lem is not NP-complete unless the polynomial hierarchy collapses to its second level [29]. But of course the most striking evidence is provided by Babai’s recent breakthrough result [11] giving a quasipolynomial time algorithm solving the Graph Isomorphism Problem (i.e. the problem can c be solved in time nO((log n) ) for some absolute constant c). This significantly improves on the √ previous best isomorphism test running in time 2O( n log n) [19] and implies the Graph Isomor- phism Problem is not NP-complete unless every problem in NP can be solved in quasipolynomial time (which would, for example, refute the Exponential Time Hypothesis [83]). However, despite the major progress made by Babai’s result, the question whether graph isomorphism is in PTIME remains wide open. With a solution to the general problem seemingly out of reach a lot of attention has been put into investigating the complexity of isomorphism testing for restricted classes of input graphs. One of the first important results in this direction was obtained by Hopcroft and Tarjan proving that the isomorphism problem for planar graphs can be solved in quasilinear time [78, 79, 80], which was later improved to linear time by Hopcroft and Wong [81]. Further examples include
1 2 CHAPTER 1. INTRODUCTION polynomial time algorithms for all graph classes of bounded tree-width [27] and graph classes of bounded Euler genus [52, 116]. More generally, Ponomarenko proved that every graph class that excludes some fixed graph as a minor admits a polynomial time isomorphism test [131]. In 1979, Babai proved that the isomorphism problem for graphs of bounded color class size can be solved in polynomial time first employing algorithmic methods related to group theory. The use of group-theoretic techniques was further extended by Luks in his seminal paper [106] giving the first polynomial-time isomorphism test for graphs of bounded degree.√ In combination with a combinatorial trick due to Zemlyachenko [151] this also led to an 2O( n log n)-time isomorphism for general graphs [19] which formed the best-known algorithm for the general problem for over three decades. Moreover, the methods developed by Luks also form the basis for Babai’s recent quasipolynomial time algorithm [11]. One of the most general results concerning the tractability of the isomorphism problem for restricted graph classes has been given by Grohe and Marx presenting polynomial-time isomor- phism tests for all graph classes excluding a fixed graph as a topological subgraph [67]. In particular, this includes all graph classes excluding some minor and all graph classes of bounded degree. One may note that, up to this point, all graph classes considered contain only sparse graphs. Graph classes admitting polynomial-time isomorphism tests also containing dense graphs include interval graphs [36], unit square graphs [122] and, maybe most notably, graph classes of bounded rank-width [73]. Finally, as a last example, the isomorphism problem for graphs with bounded eigenvalue multiplicity can also be solved in polynomial time [18, 56]. On the other hand, for graph classes such as bipartite graphs, chordal graphs, or k-degenerate graphs, it is known that the isomorphism problem is as difficult as the isomorphism problem for the class of all graphs (see, e.g., [28]). In this case the isomorphism problem for such a graph class is called GI-complete. Indeed, for most natural graph classes it is known that the isomorphism problem is either solvable in polynomial time or it is GI-complete. However, assuming the Graph Isomorphism Problem is not contained in PTIME there also exist graph classes for which the isomorphism problem is neither solvable in polynomial time nor GI-complete [127]. Besides classifying graph classes into tractable and non-tractable classes with respect to the isomorphism problem and trying to optimize running times for the tractable cases, another line of research is to investigate the complexity with respect to different cost measures such as space complexity. In this direction, Datta et al. proved that the isomorphism problem for planar graphs can be solved in LOGSPACE [42] which was later generalized to graph classes of bounded genus by Elberfeld and Kawarabayashi [47]. Moreover, the same holds for all graph classes of bounded tree-width [48]. Since already the isomorphism problem for trees is hard for the complexity class LOGSPACE (under many-one AC0-reductions) [84] the isomorphism problem for the aforementioned classes is actually LOGSPACE-complete. When analyzing the algorithms for isomorphism testing mentioned above it is notable that similar subroutines are used in many of the algorithms. Indeed, two fundamental algorithmic approaches to the Graph Isomorphism Problem, that are exploited for a number of algorithms tackling the problem, are combinatorial partition-refinement techniques and group-theoretic ap- proaches. In the context of partition-refinement techniques, one of the most important algorithms is the Weisfeiler-Leman algorithm (see, e.g., [82, 148, 149]) which is a heuristic algorithm trying to distinguish between graphs based on certain combinatorial properties. For group-theoretic techniques tackling the isomorphism problem, an important algorithmic milestone is Luks’s al- gorithm [106] solving the isomorphism problem for graphs of bounded degree in polynomial time. Both approaches have been extensively studied in the past decades leading to a variety of results also connecting the Graph Isomorphism Problem to other areas of computer science. Also, in combination, these two approaches lay the foundation of Babai’s quasipolynomial time isomor- phism test which builds on novel group-theoretic subroutines as well as insights on the power of 3 combinatorial partition-refinement methods such as the Weisfeiler-Leman algorithm. The main purpose of this thesis is to further expand our understanding on the power and the limits of these fundamental approaches. This leads to various improved algorithms for testing isomorphism on important classes of graphs.
Weisfeiler-Leman Algorithm and Related Combinatorial Approaches One of the most fundamental subroutines in the context of the Graph Isomorphism Problem is the Weisfeiler-Leman algorithm. For every k ≥ 1 there is a k-dimensional variant of the algorithm that colors, for a given graph G, the k-tuples of vertices of G and iteratively refines the coloring in an isomorphism-invariant way. Originally, the algorithm was introduced only for dimension two by Weisfeiler and Leman [149]. The k-ary version, k ≥ 1, was introduced by Babai and Mathon [7] and independently by Immerman and Lander [82]. Already the 1-dimensional Weisfeiler-Leman algorithm, also referred to as the Color Re- finement algorithm, which iteratively refines an initially uniform coloring by counting for each vertex the number of neighbors of a certain color, is a quite powerful tool in the context of the Graph Isomorphism Problem and is used as a subroutine in a number of algorithms (see, e.g., [14, 21, 23, 112, 113, 142]). In particular, the Color Refinement algorithm already manages to solve the isomorphism problem for random graphs asymptotically almost surely [17]. Whereas the Color Refinement algorithms fails to decide isomorphism of regular graphs, the 2-dimensional Weisfeiler-Leman algorithm can be used to solve the problem for random regular graphs asymp- totically almost surely [98]. Also, the 2-dimensional Weisfeiler-Leman algorithm is closely tied to coherent configurations which are for example studied in algebraic combinatorics (see, e.g., [34]). For higher dimensions, the algorithm is for example prominently applied in Babai’s quasipolyno- mial time isomorphism test for dimension k = O(log n) (where n denotes the number of vertices of the input graphs). There are several characterizations of the Weisfeiler-Leman algorithm that connect it to other areas of theoretical computer science. First, the expressive power of the algorithm can be characterized in terms of bounded-variable fragments of first-order logic with counting [82] which connects the Weisfeiler-Leman algorithm to finite model theory and descriptive complexity theory (see, e.g., [64]). Also, this connection can be used to capture the power of Weisfeiler- Leman algorithm in terms of certain pebble games [76]. More recently, it has been observed that the power of the algorithm also corresponds to Sherali-Adams relaxations of the natural linear programs for the Graph Isomorphism Problem [5, 71]. This result inspired work also relating the Weisfeiler-Leman algorithm to semi-definite programming [126, 140] and algebraic approaches (e.g. Gr¨obnerBasis) [25, 26, 72]. Furthermore, the power of the Weisfeiler-Leman algorithm can be characterized by certain homomorphism counts from graphs of bounded tree-width [44]. Finally, in recent years, the Weisfeiler-Leman algorithm has also been exploited in a machine learning context for graph classification problems [139] (see also [97, 125]). In this direction, the power of the Color Refinement algorithm corresponds to that of certain graph neural networks [120]. A common way to investigate the strength of the Weisfeiler-Leman algorithm is to determine the dimension of the algorithm that is required to build a complete isomorphism test for certain graphs G. In this sense, following Grohe [64], the Weisfeiler-Leman dimension of a graph G is the minimal number k such that the k-dimensional Weisfeiler-Leman algorithm identifies G. By the celebrated seminal paper of Cai, F¨urerand Immerman [31] there is no fixed dimension for which the Weisfeiler-Leman algorithm solves the Graph Isomorphism Problem on its own. Indeed, there are non-isomorphic n-vertex graphs G and H which the k-dimensional Weisfeiler-Leman algorithm cannot distinguish unless k = Ω(n). However, when focusing on a particular class of 4 CHAPTER 1. INTRODUCTION graphs, it is often the case that the Weisfeiler-Leman algorithm serves as a complete isomorphism test for some fixed dimension k. Since the Weisfeiler-Leman algorithm can be implemented in polynomial time for every fixed dimension k (see, e.g., [82]) this immediately gives a polynomial time isomorphism test for the graph class in question. Indeed, for several classes admitting polynomial time isomorphism tests mentioned above the Weisfeiler-Leman algorithm builds a complete isomorphism test giving a unifying method to tackle the isomorphism problem for these classes. For example, this includes planar graphs [90], graphs of bounded tree-width [66], graphs of bounded genus [62, 65], and more generally, all classes that exclude a fixed graph as a minor [63]. Also, the class of interval graphs has finite Weisfeiler-Leman dimension [51]. In this thesis we investigate the Weisfeiler-Leman dimension of graphs of bounded tree-width and rank-width. For the case of tree-width Grohe and Mari˜no[66] proved that the Weisfeiler- Leman dimension of the class of graphs of tree-width at most k is upper-bounded by k + 2. In this thesis we present a more careful implementation of their general strategy resulting in an improved upper bound of k confirming a conjecture of Grohe [64]. More importantly, we extend the high-level strategy for bounding the Weisfeiler-Leman di- mension of graphs of bounded tree-width to graphs of bounded rank-width. Originally introduced by Oum and Seymour [130], rank-width is another graph parameter measuring the width of a cer- tain style of hierarchical decomposition. However, in contrast to tree-width which measures the complexity of a separation in the hierarchical decomposition in terms of connectivity, rank-width measures this complexity in terms of the rank of the adjacency matrix of the edges of the two sides of the separation. This makes rank-width almost closed under complementation and also allows for dense graphs to have small rank-width (in particular, every complete graph has rank- width 1). Rank-width is closely related to clique-width, another graph parameter measuring the structural difficultly of a graph. For every graph G it holds that rw(G) ≤ cw(G) ≤ 2rw(G)+1 − 1 [130]. In particular, a class of graphs has bounded rank-width if and only if it has bounded clique-width. This means that many NP-hard problems can be solved efficiently for graphs of bounded rank-width [40, 49]. For the Graph Isomorphism Problem, the first polynomial time algorithm for graphs of bounded rank-width was presented by Grohe and Schweitzer [73]. However, the running time of their algorithm is nf(k) where n denotes the number of vertices, k is the rank-width of the inputs graphs, and f is a non-elementary function. Besides the unsatisfactory running time of the al- gorithm, it is also rather complicated building on advanced results from structural graph theory [74] and computational group theory [106]. In this thesis we prove that the Weisfeiler-Leman dimension of graphs of rank-width k is at most 3k + 4 which adds a rich family of dense graph classes to the picture of graph classes of bounded Weisfeiler-Leman dimension. Also, this results in a simple isomorphism test for graphs of bounded rank-width which, maybe surprisingly, it also significantly faster than the isomorphism test of Grohe and Schweitzer. On top of that, we can use this result to give a generic polynomial time canonization algorithm for graphs of bounded rank-width. A canonization algorithm A for a class C maps a graph G to a graph A(G) =∼ G that solely depends on the isomorphism type of G and not on G itself. More formally, it holds that A(G) = A(H) if and only if G =∼ H for all graphs G, H ∈ C. Clearly, the isomorphism problem reduces to the problem of giving a canonization algorithm; the converse is not known. Previously, it was unknown whether such an algorithm running in polynomial time exists for graphs of rank-width at most k. Having provided upper bounds on the Weisfeiler-Leman dimension of graphs of bounded tree- width and rank-width, a natural question is to ask for lower bounds on the Weisfeiler-Leman dimension of these two classes. Looking at the result of Cai, F¨urerand Immerman [31] it is not difficult to see that both upper bounds are tight up to a constant factor. However, aiming to exactly determine the Weisfeiler-Leman dimension of the graph classes, it would be desirable 5 to obtain lower bounds that are as close as possible to the upper bounds described above. In this direction, Dawar and Richerby [43] showed that the Weisfeiler-Leman dimension of the Cai- F¨urer-Immermangraphs is closely tied to the tree-width of the base graphs. Exploiting this connection we present a more refined analysis on the tree-width of certain Cai-F¨urer-Immerman graphs leading to improved lower bounds on the Weisfeiler-Leman dimension that are only a small constant factor away from the upper bounds for graph classes of bounded tree-width and rank-width. Another, closely related, combinatorial approach to the Graph Isomorphism Problem, that works particularly well in practice, is the Individualization-Refinement paradigm (I/R paradigm). In a nutshell, the basic principle of an I/R algorithm is to first refine an initially uniform coloring of the vertices of the input graphs in an isomorphism-invariant manner. A typical choice for this subroutine is the Color Refinement algorithm. In case the produced coloring is discrete (i.e., every color class contains only a single vertex) the isomorphism problem can be solved in a straightforward way. Otherwise, vertices in a chosen color class are individualized one by one in a backtracking manner in order to artificially distinguish them from the other vertices. This process yields a backtracking tree that is traversed in order to explore the structure of the input graphs. Additional pruning techniques for example based on automorphisms of the input graphs make this framework feasible in practice. While the paradigm is also regularly exploited theoretically analyzing the isomorphism prob- lem (see, e.g., [9, 14, 141, 142]), among them Babai’s quasipolynomial time algorithm [11], the I/R framework is most comprehensively used in practical software tools tackling the problem. The I/R paradigm was first implemented by Brendan McKay in the early 1980’s in his software package Nauty [112] which is still one of most efficient tools for the purpose of isomorphism testing today. In the last decade, several variants of the algorithm have been implemented in other software packages such as Nauty/Traces [113], Bliss [85, 86], Conauto [105] and Saucy [35] leading to a variety of solvers that perform extremely well on an abundance of different types of instances (see, e.g., [113]). Actually, with a lack of instances challenging these solvers, the isomorphism problem already seems to be solved from a practical point of view. On the other hand, only very little is known on the worst-case complexity of the algorithms implemented by the solvers. Indeed, in his breakthrough paper [11], Babai explicitly asks for the worst-case complexity of algorithms purely based on the Individualization-Refinement paradigm. In 1995 Miyazaki proved that the then current version of Nauty [112] has exponential worst- case complexity [118]. For the proof Miyazaki designed a family of graphs that specifically target the cell selection (i.e., the color classes chosen for individualization) implemented in Nauty fooling the algorithm into an exponential behavior. However, as Miyazaki also proves, the constructed graphs can be solved in polynomial time using a slightly different cell selection. Indeed, with the heuristics for cell selection and other tasks getting more and more refined, most of the practical tools developed in the last decade perform efficiently on the graphs constructed by Miyazaki (see, e.g., [113]). In this thesis, we analyze the power of algorithms within the I/R paradigm in a much more broad setting and provide exponential lower bounds on the worst-case complexity of a large class of I/R-algorithms that include all current state-of-the-art isomorphism tools. More precisely, we present a construction yielding graphs, which we call multipede graphs, with an exponential size search tree for all I/R algorithms where the refinement operator, the cell selection and the invariants used are not stronger than the k-dimensional Weisfeiler-Leman algorithm for some fixed number k. In particular, there is no restriction on the automorphism pruning performed by the algorithm and one may even assume perfect automorphism pruning. It should be pointed out that this makes the I/R approach stronger than the k-dimensional Weisfeiler-Leman algorithm for any fixed number k. For example, isomorphism for Cai-F¨urer-Immermangraphs, which 6 CHAPTER 1. INTRODUCTION cannot be distinguished by the k-dimensional Weisfeiler-Leman algorithm unless k = Ω(n), can be tested in polynomial time within the described class of I/R algorithms [118]. For the construction of the multipede graphs we utilize a construction of Gurevich and She- lah [75] that yields rigid structures (i.e., structures without non-trivial automorphisms) with arbitrarily large Weisfeiler-Leman dimension. To be more precise, we start by constructing a bipartite base graph that is obtained from a random process. This base graph has, with high probability, strong expansion properties guaranteeing a suitable variant of the meagerness prop- erty already exploited in [75]. Additionally, almost surely, the neighborhoods of the vertices of one partition class of the bipartite graph are almost disjoint. Applying a suitable variant of the Cai-F¨urer-Immermanconstruction [31] for bipartite graphs gives the desired multipede graphs. In order to analyze the size of the search tree constructed by I/R algorithms we first define a closure operator that bounds the effect of the Weisfeiler-Leman algorithm by exploiting the almost-disjointness property of the neighborhoods. The closure operator defines a subgraph which, by the meagerness property, has an exponential number of automorphisms. This gives an exponential number of colorings that cannot be distinguished by the Weisfeiler-Leman algorithm. Combining these statements gives the desired exponential lower bound on the size of the search tree of I/R algorithms. With the I/R framework being used in all state-of-the-art isomorphism tools the above results raise the question whether the constructed graphs are also difficult in practice. Towards this end, we introduce a variant of our construction that creates the most difficult benchmark graphs for isomorphism testing available today.
Group-Theoretic Approaches A second fundamental approach to the Graph Isomorphism Problem is based on group-theoretic techniques. Already early on in the study of the isomorphism problem a close connection to the structure of the automorphism group of the input graphs was observed. In this direction, Babai [6] and Mathon [110] independently proved that the Graph Isomorphism Problem is polynomial- time equivalent to the problem of computing a generating set for the automorphism group and also computing the size of the automorphism group. From an algorithmic point of view techniques from group theory were first exploited by Babai [8] in 1979 to give an isomorphism test for graphs of bounded color class size. Already this simple algorithm demonstrates the amazing power of group-theoretic approaches since graph classes of bounded color class size include the Cai-F¨urer-Immermangraphs as well as the multipede graphs that prove to be extremely difficult for purely combinatorial approaches. While Babai’s original algorithm is a randomized Las Vegas algorithm it was derandomized shortly after by Furst, Hopcroft and Luks by providing a basic polynomial-time library for computing with permutation groups [57]. Besides its significance for the Graph Isomorphism Problem, this line of work also initiated research in Computational Group Theory (see [77, 138]). The striking usefulness of the group-theoretic techniques was further demonstrated by Luks with his polynomial time isomorphism test for graphs of bounded degree [106]. With a slight improvement given later [19] it tests in time nO(d/ log d) whether two graphs of maximum degree d are isomorphic (where n denotes the number of vertices of the input graphs). For his algorithm Luks first introduces a more general problem to be able to build a recursive algorithm along the structure of the permutation groups involved. The String Isomorphism Problem takes as input two strings x, y:Ω → Σ, where Ω is a finite set and Σ a finite alphabet, and a permutation group Γ ≤ Sym(Ω) (given by a generating set), and asks whether there is some γ ∈ Γ that maps x to y. For graphs of maximum degree d the isomorphism problem can be reduced in polynomial time [106, 21] to the String Isomorphism Problem where the input group Γ is contained in the 7
1 class Γbd containing all groups all of whose composition factors are isomorphic to subgroups of Sd. Then, the String Isomorphism Problem for Γbd-groups is solved by recursively processing the input group Γ along Γ-invariant partitions. Luks’s algorithm quickly developed to one the cornerstones of the algorithmic theory of the
Graph Isomorphism Problem. In combination with a combinatorial method due√ to Zemlyachenko [151] it results in an isomorphism test for general graphs running in time 2O( n log n) [19] which was the best known algorithm for over three decades. Also, Luks’s algorithm forms an important building block for various other algorithms tackling the isomorphism problem (see, e.g., [14, 67, 96, 117, 122]). But most notably, the recursion mechanisms introduced by Luks form the basis for Babai’s quasipolynomial time isomorphism test [11]. Indeed, for his algorithm, Babai follows Luks’s algorithm for the String Isomorphism Problem attacking the obstacle cases where the recursion performed by Luks’s algorithm does not lead to the desired running time. In order to handle these obstacle cases Babai introduces various new techniques in terms of group-theoretic methods as well as analyzing combinatorial methods such as the Weisfeiler-Leman algorithm. In this context it seems natural to ask whether the techniques developed by Babai for his quasipolynomial time algorithm can also be extended to Luks’s algorithm in order to give a faster isomorphism test for graphs of small degree. Indeed, graphs of polylogarithmic degree are not a critical case for Babai’s algorithm as the automorphism groups do not contain large alternating or symmetric groups, and graphs of polylogarithmic degree form one of the obstacle cases where Babai’ algorithm still runs in quasipolynomial time. This gives a strong motivation to investigate the above question. In this thesis we provide a positive answer and prove that the isomorphism problem for graphs of maximum degree d can be solved in time npolylog(d). Actually, following the standard route of considering the String Isomorphism Problem, we present an algorithm that polylog(d) solves the String Isomorphism Problem for Γbd-groups in time n . For designing the algorithm a main hurdle is to adapt the group-theoretic techniques devel- oped in [11] to the setting of Γbd-groups. Towards this end, we introduce the notion of an almost d-ary sequence of partitions for a permutation group Γ ≤ Sym(Ω). Consider a sequence of Γ- invariant partitions B0 = {Ω} B1 · · · Bm = {{α} |∈ Ω} where Bi ≺ Bi−1 means that the partition Bi strictly refines Bi−1. Such a sequence is almost d-ary if, for every i ∈ [m] and B ∈ Bi−1, it holds that, after stabilizing B setwise, the induced action of Γ on the classes from Bi contained in B is permutationally isomorphic to a subgroup of Sd or semi-regular (i.e., only the identity element has fixed points). For permutation groups with an almost d-ary sequence of partitions there is a natural adaption of Babai’s Unaffected Stabilizers Theorem which lays the foundation for the group-theoretic techniques developed in [11]. With this, it is possible to give a variant of the Local Certificates Routine which, by a more refined analysis of the running time, allows the efficient construction of relational structures defined on a most d points cap- turing sufficient structural information of the input strings. Computing isomorphisms between the relational structures using Babai’s algorithm [11] as a black box allows us to make sufficient progress in order to build a recursive algorithm with the desired running time. However, not every permutation group in the class Γbd has an almost d-ary sequence of par- titions required by the approach described above. To remedy this, our algorithm first performs a normalization of the input data by modifying the action of the input group while preserving string isomorphisms. This normalization process is based on some heavy group theory. The first step is to classify large primitive Γbd-groups via the O’Nan Scott Theorem exploiting several group-theoretic structure theorems on primitive groups in Γbd showing that such groups are nec- essarily composed of Johnson schemes in a well-defined manner. Based on this classification we are able to construct graphs of small degree describing the structure of the input permutation
1 In Luks’s original work [106] this class is called Γd. However, in the more recent literature [13, 58] the class Γd typically refers to larger class of groups. 8 CHAPTER 1. INTRODUCTION group in a suitable way. Finally, unfolding these graphs yields the desired normalized group operation. With Luks’s algorithm for the String Isomorphism Problem being used as a subroutine in various other algorithms one can ask for the impact of the above improvement in the context of the Graph Isomorphism Problem. Of course, the first application is an improved isomorphism test for graphs of maximum degree d running in time npolylog(d). Moreover, one can also give better isomorphism tests for relational structures and hypergraphs of small degree. Actually, these results are not only interesting when the degree is small, but even improve on existing algorithms for isomorphism testing of relational structures and hypergraphs in general (cf. [15]). For a deeper application of the above results we consider the isomorphism problem for graphs of bounded tree-width. The first polynomial time algorithm for graph classes of bounded tree-width was given by Bodlaender [27] using dynamic programming on the set of all k-tuples of vertices separating the graph resulting in a running time of nO(k). This roughly matches the running time of an isomorphism test based on the Weisfeiler-Leman algorithm using a result of Grohe and Mari˜no [66] upper-bounding the Weisfeiler-Leman dimension of such graphs. Only recently, Lokshtanov, Pilipczuk, Pilipczuk, and Saurabh [104] designed the first fixed-parameter tractable isomorphism 5 test parameterized by the tree-width of the input graphs running in time 2O(k log k)nc for some constant c. The algorithm of Lokshtanov et al. first improves the input graphs and decomposes the improved graphs along clique separators into so-called basic parts in an isomorphism-invariant manner. After fixing a vertex of small degree, the basic parts can be decomposed further into an isomorphism-invariant tree decomposition of width exponential in k and adhesion width O(k3). Using dynamic programming this suffices to compute a graph canonization in the desired time frame (see also [128]). In this thesis we give an improved isomorphism test running in time 2O(k polylog(k))nc for some constant c. The main addition in comparison to the algorithm of Lokshtanov et al. is that each bag of the decomposition of the basic graphs is labeled with an auxiliary graph capturing structural information obtained during the decomposition. Crucially, these structure graphs can be designed to have small degree which allows us to apply the methods described above for the isomorphism problem of graphs of bounded degree. In combination with other modifications this enables us to improve the running time as indicated above.
Scientific Contribution The following section displays the scientific contributions of the author to the result presented in this thesis. The upper and lower bounds on the Weisfeiler-Leman dimension for graph classes of bounded tree-width were obtained in a joint work with Sandra Kiefer “The Power of the Weis- feiler Leman Algorithm to Decompose Graphs” published at the 44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019) [89]. The main technical result of this paper is that the 2-dimensional Weisfeiler-Leman algorithm distinguishes 2-separators from other pairs of vertices. The bounds on the Weisfeiler-Leman dimension for graphs of bounded tree-width, which partly follow from the above result, were, to a large part, obtained and for- malized by the present author. The upper bound on the Weisfeiler-Leman dimension of graphs of bounded rank-width was obtained solely by the author of this thesis and presented in “Canonisation and Definability for Graphs of Bounded Rank Width” published in the proceedings of the Thirty-Fourth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2019) [68]. This paper is a joint work with Martin Grohe also featuring a second definability result that was obtained together with Martin Grohe. 9
The lower bounds on the running time of I/R algorithms were obtained in a joint project with Pascal Schweitzer [123, 124]. The theoretical bounds on the worst-case complexity of I/R algo- rithms appeared as “An Exponential Lower Bound for Individualization-Refinement Algorithms for Graph Isomorphism” in the proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC 2018). On the other hand, practical benchmark graphs are con- structed and evaluated in “Benchmark Graphs for Practical Graph Isomorphism” presented at 25th Annual European Symposium on Algorithms (ESA 2017). For the practical part, the main contribution of this author lies in evaluating the performance of practical solvers on a number of different constructions based on the Cai-F¨urer-Immermangraphs [31] and multipede graphs [75] and in tuning the constructions in order to obtain graphs that are practically extremely difficult. The success on the practical side motivated research on the theoretical worst-case com- plexity of algorithms within the I/R paradigm. The main insights allowing to provide theoretical lower bounds were obtained jointly with Pascal Schweitzer in various discussions regarding this problem. The faster algorithm for the String Isomorphism Problem for Γbd-groups and, consequently for graphs of bounded degree is published as “A Faster Isomorphism Test for Graphs of Small Degree” in the proceedings of the 59th IEEE Annual Symposium on Foundations of Computer Science (FOCS 2018) [69]. For this result, the adaption of Babai’s methods for Γbd-groups equipped with an almost d-ary sequence of partitions is completely due to this author. The ideas for the normalization procedure were developed mostly together with Martin Grohe where the technical write-up was done by the present author. I remark that the normalization procedure given in this thesis is different from the original one in the sense that it is less technically involved and to some degree provides a more general framework for modifying the action of a permutation group in a certain way. The normalization is based on a characterization theorem for primitive Γbd-groups which is mostly based on the available literature on primitive groups. The search for the literature was mostly done by this author again whereas the writeup and various technical details were resolved jointly with Pascal Schweitzer. Finally, the improved fixed-parameter tractable isomorphism test for graphs parameterized by tree-width is a joint work together with Martin Grohe, Pascal Schweitzer and Daniel Wiebking [70]. The paper appeared as “An Improved Isomorphism Test for Bounded-Tree-Width Graphs” in the proceedings of the 45th International Colloquium on Automata, Languages, and Pro- gramming (ICALP 2018). For this work, the main contribution of the present author lies in constructing and exploiting the structure graphs, that each bag is labeled with and which cap- ture the bounded-degree structures present in the bag in order to be able to apply the novel methods developed for bounded degree graphs. I remark that the observation that the automor- phism group of a k-basic graph, after fixing a vertex of degree at most k, lies in the class Γbk+1 is due to Martin Grohe, Pascal Schweitzer and Daniel Wiebking which, however, alone is not sufficient in order to apply the methods developed for graphs of bounded degree.
Structure of the Thesis The remainder of this thesis is structured as follows. In Chapter 2 we introduce the basic terminology for this work including a complete description of the Weisfeiler-Leman algorithm and the Individualization-Refinement paradigm. The upper bounds on the Weisfeiler-Leman dimension of graphs of bounded tree-width and rank-width are presented in Chapter 3. These results are complemented by the corresponding lower bounds in Section 4.1. Moreover, the lower bounds on the worst-case complexity of algorithms within the I/R framework are given in Section 4.2 and 4.3. This concludes the first part of this thesis on the power of purely combinatorial approaches to the Graph Isomorphism Problem. 10 CHAPTER 1. INTRODUCTION
For the second part, which deals with the power of group-theoretic approaches, Chapter 5 first introduces the necessary background on group theory as well as Luks’s algorithm and a short introduction to some aspects of Babai’s quasipolynomial time algorithm. In Chapter 6 we present a faster algorithm performing isomorphism tests for graphs of bounded degree adapting the group-theoretic techniques of Babai’s algorithm. In Chapter 7 these results are further applied to also obtain an improved fixed-parameter tractable isomorphism test parameterized by the tree-width of the input graphs. Finally, this thesis is concluded with a discussion of the results and open research directions in Chapter 8. Chapter 2
Isomorphism and Combinatorial Algorithms
2.1 Graphs and Isomorphism
2.1.1 Graphs and Notation
Notation for Numbers and Sets. The set of natural numbers is N = {1, 2, 3, 4,... } and [n] := {1, . . . , n} denotes the initial segment of the natural numbers up to n. Also, Z = {0, 1, −1, 2, −2,... } denotes the set of integers and Z≥0 = {0, 1, 2, 3,... } denotes the set of non-negative integers. For a finite set X the power set of X is denoted by 2X := {Y | Y ⊆ X}. X |X| X Note that |2 | = 2 . Also, for t ≤ |X|, the set of all t-element subsets of X is denoted t := X |X| X {Y ⊆ X | |Y | = t}. Again, observe that | t | = t . Similarly, ≤t := {Y ⊆ X | |Y | ≤ t} denotes the set of all subsets of X of size at most t. For three sets X,X1,X2, the set X is the disjoint union of X1 and X2, denoted by X = X1 ] X2, if X = X1 ∪ X2 and X1 ∩ X2 = ∅. For a finite set X a partition of X is a collection B of subsets such that B1 ∩ B2 = ∅ for all B1,B2 ∈ B S and B∈B B = X. A partition B is called an equipartition if |B1| = |B2| for all B1,B2 ∈ B. For S ⊆ X define B[S] := {B ∩ S | B ∈ B: B ∩ S 6= ∅} to be the induced partition on S. A partition B1 of a set X refines another partition B2 of X, denoted B1 B2, if for every B1 ∈ B1 there is some B2 ∈ B2 such that B1 ⊆ B2. If additionally B1 6= B2 we say that B1 strictly refines B2 which is denoted by B1 ≺ B2.
Graphs. A graph is a pair G = (V (G),E(G)) with vertex set V (G) and edge set E(G). Unless stated otherwise, all graphs are undirected and simple graphs, i.e., there are no loops or multiedges. In this setting an edge is denoted as vw where v, w ∈ V (G). The (open) neighborhood of a vertex v is denoted NG(v) := {w ∈ V (G) | vw ∈ E(G)}. The closed neighborhood is NG[v] := NG(v) ∪ {v}. Also, for a set of vertices X ⊆ V (G), the neighborhood of X is defined as S NG(X) := v∈X NG(v) \ X. The degree of a vertex v ∈ V (G), denoted degG(v), is the size of its neighborhood. Usually, we omit the index G if it is clear from the context and simply write N(v), N[v], N(X) and deg(v). Let v, w ∈ V (G). A path from v to w is a sequence of pairwise distinct vertices v = u0, u1, . . . , u`−1, u` = w such that ui−1ui ∈ E(G) for all i ∈ [`]. In this case ` is the length of the path. The distance between v and w, denoted distG(v, w), is the length of a shortest path from v to w. As before, the index G is usually omitted. For X ⊆ V (G) the induced subgraph on X is G[X] := (X, {vw | v, w ∈ X, vw ∈ E(G)}). Also, G−X := G[V (G)\X]
11 12 CHAPTER 2. ISOMORPHISM AND COMBINATORIAL ALGORITHMS
denotes the induced subgraph on the complement of X. A (vertex-)colored graph is a tuple G = (V (G),E(G), χG) where χG : V (G) → C is a mapping and C is some finite set of colors. For ease of notation I also often write (G, χG) in order to explicitly refer to the coloring χG. In this thesis, all graphs may be seen as colored graphs. Note that an uncolored graph can be interpreted as a colored graph where each vertex is assigned the −1 same color. The color classes of a (colored) graph are the sets χG (c) where c ∈ C. Note that the color classes form a partition of the vertex set. A coloring χG is discrete if all color classes are singletons, i.e., χG(v) 6= χG(w) for all distinct v, w ∈ V (G).
Relational Structures. A (relational) vocabulary τ is a set of relation symbols R1,...,Rk where each symbol is equipped with an arity ri ∈ N.A relational structure (over vocabulary τ) is ri a tuple A = (V,R1(A),...,Rk(A)) with a (finite) universe V and relations Ri(A) ⊆ V , i ∈ [k]. Usually, I omit the vocabulary and simply write A = (V,R1,...,Rk) for a relational structure over an implicitly given vocabulary. The structure A is t-ary if every relation symbol has arity at most t, i.e., ri ≤ t for all i ∈ [k].
2.1.2 Tree Decompositions and Tree-Width Throughout this thesis tree decompositions and the tree-width of a graph repeatedly play a role. In the following these concepts are formally defined and some very basic properties are stated which are used in the course of this thesis (for a more complete introduction to tree-width I refer to [94]).
Definition 2.1.1 (Tree Decomposition and Tree-Width). Let G be a graph. A tree decomposition is a pair (T, β) where T is a tree and β : V (T ) → 2V (G) such that
(T.1) for every v ∈ V (G) the set {t ∈ V (T ) | v ∈ β(t)} is non-empty and induces a connected subgraph in T , and
(T.2) for every e ∈ E(G) there is some t ∈ V (T ) such that e ⊆ β(t).
The sets β(t), t ∈ V (T ), are called the bags of the decomposition. The width of a decomposition (T, β) is width(T, β) := max |β(t)| − 1. t∈V (T ) The tree-width of G is the minimum width among all tree decompositions of G, i.e.,
tw(G) := min{width(T, β) | (T, β) is a tree decomposition of G}.
The adhesion sets of a tree decomposition (T, β) are the sets β(s)∩β(t) for edges st ∈ E(T ). The adhesion-width of a decomposition is the maximum size of an adhesion set, i.e., maxst∈E(T ) |β(s)∩ β(t)|.
The following very basic properties of tree decompositions are well-known.
Lemma 2.1.2. Let G be a graph and let C ⊆ V (G) be a clique in G (i.e., vw ∈ E(G) for all distinct v, w ∈ C). Also let (T, β) be a tree decomposition of G. Then there exists t ∈ V (T ) such that C ⊆ β(t).
Let k ∈ N. A graph G is k-degenerate if every subgraph of G contains a vertex of degree at most k. 2.1. GRAPHS AND ISOMORPHISM 13
Lemma 2.1.3. Let G be a graph such that tw(G) ≤ k. Then G is k-degenerate. In particular, there exists a vertex v ∈ V (G) such that deg(v) ≤ k. Let G be a graph. A set S ⊆ V (G) is a separator of G if G−S has more connected components than the graph G. In particular, if G is a connected graph then S is a separator if and only if G − S is disconnected. Now let (T, β) be a tree decomposition of the graph G and suppose G is connected. Then the adhesion set β(s) ∩ β(t) is a separator of G for every edge st ∈ E(T ) such that β(s) * β(t) and β(t) * β(s). (Observe that every tree decomposition (T, β) of a graph G can easily be turned into a tree decomposition (T 0, β0) of the same width such that β0(s) * β0(t) and β0(t) * β0(s) for all st ∈ E(T 0).)
2.1.3 Isomorphisms Two graphs G and H are isomorphic, denoted by G =∼ H, if there is a bijective mapping ϕ: V (G) → V (H) such that vw ∈ E(G) if and only if ϕ(v)ϕ(w) ∈ E(H) for all v, w ∈ V (G). In this case ϕ is an isomorphism from G to H which is denoted by ϕ: G =∼ H. For two graphs G and H let Iso(G, H) := {ϕ: V (G) → V (H) | ϕ: G =∼ H} denote the set of all isomorphisms from G to H. Also, for every set Λ ⊆ {ϕ: V (G) → V (H) | ϕ bijective} define IsoΛ(G, H) := {ϕ ∈ Λ | ϕ: G =∼ H}. The Graph Isomorphism Problem asks, given two graphs G and H, whether they are isomorphic, i.e., whether Iso(G, H) 6= ∅. While the Graph Isomorphism Problem is typically defined for uncolored graphs there is a natural variant for colored graphs. Two colored graphs (G, χG) and (H, χH ) are isomorphic if there is an isomorphism ϕ: G =∼ H (between the uncolored graphs) such that additionally χG(v) = χH (ϕ(v)) for all v ∈ V (G). The Graph Isomorphism Problem for colored graphs asks, given two colored graphs (G, χG) and (H, χH ), whether they are isomorphic. Since these two problems are polynomial-time equivalent under many-one reductions (see, e.g., [28]) I do not distinguish them in the remainder of this thesis and, consistent with previous provisions, typically refer to the isomorphism problem for colored graphs. We regularly also encounter graphs (G, χG, v1, . . . , vk) which are additionally equipped with a sequence of vertices v1, . . . , vk ∈ V (G). In this context we say that v1, . . . , vk ∈ V (G) are individualized. Two such graphs (G, χG, v1, . . . , vk) and (H, χH , w1, . . . , w`) are isomorphic if ∼ there is an isomorphism ϕ:(G, χG) = (H, χH ) such that additionally k = ` and ϕ(vi) = wi for all i ∈ [k]. Actually, in the context of isomorphism testing, a graph (G, χG, v1, . . . , vk) may be interpreted as a colored graph where the coloring χG is modified in such a way that each vi forms a singleton color class, i ∈ [k]. More formally, suppose χG : V (G) → C and define ∗ [k] χG : V (G) → C × 2 : v 7→ (χG(v), {i ∈ [k] | v = vi}). ∗ Then (G, χG, v1, . . . , vk) may be interpreted as the colored graph (G, χG). In particular, this al- lows to apply methods defined for colored graphs also to graphs with a sequence of individualized ∼ ∗ ∼ ∗ vertices. Note that (G, χG, v1, . . . , vk) = (H, χH , w1, . . . , w`) if and only if (G, χG) = (H, χH ). Two relational structures A1 = (V1,R1,...,Rk) and A2 = (V2,S2,...,Sk) (over the same vocabulary) are isomorphic if there is a bijective mapping ϕ: V1 → V2 such that (v1, . . . , vri ) ∈ Ri if and only if (ϕ(v1), . . . , ϕ(vri )) ∈ Si (where ri denotes the arity of Ri and Si). As before, ∼ ϕ: A1 = A2 denotes that ϕ is an isomorphism from A1 to A2. In the course of this thesis we repeatedly need to deal with more general types of objects for which we are interested in the set of isomorphisms. While the type of considered objects typically depends on the specific application one can still define isomorphisms within a more general framework1. Let V be finite set of elements (e.g. the vertex of a graph). The set of
1This framework is for example also used in [137] in order to describe a general method for canonizing different types of objects. 14 CHAPTER 2. ISOMORPHISM AND COMBINATORIAL ALGORITHMS hereditarily finite objects over ground set V is inductively defined as follows. Each v ∈ V is an atom which in particular is a hereditarily finite object. Also, for hereditarily finite objects X1,...,Xk (over ground set V ) the set {X1,...,Xk} as well as the tuple (X1,...,,Xk) is a hereditarily finite object. Observe that every type of object described above is a hereditarily finite object over a suitable ground set V . In order to define isomorphisms for hereditarily finite objects let X be a hereditarily finite object over ground set V and Y be a hereditarily finite object over ground set W . An isomorphism from X to Y is a bijective mapping ϕ: V → W such that Xϕ = Y where Xϕ is inductively defined as follows. For every atom v ∈ V define ϕ ϕ ϕ ϕ v = ϕ(v). For X = {X1,...,Xk} let X = {X1 ,...,Xk } and similarly, for X = (X1,...,Xk) ϕ ϕ ϕ let X = (X1 ,...,Xk ). Note that the definition of isomorphisms for hereditarily finite objects is consistent with the definitions given before. Let G be a (colored) graph. An automorphism of G is an isomorphism from G to itself. Let Aut(G) denote the automorphism group of G, i.e., the set of all automorphisms of G to- gether with the composition operation. Observe that, for a second (colored) graph H, either Iso(G, H) = ∅ or Iso(G, H) = {γσ | γ ∈ Aut(G)} =: Aut(G)σ where σ ∈ Iso(G, H) is an arbi- trary isomorphism (where compositions of functions are applied from left to right). A graph is rigid if its automorphism group is trivial, i.e., the only automorphism is the identity mapping. Of course, these definitions naturally lift to relational structures and, more generally, hereditarily finite objects.
2.1.4 Isomorphism Invariance and Canonization A typical situation in the design of algorithms for the Graph Isomorphism Problem is that, in intermediate steps of the algorithm, more complicated structures are computed which are utilized to decide the isomorphism problem for the original input. For this approach to be correct, it is often crucial that the constructed objects are defined in an isomorphism-invariant way, i.e., each isomorphism of the input structures naturally extends to an isomorphism of the structures constructed in intermediate steps. Since the notion of isomorphism-invariance is used for various types of structures in this thesis, it is formally defined based on hereditarily finite objects. Let X be a class (which is closed under isomorphisms) of pairs (V,X) where X is a hereditarily finite object over ground set V . Similarly, let Y be another class of pairs (W, Y ) where again Y is a hereditarily finite object over ground set W . A function f : X → Y is isomorphism-invariant if for every two (V1,X2), (V2,X2) ∈ X and every isomorphism ϕ ∈ Iso((V1,X1), (V2,X2)) there is an isomorphism ψ from (W1,Y1) := f(V1,X1) to (W2,Y2) := f(V2,X2) such that ϕ(v) = ψ(v) ϕ for every v ∈ V1 ∩ W1 and (V1 ∩ W1) = V2 ∩ W2.
Example 2.1.4. Let G be a graph and suppose (T, β) is a tree decomposition for G. Without loss of generality assume that V (G) ∩ V (T ) = ∅. Then G together with its tree decomposition may be viewed as a hereditarily finite object
X = (V (G),E(G),V (T ),E(T ), {(t, β(t)) | t ∈ V (T )}) over ground set W = V (G)]V (T ). Also a function, that associates a tree decomposition with each graph, is isomorphism invariant if for all graphs G1 and G2 and associated tree decompositions T1 and T2, and every ϕ ∈ Iso(G1,G2), there is a bijection ψ : V (G1) ] V (T1) → V (G2) ] V (T2) such that ψ(v) = ϕ(v) for all v ∈ V (G1), ψ|T1 is an isomorphism from T1 to T2, and β2(ψ(t)) = ψ(β1(t)) for all t ∈ V (T1). Let X1 and X2 be the hereditarily finite objects associated with G1 and G2 together with their respective tree decompositions. Then the above conditions translate to ψ being an isomorphism from X1 to X2. 2.1. GRAPHS AND ISOMORPHISM 15
An important special case of isomorphism-invariant functions are graph invariants. Indeed, a simple, but typical approach to the Graph Isomorphism Problem is to extract certain structural information from a graph. If the extracted information differ for two given input graphs they must be non-isomorphic. This approach is formalized in terms of graph invariants. Formally, let C be a class of graphs (i.e., a collection a graphs that is closed under isomorphisms). A graph invariant for C is a function I with domain C such that I(G) = I(H) for all graphs G, H ∈ C such that G =∼ H. In particular, the function I is isomorphism-invariant. For two given graphs G and H, graph invariants can be used as a heuristic for isomorphism testing by computing I(G) and I(H) and comparing the results for equality. If I(G) 6= I(H) then the input graphs must be non-isomorphic (in the other case one can not deduce any information about the graphs being isomorphic). An important example of a graph invariant builds on the k-dimensional Weisfeiler-Leman algorithm to be introduced in the next section. A graph invariant I for a class C is complete if I(G) = I(H) if and only if G =∼ H, i.e., the above algorithm serves as a complete isomorphism test. An important special case of complete graph invariants are graph canonizations where the output of the function is again a graph with an ordered vertex set. Towards this end, let GN denote the class of graphs whose vertex set is an initial segment of the natural numbers.
Definition 2.1.5. A function κ: C → GN canonizes a graph class C (resp. κ is a graph canon- ization for C) if
(C.1) κ(G) =∼ G for all G ∈ C, and
(C.2) for all G, H ∈ C it holds that
G =∼ H ⇒ κ(G) = κ(H).
Note that the backward direction of the implication in (C.2) also holds by Property (C.1). Hence, the Graph Isomorphism Problem reduces to the problem of computing a graph canon- ization. It is unknown whether the converse also holds, i.e., whether the problem of computing a graph canonization polynomial-time reduces to the Graph Isomorphism Problem.
Example 2.1.6 (Lexicographically Smallest Representation). For a graph G ∈ GN with G = 2 ([n],E(G)) consider the string sG : {(i, j) ∈ [n] | i > j} → {0, 1} defined by sG(i, j) = 1 if and only if ij ∈ E(G). There is a natural order on the positions {(i, j) ∈ [n]2 | i > j} by first comparing the first entry of the tuple and afterwards the second entry. This allows to compare graphs G, H ∈ GN by comparing the strings sG and sH using the standard lexicographic order ≤lex on strings. Consider the function κlex : G → GN defined by κlex(G) = F for a graph F ∈ GN such that F =∼ G and sF ≤lex sH ∼ for all H ∈ GN with G = H. It is easy to show that κlex canonizes the class of all graphs G. However, the function κlex cannot be computed in polynomial time (unless PTIME = NP). Indeed, in order to find a numbering of the vertices of G that minimizes the string representation, an algorithm needs to compute a maximum independent set of the graph. More precisely, consider the graph Dn,` = ([n],En,`) with
En,` = {ij | i > ` ∨ j > `}.
Then sκlex(G) ≤lex sDn,` if and only if G has an independent set of size ` (where n is the number of vertices of G). 16 CHAPTER 2. ISOMORPHISM AND COMBINATORIAL ALGORITHMS
Of course, this is only one example of a graph canonization and there may be other mappings canonizing the class of all graphs which can be evaluated more efficiently. Indeed, in a very recent work, Babai gives a graph canonization that can be computed in quasipolynomial time [12] which generalizes his previous result [11] giving an isomorphism test that runs within the same time bound.
2.2 The Weisfeiler-Leman Algorithm
One of the most fundamental subroutines in the context of the Graph Isomorphism Problem is the Weisfeiler-Leman algorithm. Originally, the algorithm was introduced in its 2-dimensional variant in 1968 by Weisfeiler and Leman [149] (see also [148]). The generalized version for arbitrary dimension k was independently defined by Babai and Mathon [7] and Immerman and Lander [82]. While the algorithm itself cannot be used to decide graph isomorphism for the class of all graphs [31] the method is commonly used as subroutine for designing isomorphism tests. A prominent example is Babai’s quasipolynomial time algorithm [11] which crucially employs the Weisfeiler-Leman algorithm for dimension k = O(log n) (where n denotes the number of vertices of the input graphs).
2.2.1 The Algorithm
Let G be a graph with vertex coloring χG : V (G) → [`] and let k ∈ N. The k-dimensional Weisfeiler-Leman algorithm is a procedure that, given a colored graph G, first computes an isomorphism-invariant initial coloring of the k-tuples of vertices and then iteratively refines this coloring in an isomorphism-invariant way. k Let χ1, χ2 : V → C be colorings of k-tuples of vertices where C is some finite set of colors. The coloring χ1 refines χ2, denoted χ1 χ2, if χ1(¯v) = χ1(w ¯) implies χ2(¯v) = χ2(w ¯) for all k v,¯ w¯ ∈ V . Observe that χ1 χ2 if and only if the partition into color classes of χ1 refines the corresponding partition into color classes of χ2. The colorings χ1 and χ2 are equivalent, denoted χ1 ≡ χ2, if χ1 χ2 and χ2 χ1. k For the description of the Weisfeiler-Leman algorithm fix k ≥ 1. The initial coloring χ(0)[G] computed by the Weisfeiler-Leman determines for each k-tuplev ¯ ∈ V (G)k the isomorphism- k type of the underlying ordered induced subgraph. More precisely, it holds χ(0)[G](v1, . . . , vk) = k χ(0)[G](w1, . . . , wk) if for all i, j ∈ [k] it holds χG(vi) = χG(wi), vi = vj if and only if wi = wj, and vivj ∈ E(G) if and only if wiwj ∈ E(G). The initial coloring is refined by iteratively k computing colorings χ(i)[G] for i > 0. Forv ¯ = (v1, . . . , vk) and w ∈ V (G) letv ¯[i/w] := (v1, . . . , vi−1, w, vi+1, . . . , vk) be the tuple obtained fromv ¯ by replacing the i-th entry with vertex k k w. For k > 1 define χ(i)[G](¯v) = (χ(i−1)[G](¯v), M) where nn oo k k M = χ(i−1)[G](¯v[1/w]), . . . , χ(i−1)[G](¯v[k/w]) w ∈ V (G) . For k = 1 the definition is essentially the same, but the multiset is defined only over the neighbors k of v, i.e., M = {{χ(i−1)[G](w) | w ∈ NG(v1)}}. From the definition of the colorings it is k k immediately clear that χ(i+1)[G] χ(i)[G]. Now let i ∈ N be the minimal number such that k k k χ(i)[G] ≡ χ(i+1)[G]. For this i, the coloring χ(i)[G] is called the stable coloring of G and is k denoted by χWL[G]. The k-dimensional Weisfeiler-Leman algorithm takes as input a (colored) graph G and com- k putes (a coloring that is equivalent to) χWL[G]. For every fixed k ∈ N this can be done in polynomial time. 2.2. THE WEISFEILER-LEMAN ALGORITHM 17
Theorem 2.2.1 (see [82]). Let G be a graph. Then an isomorphism-invariant coloring that is k k+1 equivalent to χWL[G] can be computed in time O(n log n).
For the 1-dimensional Weisfeiler-Leman algorithm, which is also referred to as Color Refine- ment algorithm, the running can be improved to almost linear in the number of vertices and edges (see, e.g., [24]).
Theorem 2.2.2. Let G be a graph. Then an isomorphism-invariant coloring that is equivalent 1 to χWL[G] can be computed in time O((n + m) log n) where n denotes the number of vertices and m the number of edges of G.
A common application is to use the Weisfeiler-Leman algorithm as an (incomplete) isomor- phism test. The k-dimensional Weisfeiler-Leman algorithm distinguishes two graph G and H if there is a color c such that
k k k k |{v¯ ∈ V (G) | χWL[G ] H](¯v) = c}|= 6 |{w¯ ∈ V (H) | χWL[G ] H](w ¯) = c}| where G ] H denotes the disjoint union of the graphs G and H. If the k-dimensional Weisfeiler- Leman algorithm distinguishes G and H then G =6∼ H. Two graphs G and H are equivalent with respect to k-dimensional Weisfeiler-Leman, denoted G 'k H, if they are not distinguished by the k-dimensional Weisfeiler-Leman algorithm. Note that, in general, one can not conclude that G =∼ H in this case. A graph G is identified by the k-dimensional Weisfeiler-Leman algorithm ∼ if G 'k H if and only if G = H for all graphs H. Following Grohe [64], the Weisfeiler-Leman dimension of a graph G, denoted dimWL(G), is the smallest number k ∈ N such that the k- dimensional Weisfeiler-Leman algorithm identifies G.
Observation 2.2.3. For every graph G it holds that dimWL(G) ≤ |V (G)| − 1.
Let C be a class of graphs. The Weisfeiler-Leman dimension of C is the smallest number 2 ` ∈ N ∪ {∞} such that dimWL(G, χG) ≤ ` for all colored graphs (G, χG) such that G ∈ C . Note that if dimWL(C) is finite then the Graph Isomorphism Problem for C can be solved in polynomial time. Indeed, for many important graph classes it is known that their Weisfeiler- Leman dimension is finite. This includes for example planar graphs [61, 90], graphs of bounded tree-width [66], and more generally every graph class that excludes a fixed graph as a minor [63] (see also [64]). On the other hand, it is also known that the class of all graphs has infinite Weisfeiler-Leman dimension [31]. Both topics are further discussed in Chapter 3 and 4. The Weisfeiler-Leman algorithm can also be used to tackle the Canonization Problem. While the coloring computed by the Weisfeiler-Leman algorithm can not be used directly to compute a graph canonization, the algorithm can be used as a subroutine to design a canonization algorithm. The basic idea is to utilize an ordering on the colors computed by the algorithm. To formalize this idea we first need to introduce another concept. Let G be a graph. The k-dimensional Weisfeiler-Leman algorithm determines orbits of G, k if for every v ∈ V (G), every graph H and every w ∈ V (H) such that χWL[G](v, . . . , v) = k ∼ χWL[H](w, . . . , w) there is an isomorphism ϕ: G = H such that ϕ(v) = w.
Theorem 2.2.4. Let C be a class of graphs of Weisfeiler-Leman dimension at most k. Then the (k + 1)-dimensional Weisfeiler-Leman algorithm determines orbits of all graphs G ∈ C.
2This definition slightly deviates from the standard definition which usually considers only uncolored graphs. However, the present definition is often more convenient. For example, there is a generic polynomial-time canon- ization algorithm for graph classes of bounded Weisfeiler-Leman dimension. 18 CHAPTER 2. ISOMORPHISM AND COMBINATORIAL ALGORITHMS
Proof. Let G ∈ C and let H be a second graph. Also let v ∈ V (G) and w ∈ V (H) such k+1 k+1 that χWL [G](v, . . . , v) = χWL [H](w, . . . , w). Then (G, v) 'k (H, w). Since the k-dimensional Weisfeiler-Leman identifies all graphs G ∈ C this implies that (G, v) =∼ (H, w). So there is an isomorphism ϕ: G =∼ H such that ϕ(v) = w.
Theorem 2.2.5. Let C be a class of graphs of Weisfeiler-Leman dimension at most k. Then there is an algorithm canonizing graphs G ∈ C in time O(nk+3 log n).
Algorithm 1: Canonization Algorithm for graph class C
Input : Graph G ∈ C with vertex coloring χG Output: κ(G)
1 n := |V (G)| 2 for i = 1, . . . , n do k+1 3 compute χG,i := χWL [G, χG, v1, . . . , vi−1] 4 vi := argminv∈V (G)\{v1,...,vi−1} χG,i(v) 5 end 6 return ([n], {ij | vivj ∈ E(G)}, i 7→ χG(vi))
Proof. Let κ: C → GN be the function computed by Algorithm 1. It is first argued that κ canonizes the graph class C. Let G ∈ C. Clearly, ϕ: V (G) → [n]: vi 7→ i is an isomorphism from G to κ(G). ∼ So let H ∈ C be a second graph such that G = H. Also let v1, . . . , vn be the sequence of vertices computed by Algorithm 1 for the graph G and let w1, . . . , wn be the corresponding ∼ sequence for H. It is proved by induction in i ∈ {0, . . . , n} that (G, v1, . . . , vi) = (H, w1, . . . , wi). ∼ ∼ The base step i = 0 is exactly the assumption G = H. So let i ≥ 1 and let ϕ:(G, v1, . . . , vi−1) = ∼ (H, w1, . . . , wi−1). Then (G, χG,i) = (H, χH,i) and χG,i(vi) = χH,i(wi). Since the (k + 1)- dimensional Weisfeiler-Leman algorithm determines orbits of all graphs G ∈ C by Theorem ∼ 2.2.4 there is an isomorphism ϕ:(G, χG,i) = (H, χH,i) such that ϕ(vi) = wi. But this means ∼ (G, v1, . . . , vi) = (H, w1, . . . , wi). By the induction principle, ϕ: V (G) → V (H): vi 7→ wi is an isomorphism from G to H. Thus, κ(G) = κ(H). The bound on the running time is immediately clear as the algorithm performs n calls to the (k + 1)-dimensional Weisfeiler-Leman algorithm, which runs in time O(nk+2 log n) (see Theorem 2.2.1).
The Weisfeiler-Leman procedure has various equivalent characterizations that connect the algorithm to other ares of computer science. The expressive power of the algorithm can be characterized in terms of bounded variable fragments of first order logic with counting quantifiers and in terms of certain pebble games (see [82, 31]). Also, more recently, it has been observed that there are correspondences to Sherali-Adams relaxations of certain linear programs [5, 71] and the strength of the algorithm can be characterized by certain homomorphism counts from graphs of bounded tree-width [44]. Moreover, in the last couple of years, the algorithm has been exploited in a graph learning context [139] (see also [97, 125]). In this context, the power of the Color Refinement algorithm can be characterized by the power of certain graph neural networks [120] and extensions of the algorithms have been proposed based on higher dimensions of the Weisfeiler-Leman algorithm (see, e.g., [119, 120]). 2.2. THE WEISFEILER-LEMAN ALGORITHM 19
2.2.2 Pebble Games For this thesis a characterization by the Weisfeiler-Leman algorithm in terms of pebble games is most relevant. Indeed, in many cases, it is much easier to argue that two graphs are distinguished by the Weisfeiler-Leman algorithm exploiting the characterization by pebble games. Let k ∈ N. For two colored graphs (G, χG) and (H, χH ) on the same number of vertices the bijective k-pebble game BPk(G, χG, H, χH ) is defined as follows: • The game has two players called Spoiler and Duplicator. • The game proceeds in rounds. Each round is associated with a pair of positions (¯v, w¯) withv ¯ ∈ V (G)` andw ¯ ∈ V (H)` where 0 ≤ ` ≤ k. • The initial position of the game is ((), ()) (the pair of empty tuples). • Each round consists of the following steps. Suppose the current position of the game is (¯v, w¯) = ((v1, . . . , v`), (w1, . . . , w`)). First, Spoiler chooses whether to remove a pair of pebbles or to play a new pair of pebbles. The first option is only possible if ` > 0 and the latter option is only possible if ` < k. If Spoiler wishes to remove a pair of pebbles he picks some i ∈ [`] and the game moves to position (¯v \ i, w¯ \ i) wherev ¯ \ i := (v1, . . . , vi−1, vi+1, . . . , v`) (w ¯ \ i is defined in the same way). Otherwise the following steps are performed. (D) Duplicator picks a bijection f : V (G) → V (H). (S) Spoiler chooses v ∈ V (G) and sets w := f(v).
The new position is then ((v1, . . . , v`, v), (w1, . . . , w`, w)).
Spoiler wins the play if for the current position ((v1, . . . , v`), (w1, . . . , w`)) the induced graphs are not isomorphic. More precisely, Spoiler wins if there is an i ∈ [`] such that χG(vi) 6= χH (wi) or there are i, j ∈ [`] such that vi = vj ⇔/ wi = wj or vivj ∈ E(G) ⇔/ wiwj ∈ E(H). If the play never ends Duplicator wins.
We say that Spoiler (resp. Duplicator) wins the bijective k-pebble game BPk(G, χG, H, χH ) if Spoiler (resp. Duplicator) has a winning strategy for the game. Also, if G and H have a different number of vertices, Spoiler immediately wins the game. Moreover, for positions (¯v, w¯) ∈ V (G)` × V (H)`, ` ≤ k, Spoiler (resp. Duplicator) wins the game BPk(G, χG, H, χH ) from position (¯v, w¯) if Spoiler (resp. Duplicator) has a winning strategy in the game BPk(G, χG, H, χH ) started from initial position (¯v, w¯). I remark that the definition of the pebble game provided above does not match the standard definition of the game which combines removing a pebble and placing a new pebble into a single move of the game. This has the advantage that the the number of moves required for Spoiler to win the game (provided he has a winning strategy) exactly corresponds to the number of iterations the Weisfeiler-Leman algorithm performs until stabilization. However, for the purpose of providing winning strategies for explicit families of input graphs (which is the main focus in this thesis) the above variant is more convenient. The next theorem connects the Weisfeiler-Leman algorithm and bijective pebble games. Theorem 2.2.6 ([31, 76]). Let G, H be two graphs and let v¯ ∈ V (G)k and w¯ ∈ V (H)k. Then k k χWL[G](¯v) = χWL[H](w ¯) if and only if Duplicator wins the pebble game BPk+1(G, H) from the position (¯v, w¯).
Corollary 2.2.7. Let G, H be two graphs. Then G 'k H if and only if Duplicator wins the pebble game BPk+1(G, H). 20 CHAPTER 2. ISOMORPHISM AND COMBINATORIAL ALGORITHMS
2.2.3 Connection to Logic Another characterization of the Weisfeiler-Leman algorithm can be formulated in terms of first- order logic with counting quantifiers. As usual, first order logic (FO) is build inductively from the atomic formulas. As this thesis is primarily concerned with graphs we restrict ourselves to the case where the vocabulary only consists of a single binary relation symbol E. In this case the atomic FO-formulas are of the form x = y and Exy. First order formulas are build from atomic formulas in an inductive fashion using the Boolean operators ∧, ∨ and ¬, existential quantifiers ∃xϕ(x) and universal quantifiers ∀xϕ(x). First order logic with counting quantifiers, denoted by C, is the extension of first order logic by counting quantifiers of the form ∃≥`ϕ(x). A graph G satisfies such a formula if there are at least ` distinct v ∈ V (G) such that G |= ϕ(v). Note that first order logic with counting quantifiers has the same expressive power as first order logic since each counting quantifier ∃≥`ϕ(x) can be replaced by ^ ^ ∃x1 ... ∃x` xi 6= xj ∧ ϕ(xi) . i6=j∈[`] i∈[`]
Let Lk be the restriction of FO to formulas that use at most k distinct variables (each variable may be requantified multiple times). Also, let Ck be the restriction of C to formulas that use at most k distinct variables. Note that, while FO and C have the same expressive power, this is not the case for Lk and Ck. Before connecting the logic Ck to the Weisfeiler-Leman algorithm we first provide some ex- amples for the logic Ck.
Example 2.2.8. Let G be a graph and v, w ∈ V (G).A walk from v to w is a sequence v = 3 u0, u1, . . . , u` = w such that ui−1ui ∈ E(G) for all i ∈ [`] and ui 6= w for all i ∈ [` − 1] . In this case ` is the length of the walk. Consider the formulas walk0(x, y) := x = y, walk1(x, y) := Exy, and
walk`(x, y) := ∃z(z 6= y ∧ E(x, z) ∧ walk`−1(z, y)) for ` ≥ 2. Then G |= walk`(v, w) if and only if there is a walk of length ` from v to w. By reusing 3 variables walk`(x, y) ∈ C . Moreover, the formula _ walkG(x, y) := walk`(x, y) `∈[|V (G)|] states that there is a walk from v to w in G. For later reference, also consider the following generalization. Let s1, . . . , sk ∈ V (G) be k k+3 additional vertices. Then there exists a formula walkG(x, y, z1, . . . , zk) ∈ C such that G |= walkG(v, w, s1, . . . , sk) if and only if there is a walk from v to w in G − {s1, . . . , sk}.
The last example does not use any counting quantifiers. It turns out that by utilizing the counting quantifiers one can not only express whether there is a walk of a certain length between two vertices, but one can also count the number of such walks.
Example 2.2.9. As in the last example, let G be a graph and v, w ∈ V (G). Consider the n,1 n,0 n,r formulas walk0 (x, y) := x = y, walk0 (x, y) := x 6= y and walk0 (x, y) := x 6= x for all r ≥ 2,
3The last condition is usually not part of the definition of a walk. This part is added mainly for technical reasons. 2.3. INDIVIDUALIZATION-REFINEMENT 21
n,1 n,0 n,r and walk1 (x, y) := E(x, y), walk1 (x, y) := ¬E(x, y) and walk1 (x, y) := x 6= x for all r ≥ 2, and ! n,r _ =d _ ^ =sj n,rj walk` (x, y) := ∃ z (z 6= y ∧ E(x, z)) ∧ ∃ z(E(x, z) ∧ walk`−1 (z, y)) P d∈[n] sj =d j P sj rj =r
n,r for ` ≥ 2. Let n := |V (G)|. Then G |= walk` (v, w) if and only if there are exactly r walks of n,r 3 length ` from v to w. By reusing variables walk` (x, y) ∈ C . Additionally, given vertices s1, . . . , sk, one can also count walks of a certain length in the k+3 graph G − {s1, . . . , sk} in the logic C . The following theorem connects the k-bijective pebble game to the logic Ck.
Theorem 2.2.10 (Hella [76]). Let G, H be two graphs and suppose k ≥ `. Also let v1, . . . , v` ∈ V (G) and w1, . . . , w` ∈ V (H). Then Spoiler wins the game BPk(G, H) starting from position k ((v1, . . . , v`), (w1, . . . , w`)) if and only if there is a formula ϕ(x1, . . . , x`) ∈ C such that G |= ϕ(v1, . . . , v`) and H 6|= ϕ(w1, . . . , w`).
k For two graphs G, H we define G ≡Ck H if for every sentence ϕ ∈ C it holds that G |= ϕ if and only if H |= ϕ.
Corollary 2.2.11. Let G, H be two graphs. Then Duplicator wins the game BPk(G, H) if and only if G ≡Ck H.
2.3 Individualization-Refinement 2.3.1 The Basic Paradigm The k-dimensional Weisfeiler-Leman algorithm provides a polynomial time algorithm for every fixed number k. However, from the perspective of designing practically efficient algorithms for the Graph Isomorphism Problem there are several downsides. First, the k-dimensional Weisfeiler- Leman algorithm fails to decide isomorphism on its own for every constant k. Actually, there are graphs whose Weisfeiler-Leman dimension is linear in their number of vertices [31]. But more importantly, while a polynomial time algorithm for constant k, the Weisfeiler-Leman algorithm requires an exhaustive amount of memory. This makes the algorithm rather inefficient in practice even for small values of k. Another approach to the Graph Isomorphism Problem, that works particularly well in prac- tice, is the Individualization-Refinement paradigm (I/R paradigm). This paradigm is imple- mented by all current state-of-the-art isomorphism solvers. This includes the software packages Nauty/Traces [112, 113], Bliss [85, 86], Conauto [105] and Saucy [35]. Also, the I/R paradigm is commonly used in a theoretical context (see, e.g., [9, 14, 23, 141, 142]) which includes Babai’s quasipolynomial time algorithm for graph isomorphism [11]. An extensive description of the paradigm of individualization-refinement algorithms can be found in [113]. The basic strategy of these algorithms is to capture information about the structure of a graph by coloring the vertices. An initially uniform coloring is refined in an isomorphism-invariant manner whenever feasible, followed by artificially distinguishing vertices in a form of backtracking. This approach can be used to decide isomorphism of two graphs, but also for computing the automorphism group or a canonization of a single graph. A refinement operator is a function ref that takes a colored graph G and refines the coloring in an isomorphism-invariant fashion. More formally, a refinement operator ref takes a pair (G, χG) 22 CHAPTER 2. ISOMORPHISM AND COMBINATORIAL ALGORITHMS
∼ and outputs ref(G, χG) χG. Also, ref is isomorphism-invariant meaning that if ϕ:(G, χG) = ∼ (H, χH ) is an isomorphism then also ϕ:(G, ref(G, χG)) = (H, ref(H, χH )). A typical choice for such a refinement is the 1-dimensional Weisfeiler-Leman algorithm. In order to obtain a practically efficient algorithm it is vital that the refinement operator can be evaluated quickly since this subroutine is called at every node of the backtracking tree. Recall that for the 1- dimensional Weisfeiler-Leman algorithm this can be done in almost linear time (see Theorem 2.2.2). In case the refinement operator ref produces a discrete coloring on the graph G it is trivial to check whether this graph is isomorphic to another graph H. Indeed, the refinement of H must also be discrete and there is at most one color preserving bijection between the vertex sets which can be trivially checked for being an isomorphism. When choosing the Color Refinement algorithm as a refinement operator, this is already the case asymptotically almost surely [17]. However, if the coloring produced by ref is not discrete one needs to do more work. In this case the strategy is to select a color class of the coloring produced by ref, usually called a cell, and to individualize a single vertex from that class. As before, individualization means to refine the coloring by giving the vertex a new singleton color. Since such an operation is not necessarily isomorphism-invariant, we branch over all choices of this vertex within the chosen cell. To the updated coloring with the newly individualized vertex, the algorithm applies the refinement operator again and proceeds in a recursive fashion. More formally, an algorithm implementing the I/R paradigm works as follows. Let G be a graph and χG : V (G) → C a coloring of the vertices. A cell selector is an isomorphism-invariant −1 function sel which maps (G, χG) to sel(G, χG) ∈ C with |χG (sel(G, χG))| ≥ 2 if such a color exists. If the coloring χG is discrete the cell selector returns sel(G, c) =⊥. Here, isomorphism- ∼ invariant means that if (G, χG) = (H, χH ) then the cell selector chooses the same color, i.e., sel(G, χG) = sel(H, χH ). The performance of an individualization-refinement algorithm can drastically depend on the cell selection strategy (see, e.g., [113, 118]).
Let G be a graph with an initial coloring χG : V (G) → C. Let sel be a cell selector and ref ref,sel a refinement operator. With this, a backtracking tree T [G, χG] is defined as follows. The root of the tree is labeled with the empty sequence ε. Letv ¯ = (v1, . . . , v`) be a node of the search tree. Let χ := ref(G, χG, v¯) be the coloring computed by the refinement operator after individualizing the vertices from the current sequence and let c := sel(G, χ) be the color selected by the cell selector. If c =⊥ thenv ¯ is a leaf of the search tree and the coloring χ is discrete. −1 Otherwise, for each w ∈ χ (c), there is a child node labeled with (v1, . . . , v`, w). The vertices of the search tree are referred to as nodes and they are identified with the sequence of vertices they are labeled with. Together a cell selector and a refinement operator are sufficient to build a correct isomorphism test. Indeed, two graphs are isomorphic if and only if they have isomorphic leaves in their search ∼ ref,sel trees. More precisely, (G, χG) = (H, χH ) if and only if there are leavesv ¯ ∈ V (T [G, χG]) and ref,sel ∼ w¯ ∈ V (T [H, χH ]) such that (G, χG, v¯) = (H, χH , w¯). Since ref(G, χG, v¯) and ref(H, χH , w¯) ∼ are discrete colorings it is easy to check whether (G, χG, v¯) = (H, χH , w¯). Actually, for an ∼ ref,sel isomorphism ϕ:(G, χG) = (H, χH ), ifv ¯ ∈ V (T [G, χG]) is a leaf then ϕ(¯v) is a leaf of ref,sel ∼ T [H, χH ]. Moreover, ϕ is the unique isomorphism from (G, χG, v¯) = (H, χH , ϕ(¯v)). This way, one can also compute the automorphism group of (G, χG) comparing all pairs of leaves of ref,sel T [G, χG]. Finally, the I/R paradigm can also be used to build a canonization algorithm. Towards this end, the only thing that is required is a graph invariant inv that maps colored graphs (G, χG) to a totally ordered set X such that for all discretely colored graphs (G, χG) and (H, χH ) it holds ∼ (G, χG) = (H, χH ) if and only if inv(G, χG) = inv(H, χH ). A complete invariant for discretely colored graphs is easy to obtain using the total order on the vertices coming from the vertex 2.3. INDIVIDUALIZATION-REFINEMENT 23 colors (see, e.g., Example 2.1.6).
2.3.2 Pruning with Invariants While this basic framework already works quite well for a number of graphs there are two further ingredients that are crucial for the efficiency of practical individualization-refinement algorithms. These are the use of node invariants and the exploitation of automorphisms. Let X be a totally ordered set. A node invariant is an isomorphism-invariant function inv taking a colored graph ` (G, χG) and a sequencev ¯ = (v1, . . . , v`) ∈ V (G) to an element inv(G, χG, v¯) ∈ X such that for ` ` all vertex sequencesv ¯ ∈ V (G) , all colored graphs (H, χH ) and allw ¯ ∈ V (H)
(i) if inv(G, χG, (v1, . . . , v`)) < inv(H, χH , (w1, . . . , w`)) then it also holds for all v ∈ V (G), w ∈ V (H) that inv(G, χG, (v1, . . . , v`, v)) < inv(H, χH , (w1, . . . , w`, w)), and ∼ (ii) if ref(G, χG, v¯) is discrete and inv(G, χG, v¯) = inv(H, χH , w¯) then (G, χG, v¯) = (H, χH , w¯). Note that, due to the set X being totally ordered, isomorphism-invariant means the function inv ∼ is a graph invariant, i.e., (G, χG, v¯) = (H, χH , w¯) implies inv(G, χG, v¯) = inv(H, χH , w¯). Now let inv be a node invariant and define
ref,sel ref,sel I := { v¯ ∈ V (T [G, χG]) | @ w¯ ∈ V (T [G, χG]): |v¯| = |w¯| ∧ (inv(G, χG, w¯) < inv(G, χG, v¯)) }.
ref,sel ref,sel Finally, define the search tree Tinv [G, χG] = (T [G, χG])[I] as the subtree induced by the ref,sel node set I. Observe that Property (i) implies that Tinv [G, χG] is indeed a tree. By using the invariant the algorithm thus may cut off the parts of the search tree that do not have nodes that are minimal among all nodes on their level. However, due to isomorphism invariance, the property that two graphs are isomorphic if and only if they have isomorphic leaves remains. The use of a node invariant also makes it easy to define a canonization as it defines a complete invariant for discretely colored graphs. Thus, for a graph canonization, the algorithm may just pick the leaf with the minimal node invariant. When an individualization-refinement algorithm is used as an isomorphism test on two input graphs (G, χG) and (H, χH ), the invariant can be used across both graphs and only the minimum among all nodes is maintained. Specifically, in this case one can define
ref,sel I := {v¯ ∈ V (T [G, χG]) | ref,sel @ w¯ ∈ V (T [G, χG]): |v¯| = |w¯| ∧ (inv(G, χG, w¯) < inv(G, χG, v¯)) ∧ ref,sel @ w¯ ∈ V (T [H, χH ]): |v¯| = |w¯| ∧ (inv(H, χH , w¯) < inv(G, χG, v¯))}.
ref,sel This gives a pair of search trees Tinv [G, χG, H, χH ] for which each tree may be significantly smaller than when the algorithm is executed separately. Still, the graphs are non-isomorphic if the search trees are structurally different which can be detected in various different ways. A concrete method that is often used is to compare whether the smallest leaf of each tree corresponds to the same discretely colored graph.
2.3.3 Pruning with Automorphisms The second essential ingredient required for the practicality of I/R algorithms is the exploitation ref,sel ∼ of automorphisms. Letv, ¯ w¯ ∈ V (T [G, χG]) such that ϕ:(G, χG, v¯) = (G, χG, w¯). Then the ref,sel isomorphism extends to the subtrees of T [G, χG] rooted atv ¯ andw ¯ in the natural way. In 24 CHAPTER 2. ISOMORPHISM AND COMBINATORIAL ALGORITHMS particular, the set of all elements inv(¯v0) wherev ¯0 ranges over all leaves in the subtree rooted at v¯ equals the set of all inv(w ¯0) wherew ¯0 ranges over all leaves in the subtree rooted atw ¯ (see, e.g., [113] for details). Hence, for the purpose of isomorphism testing, automorphism group compu- tation and computing a graph canonization, it suffices to traverse only one of the subtrees. This means automorphisms that are detected by an I/R-algorithm can be used to cut off further parts of the search tree. An efficient strategy for the detection of automorphisms is an essential part of individualization-refinement algorithms. For example, a very simple strategy is to maintain a list of visited leaves and, whenever the next leaf is visited, compare this leaf with all the leaves in the list trying a find automorphisms. Besides this simple strategy, practical implementations of the I/R-paradigm often utilize additional heuristics for automorphism detection. Since this thesis is only concerned with providing lower bounds on the complexity of I/R algorithms we do not require any specific implementation details on automorphism detection. Instead, we rely on the following generic lower bound on the running time of such algorithms. Proposition 2.3.1. The running time of an individualization-refinement algorithm with cell selector sel, refinement operator ref and node invariant inv on a graph G is bounded from below ref,sel by |Tinv [G]|/| Aut(G)|. The argument for the correctness of this proposition is simply that the individualization-re- ref,sel finement algorithm touches during its execution on G for each nodev ¯ ∈ Tinv [G] at least one node equivalent tov ¯ under the automorphism group Aut(G) (see [112]). Chapter 3
Upper Bounds on the Weisfeiler-Leman Dimension
The Weisfeiler-Leman algorithm is one of most fundamental subroutines in the context of the Graph Isomorphism Problem and, over the past decades, its power has been intensively studied in various contexts. In the area of graph isomorphism testing, one of the most prominent examples is Babai’s quasipolynomial time algorithm [11] which employs the algorithm for dimension k = O(log n). Moreover, more recently, the Weisfeiler-Leman algorithm has also found applications in areas such as machine learning (see, e.g., [139, 120]). By a famous result of Cai, F¨urerand Immerman [31] it is well-known that, for any fixed dimension, the Weisfeiler-Leman algorithm itself fails to decide the Graph Isomorphism Problem. However, for several restricted classes of graphs it has been shown the k-dimensional Weisfeiler- Leman algorithm serves as a complete isomorphism test. In the following we prove such a statement for graphs of bounded tree-width and, more generally, graphs of bounded rank-width. For graphs of tree-width at most k such a statement was first proved by Grohe and Mari˜no [66] providing an upper bound of k + 2 on the Weisfeiler-Leman dimension of graphs of tree- width at most k. In this chapter, we improve on this result giving an upper bound of k on the Weisfeiler-Leman dimension of graphs of tree-width at most k. On the other hand, for graphs of bounded rank-width, it was previously unknown whether the Weisfeiler-Leman algorithm serves as a complete isomorphism test. Indeed, in [74], Grohe and Schweitzer explicitly ask whether the Weisfeiler-Leman dimension of graphs of rank-width k, k ∈ N, is bounded by some function in k.
3.1 Tree-Width
For analyzing the Weisfeiler-Leman dimension of the class of graphs of tree-width at most k recall the basic definitions around tree decompositions presented in Subsection 2.1.2. For the most basic case it is well-known that the class of forests (i.e., graphs of tree-width at most 1) has Weisfeiler-Leman dimension 1.
Theorem 3.1.1 (see [82]). The Color Refinement algorithm identifies all forests.
For the class of graphs of tree-width at most k, k ≥ 2, the first bound on the Weisfeiler-Leman dimension was obtained by Grohe and Mari˜no[66] giving an upper bound of k + 2. The proof is essentially based on providing a winning strategy for Spoiler in (k + 3)-bijective pebble game
25 26 CHAPTER 3. UPPER BOUNDS ON THE WL DIMENSION
BPk+3(G, H) where G and H are non-isomorphic graphs such that tw(G) ≤ k. For his strategy Spoiler basically plays along a tree decomposition of the graph G in a top-down fashion always moving into a part of the tree decomposition that is non-isomorphic from the corresponding part in the second graph. An important ingredient of this strategy is Spoiler’s ability to recognize separators of the graph meaning that Spoiler has a winning strategy if the pebbled vertices in the first graph form a separator whereas the same does not hold in the second graph. Actually, we shall require the following slightly stronger property. k+1 Let G be a graph. For a (k + 1)-tuple (v1, . . . , vk, vk+1) ∈ V (G) we define
sG(v1, . . . , vk, vk+1) := |C| where C is the unique component of G − {v1, . . . , vk} such that vk+1 ∈ C (if vk+1 ∈ {v1, . . . , vk} then sG(v1, . . . , vk, vk+1) := 0).
Definition 3.1.2. Let k, ` ∈ N such that ` ≥ k. The `-dimensional Weisfeiler-Leman algorithm is aware of k-separators if, for all graphs G, H, Spoiler wins the game BP`+1(G, H) from initial position ((v1, . . . , vk+1), (w1, . . . , wk+1)) for all vertices v1, . . . , vk+1 ∈ V (G) and w1, . . . , wk+1 ∈ V (H) such that sG(v1, . . . , vk, vk+1) 6= sH (w1, . . . , wk, wk+1). The terminology chosen for this definition relates to Corollary 2.2.7 stating that the `- dimensional Weisfeiler-Leman algorithm is equivalent in its power to the (` + 1)-bijective pebble game. Now, our approach is first to prove that, assuming the `-dimensional Weisfeiler-Leman al- gorithm is aware of k-separators, it identifies every graph of tree-width at most k (assuming ` ≥ k). This is achieved by providing a winning strategy for Spoiler in the game BP`+1(G, H) where G and H are non-isomorphic graphs such that tw(G) ≤ k. Afterwards, we argue that the `-dimensional Weisfeiler-Leman algorithm is aware of k-separators gradually improving on the value of ` by considering more and more complex strategies for Spoiler. For the first step we require the following characterization of tree-width. Let G be a graph. For a k-element separator S ⊆ V (G) and C a connected component of G − S, we define the graph G(S, C) to be the graph induced by S ∪ C together with the complete set of edges in S.
Lemma 3.1.3 (Arnborg et al. [2]). Suppose G(S, C) has at least k + 2 vertices. Then G(S, C) has tree-width at most k if and only if there exists v ∈ C such that for every connected component A of G[C \{v}] there is a k-element separator SA ⊆ S ∪ {v} such that
1. no vertex in A is adjacent to the unique element from S \ SA, and
2. G(SA,A) has tree-width at most k.
Suppose G(S, C) has tree-width at most k. In this case let DG(S, C) denote the set of possible vertices v ∈ C that satisfy the lemma above.
Theorem 3.1.4. Suppose k ≥ 2 and let ` ≥ k such that the `-dimensional Weisfeiler-Leman algorithm is aware of k-separators. Let G be a graph of tree-width at most k. Then the `- dimensional Weisfeiler-Leman algorithm identifies G.
Proof. Let G be a connected graph of tree-width k and suppose H is a second connected graph such that G =6∼ H. Let (T, β) be a tree decomposition of the graph G of width k. For a separator S ⊆ V (G) and an integer m ∈ N we define
CG(S, m) := {C ⊆ V (G) | C is a connected component of G − S of size m}. 3.1. TREE-WIDTH 27
Moreover [ G(S, m) := G[S ∪ C].
C∈CG(S,m)
An ordered separator is a tuplea ¯ = (a1, . . . , ak) such that the underlying set {a1, . . . , ak} is a separator. In this proof, slightly abusing notation, we do not distinguish between ordered separators and the underlying unordered separator. For two ordered separatorsa ¯ ∈ V (G)k and ¯b ∈ V (H)k we define m(¯a, ¯b) to be the minimal number m ≥ 1 such that (G(¯a, m), a¯) =6∼ (H(¯b, m), ¯b). We now argue that Spoiler wins the game BP`+1(G, H). Suppose the game is in a position (¯a, ¯b) ∈ V (G)k ×V (H)k wherea ¯ ⊇ γ(s)∩γ(t) for an edge st ∈ E(T ). We shall prove by induction on m := m(¯a, ¯b) that Spoiler wins the game from position (¯a, ¯b). In each case Spoiler wishes to play another pebble. Let f : V (G) → V (H) be the bijection chosen by Duplicator. Since the `-dimensional Weisfeiler-Leman algorithm is aware of k-separators it can be assumed that f ¯ maps the vertex set of G(¯a, m) to the vertex set of H(b, m). Now let C ∈ CG(¯a, m) such that
0 0 ∼ |{C ∈ CG(¯a, m) | (G(¯a, C ), a¯) = (G(¯a, C), a¯)}| 0 ¯ ¯ 0 ¯ ∼ > |{C ∈ CH (b, m) | (H(b, C ), b) = (G(¯a, C), a¯)}|.
Also let 0 0 0 ∼ D := {v ∈ DG(¯a, C ) | C ∈ CG(¯a, m) ∧ (G(¯a, C ), a¯) = (G(¯a, C), a¯)}
Then there exists some v ∈ D such that the following holds. Let CG ∈ CG(¯a, m) such that ¯ v ∈ CG and CH ∈ CH (b, m) such that f(v) ∈ CH . Then ∼ ¯ ¯ (G[CG ∪ a¯], a,¯ v) =6 (G[CH ∪ b], b, f(v)). (3.1)
Now Spoiler places pebbles on (v, w) where w = f(v). For the base case of the induction suppose m = 1. This means CG = {v} and CH = {w} and thus, Spoiler wins immediately. So assume m > 1. Let A1,...,A` ⊆ CG be the connected components of G[CG \{v}]. Note that |Ai| ≤ m − 1 for every i ∈ [`]. Also let B1,...,B`0 ⊆ CH be the connected components of H[CH \{w}]. Because of Equation (3.1) there is some A ∈ {A1,...,A`} such that ∼ |{i ∈ [`] | G[Ai ∪ a¯ ∪ {v}], a,¯ v) = G[A ∪ a¯ ∪ {v}], a,¯ v)}| 0 ¯ ¯ ∼ > |{i ∈ [` ] | H[Bi ∪ b ∪ {w}], b, w) = G[A ∪ a¯ ∪ {v}], a,¯ v)}|.
0 We pick such a set A ∈ {A1,...,A`} with minimal cardinality (i.e. there is no set A ∈ {A1,...,A`} satisfying the above property which is strictly smaller than A). Now supposea ¯ = ¯ (a1, . . . , ak) and b = (b1, . . . , bk). Pick i ∈ [k] such that no vertex in A is adjacent to ai (cf. Lemma 0 3.1.3). Now Spoiler removes the pair of pebbles (ai, bi). Leta ¯ = (a1, . . . , ai−1, ai+1, . . . , ak, v) ¯0 0 ¯0 and b = (b1, . . . , bi−1, bi+1, . . . , bk, w). Observe that (¯a , b ) is the current position of the game. 0 0 0 Now let m := |A| < m. Note that A ∈ CG(¯a , m ). Claim 1. (G(¯a0, m0), a¯0) =6∼ (H(¯b0, m0), ¯b0). Proof. To prove the claim it suffices to argue that |A| > |B| where
0 0 0 0 0 0 ∼ 0 0 A := {A ∈ CG(¯a , m ) | (G(¯a ,A ), a¯ ) = (G(¯a ,A), a¯ )} and 0 ¯0 0 ¯0 0 ¯0 ∼ 0 0 B := {B ∈ CH (b , m ) | (H(b ,B ), b ) = (G(¯a ,A), a¯ )}. 28 CHAPTER 3. UPPER BOUNDS ON THE WL DIMENSION
0 0 0 00 0 0 0 0 Let A = {A ∈ A | A ⊆ CG} and A = A\A . Similarly define B = {B ∈ B | B ⊆ CH } and B00 = B\B0. By the definition of the set A it follows that |A0| > |B0|. Now define 0 [ 00 [ G := Ga¯ ∪ {v} ∪ CG(¯a, m ) ∪ Ai 00 0 0 m ≤m i∈[`]: |Ai| k+1 Proof. Let G, H be two graphs,v ¯ = (v1, . . . , vk+1) ∈ V (G) andw ¯ = (w1, . . . , wk+1) ∈ k+1 V (H) such that sG(¯v) 6= sH (w ¯). Without loss of generality assume sG(¯v) > sH (w ¯). Let CG be the connected component of G − {v1, . . . , vk} such that vk+1 ∈ CG and similarly, let CH be the connected component of H − {w1, . . . , wk} such that wk+1 ∈ CH . To prove the lemma it needs to be argued that Spoiler wins the game BPk+3(G, H) from position (¯v, w¯). First, Spoiler plays an additional pair of pebbles. Let f : V (G) → V (H) be the bijection chosen by Duplicator. Since |CG| > |CH | there is some v ∈ CG such that w := f(v) ∈/ CH . Spoiler places pebbles on (v, w). Then there is a walk from vk+1 to v in G − {v1, . . . , vk}, but there is no walk from wk+1 to w in H − {w1, . . . , wk}. So Spoiler wins from the current position by Theorem 2.2.10 and Example 2.2.8. Corollary 3.1.6 (Grohe and Mari˜no[66]). The (k+2)-dimensional Weisfeiler-Leman algorithm identifies every graph of tree-width at most k. Proof. This follows from Lemma 3.1.5 and Theorem 3.1.4. The strategy presented in the last lemma is extremely simple and indeed, from the logical point of view, it does not even require the use of counting quantifiers (cf. Example 2.2.8 and Theorem 2.2.10). Using the ability of the 2-dimensional Weisfeiler-Leman algorithm to count the number of walks of a certain length between two given vertices, the bound on ` can be slightly improved. A very similar argument is also used in [90] in order to show that the 2-dimensional Weisfeiler-Leman algorithm can detect cut vertices1. 1A vertex v ∈ V (G) is a cut vertex if the singleton set {v} is a separator of G. 3.1. TREE-WIDTH 29 Lemma 3.1.7. For k ≥ 1, the (k + 1)-dimensional Weisfeiler-Leman algorithm is aware of k-separators. Proof. The basic approach is very similar to the proof of Lemma 3.1.5. Let G, H be two graphs, k+1 k+1 v¯ = (v1, . . . , vk+1) ∈ V (G) andw ¯ = (w1, . . . , wk+1) ∈ V (H) such that sG(¯v) 6= sH (w ¯). Without loss of generality assume sG(¯v) > sH (w ¯). Let CG be the connected component of G − {v1, . . . , vk} such that vk+1 ∈ CG and similarly, let CH be the connected component of H − {w1, . . . , wk} such that wk+1 ∈ CH . It needs to be proved that Spoiler wins the game BPk+2(G, H) from position (¯v, w¯). First, Spoiler plays an additional pair of pebbles. Let f : V (G) → V (H) be the bijection chosen by Duplicator. Since |CG| > |CH | there is some vk+2 ∈ CG such that wk+2 := f(v) ∈/ CH . Spoiler places pebbles on (vk+2, wk+2). 0 0 0 Now let G = G − {v1, . . . , vk−1} and H = H − {w1, . . . , wk−1}. For v, w ∈ V (G ) define G0 0 W` (v, w) to be the number of walks of length ` from v to w in G (see Example 2.2.8 for the definition of a walk). By Theorem 2.2.10 and Example 2.2.9 it suffices to prove that there are G0 H0 i, j ∈ {k, k + 1, k + 2} and ` ≤ |V (G)| such that W` (vi, vj) 6= W` (wi, wj). Towards this end, G0 H0 suppose that W` (vk, vi) = W` (wk, wi) for all i ∈ {k + 1, k + 2} and ` ≤ |V (G)|. Since every 0 walk from wk+1 to wk+2 in the graph H has to pass wk it holds that `−1 H0 X H0 H0 W` (wk+1, wk+2) = Wi (wk+1, wk) · W`−i(wk, wk+2). i=1 Note that each walk from wk+1 to wk+2 uniquely decomposes into two parts splitting the walk 0 at vertex wk since the end-vertex of a walk may not appear multiple times. However, in G there is at least one walk from vk+1 to vk+2 that does not pass through vk. Let d be its length. Then d−1 G0 X G0 H0 H0 Wd (vk+1, vk+2) ≥ 1 + Wi (vk+1, vk) · W`−i(vk, vk+2) > Wd (wk+1, wk+2). i=1 Corollary 3.1.8. The (k + 1)-dimensional Weisfeiler-Leman algorithm identifies every graph of tree-width at most k. The strategy given in the last lemma is still fairly simple. Indeed, it is possible to further improve on the bound on `, although at the price of a much more complicated strategy. Proposition 3.1.9 (Kiefer, N. [89]). For k ≥ 2, the k-dimensional Weisfeiler-Leman algorithm is aware of k-separators. Since the proof of this proposition is very complicated and lengthy I omit it in this thesis and rather only refer to the original work [89]. Corollary 3.1.10. The k-dimensional Weisfeiler-Leman algorithm identifies every graph of tree- width at most k. From an algorithmic point of view this result implies that the Graph Isomorphism Problem for graphs of tree-width at most k can be solved in time O(nk+1 log n) by Theorem 2.2.1. This slightly improves on the running time of the first polynomial time isomorphism test for graphs of bounded tree-width due to Bodlaender [27]. Moreover, the Graph Canonization Problem for graphs of tree-width k can be solved in time O(nk+3 log n) by Theorem 2.2.5. However, from the algorithmic perspective, both algorithms are far from optimal and both problems are fixed-parameter tractable where the parameter is the tree-width of the input graphs [104]. Ac- tually, in Chapter 7, we present an algorithm solving the Graph Isomorphism Problem in time 2k polylog(k) poly(n) where k denotes the tree-width of the input graphs. 30 CHAPTER 3. UPPER BOUNDS ON THE WL DIMENSION 3.2 Rank-Width Over the past decades, it has been proved for many graph classes that their Weisfeiler-Leman dimension is finite. Besides graph classes of bounded tree-width, this also includes for example planar graphs [61], graph classes of bounded genus [62] and, more generally, graph classes that exclude a fixed graph as a minor [63] (more details are given in Section 3.3). However, most graph classes for which such results are known contain only sparse graphs. In contrast, an important collection of graph classes also containing dense graphs are classes of bounded rank-width (in particular, every complete graph has rank-width 1). The graph parameter rank-width was first defined by Oum and Seymour [130] in connection with graphs of bounded clique-width, another graph parameter which is closely related to rank-width. The first polynomial-time algorithm for the Graph Isomorphism Problem for graph classes of bounded rank-width was given by Grohe and Schweitzer [73]. However, their algorithm is rather complicated using both group-theoretic techniques and advanced results from structural graph theory [74]. In this thesis, we show that the Weisfeiler-Leman dimension of graphs of rank-width k is at most 3k + 4 which results in a simple polynomial-time isomorphism test which, maybe surprisingly, is also significantly faster than the algorithm from [73]. The results given in this section can also be found in [68]. 3.2.1 Definition and Properties We start by defining the graph parameters rank-width and also clique-width and stating some basic properties about them including their relation to tree-width. Rank-width is another graph invariant first introduced by Oum and Seymour [130] which, similar to tree-width, measures the width of a certain style of hierarchical decomposition of a graph. Intuitively, the aim is to repeatedly split the vertex set of the graph along cuts of low complexity. For rank-width, the complexity of a cut is measured in terms of the rank of the matrix capturing the adjacencies between the two sides of the cut over the 2-element field F2. n×n Let G be a graph. Let AG ∈ F2 denote the adjacency matrix of G. For X,Y ⊆ V (G) X×Y also define the submatrix AG[X,Y ] ∈ F2 where AG[X,Y ]x,y := 1 if and only if xy ∈ E(G). For X ⊆ V (G) the complexity of cutting the graph along X can now be measured by ρG(X) := rk2(AG[X, X]) where X := V (G) \ X and rk2(A) denotes the F2-rank of a matrix A. Definition 3.2.1 (Rank Decomposition and Rank-Width). A rank decomposition of G is a tuple (T, γ) consisting of a binary directed tree T and a mapping γ : V (T ) → 2V (G) such that (R.1) γ(r) = V (G) where r is the root of T , (R.2) γ(t) = γ(s1) ∪ γ(s2) and γ(s1) ∩ γ(s2) = ∅ for all internal nodes t ∈ V (T ) with children s1 and s2, and (R.3) |γ(t)| = 1 for all t ∈ L(T ), where L(T ) denotes the set of leaves of the tree T . Instead of giving γ, one can equivalently also specify a bijection f : L(T ) → V (G). Observe that this completely specifies γ by Condition (R.2). The width of a rank decomposition (T, γ) is width(T, γ) := max{ρG(γ(t)) | t ∈ V (T )}. The rank-width of a graph G is rw(G) := min{width(T, γ) | (T, γ) is a rank decomposition of G}. 3.2. RANK-WIDTH 31 Another graph invariant that is closely related to rank-width is clique-width [41]. It is also a measure of a graph’s structural complexity, but unlike rank-width, it considers the complexity of an algebraic expression defining the graph. For a natural number k ∈ N a k-graph is a pair (G, lab) where G is a graph and lab: V (G) → [k] is a labeling of the vertices. In order to define the clique-width of a graph consider the following four operations for k-graphs: 1. for i ∈ [k] let ·i denote an isolated vertex with label i, 0 0 2. for i, j ∈ [k] with i 6= j define the k-graph ηi,j(G, lab) := (G , lab) where V (G ) := V (G) and E(G0) := E(G) ∪ {vw | lab(v) = i ∧ lab(w) = j}, 0 3. for i, j ∈ [k] define ρi→j(G, lab) := (G, lab ) where ( j if lab(v) = i lab0(v) := , lab(v) otherwise and 4. for two k-graphs (G, lab) and (G0, lab0) define (G, lab) ⊕ (G0, lab0) to be the disjoint union of the two k-graphs. A k-expression t is a well-formed expression in these symbols and defines a k-graph (G, lab). In this case t is a k-expression for G. The clique-width of a graph G, denoted by cw(G), is the minimum k ∈ N such that there is a k-expression for G. Although rank-width and clique-width seem to be quite different measures of a graphs struc- tural complexity they are actually closely related to each other. Indeed, both parameters are bounded in terms of the other one. Theorem 3.2.2 (Oum, Seymour [130]). For every graph G it holds that rw(G) ≤ cw(G) ≤ 2rw(G)+1 − 1. Also, there is the following connection to tree-width showing that (up to an additive constant of one) rank-width is a more general graph measure than tree-width. Theorem 3.2.3 (Oum [129]). For every graph G it holds that rw(G) ≤ tw(G) + 1. Note that the tree-width of a graph can not be bounded in terms of its rank-width. For example, the complete graph on n vertices Kn has rank-width rw(Kn) = 1 and tree-width tw(Kn) = n − 1. Also, it has clique-width cw(Kn) = 2 (observe that the operation ηi,j can only be performed for distinct i, j ∈ [k]). Similar to tree-width, the measure rank-width is important algorithmically since many NP- hard problems can be solved in polynomial time for graph classes of bounded rank-width (or equivalently, graph classes of bounded clique-width). For example, this includes all problems definable in monadic second order. Actually, all these problems can even be solved in linear time [40]. Further algorithmic results in this direction can also be found in [49]. Considering the Graph Isomorphism Problem there is also a polynomial time algorithm for graph classes of bounded rank-width. Proposition 3.2.4 (Grohe, Schweitzer [73]). For every fixed k ∈ N the Graph Isomorphism Problem for graphs of rank-width at most k can be solved in polynomial time. 32 CHAPTER 3. UPPER BOUNDS ON THE WL DIMENSION However, while a polynomial time algorithm for every fixed k, the exponent of the polynomial depends on k in a non-elementary fashion. This is of course quite unsatisfactory. On top of this, the algorithm by Grohe and Schweitzer is very complicated using both group theoretic techniques (which are considered in Chapter 5 of this thesis) and advanced results on graph decompositions and tangles [74]. In this thesis, we improve on this result by showing that the `-dimensional Weisfeiler-Leman algorithm identifies every graph of rank-width at most k for ` = O(k). This immediately results in a simple isomorphism test for graphs of rank-width k running in time nO(k). Moreover, this also results in a polynomial-time canonization algorithm for graphs of bounded rank-width by Theorem 2.2.5. 3.2.2 Split Pairs and Flip Functions Let G be a graph of rank-width k. On an abstract level, the approach to bound the Weisfeiler- Leman dimension of graphs of bounded rank-width is similar to the corresponding argument for graphs of bounded tree-width (see Corollary 3.1.10). For a set X ⊆ V (G) such that ρG(X) ≤ k Spoiler’s goal is to pebble a small set of vertices that splits off the set X. Then, playing along a rank decomposition, Spoiler continues to reduce the size of the relevant set X until eventually it is sufficiently small for Spoiler to win the game. For tree-width, there is a natural way to split the graph into multiple independent parts by pebbling separators of the graph (i.e., the adhesion sets of a tree decomposition). However, for graphs of bounded rank-width, it is far less clear how the graph can be split in a meaningful way. In particular, since there may be many edges going from X to X, one cannot simply remove a few vertices in order to separate X from X. In order to solve this problem, we introduce the notion of split pairs and flip functions. Intuitively, the split pairs take the role of a separator compared to graphs of bounded tree-width whereas the flip functions are used to make independent parts visible when pebbling a split pair. Let G be a graph and X ⊆ V (G). Two vertices v, w ∈ V (G) are X-equivalent, denoted v ∼X w, if they have the same neighbors in X, i.e., N(v) ∩ X = N(w) ∩ X. Also, for v ∈ V (G), X define the vector vecX (v) := (av,w)w∈X ∈ F2 where av,w = 1 if and only if vw ∈ E(G). Observe that v ∼X w if and only if vecX (v) = vecX (w). Moreover, for S ⊆ V (G) let vecX (S) := {vecX (v) | v ∈ S}. In order to analyze the split of the vertex set into X and X we are typically interested in the vectors from the sets vecX (X) and vecX (X). Observe that these sets of vectors correspond to the rows and columns of the matrix AG[X, X]. Observation 3.2.5. Let X ⊆ Y ⊆ V (G) and suppose T ⊆ S ⊆ V (G) such that vecX (S) is linearly independent. Then vecY (T ) is linearly independent. n n For a set of vectors S ⊆ F2 we denote by hSi the linear space spanned by S. A set B ⊆ F2 is a linear basis for hSi if B is linearly independent and hBi = hSi. Definition 3.2.6 (Split Pairs). Let G be a graph and X ⊆ V (G). A pair (A, B) is a split pair for X if 1. A ⊆ X and B ⊆ X, 2. vecX (A) forms a linear basis for hvecX (X)i, and 3. vecX (B) forms a linear basis for hvecX (X)i. Note that |A| = ρG(X) and |B| = ρG(X) since vecX (X) is exactly the set of rows of AG[X, X] and vecX (X) is the set of columns of this matrix. Also observe that if (A, B) is a split pair for 3.2. RANK-WIDTH 33 X then (B,A) is a split pair for X. As a special case the pair (∅, ∅) is defined to be a split pair for X = V (G). ¯ An ordered split pair for X is a pair (¯a, b) = ((a1, . . . , aq), (b1, . . . , bp)) such that the corre- sponding sets form a split pair, i.e., ({a1, . . . , aq}, {b1, . . . , bp}) is a split pair for X. Lemma 3.2.7. Let G be a graph, X ⊆ V (G) and suppose (A, B) is a split pair for X. Then v ∼X w ⇔ v ∼A w for all v, w ∈ X. Similarly v ∼X w ⇔ v ∼B w for all v, w ∈ X. Proof. Let v, w ∈ X. The forward direction of the equivalence directly follows from the fact that A ⊆ X. So suppose that v ∼A w and assume A = {a1, . . . , aq}. Then, for all i ∈ [q], it holds that vai ∈ E(G) if and only if wai ∈ E(G). Thus (vecX (ai))v = (vecX (ai))w , that is, the v-entry of the vector vecX (ai) coincides with its w-entry. Since vecX (A) forms a linear basis for hvecX (X)i, it follows that 0 0 (vecX (v ))v = (vecX (v ))w 0 for all v ∈ X. But this means N(v) ∩ X = N(w) ∩ X and therefore, v ∼X w. The second statement is proved analogously. We shall use the last lemma in the following way. In order to split a graph G along a given set X Spoiler may play pebbles on all vertices of an ordered split pair (¯a, ¯b) for X. Now consider the coloring obtained from applying the Color Refinement algorithm to the graph G ¯ 1 ¯ after individualizing all vertices froma ¯ and b, i.e., the coloring χWL[G, a,¯ b]. Corollary 3.2.8. Let G be a graph, X ⊆ V (G) and suppose (¯a, ¯b) is an ordered split pair for 1 ¯ 1 ¯ X. Then v ∼X w for all v, w ∈ X with χWL[G, a,¯ b](v) = χWL[G, a,¯ b](w). Similarly, v ∼X w for 1 ¯ 1 ¯ all v, w ∈ X with χWL[G, a,¯ b](v) = χWL[G, a,¯ b](w). 1 ¯ The main observation is that the colored graph (G, χWL[G, a,¯ b]) consists of multiple “inde- pendent parts” each of which is either completely contained in X or completely contained in X. In order to make these parts visible we consider the concept of a flip function. Definition 3.2.9 (Flip Functions and Flipped Graphs). Let G = (V, E, χ) be a vertex-colored graph where χ: V → C and C is some finite set of colors. A flip function for G is a mapping f : C × C → {0, 1} such that f(c, c0) = f(c0, c) for all c, c0 ∈ C. Moreover, for a graph G = (V, E, χ) and a flip function f define the flipped graph Gf := (V,Ef , χ) where Ef := {vw | vw ∈ E ∧ f(χ(v), χ(w)) = 0} ∪ {vw | v 6= w ∧ vw∈ / E ∧ f(χ(v), χ(w)) = 1}. For a colored graph G and a flip function f we denote by Comp(G, f) ⊆ 2V (G) the set of vertex sets of the connected components of Gf . Observe that Comp(G, f) partitions the set V (G). The next lemma forms the first of two main building blocks for describing a winning strategy for Spoiler in a recursive fashion along a given rank decomposition of a graph G. 34 CHAPTER 3. UPPER BOUNDS ON THE WL DIMENSION X X 0 v w P P w v0 Q Q . . Figure 3.1: Visualization of the sets P , P , Q and Q from the proof of Lemma 3.2.10. Lemma 3.2.10. Let G be a graph, X ⊆ V (G) and suppose (¯a, ¯b) is an ordered split pair for X. 0 1 ¯ Then there is a flip function f for the colored graph G := (G, χWL[G, a,¯ b]) such that for every C ∈ Comp(G0, f) it holds that C ⊆ X or C ⊆ X. ∗ 1 ¯ 0 Proof. Let χ := χWL[G, a,¯ b]. The flip function f is defined in such a way that f(c, c ) = 1 if there are v ∈ X and w ∈ X such that vw ∈ E(G) and {χ∗(v), χ∗(w)} = {c, c0}. We need to argue that there are no v ∈ X and w ∈ X such that vw is an edge in the flipped graph Gf . Suppose towards a contradiction this statement does not hold, that is, there are v ∈ X and w ∈ X such that vw ∈ E(Gf ). Let c := χ∗(v) and c0 := χ∗(w). Then vw∈ / E(G) (if vw ∈ E(G) then f(c, c0) = 1 and thus, vw∈ / E(Gf )) and therefore, f(c, c0) = 1. This means there are v0 ∈ X and w0 ∈ X such that v0w0 ∈ E(G) and {χ∗(v0), χ∗(w0)} = {c, c0}. We distinguish two cases. ∗ 0 ∗ 0 0 0 0 Case χ (v ) = c and χ (w ) = c : Then v ∼X v and w ∼X w by Corollary 3.2.8 which implies vw ∈ E(G) ⇔ vw0 ∈ E(G) ⇔ v0w0 ∈ E(G). This is a contradiction. Case χ∗(v0) = c0 and χ∗(w0) = c: Let P = (χ∗)−1(c)∩X, P = (χ∗)−1(c)∩X, Q = (χ∗)−1(c0)∩X and Q = (χ∗)−1(c0) ∩ X. So v ∈ P , v0 ∈ Q, w ∈ Q and w0 ∈ P (see Figure 3.1). Claim 1. Let y ∈ P and z ∈ Q. Then yz∈ / E(G). Proof. We have v ∼X y and w ∼X z by Corollary 3.2.8. Hence, vw ∈ E(G) ⇔ vz ∈ E(G) ⇔ yz ∈ E(G). y Claim 2. Let y ∈ Q and z ∈ P . Then yz ∈ E(G). 0 0 Proof. We have v ∼X y and w ∼X z by Corollary 3.2.8. Hence, 0 0 0 v w ∈ E(G) ⇔ v z ∈ E(G) ⇔ yz ∈ E(G). y 3.2. RANK-WIDTH 35 Now |N(v) ∩ Q| = |N(v) ∩ (Q ∪ Q)| = |N(w0) ∩ (Q ∪ Q)| ≥ |Q| by Claim 1 and 2. This implies Q ⊆ N(v) and in particular, v ∈ N(v0). Also, P ⊆ N(v0) by Claim 2. Thus |N(v0)∩(P ∪P )| ≥ |P |+1. Since χ∗(v0) = χ∗(w) = c0 we conclude that |N(w)∩(P ∪P )| ≥ |P | + 1. But on the other hand, |N(w) ∩ (P ∪ P )| = |N(w) ∩ P | ≤ |P | by Claim 1. This is again a contradiction. In order to be able to treat the connected components of the flipped graph independently we also need to argue that applying a flip function to two graphs neither changes the isomorphism problem nor the outcome of the Weisfeiler-Leman algorithm. Lemma 3.2.11. Let (G, χG), (H, χH ) be two colored graphs and let f be a flip function for G and H. Also let ϕ: V (G) → V (H) be a bijection. Then ϕ: G =∼ H if and only if ϕ: Gf =∼ Hf . Proof. Trivial. Lemma 3.2.12. Let G = (VG,EG, χG),H = (VH ,EH , χH ) be two colored graphs and let f be a ` ` flip function for G and H. Also let (¯v, w¯) ∈ VG × VH be a position in the k-bijective pebble game BPk(G, H) for ` ≤ k. Then Spoiler wins from the position (¯v, w¯) in the game BPk(G, H) if and f f only if Spoiler wins from (¯v, w¯) in BPk(G ,H ). Proof. A position (¯v, w¯) in the pebble game BPk(G, H) is a winning position for Spoiler if and f f only if it is a winning position for Spoiler in the game BPk(G ,H ). Recall that two colorings χ, χ0 : V → C are equivalent, denoted χ ≡ χ0, if the partition induced by the color classes is the same for both colorings. Corollary 3.2.13. Let G = (V, E, χ) be a colored graph and let f be a flip function for G. Then k k f χWL[G] ≡ χWL[G ]. Proof. This follows from Theorem 2.2.6 and Lemma 3.2.12. k Forv ¯ = (v1, . . . , vk) ∈ V and C ⊆ V we define the tuplev ¯ ∩ C = (vi)i∈I where I = ` {i ∈ [k] | vi ∈ C}. Also, for a second tuplew ¯ = (w1, . . . , w`) ∈ V , we writev ¯ ⊆ w¯ if {v1, . . . , vk} ⊆ {w1, . . . , w`}. Corollary 3.2.14. Let G = (VG,EG, χG), H = (VH ,EH , χH ) be two colored graphs and let f k k be a flip function for G and H. Let v¯ ∈ VG and w¯ ∈ VH . Let CG be a connected component f of G such that χG(x) 6= χG(y) for all x ∈ CG, y ∈ VG \ CG, and similarly let CH a connected f component of H such that χH (x) 6= χH (y) for all x ∈ CH , y ∈ VH \ CH . Suppose that 1 ∼ 1 (G[CG], χWL[G, v¯]) =6 (H[CH ], χWL[H, w¯]) 0 (where both colorings are restricted to the vertex set of the induced subgraphs). Let v¯ =v ¯ ∩ CG 0 and w¯ =w ¯ ∩ CH . Then 1 0 ∼ 1 0 (G[CG], χWL[G, v¯ ]) =6 (H[CH ], χWL[H, w¯ ]) or (G, v¯) 6'1 (H, w¯). 36 CHAPTER 3. UPPER BOUNDS ON THE WL DIMENSION Proof. Supposev ¯ = (v1, . . . , vk) andw ¯ = (w1, . . . , wk). Let IG = {i ∈ [k] | vi ∈ CG} and f f IH = {i ∈ [k] | wi ∈ CH }. Also assume (G, v¯) '1 (H, w¯). Then (G , v¯) '1 (H , w¯) by Lemma 3.2.12 and Corollary 2.2.7 and thus, IG = IH . Now suppose 1 0 ∼ 1 0 ϕ:(G[CG], χWL[G, v¯ ]) = (H[CH ], χWL[H, w¯ ]). Since IG = IH it follows that ∼ ϕ:(G[CG], v¯) = (H[CH ], w¯) and, by Lemma 3.2.11, f ∼ f ϕ:(G [CG], v¯) = (H [CH ], w¯). Now a simple inductive argument implies f 1 ∼ f 1 ϕ:(G [CG], χWL[G, v¯]) = (H [CH ], χWL[H, w¯]) since, in each iteration, the Color Refinement algorithm only takes colors of neighbors into account. Applying Lemma 3.2.11 once again gives the desired statement. 3.2.3 A Recursive Strategy for Spoiler Recall that the goal of this section is to prove the Weisfeiler-Leman dimension of graphs of rank- width at most k is bounded by a function of k. More precisely, given two non-isomorphic graphs G and H, where G has rank-width at most k, we give a winning strategy for Spoiler in the game BP`(G, H) for ` = 3k + 5. Spoiler’s strategy in the game is to play along a rank decomposition (T, γ) for the graph G. At a specific node t ∈ V (T ) of the rank decomposition, Spoiler plays an ordered split pair (¯a, ¯b) for the set γ(t) and identifies a component C (with respect to a suitable flip function) that is different from the corresponding component (specified by the bijection played by Duplicator) in the second graph. In order to distinguish these components, Spoiler continues to play along the rank decomposition in a recursive fashion going down the tree. The main problem that remains to be solved to realize this strategy is to ensure that Spoiler can remove the pebbles from an ordered split pair of t once Spoiler has pebbled ordered split pairs of the children of t. This problem is already partly solved by Corollary 3.2.14 stating that pebbles outside the components we are interested in can be removed without a problem. In order to ensure that no other pebbles need to be removed we introduce the notion of nice (triples of) split pairs. Recall that for sets X,X1,X2 we write X = X1 ] X2 to denote that X is the disjoint union of X1 and X2, that is, X = X1 ∪ X2 and X1 ∩ X2 = ∅. Definition 3.2.15 (Nice Triples of Split Pairs). Let G be a graph and X,X1,X2 ⊆ V (G) such that X = X1 ]X2. Let (A, B) be a split pair of X and let (Ai,Bi) be split pairs for Xi, i ∈ {1, 2}. The triple of split pairs (A, B), (A1,B1) and (A2,B2) is nice if (N.1) A ∩ Xi ⊆ Ai, (N.2) Bi ∩ X ⊆ B, and (N.3) Bi ∩ X3−i ⊆ A3−i for both i ∈ {1, 2}. Naturally, a triple of ordered split pairs is nice if the underlying unordered triple of split pairs is nice. 3.2. RANK-WIDTH 37 X X X1 A B X2 Figure 3.2: Visualization for the proof of Lemma 3.2.16. Lemma 3.2.16. Let G be a graph and X,X1,X2 ⊆ V (G) such that X = X1 ] X2. Let (A, B) be a split pair of X. Then there are split pairs (Ai,Bi) for Xi, i ∈ {1, 2}, such that the triple (A, B), (A1,B1) and (A2,B2) is nice. For the proof recall the definition of split pairs (see Definition 3.2.6). A visualization of the situation is also given in Figure 3.2. Proof. First define the sets A for both i ∈ {1, 2}. Since X ⊆ X the set vec (A ∩ X ) is linearly i i Xi i independent by Observation 3.2.5. Hence, there is a set A ∩ X ⊆ A ⊆ X such that vec (A ) i i i Xi i is a linear basis for vec (X ). So Property (N.1) is satisfied. Xi i It remains to define the sets Bi for both i ∈ {1, 2}. Without loss of generality consider the case i = 1. The set vec (A ) spans every element in the set vec (X ) ⊆ X1∪X . Hence, X2 2 X2 2 F2 X1 vecX1 (A2) spans every element in the set vecX1 (X2) ⊆ F2 . X1∪X2 Moreover, the set vecX (B) spans every element in the set vecX (X) ⊆ F2 . So vecX1 (B) X1 spans every element in the set vecX1 (X) ⊆ F2 . X1 Together this means vecX1 (B ∪A2) spans every element in the set vecX1 (X1) ⊆ F2 . So there exists a set B1 ⊆ B ∪ A2 such that vecX1 (B1) is linearly independent and it spans every element in the set vecX1 (X1). In particular, Properties (N.2) and (N.3) are satisfied for i = 1. Finally, we shall also need the following simple observation. Observation 3.2.17. Let G, H be two non-isomorphic graphs and let σ : V (G) → V (H) be any bijection. Then there is some v ∈ V (G) such that G[A] =6∼ G[B] where A is the vertex-set of the connected component of G such that v ∈ A and B is the vertex-set of the connected component of H such that σ(v) ∈ B. Theorem 3.2.18. The (3k + 4)-dimensional Weisfeiler-Leman algorithm identifies every graph of rank-width at most k. Proof. Let G = (VG,EG, χG), H = (VH ,EH , χH ) be two colored graphs such that rw(G) ≤ k and G =6∼ H. Also let (T, γ) be a rank decomposition of G of width k. We prove that Spoiler wins the bijective `-pebble game played over graphs G and H where ` = 3k + 5. Together with Corollary 2.2.7 this implies the statement of the theorem. To be 38 CHAPTER 3. UPPER BOUNDS ON THE WL DIMENSION more precise, it is first argued that Spoiler has a winning strategy that requires ` = 6k + 5 many pebbles. Afterwards it is explained how to realize this strategy using only 3k + 5 many pebbles making use of properties of nice split pairs. Let t ∈ V (T ) be a node of the rank decomposition. A tuple (¯a, ¯b) is called an ordered split pair for t if (¯a, ¯b) is an ordered split pair for γ(t). We describe Spoiler’s winning strategy in an inductive fashion. Throughout the play we assume that Spoiler preserves the following invariant at positions ((¯a, ¯b, v), (¯a0, ¯b0, v0)): (I.1) There is a node t ∈ V (T ) such that (¯a, ¯b) is an ordered split pair for t. (I.2) v ∈ γ(t). (I.3) Let f be the flip function obtained from Lemma 3.2.10 with respect to X = γ(t). Let C ∈ 1 ¯ 0 1 0 ¯0 Comp((G, χWL[G, a,¯ b]), f) such that v ∈ C. Similarly let C ∈ Comp((H, χWL[H, a¯ , b ]), f) such that v0 ∈ C0. Then 1 ¯ ∼ 0 1 0 ¯0 (G[C], χWL[G, a,¯ b]) =6 (H[C ], χWL[H, a¯ , b ]) (where, as before, the colorings are restricted to the vertex set of the respective graph). Note that initially it is easy for Spoiler to reach such a position for the root node r of T . Indeed, (∅, ∅) is a split pair for γ(r) = V (T ) and the flip function f obtained from Lemma 3.2.10 always evaluates to zero (i.e., no edges are flipped). So Spoiler simply may choose a pair of vertices (v, v0) satisfying Property (I.3) using Observation 3.2.17. Also observe that in a position as described above the number of pebbles is at most 2k + 1. We now prove by induction on |γ(t)| that Spoiler wins from such a position. In the base step |γ(t)| = 1. This means |C| = 1 and Spoiler easily wins using two additional pebbles. Indeed, if |C0| = 1 then Spoiler wins immediately. Otherwise |C0| > 1. In this case Spoiler wins (using two additional pebbles) since the sets C and C0 can be recognized by the Color Refinement algorithm because one of the vertices in each set is individualized (cf. Corollary 3.2.13). For the inductive step assume |γ(t)| > 1 and let t1, t2 be the children of t in the rooted tree T . Let X = γ(t), X1 = γ(t1) and X2 = γ(t2). Note that X = X1 ] X2. By Lemma 3.2.16 ¯ ¯ ¯ ¯ there are ordered split pairs (¯ai, bi) for Xi such that the triple (¯a, b), (¯a1, b1) and (¯a2, b2) is nice. On an intuitive level, the central advantage of pebbling nice triples of ordered split pairs is that, ¯ ¯ for i ∈ {1, 2}, Spoiler can remove the pebbles (¯a, b) and (¯a3−i, b3−i) without unpebbling some element from Xi. Also let fi be the flip function obtained from Lemma 3.2.10 with respect to ¯ the ordered split pair (¯ai, bi) and the set Xi. ¯ ¯ 0 ¯0 0 ¯0 Now Spoiler plays pebbles on (¯a1, b1, a¯2, b2) and let (¯a1, b1, a¯2, b2) be Duplicator’s answer. ¯ ¯ ¯ 0 0 ¯0 0 ¯0 0 ¯0 0 Letα ¯ = (¯a, b, a¯1, b1, a¯2, b2, v) andα ¯ = (¯a , b , a¯1, b1, a¯2, b2, v ) be lists of all vertices pebbled at this point in the two graphs G and H. In the next step, Spoiler wishes to play another pebble. Let σ : V (G) → V (H) be the bijection chosen by Duplicator. Without loss of generality we can assume that (a) σ(¯α) =α ¯0, and (b) σ(C) = C0 (if (a) is violated Spoiler can win immediately, if (b) is violated Spoiler also wins easily using two additional pebbles since the Color Refinement algorithm recognizes C and C0). Additionally, without loss of generality suppose that v ∈ X1 (otherwise we swap the roles of X1 and X2). 0 1 0 0 1 0 0 ∼ 0 Let G = (G[C], χWL[G, α¯]) and H = (H[C ], χWL[H, α¯ ]). Observe that G =6 H and σ 0 0 0 induces a bijection from V (G ) = C to V (H ) = C . First consider the flip function f1. By 3.2. RANK-WIDTH 39 Observation 3.2.17 and Lemma 3.2.11 there is some w ∈ C such that G0[M] =6∼ H0[M 0] where 0 0 0 0 M ∈ Comp(G , f1) such that w ∈ M and M ∈ Comp(H , f1) such that σ(w) ∈ M . Observe 0 that, formally, it is not possible to apply the flip function f1 to the graph G since the colorings do 1 1 ¯ not match. However, χWL[G, α¯] χWL[G, a¯1, b1] and thus, the flip function f1 naturally translates to a flip function for G0 resulting in the same flipped graph. Note that M ⊆ C ⊆ X. Also note that M ⊆ X1 or M ∩ X1 = ∅ by Lemma 3.2.10. Case 1: M ⊆ X1. 1 ¯ Let C1 ∈ Comp((G, χWL[G, a¯1, b1]), f1) be the unique set such that M = C1 ∩ C. Simi- 0 1 0 ¯0 0 0 0 larly let C1 ∈ Comp((H, χWL[H, a¯1, b1]), f1) such that M = C1 ∩ C be the corresponding 1 1 0 component in the second graph H. Note that χWL[G, α¯](u) 6= χWL[G, α¯](u ) for all u ∈ C and u0 ∈ V (G) \ C. This is clear for the graph Gf since v ∈ C and C forms a connected component in Gf and thus, it also holds for G by Corollary 3.2.13. Hence, it follows that 1 ∼ 0 1 0 (G[C1], χWL[G, α¯]) =6 (H[C1], χWL[H, α¯ ]). 0 0 Now Spoiler plays the next pebble as follows: if v ∈ C1 and v ∈ C1 then he plays z = v and z0 = v0, otherwise Spoiler plays z = w and z0 = σ(w). Clearly, 1 ∼ 0 1 0 0 (G[C1], χWL[G, α,¯ z]) =6 (H[C1], χWL[H, α¯ , z ]). 1 ¯ f1 Now consider again the flip function f1. In (G, χWL[G, a¯1, b1]) the set C1 forms a connected 1 0 ¯0 f1 0 component and similarly, in (H, χWL[H, a¯1, b1]) the set C1 forms a connected component. By Corollary 3.2.14 Spoiler can remove every pebble occupying vertices outside C1 (resp. 0 C1) while maintaining the fact that the corresponding subgraphs are non-isomorphic. Also, there is clearly no need to pebble any vertex multiple times. More formally, since α ∩ C1 ⊆ ¯ (¯a1, b1, z), it holds that 1 ¯ ∼ 0 1 0 ¯0 0 (G[C1], χWL[G, a¯1, b1, z]) =6 (H[C1], χWL[H, a¯1, b1, z ]) or Spoiler wins the game using two additional pebbles by Corollary 3.2.14. Hence, Spoiler’s ¯ ¯ 0 ¯0 0 ¯0 0 next move is to remove all pebbles (¯a, b, a¯2, b2, v) and (¯a , b , a¯2, b2, v ). But now the invariant holds for the node t1, i.e., (I.1), (I.2) and (I.3) are satisfied for the node t1. Hence, by the induction hypothesis, Spoiler wins the game from the current position. Case 2: M ∩ X1 = ∅, i.e., M ⊆ X2. This case is slightly more complicated since the set M is defined with respect to the flip function f1, but is contained in the set X2 which is split from the rest of the graph using the flip function f2. In this case Spoiler first plays the next pair of pebbles on the vertices w and w0 = σ(w). 1 1 0 0 Note that χWL[G, α,¯ w](u) 6= χWL[G, α,¯ w](u ) for all u ∈ M and u ∈ V (G) \ M using Corollary 3.2.13. Now Spoiler plays another pebble. Let σ0 : V (G) → V (H) be the bijection chosen by Duplicator. Without loss of generality suppose that (a) σ0(¯α) =α ¯0, (b) σ0(w) = w0, (c) σ0(M) = M 0 40 CHAPTER 3. UPPER BOUNDS ON THE WL DIMENSION (if one of the conditions is violated Spoiler wins the game using similar arguments as 00 1 00 0 1 0 0 before). Let G = (G[M], χWL[G, α,¯ w]) and H = (H[M ], χWL[G, α¯ , w ]). Observe that G00 =6∼ H00 and σ0 induces a bijection from V (G00) = M to V (H00) = M 0. Consider the flip function f2. By Observation 3.2.17 and Lemma 3.2.11 there is some z ∈ M such that 00 ∼ 00 0 00 0 00 G [N] =6 H [N ] where N ∈ Comp(G , f2) such that z ∈ N and N ∈ Comp(H , f2) such that σ0(z) ∈ N 0. 1 ¯ Observe that N ⊆ M ⊆ X2. Let C2 ∈ Comp((G, χWL[G, a¯2, b2]), f2) such that N = C2 ∩ M 0 1 0 ¯0 0 0 0 and let C2 ∈ Comp((H, χWL[H, a¯2, b2]), f2) such that N = C2 ∩ M . Then 1 ∼ 0 1 0 0 (G[C2], χWL[G, α,¯ w]) =6 (H[C2], χWL[H, α¯ , w ]). 0 0 Now Spoiler plays the next pebble as follows: if w ∈ C2 and w ∈ C2 then he plays x = w and x0 = w0, otherwise Spoiler plays x = z and x0 = σ0(z). Clearly, 1 ∼ 0 1 0 0 0 (G[C2], χWL[G, α,¯ w, x]) =6 (H[C2], χWL[H, α¯ , w , x ]). Now the argument is similar to the previous case. 1 ¯ f2 Consider again the flip function f2. In (G, χWL[G, a¯2, b2]) the set C2 forms a connected 1 0 ¯0 f2 0 component and similarly, in (H, χWL[H, a¯2, b2]) the set C2 forms a connected component. By Corollary 3.2.14 Spoiler can remove every pebble occupying vertices outside C2 (resp. 0 C2) while maintaining the fact that the corresponding subgraphs are non-isomorphic. Also, there is clearly no need to pebble any vertex multiple times. More formally, since (α, w) ∩ ¯ C2 ⊆ (¯a2, b2, x) (recall that v ∈ X1 and therefore v∈ / C2) it holds that 1 ¯ ∼ 0 1 0 ¯0 0 (G[C2], χWL[G, a¯2, b2, x]) =6 (H[C2], χWL[H, a¯2, b2, x ]) or Spoiler wins the game using two additional pebbles by Corollary 3.2.14. Hence, Spoiler’s ¯ ¯ 0 ¯0 0 ¯0 0 0 next move is to remove all pebbles (¯a, b, a¯1, b1, v, w) and (¯a , b , a¯1, b1, v , w ). But now the invariant holds for the node t2, i.e., (I.1), (I.2) and (I.3) are satisfied for the node t2. Hence, by the induction hypothesis, Spoiler wins the game from the current position. Overall, by the induction principle, this results in a winning strategy for Spoiler in the pebble game played over the graphs G and H. It remains to analyze the number of pebbles required to implement this strategy. Looking at Spoiler’s strategy, it is not difficult to see that it requires at most 6k + 5 many pebbles. More precisely, Spoiler needs 6k pebbles to pebble the three ordered ¯ ¯ split pairs (¯a, b) and (¯ai, bi) for i ∈ {1, 2}. The base step requires three additional pebbles. In the inductive step, five additional pebbles suffice, three for pebbling v, w and x and two pebbles to simulate the Color Refinement algorithm in case the bijections chosen by Duplicator do not match up. However, taking a closer look, some vertices are always pebbled multiple times due to the nice ordered split pairs. More precisely, from Condition (N.1) it follows thata ¯ ⊆ a¯1 ∪ a¯2. Also, ¯ ¯ ¯ Conditions (N.2) and (N.3) imply that b1 ∪ b2 ⊆ b ∪ a¯1 ∪ a¯2. So overall, the split pairs pebble at most 3k different vertices. Since there is no need to pebble any vertex multiple times, the strategy described above can actually be implemented using only 3k + 5 many pebbles. The next two corollaries state the main algorithmic consequences of Theorem 3.2.18 for the isomorphism and canonization problem for graphs of bounded rank-width. Corollary 3.2.19. The Graph Isomorphism Problem for graphs of rank-width at most k can be solved in time O(n3k+5 log n). 3.3. FURTHER RESULTS 41 Proof. This follows from Theorem 2.2.1 and 3.2.18. Corollary 3.2.20. There is an algorithm canonizing graphs of rank-width at most k in time O(n3k+7 log n). Proof. This follows from Theorem 2.2.5 and 3.2.18. Finally, observe the same results also hold for graphs of clique-width at most k by Theorem 3.2.2. 3.3 Further Results Recall that the goal of this chapter is to identify graph classes whose Weisfeiler-Leman dimension is finite. Up to this point we have proved this for all graph classes of bounded tree-width and, more generally, for all graph classes of bounded rank-width. I finish this chapter by stating further important results in this direction. One of most well-studied graph classes is the class of planar graphs. A graph G is planar if it can be drawn in the plane without edge crossings. While the first proof that the Weisfeiler- Leman dimension of planar graphs is finite was given by Grohe [61] yielding an upper bound of 14 [134], only recently, Kiefer, Ponomarenko and Schweitzer presented an improved upper bound that is almost optimal. Proposition 3.3.1 (Kiefer,Ponomarenko,Schweitzer[90]). The 3-dimensional Weisfeiler-Leman algorithm identifies every planar graph. Indeed, it is well known that the Color Refinement algorithm fails to identify every planar graph which means that the Weisfeiler-Leman dimension of the class of planar graphs is either two or three. The last result is generalized by Grohe and Kiefer to graphs of bounded genus. The genus of a graph G is the smallest number g such that G is embeddable on a surface of Euler genus at most g. Note that planar graphs have Euler genus 0. Proposition 3.3.2 (Grohe, Kiefer [65]). The (4g + 3)-dimensional Weisfeiler-Leman algorithm identifies every graph of Euler genus at most g. Let G be a graph. A graph H is a minor of G is H can be obtained from G by deleting vertices, deleting edges, and contradicting edges. A graph class C excludes H as a minor if no graph G ∈ C has H as a minor. A graph class is closed under minors if for every G ∈ C and every minor H of G it holds that H ∈ C. Observe that every graph class that is closed under minors and does not contain every graph excludes some fixed graph as a minor. For example, this includes planar graphs and more generally graph classes of bounded genus. Moreover, this also includes every graph class of bounded tree-width. Proposition 3.3.3 (Grohe [63]). Let C be a graph class that excludes a fixed graph as a minor. Then there exists some k ∈ N such that the k-dimensional Weisfeiler-Leman algorithm identifies every graph G ∈ C. In particular, every graph class that excludes a fixed graph as a minor admits a polynomial time graph isomorphism test. The last proposition provides a large collection of graph classes that have finite Weisfeiler-Leman dimension. However, every graph class that excludes a fixed graph as a minor only contains graphs that have a linear number of edges. 42 CHAPTER 3. UPPER BOUNDS ON THE WL DIMENSION A collection of dense classes that have finite Weisfeiler-Leman dimension are classes of bounded rank-width (see Theorem 3.2.18). Besides this result, there are only few examples of graph classes also containing dense graphs that have finite Weisfeiler-Leman dimension. One such example are interval graphs, intersection graphs of intervals on the real line. More formally, a graph G is an interval graph if for every v ∈ V (G) there is an interval Iv = [i, j] := {i, . . . , j} ⊆ N such that vw ∈ E(G) if and only if Iv ∩ Iw 6= ∅. Proposition 3.3.4 (Evdokimov, Ponomarenko,Tinhofer[51, 50]). The 2-dimensional Weisfeiler- Leman algorithm identifies every interval graph. Chapter 4 Lower Bounds In the previous chapter several positive results have been presented bounding the Weisfeiler- Leman dimension of certain graph classes. In this chapter, these results are complemented by analyzing the limits of purely combinatorial approaches to the Graph Isomorphism Problem. Towards this end, we first analyze the Weisfeiler-Leman algorithm and give lower bounds on the Weisfeiler-Leman dimension for graphs of bounded tree-width and rank-width that are only a small constant factor away from the upper bounds presented before. Both lower bounds are based on a well-known construction of Cai, F¨urerand Immerman [31] showing the Weisfeiler-Leman algorithm fails to decide the Graph Isomorphism Problem unless its dimension is linear in the number of vertices of the input graphs. Moreover, we also present exponential lower bounds on the worst-case complexity of algorithms within the individualization-refinement framework. 4.1 Weisfeiler-Leman Algorithm We start by presenting lower bounds on the Weisfeiler-Leman dimension. More precisely, we continue the analysis of the Weisfeiler-Leman dimension of graphs bounded tree-width and rank- width which started in the previous chapter by presenting meaningful upper bounds. Towards this end, we start by reviewing the celebrated result of Cai, F¨urerand Immerman stating the k-dimensional Weisfeiler-Leman algorithm fails to decide isomorphism of n-vertex graphs unless k = Ω(n). Theorem 4.1.1 (Cai, F¨urer,Immerman [31]). For every natural number k ≥ 1 there are non- isomorphic 3-regular graphs Gk and Hk such that |V (Gk)| = |V (Hk)| = O(k) and Gk 'k Hk. Since the tree-width of every n-vertex graph is upper bounded by n this result already implies that the Weisfeiler-Leman dimension of the class of graphs of tree-width at most k is in Ω(k). Although this result is optimal up to a constant factor it does not give us a meaningful bound on the constants involved. Indeed, the tree-width of a graph often is much smaller than the number of vertices of a graph which indicates more meaningful bounds might be possible. Towards this end, we first review the Cai-F¨urer-Immermanconstruction which forms the basis of the Theorem 4.1.1. The Cai-F¨urer-Immerman Gadget. For a non-empty finite set S we define the CFI gadget XS to be the following graph. For each w ∈ S there are vertices a(w) and b(w) and for every A ⊆ S with |A| even there is a vertex mA. For every A ⊆ S with |A| even there are edges 43 44 CHAPTER 4. LOWER BOUNDS b(3) m∅ a(3) m{2,3} b(1) m{1,3} a(1) m{1,2} b(2) a(2) Figure 4.1: Cai-F¨urer-Immermangadget X3 {a(w), mA} ∈ E(XS) for all w ∈ A and {b(w), mA} ∈ E(XS) for all w ∈ S \ A. As an example the graph X3 := X[3] is depicted in Figure 4.1. The graph is colored so that {a(w), b(w)} forms a color class for each w and so that {mA | A ⊆ S and |A| even} forms a color class. Let X ⊆ S and γ ∈ Aut(XS). We say that γ swaps exactly the pairs of X if γ(a(w)) = b(w) for w ∈ X and γ(a(w)) = a(w) for w ∈ S \ X. Lemma 4.1.2 ([31]). Let X ⊆ S. Then there is an automorphism γ ∈ Aut(XS) swapping exactly the pairs of X if and only if |X| is even. Additionally, if such an automorphism exists, it is unique. The Cai-F¨urer-ImmermanGraphs. Let G be a connected graph of minimum degree two, i.e., deg(v) ≥ 2 for all v ∈ V (G). For T ⊆ E(G) we define the graph CFIT (G) to be the graph obtained from G in the following way. Each v ∈ V (G) is replaced by a gadget XE(v) where E(v) := {(v, w) | vw ∈ E(G)} denotes the set of (directed) edges incident to v. Additionally, the following edges are added between the gadgets. For each vw ∈ E(G) \ T there are edges from a(v, w) to a(w, v) and from b(v, w) to b(w, v). Also, for every vw ∈ T there are edges from a(v, w) to b(w, v) and from b(v, w) to a(w, v). Lemma 4.1.3 ([31]). Let G be a connected graph of minimum degree two and S, T ⊆ E(G). ∼ Then CFIS(G) = CFIT (G) if and only if |S| ≡ |T | mod 2. Hence, applying the above construction to a specific graph G yields a pair of non-isomorphic graphs CFI(G) := CFI∅(G) and ]CFI(G) := CFI{e}(G) for some e ∈ E(G). It is this pair of non-isomorphic graphs which cannot be distinguished by the k-dimensional Weisfeiler-Leman algorithm for a suitable choice of the base graph G. More precisely, in [31] the authors prove that if G has no separator S of size k + 1 such that every component of G − S has at most |V |/2 vertices, then CFI(G) 'k ]CFI(G). The existence of 3-regular graphs of this type with a linear number of vertices follows from the existence of 3-regular expander graphs (see, e.g., [1]). In combination, this proves Theorem 4.1.1. The above already provides a useful sufficient condition for the base graph G in order to fulfill CFI(G) 'k ]CFI(G). However, a more rigorous analysis reveals that it actually suffices for the graph G to have tree-width at least k + 1. 4.1. WEISFEILER-LEMAN ALGORITHM 45 Theorem 4.1.4 (Dawar, Richerby [43]). Let G be a connected graph such that tw(G) ≥ k + 1 and deg(v) ≥ 2 for all v ∈ V (G). Then CFI(G) 'k ]CFI(G). This theorem immediately implies that the Weisfeiler-Leman dimension of the class of graphs k of tree-width at most k is strictly greater than 10 − 1. Indeed, for a graph G of tree-width k and maximum degree d it is easy to see that tw(CFI(G)) ≤ tw(G) · (2d + 2d−1) by replacing every vertex v ∈ V (G) in a tree decomposition of G by the vertices from the gadget XE(v) (see also [43, Lemma 5]). Since there exist connected 3-regular graphs of arbitrary tree-width this implies the stated lower bound. However, this lower bound is still pretty far away from the upper bound derived in the previous chapter. In the following an improved bound is presented by providing a better analysis of the tree-width of CFI(G) for certain base graphs G. Indeed, based on the last theorem, it suffices to find graphs G of tree-width k + 1 such that one can find a good upper bound on the tree-width of CFI(G) and ]CFI(G). A natural and well-known candidate for a graph of tree-width k is the k × k grid. + For k ≥ 2 let Gk,k denote the k × k grid. Moreover let Gk,k be the k × k grid where each edge is subdivided twice. Formally, V (Gk,k) := [k] × [k] and 0 0 0 0 0 0 E(Gk,k) := {(i, j)(i , j ) | (i = i ∧ |j − j | = 1) ∨ (j = j ∧ |i − i | = 1)}. Moreover + V (Gk,k) := V (Gk,k) ∪ {(v, w) | vw ∈ E(Gk,k)} and + E(Gk,k) := {v(v, w) | v ∈ V (Gk,k), vw ∈ E(Gk,k)} ∪ {(v, w)(w, v) | vw ∈ E(Gk,k)}. It is well-known that the tree-width of a k×k grid is tw(Gk,k) = k. Also, using essentially the + same tree decomposition, it holds that tw(Gk,k) = k. However, for the aim of bounding the tree- width of the graphs CFI(Gk,k) and ]CFI(Gk,k) the first step is to construct a tree decomposition + for Gk,k satisfying some additional properties. The main intuition is that each subdivision vertex (i.e. the vertices that are added for subdividing edges) is replaced by two vertices in the CFI- construction. On the other hand, the original vertices of the grid are typically replaced by eight vertices (assuming the vertex has degree 4). Hence, the goal is to find a tree decomposition + of Gk,k where the large bags only contain subdivision vertices. This way, when building a tree decomposition for CFI(G) in the natural way from a tree decomposition for G, the size of the largest bag only increases by a factor of two. + Lemma 4.1.5. Let k ≥ 2. Then there is a tree decomposition (T, β) of Gk,k of width k + 2 such that 1. |β(t) ∩ V (Gk,k)| ≤ 1 for every t ∈ V (T ), and 2. if |β(t)∩V (Gk,k)| = 1 then β(t) = E(v)∪{v} for some v ∈ V (Gk,k) where E(v) = {(v, w) | vw ∈ E(Gk,k)}. In this case t is a leaf of T and β(s)∩V (Gk,k) = ∅ for the unique s ∈ V (T ) with st ∈ E(T ). 46 CHAPTER 4. LOWER BOUNDS j − 1 j j − 1 j j − 1 j i i + 1 Ai,j Bi,j Ci,j Figure 4.2: Visualization of the sets Ai,j, Bi,j and Ci,j constructed in the proof of Lemma 4.1.5. Proof. In order to describe the bags of the tree decomposition several sets Ai,j,Bi,j,Ci,j ⊆ + V (Gk,k) for i, j ∈ [k] are defined first (see also Figure 4.2). Let 0 0 0 Ai,j := {((i , j), (i , j + 1)) | 1 ≤ i ≤ i} ∪ {((i0, j), (i0, j − 1)) | i ≤ i0 ≤ k} ∪ {((i, j), (i + 1, j)), ((i, j), (i − 1, j))}, 0 0 0 Bi,j := {((i , j), (i , j + 1)) | 1 ≤ i ≤ i} ∪ {((i0, j), (i0, j − 1)) | i < i0 ≤ k} ∪ {((i, j), (i + 1, j)), ((i + 1, j), (i, j))} and 0 0 0 Ci,j := {((i , j), (i , j − 1)) | 1 ≤ i ≤ i} ∪ {((i0, j − 1), (i0, j)) | i ≤ i0 ≤ k}. + (Formally, the sets defined above may also contain elements outside of V (Gk,k) if some index is not contained in the set [k]. In this case, the corresponding element is simply not part of the set.) Now define A B C D V (T ) := {ti,j, ti,j, ti,j, ti,j | i, j ∈ [k]} Also set A β(ti,j) := Ai,j, B β(ti,j) := Bi,j, C β(ti,j) := Ci,j, 4.1. WEISFEILER-LEMAN ALGORITHM 47 D β(ti,j) := E(i, j) ∪ {(i, j)}. Observe that each bag contains at most k + 3 many elements. It remains to define the edges of the tree T . The following edges are added to the set E(T ): C C • ti,jti+1,j for all i ∈ [k − 1], j ∈ [k], C A • tk,jt1,j for all j ∈ [k], A B • ti,jti,j for all i, j ∈ [k], A D • ti,jti,j for all i, j ∈ [k], B A • ti,jti+1,j for all i ∈ [k − 1], j ∈ [k], and B C • tk,jt1,j+1 for all j ∈ [k − 1]. + It can be easily verified that (T, β) defines a tree decomposition of Gk,k with the desired prop- erties. Lemma 4.1.6. For k ≥ 2 it holds that tw(CFI(Gk,k)) ≤ 2k + 5 and tw(]CFI(Gk,k)) ≤ 2k + 5. Proof. Fix k ≥ 2 and let (T, β) be the tree decomposition described in Lemma 4.1.5 for the + 0 0 graph Gk,k. Now a tree decomposition (T , β ) for the graphs CFI(Gk,k) and ]CFI(Gk,k) can be 0 obtained as follows. For each t ∈ V (T ) such that β(t) ∩ V (Gk,k) = ∅ it also holds that t ∈ V (T ) and β0(t) = {a(v, w), b(v, w) | (v, w) ∈ β(t)}. 0 Note that |β (t)| = 2 · |β(t)|. Also, for t1t2 ∈ E(T ) where β(ti) ∩ V (Gk,k) = ∅ there is an edge 0 t1t2 ∈ E(T ). Otherwise |β(t) ∩ V (Gk,k)| = 1 and β(t) = E(v) ∪ {v} for some v ∈ V (Gk,k). Also t is a leaf of T and β(s) ∩ V (Gk,k) = ∅ for the unique s ∈ V (T ) with st ∈ E(T ). For every A ⊆ E(v) with 0 even cardinality |A| there is a node tA ∈ V (T ). We define 0 β (tA) = {mA} ∪ {a(v, w), b(v, w) | vw ∈ E(Gk,k)}. Note that |β0(t )| ≤ 9 since deg (v) ≤ 4 for every v ∈ V (G ). Also, there are edges A Gk,k k,k 0 0 0 tAs ∈ E(T ) for every A ⊆ E(v) with even cardinality |A|. It is easy to check that (T , β ) is a tree decomposition for the graphs CFI(Gk,k) and ]CFI(Gk,k). Also, width(T 0, β0) ≤ max{9, 2(width(T, β) + 1)} − 1 ≤ max{9, 2(k + 3)} − 1 = 2k + 5. Theorem 4.1.7. For every k ≥ 2 there are non-isomorphic graphs Gk and Hk of tree-width at most 2k + 7 such that Gk 'k Hk. Proof. Let Gk = CFI(Gk+1,k+1) and Hk = ]CFI(Gk+1,k+1). Then the statement follows from Theorem 4.1.4 and Lemma 4.1.6. In combination with Theorem 3.2.3 this result also implies a similar statement for graphs of bounded rank-width. Corollary 4.1.8. For every k ≥ 2 there are non-isomorphic graphs Gk and Hk of rank-width at most 2k + 8 such that Gk 'k Hk. 48 CHAPTER 4. LOWER BOUNDS Weisfeiler-Leman dimension graph class lower bound upper bound trees 1 1 planar graphs 2 3 k tree-width k 2 − 3 k genus g Ω(g) 4g + 3 excluded minor H Ω(|V (H)|) f(H) interval graphs 2 2 k clique-width k 2 − 6 3k + 4 k rank-width k 2 − 4 3k + 4 Table 4.1: Upper and lower bounds on the Weisfeiler-Leman dimension of certain graph classes. Another graph measure that is briefly mentioned in the previous chapter is clique-width. Since the rank-width of a graph is bounded by its clique-width (see Theorem 3.2.2) all upper bounds stated in the previous chapter for rank-width immediately translate to graphs of bounded clique-width. Of course this raises the question whether a lower bound similar to the ones given above can also be obtained for clique-width. First it can be observed that such a result cannot be obtained directly. Indeed, while the clique-width of a graph is bounded in terms of its tree- width the clique-width may be exponentially larger than the tree-width of a graph [39]. However, an analysis of the above graphs reveals that similar bounds can still be proven with respect to clique-width. Proposition 4.1.9. For k ≥ 2 it holds that cw(CFI(Gk,k)) ≤ 2k + 11 and cw(]CFI(Gk,k)) ≤ 2k + 11. Proof Idea. The strategy is to build t-expressions for the graphs CFI(Gk,k) and CFI(Gk,k) from “left to right” similar to the tree decompositions constructed before. All vertices on the “border” of the current step in the construction get distinct colors assigned whereas all remaining vertices are assigned the same color. The number of vertices on the “border” is at most 2k + 2 + 8 which means that in total 2k + 11 colors suffice. Corollary 4.1.10. For every k ≥ 2 there are non-isomorphic graphs Gk and Hk of clique-width at most 2k + 13 such that Gk 'k Hk. An overview of the upper and lower bounds discussed in this thesis is again given in Table 4.1. Observe that the lower bounds not stated above either directly follow from Theorem 4.1.1 or they follow from the fact the 6-cycle and the disjoint union of two triangles cannot be distinguished by the Color Refinement algorithm. Of course, as a natural open problem, it would be desirable to further close the gaps between upper and lower bounds for any of these graph classes. 4.2 The I/R-Method in Theory Another combinatorial approach to graph isomorphism testing, that works extremely well in practice (see, e.g., [113]), is provided by the individualization-refinement paradigm (see Section 2.3). The goal of this section is to provide a theoretical analysis of this paradigm proving that algorithms in this framework have exponential worst-case complexity. The first analysis of 4.2. THE I/R-METHOD IN THEORY 49 algorithms in this framework was given by Miyzaki [118] in 1995 showing that the then current version of Nauty [112] has exponential worst-case complexity. However, for his analysis, Miyazaki exploited specific implementation details, for example regarding the choice of the cell selector implemented in Nauty. Indeed, as Miyazaki also argues, the graphs constructed to prove his lower bound can be canonized in polynomial time using the I/R framework. Compared to the results of the last section, a main obstacle for proving lower bounds within the I/R paradigm is the possibility of the algorithms to prune the search tree using automor- phisms of the input graphs. Indeed, the Cai-F¨urer-Immermangraphs considered above have an exponential number of automorphisms allowing I/R algorithms to prune large parts of the search tree. As a result, I/R algorithms perform reasonably well on Cai-F¨urer-Immermangraphs (see [113, 118]). In order to circumvent this problem the proofs of this section are based on a construction of Gurevich and Shelah [75] yielding rigid graphs (i.e., graphs without non-trivial automorphisms) with similar properties. The results presented in this section are also given in [124]. 4.2.1 A Framework for a Lower Bound The goal is to make a comprehensive statement about the worst-case complexity of individu- alization-refinement algorithms. In particular, the goal is to provide lower bounds that do not depend on the specific choices of the cell selector, the refinement operator, and the node invariant implemented in an I/R algorithm (see Section 2.3). However, there is an intrinsic limitation here. A complete node invariant that distinguishes any two non-isomorphic graphs would yield a polynomial-size search tree. Similarly, a refinement operator that refines every coloring into the orbit partition under the automorphism group of the graph also yields polynomial-size search trees. However, it is not difficult to show that being able to compute either of these functions is at least as hard as the Graph Isomorphism Problem itself. Of course it is nonsensical to allow that an individualization-refinement algorithm uses a subroutine that already solves the Graph Isomorphism Problem. Thus, it becomes apparent that we need to restrict the power of the operators involved in building an I/R algorithm. One possible way to achieve such a restriction is by using the Weisfeiler-Leman algorithm. From a theoretical perspective, the Weisfeiler-Leman algorithm provides a powerful tool which already identifies many different types of graphs (see Chapter 3), while at the same, it fails to decide the Graph Isomorphism Problem in general (see Theorem 4.1.1). Moreover, from a practical point of view, all operators used in practical implementations (e.g., Nauty/Traces [112, 113], Bliss [85, 86], Conauto [105], etc.) are based on the Weisfeiler- Leman algorithm (actually, in most cases, they are based on the Color Refinement algorithm). Hence, any lower bound on the worst-case complexity of I/R algorithms based on operators using the Weisfeiler-Leman algorithm in particular implies the same bound for all the state-of-the-art tools used in practice. This makes the Weisfeiler-Leman algorithm a natural choice for limiting the power of the cell selection, the refinement operator, and the node invariants implemented in an I/R algorithm. To formalize the restriction on the operators we introduce the notion of k-realizability. Let (G, χG), (H, χH ) be two colored graphs. A cell selector sel is k-realizable if sel(G, χG) = ` ` sel(H, χH ) whenever (G, χG) 'k (H, χH ). Also letv ¯ ∈ V (G) andw ¯ ∈ V (H) . A node in- variant inv is k-realizable if inv(G, χG, v¯) = inv(H, χH , w¯) whenever (G, χG, v¯) 'k (H, χH , w¯). Intuitively this means that whenever the k-dimensional Weisfeiler-Leman algorithm cannot dis- tinguish between the graphs associated with two nodes of the refinement tree then the cell selector and the node invariant have to behave in the same way on both nodes. Finally, a refinement 50 CHAPTER 4. LOWER BOUNDS 1 operator ref is k-realizable if, for all colored graphs (G, χG) and (H, χH ), and all v ∈ V (G), w ∈ V (H), it holds that k k χWL[G, χG](v, . . . , v) = χWL[H, χH ](w, . . . , w) k k ⇒ χWL[G, ref(G, χG)](v, . . . , v) = χWL[H, ref(H, χH )](w, . . . , w). Observe that, in particular, this implies that (G, ref(G, χG)) 'k (H, ref(H, χH )) if (G, χG) 'k (H, χH ). For the remainder of this section we restrict our attention to I/R algorithms implementing k-realizable operators for some fixed number k. Actually, we only use the k-realizability of the operators through the following lemma. Lemma 4.2.1. Suppose k ∈ N and let sel be a k-realizable cell selector, inv a k-realizable node invariant and ref a k-realizable refinement operator. Furthermore, let (G, χG) be a colored graph ref,sel ref,sel and suppose v¯ ∈ V (Tinv [G, χG]). Let m = |v¯|. Then w¯ ∈ V (Tinv [G, χG]) for every w¯ ∈ m V (G) such that (G, χG, v¯) 'k (G, χG, w¯). Proof. Supposev ¯ = (v1, . . . , vm) andw ¯ = (w1, . . . , wm). The statement is proved by induction on m ∈ N. For m = 0 the statement trivially holds sincev ¯ =w ¯ = ε. So suppose m > 0. Let 0 v¯ := (v1, . . . , vm−1) be the tuple obtained fromv ¯ by deleting the last entry and similarly define 0 0 0 0 ref,sel w¯ := (w1, . . . , wm−1). Clearly, (G, χG, v¯ ) 'k (G, χG, w¯ ) andv ¯ ∈ V (Tinv [G, χG]). So by 0 ref,sel 0 induction hypothesis it follows thatw ¯ ∈ V (Tinv [G, χG]). Let χ1 := ref(G, χG, v¯ ) and χ2 := 0 ref(G, χG, w¯ ). Then (G, χ1) 'k (G, χ2) since ref is k-realizable. This implies that sel(G, χ1) = −1 sel(G, χ2) =: i because sel is k-realizable. Note that vm ∈ χ1 (i). Since (G, χG, v¯) 'k (G, χG, w¯) k 0 k 0 it holds that χWL[G, χG, v¯ ](vm, . . . , vm) = χWL[H, χH , w¯ ](wm, . . . , wm). Hence, i = χ1(vm) = ref,sel χ2(wm) because ref is k-realizable. Thus,w ¯ ∈ V (T [G, χG]). Furthermore, inv(G, χG, v¯) = ref,sel inv(G, χG, w¯) which implies thatw ¯ ∈ V (Tinv [G, χG]). In order to provide lower bounds on the worst-case complexity of I/R algorithms using k- ref,sel realizable operators we construct rigid colored graphs (G, χG) whose search tree Tinv [G, χG] is exponentially large. Since the graphs are rigid no automorphism pruning can speed up the algorithm (see Proposition 2.3.1). In the light of the above lemma, to prove that the tree ref,sel ref,sel Tinv [G, χG] is exponentially large, it suffices to find a nodev ¯ ∈ V (Tinv [G, χG]) with an exponential number of equivalent tuples. To argue the existence of these equivalent tuples we roughly proceed in two steps. First, we show that the depth of the search tree is linear, i.e., to obtain a discrete partition one has to individualize a linear number of vertices. This means there ref,sel is a nodev ¯ in the search tree Tinv [G, χG] such that |v¯| is linear in the number of vertices of G. Then, in a second step, we show that if |v¯| is sufficiently large, there are exponentially many equivalent tuples. To find such equivalent tuples we prove a limitation of the effect of the k- dimensional Weisfeiler-Leman algorithm after individualizing the vertices fromv ¯. Intuitively we identify a subgraph containingv ¯ which encapsulates the effect of the Weisfeiler-Leman algorithm. This subgraph has an exponential number of automorphisms which enables us to find the desired number of equivalent tuples. 4.2.2 The Multipede Construction The proof of the lower bound is based on the construction of graphs R(G) defined for a bipartite base graph G. This construction is a combination of the Cai-F¨urer-Immerman construction and 1The definition provided in the original work [124] is not sufficient to prove exponential lower bounds on the size of the search of an I/R algorithm. To resolve this error a slightly stronger notion of k-realizability is required. 4.2. THE I/R-METHOD IN THEORY 51 w1 w2 w3 w4 w5 w6 W G V v1 v2 v3 a(w1) b(w1) a(w2) b(w2) a(w3) b(w3) a(w4) b(w4) a(w5) b(w5) a(w6) b(w6) R(G) Figure 4.3: The figure depicts a base graph G on the top and the corresponding multipede graph R(G) on the bottom. a related construction of Gurevich and Shelah of multipedes [75] yielding rigid structures with similar properties. Let G = (V, W, E) be a bipartite graph where deg(v) ≥ 2 for all v ∈ V . The multipede graph R(G) is defined as follows. Each vertex w ∈ W is replaced by two vertices a(w) and b(w). Also, each v ∈ V is replaced by the CFI gadget XN(v) (see Section 4.1). The middle vertices of the CFI-gadget are denoted by mA(v) for A ⊆ N(v) with |A| even. More formally, V (R(G)) := {a(w), b(w) | w ∈ W } ∪ {mA(v) | v ∈ V,A ⊆ N(v), |A| even} and E(R(G)) := {a(w)mA(v) | w ∈ A} ∪ {b(w)mA(v) | w ∈ N(v) \ A}. 2 S For each w ∈ W denote F (w) := {a(w), b(w)} and for X ⊆ W define F (X) := w∈X F (w). Also, for v ∈ V denote M(v) := {mA(v) | A ⊆ N(v), |A| even} and for Y ⊆ V define M(Y ) := S v∈Y M(v). The vertices of the graph R(G) are colored in such a way that F (w) forms a color class for all w ∈ W and moreover, M(v) forms a color class for all v ∈ V . Formally, the coloring χR(G) : V (R(G)) → C of the graph R(G) can be defined as ( w if u ∈ F (w) χ (u) := . R(G) v if u ∈ M(v) An example of this construction is shown in Figure 4.3. For I ⊆ W we further define the graph RI (G) similar to R(G) but refine the coloring so that for each w ∈ I both {a(w)} and {b(w)} form a color class. Hence, R(G) = R∅(G). Note that the multipede construction is closely related to the Cai-F¨urer-Immerman construc- tion discussed in the previous section. Indeed, let H be an arbitrary graph and let G be the 2This notation is inspired by the work of Gurevich and Shelah [75] where a(w) and b(w) are called the feet of w. 52 CHAPTER 4. LOWER BOUNDS bipartite graph obtained from H by subdividing each edge exactly once (i.e., each edge e ∈ E(H) is replaced by a new vertex that is connected to the two endpoints of the edge e). Then, the only difference between CFI(H) and R(G) is that, in CFI(H) there are two pairs of vertices (a(v, w), b(v, w)) and (a(w, v), b(w, v)) whereas in R(G) there is only one pair (a(vw), b(vw)). However, this does not change any of the relevant properties of the graph CFI(H). In this way, the multipede construction can be seen as a generalization of the Cai-F¨urer-Immerman construction. Recall that, in order for the I/R algorithm to be unable to exploit automorphisms of the input graphs, the graph R(G) is supposed to be rigid. We start by identifying properties of G that correspond to R(G) having few automorphisms. Definition 4.2.2 (Odd Graphs, Gurevich and Shelah [75]). Let G = (V, W, E) be a bipartite graph. We say G is odd if for every ∅= 6 X ⊆ W there exists some v ∈ V such that |N(v) ∩ X| is odd. Lemma 4.2.3. Let G = (V, W, E) be an odd bipartite graph. Then R(G) is rigid. Proof. Let γ ∈ Aut(R(G)) be an automorphism of the graph R(G). Due to the coloring of the vertices, the permutation γ maps every set F (w) for w ∈ W and every set M(v) for v ∈ V to itself. Consider the set X := {w ∈ W | γ(a(w)) = b(w)}. Suppose towards a contradiction that X 6= ∅. Since G is odd there is some v ∈ V such that |N(v) ∩ X| is odd. Then γ restricts to an automorphism of the gadget XN(v) swapping an odd number of the outer pairs. This contradicts the properties of CFI gadgets (cf. Lemma 4.1.2). So X = ∅ and thus γ(a(w)) = a(w) for all w ∈ W . From this it easily follows that γ is the identity mapping (cf. Lemma 4.1.2). Remark 4.2.4. As indicated above, for each Cai-F¨urer-Immermangraph there is a corresponding multipede graph R(G) for a base graph G = (V, W, E) where deg(w) = 2 for every w ∈ W . Observe that such a graph G cannot be odd. Indeed, viewing the set W as the edge set of a graph H defined on vertex set V , the edge set X ⊆ W of an arbitrary cycle provides a witness for G not being odd. Hence, the generalization from Cai-F¨urer-Immermangraphs to multipede graphs in particular allows for the construction of rigid graphs. Actually, in our proof, we consider graphs G such that R(G) is not rigid, but only has few automorphisms. In order to turn R(G) into a rigid graph we individualize a small set of vertices by considering the graph RI (G) for a small set I. It turns out that the number of automorphisms of R(G) and the number of vertices that need to be individualized can be computed from the rank of the adjacency matrix of G. n×n Recall that for a graph G we denote by AG ∈ F2 its adjacency matrix. For a bipartite ∗ V ×W graph G = (V, W, E) let AG := AG[V,W ] ∈ F2 be the submatrix with rows from V and columns from W . Also recall that rk2(A) denotes the F2-rank of a matrix A. Finally, we denote by AT the transpose of a matrix A. |W |−rk (A∗ ) Lemma 4.2.5. Let G be a bipartite graph. Then | Aut(R(G))| = 2 2 G . m×n n Proof. Suppose G = (V, W, E). For a matrix A ∈ F2 denote Sol(A) := {x ∈ F2 | Ax = 0}. To ∗ show the lemma it suffices to argue that | Aut(R(G))| = | Sol(AG)|. W For γ ∈ Aut(R(G)) we define the vector xγ ∈ F2 by setting (xγ )w = 1 if and only if γ(a(w)) = b(w). Observe that the mapping γ 7→ xγ is injective. Furthermore AGxγ = 0 since for each v ∈ V the automorphism γ swaps an even number of neighbors of v. For the backward direction let x ∈ Sol(AG). Then, for each v ∈ V , the set {w ∈ N(v) | xw = 1} has even cardinality. Thus, by the properties of the CFI-gadgets (cf. Lemma 4.1.2), there is 4.2. THE I/R-METHOD IN THEORY 53 a unique automorphism γ ∈ Aut(R(G)) that swaps exactly those pairs (a(w), b(w)) for which xw = 1. In particular, the arguments show a bipartite graph G = (V, W, E) is odd if and only if ∗ rk2(AG) = |W |. Corollary 4.2.6. Let G = (V, W, E) be an odd bipartite graph. Then there is some V 0 ⊆ V with |V 0| ≤ |W | such that the induced subgraph G[V 0 ∪ W ] is odd. Lemma 4.2.7. Let G = (V, W, E) be a bipartite graph. Then there is I ⊆ W with |I| ≤ ∗ I |W | − rk2(AG) such that R (G) is rigid. W W Proof. Let B = {ew ∈ F2 | w ∈ W } be the standard basis for F2 (that is, (ew)u = 1 if and ∗ ∗ only if w = u). Furthermore, for v ∈ V , let (AG)v be the v-th row of AG and let BI ⊆ B be a ∗ T W minimal subset of B such that BI ∪ {((AG)v) | v ∈ V } spans the entire space F2 . Finally, let ∗ I = {w ∈ W | ew ∈ BI }. Clearly, |I| ≤ |W | − rk2(AG). I I W It remains to argue that R (G) is rigid. Let γ ∈ Aut(R (G)) and let xγ ∈ F2 be the vector T obtained by setting (xγ )w = 1 if and only if γ(a(w)) = b(w). Then (ew) xγ = 0 for all w ∈ I. Furthermore (AG)vxγ = 0 for all v ∈ V by the same argument as in the proof of Lemma 4.2.5. T W Since BI ∪{((AG)v) | v ∈ V } spans the entire space F2 it follows by the standard linear algebra arguments that xγ = 0. Thus γ is the identity mapping. 4.2.3 The Weisfeiler-Leman Refinement and Closure Operators Recall that, following the strategy outlined below Lemma 4.2.1, the goal is to find many tuples of verticesv ¯ ∈ V (R(G))m such that individualizing these vertices results in equivalent graphs (with respect to the k-dimensional Weisfeiler-Leman algorithm). Towards this end, we define a closure operator that bounds the effect of the Weisfeiler-Leman refinement after individualizing the vertices fromv ¯. Definition 4.2.8 (d-Closure). Let d ∈ N and let G = (V, W, E) be a bipartite graph. For X ⊆ W define the d-attractor of X as [ attrd(X) = X ∪ N(v). v∈V : |N(v)\X|≤d A set X ⊆ W is d-closed if X = attrd(X). The d-closure of X is the unique minimal superset which is d-closed, that is d \ 0 clG(X) = X . X0⊇X, X0 is d-closed As observed in [75] the 1-closure describes the information the 1-dimensional Weisfeiler-Leman captures. Lemma 4.2.9 (Gurevich, Shelah [75]). Let G = (V, W, E) be a bipartite graph and suppose I ⊆ W . Then 1 I 1 I 1 χWL[R (G)](a(w)) 6= χWL[R (G)](b(w)) ⇔ w ∈ clG(I) for all w ∈ W . 54 CHAPTER 4. LOWER BOUNDS Let G = (V, W, E) be a bipartite graph. Slightly abusing notation, for a set X ⊆ W define N −1(X) := {v ∈ V | N(v) ⊆ X}. For X ⊆ W define R(G)[[X]] := R(G)[F (X) ∪ M(N −1(X))]. The last lemma can be used to show that for a 1-closed set X ⊆ W and a sequence of m verticesx ¯ = (x1, . . . , xm) ∈ F (X) , for every automorphism ϕ ∈ Aut(R(G)[[X]]) it holds that (R(G), x¯) '1 (R(G), ϕ(¯x)). The 1-closure thus gives us a method to find tuples that cannot be distinguished by the 1-dimensional Weisfeiler-Leman algorithm. However, we require such a statement also for higher dimensions. Obtaining a similar statement characterizing the effect of the k-dimensional Weisfeiler-Leman seems to be much more intricate and it is easy to see that the d-closure does not achieve this. However, under some additional assumptions, it still allows us to bound the effect of k-dimensional Weisfeiler-Leman which is sufficient for our purposes. Lemma 4.2.10. Let k, d ∈ N and suppose d ≥ k. Let G = (V, W, E) be a bipartite graph and X = {w1, . . . , wm} ⊆ W be a d-closed set. Furthermore suppose that [ N(v) ∩ N(vi) ≤ d − k i∈[k] for all distinct v, v1, . . . , vk ∈ V . Let x¯ = (x1, . . . , xm) be a sequence of vertices with xi ∈ F (wi) and let ϕ ∈ Aut(R(G)[[X]]). Then (R(G), x¯) 'k (R(G), ϕ(¯x)). Proof. By Corollary 2.2.7 it suffices to prove that Duplicator has a winning strategy in the bijec- tive (k + 1)-pebble game BPk+1((R(G), x¯), (R(G), ϕ(¯x))) played on (R(G), x¯) and (R(G), ϕ(¯x)). Towards this end we say a vertex v ∈ V (respectively w ∈ W ) is pebbled if there exists a ∈ M(v) (respectively a ∈ F (w)) which is pebbled. Furthermore we say that a vertex w ∈ W is fixed if there is some pebbled v ∈ N(w) (note that v ∈ V in this case). For a tuplea ¯ ∈ V (R(G))≤k of length at most k of pebbled vertices let cl(¯a) = F (X) ∪ M(N −1(X)) ∪ {M(v) | v ∈ V : v is pebbled} ∪ {F (w) | w ∈ W : w is pebbled or fixed}. Now during the play Duplicator preserves the following invariant for positions (¯a, ¯b) ∈ V (R(G))`× V (R(G))` where ` ≤ k + 1. (I) There is an isomorphism α: R(G)[cl(¯a)] =∼ R(G)[cl(¯b)] such that α(¯x) = ϕ(¯x) and α(¯a) = ¯b. Observe that α extends ϕ, that is, α(u) = ϕ(u) for all u ∈ V (R(G)[[X]]). Initially, the invariant holds for ` = 0 by choosing α = ϕ. So suppose the current position of the game is (¯a, ¯b) ∈ V (R(G))` × V (R(G))` where ` ≤ k + 1 such that the invariant (I) is satisfied. If Spoiler decides to remove a pair of pebbles the invariant is clearly preserved by simply restricting the mapping α accordingly. So assume Spoiler wishes to play a new pair of pebbles. In this case ` ≤ k. Claim 1. For every unpebbled v ∈ V with N(v) * X there is some w ∈ N(v) \ X which is neither pebbled nor fixed. Proof. Consider the set N(v) \ X. Since X is d-closed it holds that |N(v) \ X| ≥ d + 1. By the assumption of the lemma there are at most d − k elements in N(v) that are fixed. Thus, N(v) \ X contains at least k + 1 elements which are not fixed. Furthermore, there are at most k vertices in N(v) that are pebbled. Thus there is at least one element that is neither pebbled nor fixed. y 4.2. THE I/R-METHOD IN THEORY 55 For each unpebbled v ∈ V with N(v) * X choose a vertex wv ∈ N(v) \ X that is neither pebbled nor fixed. Furthermore let T = {w ∈ W | F (w) ⊆ cl(¯a) ∧ α(a(w)) = b(w)}. For every a ∈ M(V ) define ( A 4 (T ∩ N(v)) if |T ∩ N(v)| even Ba = A 4 (T ∩ N(v)) 4 {wv} otherwise where A ⊆ N(v) is the set with mA(v) = a and 4 denotes the symmetric difference. Now Duplicator plays the bijection α(a) if a ∈ cl(¯a) f : V (R(G)) → V (R(G)): a 7→ a if a ∈ F (W ) \ cl(¯a) mBa (v) if a ∈ M(V ) \ cl(¯a). Let a ∈ V (R(G)) be the vertex chosen by Spoiler. Claim 1. There is an isomorphism α0 : R(G)[cl(¯a, a)] =∼ R(G)[cl(¯b, f(a))] such that α0(¯x) = ϕ(¯x) and α0(¯a, a) = (¯b, f(a)). Proof. For every u ∈ cl(¯a) define α0(u) := α(u). Consider the following distinction into three cases. Case a ∈ cl(¯a): In this case the claim immediately follows from Condition (I) since f(a) = α(a). Case a ∈ F (W ) \ cl(¯a): Let w ∈ W such that a ∈ F (w). Then cl(¯a, a) = cl(¯a) ∪ {a(w), b(w)}. In order to complete the definition of the function α0 set α0(a(w)) = a(w) and α0(b(w)) = b(w). Since F (w)∩cl(¯a) = ∅ it follows that cl(¯a)∩M(v) = ∅ for all v ∈ N(w) (otherwise w would be fixed). Hence, α0 : R(G)[cl(¯a, a)] =∼ R(G)[cl(¯b, f(a))]. The other two conditions are clearly satisfied. Case a ∈ M(V ) \ cl(¯a): Let v ∈ V such that a ∈ M(v). Then [ cl(¯a, a) = cl(¯a) ∪ M(v) ∪ F (w). w∈N(v) In order to complete the definition of α0 set 0 0 (i) α (a(w)) = a(w) and α (b(w)) = b(w) for all w ∈ N(v) such that w 6= wv and F (w) ∩ cl(¯a) = ∅, 0 (ii) α (mA(v)) = mB(v) where ( A 4 (T ∩ N(v)) if |T ∩ N(v)| even B = , A 4 (T ∩ N(v)) 4 {wv} otherwise and 0 0 0 (iii) α (a(wv)) = a(wv) and α (b(w)) = b(w) if |T ∩ N(v)| is even, and α (a(wv)) = b(wv) 0 and α (b(wv)) = a(wv) otherwise. 0 Note that α is defined consistently because wv is neither pebbled nor fixed and therefore 0 ∼ ¯ F (wv)∩cl(¯a) = ∅. It is easy to verify that α : R(G)[cl(¯a, a)] = R(G)[cl(b, f(a))]. The other two conditions are clearly satisfied. y This completes the proof since Duplicator does not loose the game in a position that satisfies (I). 56 CHAPTER 4. LOWER BOUNDS 4.2.4 Meager Graphs Searching for graphs where applying the last lemma gives the desired results we generalize the notion of an `-meager graph from [75]. Definition 4.2.11 (Meager Graphs). Let G = (V, W, E) be a bipartite graph and let 0 < α < 1. The graph G is (`, α)-meager if for every ∅= 6 X ⊆ W with |X| ≤ ` it holds that |N −1(X)| < α|X|. Meager graphs have two properties that are advantageous. The first property is that for sufficiently small X ⊆ W the graph Aut(R(G)[[X]]) has many automorphisms. In combination with Lemma 4.2.10 this translates into finding many equivalent tuples as desired. Lemma 4.2.12. Let G = (V, W, E) be (`, α)-meager and X ⊆ W with |X| ≤ `. Then | Aut(R(G)[[X]])| ≥ 2(1−α)|X| . Proof. By Lemma 4.2.5 for X ⊆ W with |X| ≤ ` we have that −1 | Aut(R(G)[[X]])| ≥ 2|X|−|N (X)| ≥ 2(1−α)|X| . The second property that is advantageous is that in a meager graph the size of the d-closure of a set X is only by a constant factor larger than |X| itself. Lemma 4.2.13. Suppose d ∈ N and dα < 1. Let G = (V, W, E) be (`, α)-meager and suppose d 1 ∅= 6 X ⊆ W with |X| ≤ `(1 − dα) − d + 1. Then | clG(X)| < 1−dα |X|. Proof. Let X0 ( ··· ( Xm be a sequence of sets such that X0 = X and Xi+1 = Xi ∪ N(vi) for some vi ∈ V with |N(vi) \ Xi| ≤ d and such that Xm is d-closed. Clearly, for every i ∈ [m] −1 α α it holds that |N (Xi)| ≥ i. Suppose that m ≥ 1−dα |X| and set j = d 1−dα |X|e. Then α |Xj| ≤ |X| + dj ≤ b`(1 − dα)c − d + 1 + dd 1−dα `(1 − dα)e ≤ b`(1 − dα)c − d + 1 + dd`αe ≤ ` + b−`dαc − d + 1 + d`dαe + d − 1 = `. Hence the meagerness is applicable to Xj. This −1 α means j ≤ |N (Xj)| < α|Xj| ≤ α(|X| + dj) implying j < d 1−dα |X|e. But this contradicts the α 1 definition of j. So m < 1−dα |X| and thus, | clG(X)| = |Xm| ≤ |X| + dm < 1−dα |X|. We now concern ourselves with the existence of meager graphs. However, we require several additional properties. Indeed, in the light of Lemma 4.2.10 certain neighborhoods should be almost disjoint. Also, the graph R(G) is supposed to have only have few automorphisms, which ∗ by Lemma 4.2.7 translates into the matrix AG having large rank. Theorem 4.2.14. There exists r0 ∈ N such that for every r ∈ N with r ≥ r0 and every n 3 sufficiently large n ∈ N there is an 10r , r -meager graph G = (V, W, E) with (I) |V | = |W | = n, (II) deg(v) = r for all v ∈ V , (III) |N(v1) ∩ N(v2)| < 3 for all distinct v1, v2 ∈ V , and ∗ −r (IV) rk2(AG) ≥ (1 − 2 )n. The proof of the theorem is based on the fact that bipartite expander graphs are meager. Definition 4.2.15. Let G = (V, W, E) be a bipartite graph with |V | ≥ |W |. We call G a (γ, β)-expander if for every Y ⊆ V with |Y | ≤ γ|V | it holds that |N(Y )| ≥ β|Y |. 4.2. THE I/R-METHOD IN THEORY 57 A typical method to obtain bipartite expanders is by considering the following random pro- cess. Let r ∈ N be a fixed number such that r ≥ 3. Given (disjoint) vertex sets V and W with |W | ≥ 4r and n := |V | = |W | one obtains a bipartite graph G = (V, W, E) by choosing independently and uniformly at random, for every v ∈ V , a set of r distinct neighbors in W . I refer to [146, Section 4] and [121, Chapter 5.3] for background on expander graphs, including variants of the following lemma. Lemma 4.2.16. For r sufficiently large it holds that 1 r 8 Pr G is a , -expander ≥ . 10r 2 9 A complete proof of this lemma is given in [124]. For the proof of Theorem 4.2.14 it remains to argue that the graphs obtained from the random process described above also satisfy Condition (III) and (IV) with high probability. Lemma 4.2.17. It holds that lim Pr (∃v1, v2 ∈ V : v1 6= v2 ∧ |N(v1) ∩ N(v2)| ≥ 3) = 0. n→∞ Proof. Let r ≥ 3 be a fixed constant. Let G = (V, W, E) be a bipartite graph obtained from the random process described above and let n := |V | = |W |. Let pn denote the probability that there are distinct v1, v2 ∈ V such that |N(v1)∩N(v2)| ≥ 3. Since r is a fixed number there exists a constant c1 > 0 such that n n ≤ c · n−3 · r − 3 1 r for all natural numbers n ∈ N. With this, the probabilities pn can be estimated by r−3 |W |−r r X p ≤ |V |2 s r−s |W | ≥ 4r n |W | s=0 r r−3 n r 2 X r−3 r−s ≤ n n s=0 r r−3 −3 n r 2 X c1n r r−s ≤ n n s=0 r r−3 c1 X r = · n r − s s=0 c = 2 n for some constant c2 > 0. n Theorem 4.2.18 (cf. [32, Theorem 1.1]). For n ≥ k let Sn,k = {v ∈ F2 | |{i ∈ [n] | vi = 1}| = n×n k}. Furthermore let A ∈ F2 be a random matrix where the rows are drawn uniformly and independently from Sn,k. There is a K ∈ N such that for every fixed k ≥ K it holds that