Research Collection

Doctoral Thesis

Colourings of Graphs and Words

Author(s): Kamčev, Nina

Publication Date: 2018

Permanent Link: https://doi.org/10.3929/ethz-b-000282692

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.

ETH Library

COLOURINGS OF GRAPHS AND WORDS DISS. ETH No. 25147

Nina Kamčev

nina kamcevˇ COLOURINGSOFGRAPHSANDWORDS

diss. eth no. 25147

COLOURINGSOFGRAPHSANDWORDS

A dissertation submitted to attain the of doctor of sciences of eth zurich (Dr. sc. ETH Zurich)

presented by nina kamcevˇ MMath (Cantab) born on 7 March 1991 citizen of Croatia

accepted on the recommendation of Prof. Dr. B. Sudakov, examiner Prof. Dr. M. Axenovich, co-examiner

2018 Nina Kamˇcev: Colourings of Graphs and Words, © 2018 doi: To my family and my Father. soli deo gloria

ABSTRACT

Extremal is concerned with the extreme values of a graph parameter over various classes of graphs. Randomised constructions have played a major role in extremal combinatorics. This phenomenon acted as a catalyst for the development of probabilistic combinatorics and the theory of random graphs as independent research areas. In the present thesis, we consider three graph parameters – anagram- chromatic number, rainbow connectivity and zero forcing number. Each of them is lower-bounded by a corresponding well-studied parameter (the Thue-chromatic number, diameter and minimum rank respectively). Our aim is to understand the hierarchy of graphs delineated by the parameter in consideration, and to highlight the role of random graphs as a surprising or close-to-optimal examples. Furthermore, we analyse a random graph process constrained to the König property. A graph is called König if the size of its maximum match- ing is equal to the order of its minimal vertex cover. We answer questions about the evolution and the final outcome of the process. A common feature of our proofs is that parts of them parallel classical problems concerning existence of relevant substructures in random graphs (Hamilton cycle, spanning tree, independent set, perfect ). This allows us to rely on some well-known approaches. The second part of the thesis contains two Ramsey-type results. A catch- phrase of Ramsey theory is that ‘any sufficiently large structure contains a well-organised substructure’. Let c be an edge-colouring of the complete n-vertex graph Kn. The prob- lem of finding properly coloured and rainbow Hamilton cycles in c was initiated in 1976 by Bollobás and Erd˝osand has been extensively studied since then. We study the problem in a more general setting, giving sufficient local (resp. global) restrictions on the colourings which guarantee a properly coloured (resp. rainbow) copy of a given hypergraph G. We also look at multipartite analogues of these questions. In our final chapter, the ‘well-organised substructure’ of interest is a combinatorial line in [m]n. The Hales–Jewett theorem states that for any m and r there exists an n such that any r-colouring of the elements of [m]n contains a monochromatic combinatorial line. We look more closely at the

ix obtained combinatorial line and prove tight results about the structure of its active coordinate set.

x ZUSAMMENFASSUNG

Extremale Graphentheorie befasst sich mit den Extremwerten eines Gra- phenparameters über verschiedene Klassen von Graphen. Randomisierte Konstruktionen spielen eine wichtige Rolle in der Extremalkombinatorik. Dieses Phänomen wirkte als Katalysator für die Entwicklung der probabi- listischen Kombinatorik und der Theorie der Zufallsgraphen als unabhän- gige Forschungsgebiete. In der vorliegenden Arbeit betrachten wir drei Graphenparameter – die Anagramm-färbungszahl, den Regenbogendurchmesser und die Null- Forcingzahl. Jeder von ihnen hat einen entsprechenden, bereits gut un- tersuchten Parameter (die Thue-Färbungszahl, den Durchmesser und den minimalen Rang) als untere Schranke. Unser Ziel ist es, die Hierarchie der Graphen zu verstehen, die durch den betrachteten Parameter beschrieben wird, und die Rolle der Zufallsgraphen als überraschende oder nahezu optimale Beispiele hervorzuheben. Darüber hinaus analysieren wir einen Zufallsgraphprozess, der auf die König-Eigenschaft beschränkt ist. Ein Graph wird König-Graph genannt, wenn die Grösse seines maximalen Matchings der Kardinalität seiner mi- nimalen Knotenüberdeckung entspricht. Wir beantworten Fragen über die Entwicklung und das Endergebnis des Prozesses. Unsere Beweise haben gemein, dass Teile davon Varianten der klassi- schen Probleme bezüglich der Existenz relevanter Unterstrukturen in zu- fälligen Graphen (Hamiltonkreis, Spannbaum, Co-Clique, perfektes Mat- ching) sind. Dies ermöglicht uns, auf einige bekannte Ansätze zurückzu- greifen. Der zweite Teil der Arbeit enthält zwei Ramseytheoretische Ergebnisse. Ein Schlagwort der Ramseytheorie ist, dass "jedes ausreichend grosse Sys- tem ein gut strukturiertes Untersystem enthält". Sei c eine Kantenfärbung der n-Clique Kn. Das Problem, gültig gefärbte und Regenbogen-Hamiltonkreise in c zu finden, wurde in 1976 von Bol- lobás und Erd˝osinitiiert und seither umfassend untersucht. Wir untersu- chen das Problem in einem allgemeineren Hypergraph-Kontext. Wir fin- den hinreichende lokale (bzw. globale) Bedingungen für die Färbungen, die eine gültig gefärbte (bzw. Regenbogen-) Kopie eines gegebenen Hyper- graphen G garantieren. Darüber hinaus werden analoge Fragen für multi- partite Graphen besprochen. xi Im letzten Kapitel interessieren wir uns für eine kombinatorische Gerade in [m]n als "gut strukturiertes Untersystem". Der Satz von Hales–Jewett be- sagt, dass für jedes m und jedes r ein n existiert, so dass jede r -Färbung der Elemente von [m]n eine einfarbige kombinatorische Gerade enthält. Wir be- trachten die erhaltene kombinatorische Gerade genauer und erhalten opti- male Ergebnisse über die Struktur der aktiven Koordinatenmenge.

xii ACKNOWLEDGEMENTS

My family and my granparents have provided unfailing support through- out my education and time abroad. I would like to thank Benny Sudakov for the opportunities and guidance during my doctoral studies. My PhD siblings and colleagues – Dániel Korándi, Pedro Vieira, Matthew Kwan, Roman Glebov, Jan Volec, Igor Balla, Felix Dräxler, Alexey Pokrovskiy, Jon Noel, Matija Buci´c,Shoham Letzter, Tuan Tran, Domagoj Cevid,´ Seraina Wachter, Ana Canas da Silva, Meike Akveld, Cornelia Busch, Christa Lach- muth, Maria Hempel, Zur Luria, Frank Mousset, Rajko Nenadov, Mohsen Ghaffari, Alex Puttick and a number of others – have made my time at ETH enjoyable. A number of people from the combinatorics community have inspired, encouraged and supported me in my research, including László Babai, Béla Bollobás, David Conlon, Jacob Fox, Stefanie Gerke, Thomas Kalinowski, Michael Krivelevich, Imre Leader, Anita Liebenau, Tomasz Łuczak, Natasha Morrison, Christoph Spiegel, Simone Severini, Oriol Serra, Goran Žuži´c. The template was created by Tino Wagner, saving me a lot of time and frustration. Last, but not least, I have always been able to rely on my friends – Han- nah and Sven Eggimann, Sarah Gales, Zofia Jackson, Samuel Leder, Han- nelore Leder, Steve Matsumoto, Sophia Michael, Yining Nie, Eva Waldis- pühl, Remi Tobler, the iCafé Zurich group. This list is bound to be incom- plete. I extend my gratitude to those omitted.

xiii

CONTENTS

1 introduction1

2 anagram-free graph colourings9 2.1 Specific families of graphs ...... 11 2.2 Bounded-degree graphs ...... 14

3 rainbow connectivity 33 3.1 Edge rainbow connectivity ...... 35 3.2 Vertex rainbow connectivity ...... 40

4 zero forcing number 45 4.1 Graphs with forbidden subgraphs ...... 49 4.2 The random graph ...... 52 4.3 Spectral bounds ...... 67

5 the könig graph process 73 5.1 Overview of Proof ...... 76 5.2 Preliminaries and Probabilistic Tools ...... 79 5.3 Forming a large matching ...... 83 5.4 The structure of GN ...... 88 5.5 Rigidity and uniqueness of an optimal cover ...... 91 5.6 Delayed perfect matching threshold ...... 94 5.7 Open Questions ...... 109

6 bounded colourings of multipartite graphs and hy- pergraphs 111 6.1 Lovász local lemma and the Lu-Székely framework for ran- dom injections ...... 116 6.2 Embedding m-partite graphs ...... 117 6.3 Embedding bounded-degree ...... 123

7 intervals in the hales–jewett theorem 131 a multidimensional lu-székely 135

bibliography 139

xv NOTATION frequently used symbols

symbol meaning

N natural numbers [n] the set {1, 2, . . . , n} S(k) family of k-order subsets of S log logarithm base e V(G) vertex set of G E(G) edge set of G δ(G) minimum degree ∆(G) maximum degree

degG(v) degree of v NG(v) neighbours of v (excluding v) S NG(S) ( v∈S NG(v)) \ S G[S] the subgraph of G induced on S u ∼ v u and v are adjacent

EG(S) edges of G with both endpoints in S

eG(S) |EG(S)|

EG(S, T) edges of G with one endpoint in S and one in T

eG(S, T) |EG(S, T)| (r) Kn the complete r-uniform graph on n-vertices; r = 2 if sup- pressed

Ka,b the complete with parts of order a and b Cn the n-vertex cycle G(n, m) random graph with m edges G(n, p) binomial random graph with edge probability p

Gn,d random d-regular graph

xvi 1 INTRODUCTION

Extremal graph theory in its broad sense studies the relations between dif- ferent graph properties, often with the objective of maximising a specific graph parameter over an appropriate class of graphs. Properties which are often considered are connectivity, (minimum, maximum or average) degree, clique number, chromatic number, diameter, Hamiltonicity etc. A common theme across several chapters of this thesis is the crucial role played by random graphs and related randomised constructions in ex- tremal problems. Hence we start by reviewing some classical results in this direction, a brief history of random graph models and our contribu- tion to the developments (Chapters 2-5). Then we introduce our final two results and put them in the context of Ramsey theory. Many terms in the introduction are left undefined to retain the flow. We hope that they are either commonly known or inessential to the main discussion, and introduce them in the relevant chapters. The graphs in the thesis are simple and undirected. The chromatic number of a graph is defined to be the minimum number of colours needed to colour the vertices of a graph in a way that no two adjacent vertices are assigned the same colour. This problem has a simple real-world counterpart – the vertices of the graph can represent different examinations, and the adjacent examinations should not clash. The chro- matic number is then just the number of time slots needed to schedule all the examinations, avoiding the prescribed clashes. A very basic extremal question is, what is the largest possible chromatic number over all graphs with maximum degree ∆? A moment’s thought reveals the answer ∆ + 1 and triggers a number of further questions. As a next step, we might ask what is the maximum chromatic number over all graphs not containing a clique (complete r-vertex graph) Kr? For, containing an r-clique clearly requires us to use at least r colours. The answer is, in a sense, ‘infinity’. Using a probabilistic construction Erd˝os[65] showed that for any r, there are in fact graphs which contain no cycles shorter than r with an arbitrar- ily large chromatic number. This is surprising since a graph with no short cycles locally looks like a tree and is therefore two-colourable. Ramsey’s theorem is another fundamental result, saying that for any s and t, there exists a number n such that any n-vertex graph contains a 1 2 introduction

clique of order s or an independent set of order t. The minimum number n for which the conclusion holds is called the Ramsey number, denoted r(s, t). A simple lower-bound ‘example’ for the diagonal case r(s, s), given by Erd˝os[64], is a graph G chosen uniformly at random among all n-vertex graphs. Namely, with positive probability (in fact, probability approaching 1), the sampled graph G contains no clique or independent set of order 2dlog2 ne. Although the quest of improving this lower (as well as the upper) bound has sparked a number of ingenious ideas, the constant next to the logarithm has not changed since 1947. These are just two examples of a remarkable phenomenon – randomised objects often exhibit surprising and counter-intuitive properties. The phe- nomenon was also observed in geometry of Banach spaces, Fourier analy- sis and number theory, to name but a few [27]. Another common feature is that explicit constructions are often complex and found significantly later, if at all. The first probabilistic argument in combinatorics is attributed to Szele [141]. Triggered by the work of Erd˝os,probabilistic constructions, algorithms and methods have shaped the emerging areas of discrete math- ematics and theoretical computer science. The systematic modern study of random graphs began in 1959 with two seminal papers by Erd˝osand Rényi [59] and Gilbert [89]. This work has since grown into a well-established research area with many impor- tant applications in theoretical computer science, statistical physics, and other branches of mathematics [28, 82, 103]. Erd˝osand Rényi have intro- duced G(n, m), a graph chosen uniformly at random among all n-vertex graphs with m edges. One is usually interested in the existence of a spe- cific substructure in G(n, m). Connectivity [59, 60, 89] and perfect match- ings [61–63] were the first properties of interest. Erd˝osand Rényi have shown that with high probability, 1 if G(n, m) has no isolated vertices, then it is connected and has a perfect matching. This surprising pattern is a re- curring feature of many graph properties (e.g. Hamiltonicity), with the necessary minimum-degree requirement. For an overview of these early developments, we refer the reader to [106]. Other classical properties are containing a fixed subgraph H, diameter, order of the largest independent set, chromatic number etc. Another model of interest, especially when it comes to global properties such as connectivity, is the random regular graph Gn,d, chosen uniformly at random out of all n-vertex d-regular graphs. In the sparse case, when d is a

1 We say that an event in the probability space G(n, m) holds with high probability if its proba- bility tends to 1 with n. introduction3

constant, Gn,d behaves rather differently from G(n, m). For instance Gn, 3 is already connected, has diameter logarithmic in n and contains a Hamilton cycle, a cycle spanning all the n vertices, with high probability. We reiterate that this is in contrast with G(n, m), which becomes connected at average degree asymptotic to log n, as that is where the isolated vertices vanish with high probability. A major feature of Gn,d is that it is an expander. Ex- panders are ‘highly-connected’ bounded-degree graphs with a number of applications in computer science and pure mathematics [126]. Pinsker [132] showed that for a large constant d, almost all n-vertex d-regular graphs are expanders. Lubotzky [126] reports that the first explicit construction [127] utilised state-of the-art tools from representation theory and most contem- porary constructions still come from group theory and analysis, affirming the aforementioned limited supply of constructive examples. In Chapters 2 and 3, we use random regular graphs precisely as examples of bounded- degree graphs with strong expansion properties. A notable variation on the Erd˝os-Rényirandom graph are constrained graph processes. Given a decreasing graph property P, one greedily con- structs a random graph satisfying P. Motivation for studying these pro- cesses comes from both theoretical considerations and potential applica- tions in extremal combinatorics. For instance, the bounded-degree process was introduced in [135] with the aim of defining and analysing a viable probability measure on regular graphs. On the other hand, the triangle- free process and other randomised constructions have provided a series of improvements on the lower bound for the Ramsey number r(3, n) [20, 23, 66, 93, 110]. Another class of graphs we investigate are pseudorandom graphs, graphs with a homogeneous edge distribution resembling the random graph. A recent trend in combinatorics is to transfer results for random graphs to pseudorandom graphs. The survey [113] contains many examples of this type. Among several possible definitions, we choose the one based on spec- tral properties of a graph, which allows us to use another powerful set of tools – algebraic methods. It turns out that algebraic properties of the adja- cency matrix of a graph and in particular its spectrum give us information on its combinatorial properties such as connectivity and edge distribution. extremal problems in random graphs

We turn to the results of this thesis. Chapter 2 is a study of anagram-free graph colourings, motivated by a question of Alon et al. [6] and open 4 introduction

problems in combinatorics on words. In 1906, Thue constructed an infinite sequence on the alphabet {1, 2, 3} which contains no repetitions – consecu- tive coinciding segments such as 11321132. This surprising property can be seen as a negative Ramsey-type result. In light of the previous discussion, it is not surprising that the existence of such a sequence on 10 digits can also be proved using probabilistic tools. In fact, Alon et al. have used the Local lemma to show that in graphs of maximum degree ∆, repetitions on any simple path can be avoided using O(∆)2 colours. A sequence S is called anagram-free if it contains no consecutive letters r1r2 ... rkrk+1 ... r2k such that rk+1 ... r2k is a permutation of r1 ... rk. An- swering a question of Brown and Erd˝os,Keränen constructed an anagram- free sequence on four digits [109]. This definition is analogously extended to graph colourings – given a graph G, we define its anagram-chromatic number πα(G) as the minimal number of colours needed to colour the ver- tices of G, avoiding an anagram on any simple path. In Chapter 2, we ob- tain bounds on πα(G) for several interesting classes of graphs, all of them in stark contrast with the corresponding bounds when only repetitions are forbidden. The most striking result is that for the random d-regular graph, 2 πα(Gn,d) ∼ n with high probability, contrasting the O d -bound of Alon et al. The basic idea for this result comes down to finding Hamilton cy- cles in induced subgraphs of Gn,d of order as small as O (n log d/d). In realising this plan, we combine a variety of tricks pioneered in the above- mentioned classical problems, for instance different ways of generating Gn,d, its expansion properties, Pósa’s rotation-extension technique and var- ious probabilistic tools. Rainbow (edge- or vertex-) connectivity are graph invariants intimately related to its diameter and motivated by problems in communication. In Chapter 3, we propose a new and rather simple approach to studying rainbow connectivity. Using this idea, we give a unified proof of several results, including upper bounds for expanders and the random regular graph. Chapter 4 concerns the zero forcing process, a propagation process on a graph. A subset S of initially infected vertices of a graph G is called zero forcing if we can infect the entire graph by iteratively applying the follow- ing propagation rule. At each step, any infected vertex which has a unique uninfected neighbour, infects this neighbour. The zero forcing number of G is the minimal order of a forcing set in G. It was introduced independently as a bound for the minimum rank of a graph [2], and as a criterion for controllability of quantum spin systems [36]. The zero forcing number has introduction5 turned out to be very useful as a bound on the minimum rank for specific graph classes since it is a purely combinatorial parameter. One result of Chapter 4 is a precise computation of the zero forcing number of the ran- dom graph G(n, p), implying that for most graphs it is actually asymptotic to n and far from the minimum rank. This refutes the intuition gained by studying specific graph classes (e.g. in [34], [2]). In Chapter 5, we consider a random graph process constrained to the König property. Say that a graph G is a König graph if the size of its maxi- mum matching is equal to the order of a minimal vertex cover. We analyse the behaviour of this process, focusing mainly on two questions: What can be said about the structure of the final graph and for which m will Gm contain a perfect matching? ramsey-type problems

In the final two chapters, we go back to Ramsey theory. Its roots are in sev- eral diffuse results such as theorems of Hilbert, Schur, van der Waerden concerning colourings of the integers, and Ramsey’s theorem concerning edge-colourings of hypergraphs [91]. They follow the common theme that every system contains a large subsystem possessing additional structure. In the late 20th century Ramsey theory has emerged as a cohesive subdis- cipline of discrete mathematics. Graham, Rotschild and Spencer [91] em- phasise the central role of the Hales-Jewett theorem in this development. Our research can be traced back to the Canonical Ramsey theorem and the Hales-Jewett theorem respectively. The guiding question of Chapter 6 is, given an n-vertex graph G and an edge-colouring c of Kn, under what conditions (on G and the colouring) can we find a rainbow or properly-coloured copy of G in c? Note that we are looking to embed a spanning subgraph G, which is a well-known and difficult class of problems. The question was first considered when G is a Hamilton cycle by Bollobás and Erd˝os[29], and later extended to general graphs G, as well as hypergraphs. However, if the host graph is Kn,n, the , then finding a rainbow matching in a given edge-colouring is equivalent to a much older problem of finding a Latin transversal in a given . This motivated us to look at the multipartite version of the original problem, where both the host graph and the graph G to be embedded are m-partite. We find optimal conditions under which the embedding is possible. Our bounds exhibit a surprising transition in the rate of growth, showing that the problem is fundamentally 6 introduction

different in the regimes ∆(G)  m and ∆(G)  m. The general method also allows us to take on the hypergraph analogy of the problem, which was so far only studied when G is an r-uniform cycle. Using the Lu-Székely framework for the Lóvasz local lemma, we are able to show that actually a random embedding of the vertices of G into the host graph is rainbow / properly-coloured with positive probability. Incidentally, the Lopsided local lemma was first formulated exactly in the context of Latin transversals in Latin squares where no symbol appears too often [70]. It is a tool which allows one to consider a class of bad events Bi in a probability space and conclude that if those events are ‘close to in- V dependent’, they can be avoided simultaneously, i.e. P [ i Bi] > 0. What is remarkable about the Local lemma is that it can be used to detect an event with exponentially small probability, ‘a needle in a haystack’. A series of results have even provided algorithmic versions of increasing strength, cul- minating in the recent breakthrough of Moser and Tardos [129]. An indica- tion for ubiquity of the lemma is that it also features as a tool in Chapters 2 and 3. The Hales–Jewett theorem, the subject of Chapter 7, says very informally that the n-dimensional, multi-player, m-in-a-row generalisation of a game of tic-tac-toe cannot end in a draw, provided that the dimension n of the board is sufficiently high. In other words, for any m and r there exists an n such that any r-colouring of the elements of [m]n contains a monochro- matic combinatorial line. It is often referred to as the ‘pure version’ of van der Waerden’s theorem, which asserts that any r-colouring of the integers contains a monochromatic arithmetic progression of arbitrary length m. Shelah’s proof of the Hales-Jewett theorem [139] was a breakthrough, the first primitive recursive bound, meaning that it avoids double induction on m and r. As a step towards understanding how much room for improve- ment this poof strategy leaves, we look more closely at the structure of the obtained monochromatic combinatorial line. We exhibit the colourings cer- tifying that in our sense, the structure given by Shelah’s proof is optimal.

notation, conventions and publications

The chapters of this cumulative thesis are based on the papers listed on page 149. Standard graph theoretic notation is listed in the front matter. Recall that an event An in a sequence of probability spaces Ωn holds with high probabil- ity if P [An] → 1 as n → ∞. In a slight abuse of notation, we sometimes use, introduction7 say, G(n, m) to denote the probability space, as well as random graph sam- pled from the space. We use standard asymptotic notation (see, e.g., [12]) for the asymptotic behaviour of the relative order of magnitude of two se- quences, depending on a parameter n → ∞. Floor and ceiling signs are mostly omitted for the sake of clarity of presentation. The log will denote the base-e logarithm unless the basis is specified. The preliminaries are laid out in the respective chapters, as most of the existing results we use are rather specific to the problem. The only common preliminary results are concentration inequalities, the Local lemma and results on random regular graphs presented in Section 2.2.2.1.

2 ANAGRAM-FREEGRAPHCOLOURINGS

The study of non-repetitive colourings was conceived by a famous result of Thue [143] from 1906. He showed that there exists an infinite sequence S on an alphabet of three symbols in which no two adjacent blocks (of any length) are the same. In other words, S contains no sequence of consecu- tive symbols r1r2 ... r2n with ri = ri+n for all i ≤ n. Note that it is not a priori obvious that the minimal size of the alphabet necessary for an infi- nite non-repetitive sequence is even finite. Thue’s result is interesting in its own right, but it also has influential and surprising applications, the most famous one probably occurring in a solution to the Burnside problem for groups by Novikov and Adjan [130]. Thue-type problems lead to the de- velopment of combinatorics on words, a new area of research with many interesting connections and applications. Generalisations of Thue’s result occurred in two directions. Firstly, the setting has been changed from sequences to, e.g., the real line, the lattice Zn, or graphs. Secondly, repetitions as a forbidden structure can be re- placed by anagrams, sums, patterns etc. For a formal treatment and refer- ences to these problems, we refer the reader to the survey of Grytczuk [95]. Here we focus on graph colourings, and the structure we are avoiding are anagrams. Recall that an anagram is a sequence r1r2 ... rnrn+1 ... r2n whose second block, rn+1 ... r2n, is a permutation of r1r2 ... rn. A long standing open question of Erd˝os[67] and Brown [33] was whether there exists a sequence on {0, 1, 2, 3} containing no anagrams. We call such sequences anagram- free. It is easy to check that no such sequence on three symbols exists. In 1968 Evdokimov [73] showed that the goal can be achieved with 25 symbols, which was the first finite upper bound. Later Pleasants [133] and Dekking [52] lowered this number to five. Finally, Keränen [109] con- structed arbitrarily long anagram-free sequences on four symbols using Thue’s technique of iterated substitutions – given a finite anagram-free se- quence S on symbols {0, 1, 2, 3}, we can replace each symbol by a longer word on the same alphabet in a way that yields a new, longer anagram- free sequence S. This answered the question of Erd˝osand Brown, but at the same time opened new avenues for further studies; some of them can be found in [95]. 9 10 anagram-free graph colourings

Bean, Ehrenfeucht and McNulty [18] have studied the problem of non- repetitive colourings in a continuous setting. A colouring of the real line is called square-free if no two adjacent intervals of the same length are coloured in the same way. More precisely, for any intervals I = [a, b] and J = [b, c] of the same length L > 0, there exists a point x ∈ I whose colour is different from x + L. In [18], they showed that there exist square-free two-colourings of the real line. The problem of avoiding anagrams also has a continuous variant. Alon, Grytczuk, Laso´nand Michałek [7] have proved that there exists a measurable 4-colouring of the real line such that no two adjacent segments contain equal measure of every colour. Alon et al. [6] proposed another variation on the non-repetitive theme. Let G be a graph. A vertex colouring c : V(G) → C is called non-repetitive if any path in G induces a non-repetitive sequence. Define the Thue num- ber π(G) as the minimal number of colours in a non-repetitive colouring of G. It is easy to see that this number is a strengthening of the classical chromatic number, as well as the star-chromatic number. It turns out that the Thue number is bounded for several interesting classes of graphs, e.g. π(Pn) ≤ 3 for a path Pn of length n (directly from Thue’s Theorem), and π(T) ≤ 4 for any tree T. Using the Lovász local lemma, the authors of [6] showed that π(G) ≤ c∆(G)2, where c is a constant and ∆(G) denotes the c0∆2 maximum degree of G. They also found a graph G with π(G) ≥ log ∆ . Closing the above gap remains an intriguing open question. Another inter- esting problem is to decide if the Thue number of planar graphs is finite. A survey of Grytczuk [94] lays out some progress in this direction, as well as numerous related questions on non-repetitive graph colourings. The investigation of anagram-free colourings of graphs, which we do here, was suggested in [6]. Let c : V(G) → C be a vertex colouring of a graph G. Two vertex sets V1 and V2 have the same colouring if they have the −1 −1 same number of occurrences of each colour, i.e. |c (a) ∩ V1| = |c (a) ∩ V2| for each a ∈ C. An anagram is a path v1v2 ... v2n in G whose two segments v1 ... vn and vn+1 ... v2n have the same colouring. We denote the minimum number of colours in an anagram-free colouring of G by πα(G), and call it the anagram-chromatic number of G. Clearly πα(G) ≤ n for any n-vertex graph G. The result of Keränen [109] states that πα(Pn) ≤ 4 for a path Pn of length n, so it is only natural to ask what is πα for other families of graphs. It turns out that as soon as we move on from paths, the situation becomes very different. We first show that the anagram-chromatic number of a binary tree already increases with the number of vertices. 2.1 specific families of graphs 11

Proposition 2.1. Let Th be a perfect binary tree of depth h, i.e. every non-leaf has two children and there are 2h leaves, all at distance h from the root. Then s h ≤ πα(Th) ≤ h + 1. log2 h It follows that the anagram-chromatic number of planar graphs is also unbounded, but it is still interesting to determine how quickly it increases with the number of vertices. We observe that in dealing with a family of graphs which admits small separators (such as H-minor-free graphs), this fact can be used to bound πα(G) from above. Proposition 2.2. Let h ≥ 1 be an integer, and let H be a graph on h vertices. 3/2 1/2 Any n-vertex graph G with no H-minor satisfies πα(G) ≤ 10h n . We are particularly interested in anagram-free colourings of graphs of bounded degree. By considering the random regular graph Gn,d we show that, surprisingly, there are graphs of bounded degree such that to avoid anagrams we essentially need to give every vertex a separate colour. We say that an event in Gn,d holds with high probability (whp) if its probability tends to 1 as n tends to infinity over the values of n for which nd is even (so that Gn,d is non-empty). Our main result can be stated as follows. Theorem 2.3. There exists a constant C such that for sufficiently large d, with high probability, the random regular graph Gn,d satisfies  C log d   log d  1 − n ≤ π (G ) ≤ 1 − n. d α n,d d We start this chapter with some observations on the anagram-chromatic number for trees and minor-free graphs. Then we give the proof of Theo- rem 2.3. We conclude with some open questions.

2.1 specific families of graphs

2.1.1 Bounds for trees

A binary tree is a tree in which every vertex has at most two children. Let Th be a perfect binary tree of depth h ≥ 1, that is to say that every non-leaf has two children and there are 2h leaves, all at distance h from the root. The root is taken to be at depth 0, so a tree consisting of one vertex has depth 0. Colouring each vertex of Th by its distance from the root shows 12 anagram-free graph colourings

that πα(Th) ≤ h + 1. In the following section, we will argue that actually any n-vertex tree can be anagram-free coloured with 2 log n colours. Propo- q h sition 2.1 asserts the lower bound πα(Th) ≥ , which will be proven log2 h in this section. Let T be a vertex-coloured rooted binary tree and let U be a subtree of T. The effective vertices of U are its root (i.e. the vertex of U of the smallest depth), leaves, and vertices of degree three. The effective depth of U is set to h1, where h1 + 1 is the minimum number of effective vertices on any path from the root to a leaf. In other words, if we contract all the internal degree-two vertices of U, h1 is the minimal distance from the root to a leaf of the resulting tree. Note that if U has effective depth h1, then it has at least 2h1 leaves. We say that U is essentially monochromatic if all its effective vertices carry the same colour. We will use a Ramsey-type argument to find a large essentially monochro- matic subtree of a given tree. In the statement below H(a1, a2,..., ad) de- notes the minimal number h for which any perfect binary tree T of depth h whose vertices are coloured using colours 1, 2, . . . , d contains an essentially i-coloured subtree of effective depth ai, for some i ∈ [d].

Lemma 2.4. H(a1, a2,..., ad) ≤ a1 + ··· + ad. d Proof. We use induction on ∑i=1 ai. The base case is a1 = ··· = ad = 0, for which the claim clearly holds. Let T be a perfect binary tree of depth a1 + ··· + ad. Suppose that its root v has the colour 1, and call its children vL and vR. Consider the subtrees TL and TR of depth at least a1 + ··· + ad − 1 rooted at vL and vR respectively. If for some i ≥ 2, TL contains an essentially i-coloured subtree of effec- tive depth ai, we are done. The same holds for TR. Otherwise, using the induction hypothesis, TL and TR contain essentially 1-coloured subtrees of effective depth a1 − 1. Those two subtrees, together with the root v, form an essentially 1-coloured subtree of T, as required.

q h Proof of Proposition 2.1. Let Th be coloured using d < colours. By log2 h Lemma 2.4, it contains an essentially monochromatic subtree U of depth h/d. Let u be the root of U, and suppose U is essentially red. There are at least 2h/d paths from u to the leaves, and the colouring of each path is a multiset of order at most h + 1. On the other hand, there are at most hd such multisets. Since hd < 2h/d for our choice of d, there is a multiset which occurs on two different paths, say P1 and P2. Let v be the lowest 2.1 specific families of graphs 13

common vertex of P1 and P2, and let `1 and `2 be their respective leaves. By construction of U, the vertices v, `1 and `2 are red. Hence the segments from `1 to v, excluding v, and from v to `2, excluding `2, have the same colouring. We conclude that the given colouring of Th, even restricted to U, contains an anagram.

2.1.2 Graphs with an excluded minor

Planar graphs are of special interest when it comes to colouring problems. The Four colour theorem is one of the most celebrated results in graph theory. Moreover, the question of whether the Thue-chromatic number of planar graphs is finite has attracted a lot of attention and is still open. We use separator sets to show that for a large√ class of minor-free graphs the anagram-chromatic number is of order O n. The crucial ingredient of our argument is the separator theorem, proved by Alon, Seymour and Thomas [11]. It states that for a given h-vertex graph H, in any graph G with n vertices and no H-minor, one can find a set S ⊂ V(G) of order 3 1 |S| ≤ h 2 n 2 , whose removal partitions G into disjoint subgraphs each of 2n which has at most 3 vertices. Such a set S is called a separator in G. Using this theorem, we construct a colouring of any proper minor-closed family of graphs. For convenience of the reader, we restate Proposition 2.2.

Proposition 2.5. Let h ≥ 1 be an integer, and let H be a graph on h vertices. 3/2 1/2 Any n-vertex graph G with no H-minor satisfies πα(G) ≤ 10h n . Proof. The colouring is inductive – suppose the claim holds for graphs on at most n − 1 vertices. Let G be as in the statement, and let S be a sepa- rating set of vertices in G of order at most h3/2n1/2 given by the Separator Theorem. Then G − S consists of two vertex-disjoint subgraphs spanned 2n by A1 ⊂ V(G) and A2 ⊂ V(G), with |Ai| ≤ 3 . The induced subgraphs G[Ai] do not contain H as a minor,√ so by the inductive hypothesis, we can colour them using k = 10h3/2 2n/3 colours a1, a2,..., ak. Note that the two subgraphs receive colours from the same set. This colouring guarantees that any path containing only vertices from A1 or A2 is anagram-free. Furthermore, we assign to each vertex vi ∈ S a separate colour bi, making any path passing through S anagram-free. 14 anagram-free graph colourings

Hence the colouring is indeed anagram free. As intended, the number of colours used is at most r ! 2 h3/2n1/2 10 + 1 ≤ 10h3/2n1/2. 3

Since planar graphs are characterised as graphs containing neither K5 nor K3,3 as a minor, we arrive at the following consequence of the above result (note that the constant 150 can be replaced by√19 if we use the fact that each planar graph has a separator of order 1.84 n). √ Corollary 2.6. Let G be an n-vertex planar graph. Then πα(G) ≤ 150 n. In fact, any hereditary family of graphs with small separators can be coloured using the argument from Proposition 2.2. For example, it is easy to see that an n-vertex forest F contains a single vertex which separates it into several forests on at most n/2 vertices. The same inductive argument implies πα(F) ≤ dlog2 ne. As for the lower bound for planar graphs, we only have the following modification of the argument we gave for trees (Proposition 2.1).

1 Proposition 2.7. There is an n-vertex planar graph Fn with πα(Fn) ≥ d 4 log2 ne.

Proof. Let Fn be a perfect binary tree with n leaves, plus extra edges be- tween any two vertices on the same level having the same parent. Suppose 1 it is coloured in k = d 4 log2 ne colours. The number of shortest paths from the root to the vertices corresponding to leaves is n, whereas the number + (log2 n k) < of possible colourings of these paths is k−1 n; hence some two paths have the same colouring. These two paths, minus the shared initial segment, can be made into an anagram.

2.2 bounded-degree graphs

2.2.1 A four-regular graph with a large anagram-chromatic number

In this section we study the number of colours needed to colour a bounded- degree graph on n vertices so as to avoid all anagrams. The trivial up- per bound is n, so we will mainly be interested in lower bounds for the anagram-chromatic number. Keränen’s result implies that graphs of max- imum degree two satisfy πα(G) ≤ 5. It turns out that there are already 2.2 bounded-degree graphs 15

4-regular graphs G for which πα(G) grows rather quickly with the size of the graph.

Proposition 2.8. For infinitely√ many values of n, there exists a 4-regular n-vertex n graph H with πα(H) ≥ . log2 n Proof. Note that for each even k ≥ 4, there exists a 3-regular k-vertex graph G which is Hamilton-connected, which means that any two vertices of G are joined by a Hamilton path. Indeed, it can be easily checked that for any m ≥ 1, the Cayley graph of C2 × C2m+1 with canonical generators is Hamilton-connected. For a self-contained proof, we refer the reader to [48]. Let n = (k + 1)k. Take k + 1 copies of such G on vertex sets V1, V2,..., Vk+1 with |Vi| = k. Furthermore, take a perfect matching M on V1 ∪ · · · ∪ Vk+1 such that there exists exactly one edge between any two Vi and Vj, for i 6= j. To see that such a matching exists, denote Vi = {vij : j ∈ [k + 1] \{i}}, and take M = {{vij, vji} : 1 ≤ i < j ≤ k + 1}. Call the resulting graph H. The graph H is 4-regular - any vertex has three adjacent edges belonging to its copy of G and one edge belonging j √ k to M. Suppose that the vertices of H are coloured with n colours. log2 n S k+1 Consider the subsets of form i∈S Vi for any S ⊂ [k + 1]. There are 2 S such subsets. The√ colouring of each i∈S Vi defines a multiset of order at j n k most n. Given log n colours, the number of such multisets is at most √ 2 n √ n k+ n log2 n = 2 < 2 1. Thus, by the pigeonhole principle, there are two S S distinct sets S, T ⊂ [k + 1] such that i∈S Vi and i∈T Vi have the same number of occurrences of each colour. The same holds for sets S0 = S \ T and T0 = T \ S, which are in addition disjoint. Without loss of general- 0 0 ity assume S = {V1,..., Vs} and T = {Vs+1,..., V2s}. By the choice of M, we can find vertices v1, u1, v2, u2,..., v2s, u2s such that vi, ui ∈ Vi for i ∈ [2s], and uivi+1 are edges in M for i ∈ [2s − 1]. Moreover, we can find a Hamilton path in each H[Vi] between ui and vi, using Hamilton- connectedness of G. Concatenating these 2s paths gives us a path in H ∪ · · · ∪ which traverses V1√ V2 V2s in order. This path forms an anagram in j n k H, so πα(H) > . log2 n

2.2.2 Random regular graphs

Let us start with a simple observation which slightly improves the trivial upper bound n for the anagram-chromatic number of a graph. 16 anagram-free graph colourings

Proposition 2.9. Let G be an n-vertex graph with an independent set of order m. Then πα(G) ≤ n − m + 1. Proof. Let S be an independent set inside G of order m. Give each vertex of S the same colour, and each vertex of V(G) \ S its own colour. Any path in G contains at least one vertex of V(G) \ S, so it cannot contain an anagram. This means that our colouring is indeed anagram-free.

The above bound is essentially optimal for the random regular graph Gn,d. To recapitulate, Theorem 2.3 states that for sufficiently large d, with high probability, Gn,d satisfies

 2 · 105 log d   log d  1 − n ≤ π (G ) ≤ 1 − n. d α n,d d

The upper bound is an immediate consequence of Proposition 2.9, and the fact that with high probability, Gn,d contains an independent set of or- 2 log d log d der asymptotic to d n > d n (see, for instance, Frieze and Łuczak [80]). We will now outline the proof of the lower bound on πα(Gn,d), which comprises the remainder of the section. Instead of studying the random d-

regular graph Gn,d, we will consider the union of two random graphs Gn,d1 = + and Gn,d2 with d d1 d2. The asymptotic properties of Gn,d are con- tiguous with such a model (see the following subsection for details). Let = = = ∪ G1 Gn,d1 , G2 Gn,d2 and c be a given vertex-colouring of G G1 G2. The first step is to find two vertex subsets V1 and V2 with the same colour- ing such that G1[V1] and G1[V2] have good expansion properties. Then we use the edges of G2 to extend paths on V1 and V2, eventually building Hamilton cycles C1 in G[V1] and C2 in G[V2]. Finally, we can find an edge v1v2 ∈ G with vi ∈ Vi and use it to build a single path S which traverses first the vertices of C1 and then the vertices of C2. The segments S[V1] and S[V2] give an anagram in c. Before proceeding, let us introduce some notation, which is also listed in the front matter. For a graph G and v ∈ V(G), we denote the neigh- bourhood of v in G by NG(v). For a vertex set U ⊂ V(G), NG(U) = S v∈U NG(v) \ U. The graph induced on U is G[U], and its edge set is de- noted by EG(U) = E(G[U]). For disjoint sets U and T, EG(U, T) is the set of edges with one endpoint in U and one in T. Finally, the corresponding counts are eG(U) = |EG(U)| and eG(U, V) = |EG(U, V)|. We denote the uniform probability measure on the space of random regular graphs Gn,d by P, suppressing the indices. All the inequalities below are supposed to hold only for n large enough. 2.2 bounded-degree graphs 17

2.2.2.1 Preliminaries on random regular graphs Before proceeding with the exposition, we collect some models and prop- erties of random regular graphs that we will be using in Chapters 2 and 3. Recall that Gn,d is by definition chosen uniformly at random from all n- vertex d-regular graphs whenever nd is even. However, it is often convenient to generate a d-regular graph using one of the superposition models. The idea is that the union of a d-regular and a d0-regular graph on the same vertex set [n] is a (d + d0)-regular multi- graph. This observation, as well as some involved results going back to the work of Janson [102], Robinson and Wormald [134], lead to a very elegant arithmetic of superposition models. Let us give the relevant definitions. For models of random graphs G and 0 G on the same vertex set, we construct a new random graph G = G1 ∪ G2 0 by taking the union of independently sampled graphs G1 ∈ G and G2 ∈ G , conditional on the event E(G1) ∩ E(G2) = ∅. The probability space of such disjoint unions is denoted by G ⊕ G0. Two sequences of probability spaces Fn and Gn on the same underlying measurable spaces are called contiguous, written Fn ≈ Gn, if a sequence of events (An) occurs with high probability in Fn whenever it occurs with high probability in Gn. For constant d ≥ 3, Gn,d is contiguous with any other model which builds a d-regular graph as an edge-disjoint union of random regular graphs and Hamiltonian cycles [148, Theorem 4.15]. The specific results we use in proving Theorem 2.3 and Theorem 3.2 are Gn, d+d0 ≈ Gn, d ⊕ Gn, d0 and Gn, d+2 ≈ Gn,d ⊕ Hn, where Hn is a random Hamiltonian cycle on [n]. The power of this framework is reflected in its numerous interesting and difficult consequences. For instance, Gn,2d for d ≥ 2 can be decomposed into d edge-disjoint Hamilton cycles. edge distribution in the configuration model In analysing Gn,d, we pass to the configuration model of random regular graphs. For nd even, we take a set of nd points partitioned into n cells v1, v2,..., vn, each cell containing d points. A perfect matching P on [nd] induces a multigraph M(P) in which the cells are regarded as vertices and pairs in P as edges. For a fixed degree d and P chosen uniformly from the set of perfect match- ings Pn,d, the probability that M(P) is a simple graph is bounded away from zero, and each simple graph occurs with equal probability. Therefore, if an event holds with high probability in M(P), then it holds with high probability even when we condition on the event that M(P) is a simple graph, and therefore it holds whp in Gn,d. For a formal description of the 18 anagram-free graph colourings

configuration model and its basic properties, see, for instance, [103, Chap- ter 9]. We use the configuration model to get a bound on the edge distribution in Gn,d analogous to the Erd˝os-Rényimodel. The uniform probability mea- sure on Pn,d is denoted by PP . Both indices n and d are kept so that each perfect matching P corresponds to a unique d-regular multigraph M(P). Lemma 2.10. Let V ⊂ [n], and let B be a set of pairs of vertices from V . Let 1 n o 1 E ⊂ [n](2) with |E| ≤ min 1 |B|d, nd . For a fixed positive integer d and 4|V1| 20 P ∈ Pn,d chosen uniformly at random,  |E| 2d − 2|B|d P [M(P) ⊃ E and M(P) ∩ B = ∅] ≤ e 5n . P n The lemma also holds for more general configurations of B and E, but we state it in the form which is fit for our purpose. The proof is given after Lemma 2.12, which is a bound on the probability that Gn,d does not intersect a given set of edges. A crucial ingredient is the following estimate of Alon and Friedland [5], which is a simple corollary of the Brègman bound on the permanent of a (0, 1)-matrix.

Theorem 2.11. Let H be a graph on [N]. Let r1, r2,..., rN be the degrees of the 1 N vertices in H. Furthermore, denote r = N ∑i=1 ri. Then the number of perfect matchings in H is at most

N 1 N 2r ∏(ri!) i ≤ (r!) 2r . i=1 Lemma 2.12. For each even number N, let F = F(N) be a graph on [N] consist- 2 ing of at least βN edges. Let GN,1 denote a random matching on [N]. Then

− 8β N P [GN,1 ∩ F = ∅] ≤ e 9 . Proof. Let P(F) be the set of perfect matchings on [N] which do not inter- sect our graph F. Since the number of perfect matchings on [N] is exactly N! N , we need to show that 2 N 2 ( 2 )!

− 8β N N! |P( )| ≤ 9 F e N   . 2 N 2 2 !

Consider the complement F of F. The matchings in P(F) are exactly perfect matchings in F. We apply Theorem 2.11 directly to the graph F with r = 2.2 bounded-degree graphs 19

N − 2βN, and use Stirling’s formula to reach the final result. Indeed, we have

1 |P(F)| ≤ ((N − 2βN)!) 2(1−2β)   N N 2 1 2 ! · 2 P [G ∩ F = ∅] ≤ ((N − 2βN)!) 2(1−2β) · N,1 N!   N N √  (1 − 2β)N 2  e  2 √  = O N = O N e−βN. e N

N N!  N  2 Here we use the fact that N = Θ(1) e , as well as the inequal- N 2 ( 2 )!·2 ity 1 − 2β ≤ e−2β. Absorbing the error term into the constant, we get, for N large enough,

− 8β N P [GN,1 ∩ F = ∅] ≤ e 9 .

Proof of Lemma 2.10. We will restate the event {M(P) ⊃ E and M(P) ∩ B = ∅} in terms of P, rather than M(P). For a matching M ⊂ [nd](2), we denote the induced multigraph on V = {vi}i∈[n] by M(M). To save on notation, we write M(M) for both the graph and its edge set. Conversely, if e = {vi, vj} is a pair of vertices from V, we denote its corresponding (2) (2) pairs in [nd] by e˜ = {{x, y} : x ∈ vi, y ∈ vj}. Finally, for a set E ⊂ V , S we put E˜ = e∈E e˜. Assume that M(P) ⊃ E. Then we can find a matching M ⊂ P such that |M| = |E| = m and M(M) = E. Conditioning over the possible choices of M, we have   PP [M(P) ⊃ E ∧ M(P) ∩ B = ∅] ≤ ∑ PP [P ⊃ M] PP P ∩ B˜ = ∅ | P ⊃ M . M

We bound the two probabilities separately. Fix a choice M = {{xi, yi} : i ∈ [m]}, and let W = [nd] \{x1,..., xm, y1,..., ym}. P [ ⊃ ] ≤ 2 Claim 1. P P M (nd−2m)m 20 anagram-free graph colourings

To show this, we just count perfect matchings. The total number of per- ( ) ( − ) fect matchings P is nd ! . The points from W can be paired in nd 2m ! nd nd −m nd 2 nd 2 ( 2 )!2 ( 2 −m)!2 ways. Altogether, using Stirling’s formula, we get

 nd  (nd − 2m)! 2 ! PP [M ⊂ P] ≤   nd −m (nd)! 2 − m !2 nd −2m nd m  nd − 2m   nd − 2m   nd  2  nd − 2m  =(1 + o(1)) nd e nd − 2m e nd m  2m  2  e  2 =(1 + o(1)) 1 − ≤ . nd nd − 2m (nd − 2m)m

nd −x 2m 2 −m Here we used the fact that since 1 − x ≤ e , we have (1 − nd ) ≤ e . 2β 2   − nd Claim 2. For |B| = βn , PP P ∩ B˜ = ∅ | P ⊃ M ≤ e 5 . nd Let W be as before, and denote N = |W|. Using the assumption m ≤ 20 , 9nd ˜ we get N = nd − 2m ≥ 10 . Let BW = B[W], that is, the set of pairs contained in W which would induce B. By putting the matching M aside, we have lost some pairs from B˜, namely those touching the vertices of M. Each vertex of M is contained in at most |V1|d pairs from B˜, so the hypothesis m ≤ 1 |B|d implies 4|V1|

 2m|V |  1 1 |B | ≥ |B|d2 − 2m|V |d = βn2d2 1 − 1 ≥ βn2d2 ≥ βN2. W 1 βn2d 2 2

A random matching P conditioned on P ⊃ M corresponds to a random matching on W, i.e. an element of GN,1. Hence we can apply Lemma 2.12 1 2 9nd with |BW | ≥ 2 βN , and N = |W| ≥ 10 .

− 8 · 1 βN − 2 βnd PP [P ∩ BW = ∅ | P ⊃ M] ≤ P [E(GN,1) ∩ BW = ∅] ≤ e 9 2 ≤ e 5 .

Claim 1 and 2 hold for any choice of the matching M with M(M) = E. Putting them together, and using the fact that there are at most d2m such matchings M, we get

2m 2 − 2β nd PP [M(P) ⊃ E and M(P) ∩ B = ∅] ≤ d · · e 5 . (nd − 2m)m 2.2 bounded-degree graphs 21

nd Using m ≤ 20 ,  m  m 2m 2 − 2β nd 2d − 2β nd P [M(P) ⊃ E and M(P) ∩ B = ∅] ≤ d · e 5 = e 5 . P nd n

The following lemma says that the edges of the random regular graph are approximately homogeneously distributed. Bounds of this sort are very common ingredients for expansion properties of random graphs. Since the edge distribution of Gn,d is used in several arguments, we state four slightly different variants. In the present chapter on anagrams, properties (P1) and (P2) will imply expansion properties of G1 sampled from Gn,d, which will allow us to do rotations in G1. The aim is to identify large sets of vertex pairs, called ‘boosters’, which could increase the length of the longest path in G1.

Lemma 2.13. G = Gn,d has the following properties with high probability. (P1) For sufficiently large d, any two disjoint vertex subsets T and U with |T| ≥ 10 log d 100 log d d n and |U| ≥ d n, we have d e (T, U) ≥ |T||U| .(2.1) G 20n

30 log d (P2) For sufficiently large d, any vertex set S with |S| ≤ d n satisfies eG(S) < 100|S| log d. (P3) Fix d and γ0 with γ0d ≥ 3. There is an absolute constant α > 0 such that |S|γ0d any vertex set S of order up to αn satisfies eG(S) ≤ 2 . ηn (P4) There is a constant η > 0 such that any vertex set S of order up to d satisfies eG(S) ≤ 3|S|. Proof. We prove Lemma 2.13 for G sampled according to the configuration model, i.e. take G = M(P), where P is a random element of Pn,d. We start with (P1). Take vertex sets T and U in G with |T| = t and |U| = u. We need to bound the probability of the event  d  D = e (T, U) < tu . T,U G 20n

d For a fixed set of m edges E with m ≤ 20n tu, the probability that EG(T, U) = E is at most  m 2d − 2d (tu−m) e 5n . n 22 anagram-free graph colourings

This bound is a direct application of Lemma 2.10 to the edge set E and its bipartite complement (T × U) \ E. Taking the union bound over all sets E, we get

d d 20n tu    m 20n tu  m tu 2d − 2d (tu−m) etu 2d − 2d (tu−m) PP [DT,U] ≤ ∑ e 5n ≤ ∑ · e 5n . m=0 m n m=0 m n We now show that the largest term on the right-hand side corresponds dtu to m = M = 20n . m  etu 2d   2dtu  Claim 3. The function m 7→ m · n is increasing in m for m ∈ 0, n .

Proof. Denote f (m) = m (log C − log m) for an arbitrary constant C > 0.  C  0 For m ∈ 0, e , the derivative f (m) = log C − log m − 1 is positive, so f m f (m)  C  is increasing. It follows that e = m is increasing on the interval in 2dtu question. Taking C = n , we get the required claim. Bounding each summand by m = M, we get

d   20n tu etu · 20n 2d − 2d (tu−M) dtu − 2d (tu−M) P [D ] ≤ M · e 5n = M · (40e) 20n e 5n P T,U dtu n dtu ( 5 − 2 + d ) − dtu = Me n 20 5 n ≤ e 8n .

Finally, we take the union bound over all sets T and U of order at least 10 log d 100 log d t0 = d n and u0 = d n respectively. n n    n n − dtu P [G violates (P1)] ≤ e 8n . P ∑ ∑ t u t=t0 u=u0

n t t log d We use the bound ( t ) ≤ d = e valid for t ≥ t0 and large enough d. n n t log d+u log d − dtu PP [G violates (P1)] ≤ ∑ ∑ e e 8n . t=t0 u=u0

10 log d dtu 100 log d For t ≥ d n, we get u log d ≤ 10n . Similarly, since u ≥ d n, it holds dtu that t log d ≤ 100n .

n log2 d 2 ( 1 + 1 − 1 ) PP [G violates (P1)] ≤ O(n )e 100 10 8 d = o(1). 2.2 bounded-degree graphs 23

To prove the (P2)–(P4), we fix an integer f and a set S of order s. By s f Lemma 2.10, the probability of S spanning at least 2 edges is at most

   2   s f /2  s f /2 _ s /2 2d 2sed PP  E ⊂ E (G) ≤ ≤ .  0  s f /2 n n f (2) s f E0⊂S , |E0|= 2 s f Let Ds denote the event that eG(S) ≥ 2 for some subset S with |S| = s, where f is a parameter to be fixed. Taking the union bound over all vertex sets of order s gives  s s f /2 f n  2sed  ne  s 2ed  2 PP [Ds] ≤ ≤  ·  .(2.2) s n f s n f

To show (P2), set f = 200 log d. The term in square brackets is increasing 30 log d in s (see Claim 3), so for s ≤ d n, "  100 log d#s  s ed e · 30 ed −2 log d −s PP [Ds] ≤ ≤ e < d . 30 log d 100 30 log d √ 30e 100 −2 Here we used the condition 100 ≤ e . For s ≤ n we use (2.2) to get a stronger bound u   1 (1−100 log d) −1 PP [Ds] ≤ O n 2 < n , valid for large d. Putting the two bounds together,

√ 30 log d n d n PP [G violates (P2)] ≤ PP [Ds] + PP [su] ∑ ∑√ s=1 s= n

30 log d n √ d ≤ n · n−1 + d−s = o(1), ∑√ s= n completing the proof of (P2). For (P3) set f = γ0d ≥ 3 and choose α so that the term in square brackets 1 is less than 2 for s = αn (note that this term is increasing in s). We split the 0 1 1 range of s into γ d ≤ s ≤ n 4 and n 4 < s ≤ αn to get " # _ 1  − 3  −s+1 PP Ds ≤ n 4 · O n 8 + ∑ 2 = o(1), s 1 s≥n 4 24 anagram-free graph colourings

as required. s η For (P4), set f = 6. Take η such that n = d again makes the term in 1 brackets at most 2 . The same calculation gives the result.

2.2.2.2 Expansion properties and Pósa rotations We now go back to the proof of Theorem 2.3. Recall that we will be working with the union of random graphs G1 and G2. The next step is to build subsets of [n] which will later give us the required anagram. In everything that follows, take α = 105. Given a d-regular graph G, we say that a subset α log d α log d V1 ⊂ [n] is G-dense if 2d n ≥ |V1| ≥ 4d n, and any vertex v ∈ V1 has at α least 160 log d neighbours in V1. Lemma 2.14. Suppose we are given a d-regular graph G on [n] with properties  α log d  (P1) and (P2), and a vertex colouring c : [n] → C with |C| = 1 − d n colours. For sufficiently large d and n, there exist two disjoint G-dense sets of vertices V1, V2 ⊂ [n] which have the same colouring.

 α log d  Proof. Let c be a colouring of the vertices of G into 1 − d n colours.

α log d α log d Claim 4. There exists a subset Z ⊂ V(G) satisfying d n ≥ |Z| ≥ 2d n such that each colour appears in Z an even number of times, and for all v ∈ Z, α |NG(v) ∩ Z| ≥ 40 log d. We construct Z using the following algorithm. Let V(G) = [n], and de- α note δ = 40 log d. Let X contain one vertex from each colour class with  α log d  an odd number of colours, so |X| ≤ 1 − d n. We assume |X| =  α log d  1 − d n, by discarding more pairs of vertices of the same colour if necessary. Furthermore, set R0 = Rˆ 0 = ∅. Note that from this step on- wards, all colour classes in [n] \ (X ∪ Ri ∪ Rˆ i) will have even order. For i ≥ 0, we form Ri+1 := Ri ∪ {v}, Rˆ i+1 = Rˆ i ∪ {w}, where v is the smallest vertex with fewer than δ neighbours in [n] \ (X ∪ Ri ∪ Rˆ i), and w is the smallest vertex with c(v) = c(w). When there are no such vertices v, set Z = [n] \ (X ∪ Ri ∪ Rˆ i) and terminate the algorithm. We claim that this 10 log d occurs after at most d n steps. Suppose that it is not the case and G satisfies (P1) and (P2), but the 10 log d algorithm continues beyond t = d n steps. Look at the sets Rt and 2.2 bounded-degree graphs 25

Zt = [n] \ (X ∪ Rt ∪ Rˆ t). Each vertex from Rt has fewer than δ neighbours in Zt, so α e (R , Z ) < δ|R | = |R | log d. G t t t 40 t 10 log d α log d On the other hand, since |Rt| = t = d n and |Zt| = d n − 2t ≥ α log d 2d n, the property (P1) gives d α log d d α e (R , Z ) ≥ |R ||Z | ≥ |R | · · = |R | log d. G t t t t 20n t 2d 20n 40 t We reached a contradiction, so indeed we have the desired set Z with α log d |Z| ≥ 2d n. We show the existence of the required partition of Z into sets V1 and −1 V2 using a probabilistic argument. Partition each colour class c (a) into ordered pairs arbitrarily, and denote the collection of pairs by Q. For each pair (v, w) ∈ Q, randomly and independently put v into V1 and w into V2 1 or vice versa, with probability 2 . This guarantees that V1 and V2 have the same colouring. Claim 5. With positive probability, the partition satisfies deg (v) ≥ α log d G[Vi] 160 for all v ∈ Vi and i ∈ {1, 2}.

We use the Local lemma. Fix a vertex v, wlog v ∈ V1. Let Bv be the event α that fewer than 160 log d neighbours of v in Z have ended up in V1. Let S α be a set of 40 log d neighbours of v in Z, and let T ⊂ S be the set of vertices whose match according to Q does not lie in S. Note that S contains exactly 1 α y = 2 (|S| − |T|) pairs of Q. If Bv occurs, then |S ∩ V1| ≤ 160 log d, and 1 α therefore 2 |S| − |S ∩ V1| ≥ 160 log d. But this implies 1 1 α |T| − |T ∩ V | = (|S| − 2y) − (|S ∩ V | − y) ≥ log d. 2 1 2 1 160  1  |T ∩ V1| is a random variable with distribution B |T|, 2 , so Chernoff bounds (as stated in [103, Remark 2.5]) give

  2 1 α − 2 ( α log d) P [B ] = P |T| − |T ∩ V | ≥ log d ≤ e |T| 160 ≤ e−3 log d = d−3. v 2 1 160 α 5 Here we used |T| ≤ 40 log d and α = 10 . Two events Bv and Bw are dependent only if v and w share a neighbour, or if some two neighbours of v and w are paired. In such a dependency 2 graph, the event Bv has degree at most 2d . Since for sufficiently large d, e(2d2 + 1)d−3 < 1, 26 anagram-free graph colourings

the Lovász local lemma grants that there is a splitting avoiding all the bad events Bv. This is exactly the required splitting. It concludes the proof of Claim 5 and Lemma 2.14.

We say a graph G is a p-expander if it is connected, and |NG(U)| ≥ 2|U| for vertex sets U with |U| ≤ p.

Lemma 2.15. Let G be a d-regular graph on vertex set [n] satisfying (P1) and   |V1| (P2), and let V1 ⊂ [n] be a G-dense subset of [n]. Then G[V1] is a 4 -expander.

Proof. Denote H = G[V1]. To show expansion, suppose for the sake of 10 log d contradiction that |NH(U)| < 2|U|, and first assume that |U| ≤ d n. We can apply (P2) to T = U ∪ NH(U) ⊂ V1, using the assumption |T| ≤ 30 log d d n. This gives e(G[T]) ≤ 100|T| log d. Counting all the edges with 1 α an endpoint in U, which certainly lie inside T, we get 2 · 160 |U| log d ≤ 1 α 105 e(G[T]). The two inequalities imply |T| ≥ 100 · 320 |U| = 32000 > 3|U|, which contradicts our assumption. |V1| 10 log d Secondly, in case 4 ≥ |U| ≥ d n and |NG(U)| < 2|U|, we have 4 |V1| 10 log d |V1 \ (U ∪ NH(U))| ≥ 4 ≥ d n. This puts us in the position to apply (P1) and claim that G contains edges between U and V1 \ (U ∪ NH(U)), |V1| contradicting the definition of NH(U). Hence sets of order up to 4 indeed expand in H. Finally, assume that H is not connected. Let its smallest component be |V1| spanned by S ⊂ V1, i.e. |S| ≤ 2 and NH(S) = ∅. We already showed that |V1| certainly |S| > 4 . But then the fact that EG(S, V1 \ S) = ∅ contradicts (P1).

We will use these expansion properties to build long paths and ulti- mately a Hamilton cycle in G[Vi]. Our approach is based on the rotation- extension technique originally developed by Pósa. Given a graph G, denote the length (number of edges) of the longest path in G by `(G). We say that a non-edge {u, v} ∈/ E(G) is a booster with respect to G if G + {u, v} is Hamiltonian or `(G + {u, v}) > `(G). We denote the set of boosters in G by B(G). Pósa’s rotation technique guarantees that there exist plenty of boosters in G (see, for instance, Corollary 2.10 from [115]).

Lemma 2.16. Let p be a positive integer. Let G = (V, E) be a p-expander. Then p2 |B(G)| ≥ 2 . 2.2 bounded-degree graphs 27

2.2.2.3 Using G2 to hit boosters in G1 = Now we move on to G2 Gn,d2 . Recall that we would like to add its edges to G1[V1] and complete a cycle on V1. However, we have to argue carefully because the choice of a G1-dense set V1 will depend on the given vertex colouring.

Lemma 2.17. Let G1 be a d1-regular graph on [n] with properties (P1) and (P2), d1 d1 for sufficiently large d1, and let 150 ≤ d2 ≤ 100 . With high probability, G2 = ⊂ [ ] ( ∪ )[ ] Gn,d2 has the property that for any G1-dense subset V1 n , G1 G2 V1 is Hamiltonian.

The proof of Lemma 2.17 consists of two parts. First we identify a de- terministic property that is sufficient to make (G1 ∪ G2)[V1] Hamiltonian, and then, using the configuration model, we show that Gn,d2 possesses this property with high probability.

Lemma 2.18. Let H1 and H2 be graphs on vertex set V1. Suppose that for any 0 0 edge set E ⊂ E(H2) with |E | ≤ |V1|,

0 0 H1 ∪ E is Hamiltonian, or B(H1 ∪ E ) ∩ E(H2) 6= ∅.

Then the graph H1 ∪ H2 is Hamiltonian.

Proof. We will build a subset of E(H2) such that its addition to H1 creates a Hamiltonian graph. Start with E0 = ∅. Assume that Ei is a subset of i edges in E(H2). If the graph H1 ∪ Ei is Hamiltonian, we are done. Otherwise, by hypothesis, E(H2) ∩ B(H1 ∪ Ei) contains an edge e, so we set Ei+1 = Ei ∪ {e}. In each step i, we have `(H1 ∪ Ei+1) > `(H1 ∪ Ei), so the process termi- nates after at most |V1| steps, with a Hamiltonian graph H1 ∪ Ei.

Lemma 2.19. Let G1 be a d1-regular graph on V with properties (P1) and (P2), | | = = d1 ≤ ≤ d1 where V n and d1 is sufficiently large. Let G2 Gn,d2 for 150 d2 100 . ∈ ⊂ We say that G2 AG1 (or AG1 occurs) if there exists a G1-dense subset V1 V, V1 and an edge set E ⊂ ( 2 ), |E| ≤ |V1|, such that G2 contains E and does not (( + )[ ]) P   = ( ) intersect B G1 E V1 . It holds that AG1 o 1 .

Proof. We will prove the claim for G2 sampled according to the configura- tion model, which is contiguous to the uniform model Gn,d2 . This allows us to apply Lemma 2.10, which gives us a precise estimate on the prob- ∈ P ability of (non-)occurrence of certain edge sets. Let P n,d2 be chosen 28 anagram-free graph colourings

uniformly at random. We will actually bound the probability that the in- M( ) P   duced multigraph P is in AG1 , denoted by P AG1 , with a slight

abuse of notation for not renaming the event AG1 itself. V1 Fix a G1-dense subset V1 ⊂ V with |V1| = ξn, and E ⊂ ( 2 ) with |E| = 5 α log d1 10 log d1 m ≤ |V1|. Recall that since V1 is G1-dense, ξn ≥ 4d n = 4d n. Note   1 1 |V1| that the graph G1 + E is a 4 -expander, so we apply Lemma 2.16, which −5 2 2 says that the set of boosters B = B((G1 + E)[V1]) contains at least 2 ξ n edges. Applying Lemma 2.10 to E and B, we get  m 2d2 − 2 ξ2nd P [M(P) ⊃ E and M(P) ∩ B = ∅] ≤ e 5·25 2 . P n

Now we can take the union bound over all choices of E and V1. We n crudely bound the number of ways to choose V1 by n(ξn).

2 2   ξn  ξ n   m 1 2   n 2d2 − ξ nd2 P A ≤ n 2 e 5·24 P G1 ∑ . ξn m=1 m n

ξ2n2  m  2 m 2 2d2 eξ nd2 The term ( m ) n ≤ m is increasing in m in the given range, and hence

 1 ξn   −1 − ξd2 P ≤ · · · · 5·24 P AG1 n ξn eξ eξd2 e .

Introducing the value of ξ, the term in brackets is at most

300d 105 d log d 2 − 2 1 − d 2 4·5·24 d1 2 1 e d2e ≤ e d2d1 .

h i − 300 d1 d1 1 150 For d2 ∈ 150 , 100 the term above is upper-bounded by d1 , so

P   ≤ 2 −Ω(ξn) = −Ω(ξn) P AG1 ξn e e , as claimed. = Proof of Lemma 2.17. Since G1 satisfies (P1) and (P2), for G2 Gn,d2 it holds ∈ ⊂ with high probability that G2 / AG1 . Hence, given a G1-dense set V1 V we can apply Lemma 2.18 to G1[V1] and G2[V1] to find a Hamilton cycle in (G1 ∪ G2)[V1], as required. 2.2 bounded-degree graphs 29

We are now ready to put together the proof.

l d m Proof of Theorem 2.3. For a given d, set d2 = 2 · 300 and d1 = d − d2. Let d 1 be large enough so that d2 ≤ 100 d1, and for Lemma 2.13 and Lemma 2.17 to hold. Moreover, by choosing d2 to be even, we ensured that whenever nd is even (so that Gn,d is non-empty), nd1 and nd2 are also even. = = Generate G1 Gn,d1 and G2 Gn,d2 on vertex set V. Suppose that G1 has properties (P1) and (P2), and G2 satisfies the conclusion of Lemma 2.17. By Lemma 2.13 and Lemma 2.17, this holds with high probability. We claim   α log d1 5 that in this case πα(G1 ∪ G2) > 1 − d n, where α = 10 as before. h  i 1 Let c : V → 1 − α log d1 n be a given colouring. d1 We first use Lemma 2.14 to find G1-dense sets V1, V2 ⊂ V with the same colouring. Then by Lemma 2.17, we conclude that the graphs (G1 ∪ G2)[Vi] are Hamiltonian, for i = 1, 2. Let C1 and C2 be Hamilton cycles on V1 and V2. G1 satisfies (P1), which implies that it contains an edge between some two vertices v1 ∈ V1 and v2 ∈ V2. We form the required path S by going along C1, using v1v2 to skip to V2 and then going along C2. The segments S[V1] and S[V2] give an anagram in c, as required. It remains to express the bound in terms of d. Note that d1 lies between  1  1 − 100 d and d, so

α log d 105 log d 2 · 105 log d 1 ≤ ≤ . d1 d1 d

 2·105 log d  Hence πα(G1 ∪ G2) > 1 − d n, and by contiguity, the same holds for Gn,d with high probability. concluding remarks

We have studied anagram-free colourings of graphs, and showed that there are very sparse graphs in which anagrams cannot be avoided unless we basically give each vertex a separate colour. Interestingly, a couple of days after submitting the paper, we learned that Wilson and Wood [145] inde- pendently studied anagram-free graph colourings. In particular, our Propo- sition 2.1, which shows that complete binary trees have an unbounded anagram-free chromatic number, answers one of their questions. Wilson and Wood have later obtained some positive results, showing that every graph has an anagram-free 8-colourable subdivision [146]. 30 anagram-free graph colourings

Our research suggests several interesting questions, some of which we mention here. The first question concerns the lower bound on the anagram-chromatic number for trees. Is there a family of trees T(n) on n vertices for which πα (T(n)) ≥ ε log n for some positive constant ε > 0? We remark that this is the case for the analogous problem of finding the anagram-chromatic index of a tree. Indeed, a simple counting argument (cf. Proposition 2.7) shows that if instead of vertex colourings, we colour edges of a graph, then to avoid anagrams in the complete binary tree of depth h, we need to use at 1 least d 4 he colours. In estimating the anagram-chromatic number of planar graphs we relied only on the fact that they have small separators. It would be interesting to know a better lower bound on πα(G) for such graphs. In particular, we wonder if there exists a family Hn of planar graphs on n vertices such that ε πα(Hn) ≥ n for some absolute constant ε > 0. Let G(n, d) denote the graph with the largest anagram-chromatic num- ber among all graphs G on n vertices with ∆(G) ≤ d. Our main result  log d  shows that if d is large enough then πα(G(n, d)) ≥ n 1 − C , while for d √ n d = 4 we can only provide a construction which gives πα(G(n, 4)) ≥ 2 log n . We believe that there exist cubic graphs for which the anagram-chromatic number grows linearly with the order of the graph. It would be nice to know how fast the function f (d) = 1 − lim supn→∞ πα(G(n, d))/n decreases with d. Let us recall that from Proposition 2.9 and Theorem 2.3 it follows that

1  log d  ≤ f (d) = O . d d We do not know the correct bound, but we have good reasons to believe that the upper bound can be improved. Indeed, consider a graph which is a union of 2n/d cliques of size d/2 and a random n-vertex d/2-regular graph. We think that using such a construction one can show that f (d) ≤ (log d)1/2+o(1)/d but the proof looks quite involved and would probably not be worth the effort since it is by no means clear whether it would give the right order of f (d). Moreover, Lemma 2.17 motivates questions on Hamiltonicity of small induced subgraphs of Gn,d. Pursuing our proof outline, we can prove the following. 2.2 bounded-degree graphs 31

Claim 6. There is a constant C such that with high probability, G = Gn,d has the q log d following property. For any vertex set V1 ⊂ [n] of order at least C d n, if the d graph H = G[V1] has minimum degree at least 10n |V1|, then H is Hamiltonian. = = C p = To see this, take G2 Gn,d2 for d2 20 d log d, and G1 Gn,d1 for = − = ∪ = [ ] d1 d d2. Consider G Gn,d1 Gn,d2 . Let V1 and H G V1 satisfy q log d the hypothesis, and denote |V1| = ξn with ξ ≥ C d . Since the graph ξd ξd G[V1] has minimum degree at least 10 , and we ensured d2 ≤ 20 , G1[V1] ξd has minimum degree at least 20 . This guarantees that G1[V1], as well as 2 2 any graph on V1 containing it, has Θ(ξ n ) boosters. On the other hand, −Ω(ξd ) the condition d2e 2 < 1 implies that G2 hits those boosters with high probability (see the calculation in Lemma 2.19). Hence G[V1] is Hamilto- nian for any such V1, and by contiguity, Gn,d satisfies the claim. The above discussion leads to the natural question, what is the small- q log d est possible lower bound on |V1| in Claim 6? Note that |V1| = C d n is the best we can get from our approach. Namely, the above-mentioned q   1  1  ξd log d conditions require Ω ξ log ξ = d2 ≤ 20 , i.e ξ = Ω d n .

We also give a lower bound on |V1|. Using independent sets in Gn,d, log d one can find an induced unbalanced bipartite subgraph of order d n with high minimum degree, which is obviously non-Hamiltonian. This log d observation implies that we need at least |V1| ≥ d n. We wonder how tight this estimate is. Combinatorics-of-words questions that motivated our resarch remain open. Firstly, the list-colouring version of Keränen’s result is still open – can an infinite anagram-free sequence be constructed from lists of bounded size? Our work on graphs suggests that this is much more difficult than avoiding repetitions. A positive answer to this question would invariably contain new ideas since so far, the only method we know of generating anagram-free sequences are repeated substitutions (or word morphisms) originating in Thue’s work. New constructions are also likely to improve on the results of Wilson and Wood [146]. Secondly, a further weakening of an anagram would be consecutive seg- ments of equal length and equal sums. With a slight reformulation, is it true that for any non-decreasing sequence (an)n∈N with bounded differ- 2 ences, the sequence (n, an) in Z contains a three-term arithmetic progres- sion? There are surprising related negative results obtained by the method of repeated substitutions. Dekking [52] has found a sequence satisfying 32 anagram-free graph colourings

an+1 − an ≤ 1 containing no five-term arithmetic progressions, and Cas- saigne et al. [42] have managed to avoid four-term arithmetic progressions with an+1 − an ≤ 4. 3 RAINBOWCONNECTIVITY

An edge coloured graph is rainbow-connected if there is a rainbow path be- tween any two vertices, i.e. a path all of whose edges carry distinct colours. Any connected graph G of order n can be made rainbow-connected using n − 1 colours by choosing a spanning tree and giving each edge of the span- ning tree a different colour. Hence we can define rainbow connectivity, rc(G), as the minimal number of colours needed to make G rainbow-connected. Rainbow connectivity is introduced in 2008 by Chartrand et al. [46] as a way of strengthening the notion of connectivity, see for example [40], [45], [55], [84], [120], and the survey [122]. The concept has attracted a consid- erable amount of attention in recent years. It is also of interest in applied settings, such as securing sensitive information transfer and networking. For instance, [43] describe the following setting in networking: we want to route messages in a cellular network such that each link on the route between two vertices is assigned with a distinct channel. Then, the mini- mum number of channels to use is equal to the rainbow connectivity of the underlying network. We are interested in upper bounds on rainbow connectivity, first studied by Caro et al. [40]. The trivial lower bound is rc(G) ≥ diam(G), and it turns out that for many classes of graphs, this is a reasonable guess for the value of rainbow connectivity. Caro et al. [40] showed that a connected graph of order n and minimum degree δ ≥ 3 has rainbow connectivity at most 5n 3n 6 . Since the diameter of such a graph is at most δ+1 (see, e.g., [68]), it is natural to ask whether the rainbow connectivity of G is of the same 20n order. Krivelevich and Yuster [120] showed that indeed rc(G) ≤ δ . Then 3n Chandran et al. [45] settled this question by proving rc(G) ≤ δ+1 + 3, which is asymptotically tight. Rainbow connectivity of the random regular graph was first studied by Frieze and Tsourakakis [84]. The question is natural seeing as Gn,d has rather strong connectivity properties - for example, the diameter of Gn,d is log n with high probability asymptotic to log(d−1) , see [25]. In [84] it was shown ϕd that with high probability rc (Gn,d) = O (log n), for a constant ϕd > 2. Dudek, Frieze and Tsourakakis [55] improved this bound to rc (Gn,d) =

33 34 rainbow connectivity

O(log n), which is the correct dependence on n. We will return to this result later. The aim of this chapter is to present a simple approach which immedi- ately implies results on rainbow connectivity of several classes of graphs. It provides a unified approach to various settings, yields new theorems, strengthens some of the earlier results and simplifies the proofs. It is based on edge- and vertex-splitting. The main idea of the edge-splitting lemma is simple: we decompose G into two edge-disjoint spanning trees T1 and T2 with a common root vertex and small diameters. We use different palettes for edges of T1 and T2, ensuring that each tree contains a rainbow path from any vertex to the root. Hence if we can get the diameters of T1 and T2 ‘close’ to the diameter of G (say within a constant factor), then we have obtained a strong result. In Section 3.1, we exhibit a few applications of the lemma. First we use it to give a straightforward proof of the result of Krivelevich and Yuster [120], that is Theorem 3.1. For a connected n-vertex graph G of minimum degree δ ≥ 4, 16n rc(G) ≤ . δ Next we turn to random regular graphs for constant d and n → ∞. The colouring of Gn,d of Dudek et al. [55] typically uses Ω(d log n) colours, which for large d is significantly bigger than the diameter of Gn,d. Using our splitting lemma we can improve it to an asymptotically tight bound.

Theorem 3.2. There is an absolute constant c such that for d ≥ 5 , rc(Gn,d) ≤ c log n with high probability. log d For d ≥ 6, the theorem is an immediate consequence of the contiguity of different models of random regular graphs. With little extra work, our approach also works for 5-regular graphs. We would like to point out that the proof of Dudek et al. works starting from d = 4. The question of which characteristics of Gn,d ensure small rainbow con- nectivity arises naturally. Recalling that expander graphs also have diam- eter logarithmic in n, it makes sense to look at expansion properties. The following theorem can be viewed as a generalisation of the previous result on Gn,d. Theorem 3.3. Let ε > 0. Let G be an n-vertex d-regular graph whose edge expansion is at least εd. If d ≥ max 64ε−1 log 64ε−1 , 324 , then rc(G) = O ε−1 log n . 3.1 edge rainbow connectivity 35

In particular, this theorem applies to (n, d, λ)-graphs with λ ≤ d(1 − 2ε), i.e. n-vertex d-regular graphs whose all eigenvalues except the largest one are at most λ in absolute value. We discuss this class of graphs in more detail in Section 4.3. Krivelevich and Yuster [120] have introduced the corresponding concept of rainbow vertex connectivity rvc(G), the minimal number of colours in a vertex colouring which contains a rainbow path between any two vertices. As before, a path is said to be rainbow if its internal vertices carry distinct colours. The easy bounds diam(G) − 1 ≤ rvc(G) ≤ n also hold in this setting. Krivelevich and Yuster have demonstrated that it is impossible to bound the rainbow connectivity of G in terms of its vertex rainbow connectivity, or the other way around. They also bound rvc(G) in terms of the minimal degree. Our approach essentially works for vertex colouring as well. In Section 3.2 we present the vertex-splitting lemma. It is then used to prove the vertex-colouring analogue of Theorem 3.2 on random regular graphs.

Theorem 3.4. There is an absolute constant c such that with high probability c log n rvc(G ) ≤ for all d ≥ 28. n,d log d

3.1 edge rainbow connectivity

3.1.1 The edge-splitting lemma

We state and prove the main lemma. The rest of the section uses the same notation for spanning subgraphs G1 and G2. Lemma 3.5. Let G = (V, E) be a graph. Suppose G has two connected spanning subgraphs G1 = (V, E1) and G2 = (V, E2) such that |E1 ∩ E2| ≤ c. Then rc(G) ≤ diam(G1) + diam(G2) + c.

Proof. Let B = E1 ∩ E2. Colour the edges of B in distinct colours. These colours will remain unchanged, and the remaining edges get coloured ac- cording to graph distances in G1 and G2, denoted by d1 and d2. Choose an arbitrary v ∈ V and define distance sets Uj = {u ∈ V : d1(v, u) = j} and Wj = {u ∈ V : d2(v, u) = j}. For 1 ≤ j ≤ diam(G1), colour the edges between Uj−1 and Uj with colour aj. Similarly, using a new palette (bj), colour the edges between Wj−1 and Wj with colour bj for each 1 ≤ j ≤ diam(G2). Finally, we give the edges which do not belong to G1 or G2 the 36 rainbow connectivity

colour a1. The colouring indeed uses at most diam(G1) + diam(G2) + c colours. It remains to exhibit a rainbow path between two arbitrary vertices x1 and x2 in V. Let Pi be a shortest path in Gi from xi to v. By our definition of colouring on distance sets, both paths P1 and P2 are rainbow. If they are edge-disjoint, the concatenation is a rainbow path between x1 and x2. Otherwise, P1 and P2 can only intersect in edges of B. If this occurs, we walk from x1 along P1 to the earliest common edge. We use this edge to switch to P2 and walk to x2.

3.1.2 Rainbow connectivity and minimum degree

In this setting, the best possible result has been shown by Chandran et al. [45]. Namely, a connected graph G of order n and minimum degree δ 3n satisfies rc(G) ≤ δ+1 + 3. We show how the splitting lemma can be used 16n with basic graph theory to obtain a good upper bound, rc(G) ≤ δ . Proof of Theorem 3.1. Let G = (V, E) be as in the statement. We split G into δ−1 two spanning subgraphs of minimum degree at least 2 . First assume that all vertices of G have even degree. Then, using connectedness of G, order its edges along an Eulerian cycle e1, e2 ... em, and define

F1 = {ej : j ∈ [m] even} and F2 = {ej : j ∈ [m] odd}.

Edges around each vertex are coupled into adjacent pairs ejej+1, so this is indeed a balanced split. Let Hi = (V, Fi) be the associated graphs. To apply this splitting to general G, note that the number of vertices of odd degree is even, so we can add a matching M between those vertices. Even if G0 = (V, E ∪ M) contains double edges, it still has an Eulerian cycle. We apply the above procedure to G0, and then remove the auxiliary edges M. The end result is that a vertex of odd degree d in G has degree d±1 δ−1 2 in Hi, so indeed subgraphs Hi have minimum degree at least 2 . The graph H1 may not be connected. But since the minimum degree of δ−1 δ this graph is 2 , each connected component has order at least 2 . Hence 2n the number of components of H1 is at most δ , so we can add a set B1 ⊂ E 2n such that G1 = (V, F1 ∪ B1) is connected, and |B1| ≤ δ . We define the set B2 analogously. An elementary graph-theoretic result (mentioned in the introduction, see also [68]) shows that subgraphs G1 and G2 of G have 3n 6n diameters at most δ−1 ≤ δ . Applying the edge-splitting lemma to G1 2 +1 6n 6n 4n 16n and G2 gives rc(G) ≤ δ + δ + δ ≤ δ . 3.1 edge rainbow connectivity 37

3.1.3 Expanders

We adopt a weak definition of an expander. As before, G = (V, E), the degree d is fixed and the order n tends to infinity. For S ⊂ V, we define out(S) to be the set of edges with exactly one endpoint in S . A graph G has n edge expansion Φ if every set S ⊂ V with |S| ≤ 2 satisfies |out(S)| ≥ Φ|S| . Frieze and Molloy [81] have shown using the Lovász local lemma that the natural random k-splitting of E gives k expander graphs with positive probability. We state their theorem for k = 2.

Theorem 3.6. Let d be a natural number, λ > 0 a real number, and G = (V, E) a d-regular graph with edge expansion Φ. Suppose

Φ d ≥ 8λ−2 and ≥ 14λ−2. log d log d

Then there is a partition E = E1 ∪ E2 such that both subgraphs Gi = (V, Ei) have Φ edge expansion at least (1 − λ) 2 . Under stronger conditions on expansion, they also give a randomised polynomial-time algorithm for the splitting, which we can use to give a constructive upper bound on the rainbow connectivity of G.

Proof of Theorem 3.3. Let G be a d-regular graph with edge expansion εd. 1 −1 −1 We will apply Theorem 3.6 with λ = 2 . The hypothesis d ≥ 64ε log 64ε εd ensures that log d ≥ 32, and the second inequality follows from d ≥ 324. We get a partition E = E1 ∪ E2 such that each graph Gi = (V, Ei) has edge εd expansion at least 4 . The maximum degree of Gi is at most d, so every set n ε  S of order |S| ≤ 2 has a neighbourhood Γ(S) of order |Γ(S)| ≥ 1 + 4 |S|. Thus the number of vertices within distance at most ` from any vertex in Gi n ` o −1  is at least min (1 + ε/4) , n/2 and therefore diam(Gi) = O ε log n . −1  Applying Lemma 3.5 gives rc(G) ≤ diam(G1) + diam(G2) = O ε log n .

3.1.4 Random regular graphs

In Section 2.2.2.1, we discussed the contiguity of Gn,d with the superposi- tion models for d-regular graphs. The specific results we use in proving Theorem 3.2 are Gn, d+d0 ≈ Gn, d ⊕ Gn, d0 and Gn, d+2 ≈ Gn,d ⊕ Hn, where Hn 38 rainbow connectivity

is a random Hamiltonian cycle on [n]. Recall that Theorem 3.2 says that for c log n d ≥ 5, rc(Gn,d) ≤ log d with high probability. Proof of Theorem 3.2 for d ≥ 6. As usual, we assume that dn is even, and = define di so that Gn, di are non-empty for i 1, 2. If d is odd, then n is d±1 d d even and we can set di = 2 . Otherwise, we set d1 = d2 = 2 or di = 2 ± 1 as appropriate. The observation at the end of the proof resolves the case d = 6. Let Gi be a random di-regular graph, di ≥ 3. Then with high probability (1+o(1)) log n c log n diam(Gi) ≤ ≤ , where c is a suitable constant. Let G be log(di−1) 2 log d the union of two such edge-disjoint graphs G1 and G2. The splitting lemma c log n gives rc(G) ≤ log d . ⊕ Since G was sampled from Gn, d1 Gn, d2 , the random d-regular graph has the same property with high probability. For d = 6 and odd n, we take G to be sampled from Hn ⊕ Hn ⊕ Hn. The first two Hamiltonian cycles belong to G1, resp. G2. We split the edges of n−1 the third Hamiltonian cycle Hn alternately, so that 2 edges are assigned n+1 to G1 and 2 to G2. Then we can cite Proposition 3.8, a result of Bollobás and Chung which says that the union of a Hamiltonian cycle and a random perfect matching has logarithmic diameter [24] with high probability.

The remainder of the section deals with the case d = 5. Since Gn, 5 ≈ Gn, 1 ⊕ Hn ⊕ Hn, we can model our 5-regular graph as a union of two ran- dom graphs G1 and G2, where each Gi is an edge-disjoint union of a Hamil-  n  tonian cycle and a matching of size 4 . The following theorem says that with high probability each Gi has diameter O(log n), so rc(G) = O(log n) follows from the splitting lemma.

Theorem 3.7. Let G be a random graph on [n] generated as the union of the cycle  n  (1, 2, . . . , n, 1) and a random matching on [n] consisting of 4 edges. Then G has diameter O(log n) with high probability.

The theorem can be proved by adapting the argument of Krivelevich et al. [117], who showed that starting from a connected n-vertex graph C and ε in addition, turning each pair of vertices into an edge with probability n , the resulting graph typically has logarithmic diameter. This is very similar to what we need when C is a Hamiltonian cycle. However, since we are adding a random matching rather than independent edges, our model is slightly different. Instead of reproving the result of [117] in our setting, we decided to give a different (very short) proof relying on the following 3.1 edge rainbow connectivity 39

proposition (see [148]), which by contiguity simply says that Gn, 3 has log- arithmic diameter with high probability. Without assuming that the cycle and matching are edge disjoint this was proved earlier by Bollobás and Chung [24]. Proposition 3.8. Let H be a graph formed by taking a disjoint union of a random  n  matching of size 2 and an n-cycle. With high probability, the diameter of H is (1 + o(1)) log2 n.  n  Denote m = 4 . Note that G in Theorem 3.7 can be built in two steps as follows. First we select a random subset B = {b1, b2,..., b2m} ⊂ [n] of order 2m, and then independently a random perfect matching on {b1, b2,..., b2m}. Throughout the proof we identify the vertices of G with natural numbers up to n and assume b1 < b2 < ··· < b2m. Given a subset B, define variables Yi = bi+1 − bi for i = 1, . . . 2m − 1. Moreover, we define Y0 = b1 and Y2m = n − b2m to record the positions of the first and the last vertex in B. An important observation is that a random set B of order 2m induces a random sequence (Y0, Y1,..., Y2m) 2m with Yi ≥ 1 for i < 2m, Y2m ≥ 0 and ∑i=0 Yi = n and, vice versa, given such a random sequence, we can uniquely reproduce a corresponding set B, which is uniformly distributed over all subsets of [n] of order 2m. To complete the proof, we need the following simple lemma about (Yi).

Lemma 3.9. Let (Y0, Y1,..., Y2m) be a random sequence as defined above. Fix a set of indices 0 ≤ i1 < i2 < ··· < is < 2m. Then P [Y2m > log n] = o(1) and

" s # P Y > s ≤ e−2s ∑ ij 10 . j=1

Proof of Lemma 3.9. Since permuting the variables Yi (where i < 2m) does not change the probability space, without loss of generality we may assume (i1,... is) = (0, . . . , s − 1). Recall that Yi were defined by Yi = s−1 bi+1 − bi, so that ∑i=0 Yi > 10s means exactly that there are at most s − 1 vertices of B among the first 10s vertices. On the other hand, |B ∩ [10s]| is 2m a hypergeometric random variable with mean n · 10s. Therefore, by the standard tail bounds (see, e.g., Theorem 2.10 in [103]).

" # 2 s−1 2( 20m −1) s2 − n −2s P ∑ Yi > 10s = P [|B ∩ [10s]| ≤ s − 1] ≤ e 10s ≤ e . i=0

Similarly, Y2m > log n means that no vertex of B is in the interval [n − ] n−log n n = ( ) log n, n . The probability of this event is ( 2m )/(2m) o 1 .  40 rainbow connectivity

Proof of Theorem 3.7. As we explained, our G can be constructed as fol- lows. Start with a cycle b1b2 ... b2mb1. Pick a random perfect matching M on B = {b1, b2,... b2m} whose edges do not coincide with any edges of the cycle. Let H = H(M) be the graph on B formed as the union of the cycle b1b2 ... b2mb1 and the matching M. Choose a random sequence (Y0, Y1,..., Y2m) as above. The graph G on [n] is obtained by subdividing each edge bibi+1 into Yi edges. The exception is the edge b2mb1, which is subdivided into Y2m + Y0 edges. Note that M and (Yi) are chosen indepen- dently. Since M is random, by Proposition 3.8 H(M) has diameter at most (1 + o(1)) log2(2m) ≤ 1.5 log n − 1 with high probability. Condition on this event, and fix an arbitrary M which satisfies the condition. We will show that for random (Yi), with high probability G will have small diameter. We further condition on the event that Y2m ≤ log n, which by the previous lemma holds with high probability. Let s = 1.5 log n. Take the vertices u and v in [n], and single out the segments to which they belong, bi ≤ u < bi+1 and bj ≤ v < bj+1 (i and j are possibly 0 or 2m − 1). H contains a path P between bi and bj of length at most s − 1, which we turn into a path in G as follows. If an edge on P belongs to the matching M, then it is also an edge of G. Otherwise, if the edge has form bkbk+1, we replace it by the segment bk, bk + 1, bk + 2, . . . , bk+1 in G, whose length is Yk. If P contains the edge b2mb1, the corresponding segment has length Y2m + Y0. At the ends of the path, we walk from u to bi and from bj to v. Denote by U the set of indices k < 2m such that P contains a vertex bk. Since Yi ≥ 1 for i < 2m, the distance between u and v in G is at most Y2m + 1 + ∑k∈U max{1, Yk} < s + ∑k∈U Yk. Note also that |U| = |P| + 1 ≤ s and that P, U do not depend on variables (Yk). Thus, by Lemma 3.9, the probability that this distance exceeds 11s is at most e−2s = n−3. Taking the union bound over all pairs of vertices, P [diam(G) > 11s | M] = O n−1. Since we conditioned on the event with probability 1 − o(1), the probability that diam(G) > 11s is at most o(1), completing the proof.

3.2 vertex rainbow connectivity

We now state the vertex-colouring analogue of Lemma 3.5.

Lemma 3.10. Let G = (V, E) be a graph. Suppose that V1, V2 ⊂ V satisfy: 1) V1 ∪ V2 = V; 2) |V1 ∩ V2| ≤ c; 3) every vertex v ∈ V1 has a neighbour in V2 and vice versa; 4) G[Vi] is connected, for i = 1, 2. Then

rvc(G) ≤ diam (G[V1]) + diam (G[V2]) + c + 2. 3.2 vertex rainbow connectivity 41

Proof. Let B = V1 ∩ V2. Colour the vertices of B in distinct colours. These colours will remain unchanged, and the remaining vertices get coloured according to graph distances di in Gi = G[Vi]. Choose root vertices vi ∈ Vi such that v1v2 is an edge of G. Give each distance set {u ∈ V1 : d1(v1, u) = j} the colour aj, for 0 ≤ j ≤ diam(G1). Similarly, each set {u ∈ V2 : d2(v2, u) = j} gets colour bj. Given vertices x1 ∈ V1 and x2 in V, we will find a rainbow path between them. Suppose first that x2 lies in V2, and let Pi be a shortest path in Gi from xi to vi. By our definition of colouring on distance sets, both paths P1 and P2 are rainbow. If they are vertex-disjoint, the concatenation P1 − v1v2 − P2 is a rainbow path between x1 and x2. Otherwise, P1 and P2 can only intersect in vertices of B. If this occurs, we walk from x1 along P1 to the earliest common vertex. We use this vertex to switch to P2 and walk to x2. If x2 does not lie in V2, we replace it with its neighbour in V2, which exists by hypothesis, and then proceed with the argument. The case where x1, x2 ∈/ V1 is treated similarly.

3.2.1 Random regular graphs

We now apply the lemma to the random d-regular graph. Preliminaries on the edge distribution in Gn,d are proven in Section 2.2.2.1. Lemma 3.11. Let G be an d-regular graph, d ≥ 28. Then the vertices of G can be partitioned as V = U1 ∪ U2 so that each v ∈ V has at least 0.11d neighbours in both U1 and U2. Proof. This is a standard application of the Lovász local lemma. Denote γ = 0.11 for the rest of the section. For each vertex v, put it into U1 ran- domly and independently with probability 1/2. Let Ev be the event that v does not satisfy the statement of the lemma. By the standard Chernoff 2 −2( 1 −γ) d bounds (Theorem 5.10) the probability of this event is at most 2e 2 . Two events Ev and Eu are adjacent in the dependency graph if u and v are at distance at most 2 from each other, and otherwise they are independent. Hence, each event has degree at most ∆ = d2 in the dependency graph. Then for γ = 0.11 and d ≥ 28, the condition

2 2 1−2( 1 −γ) d (∆ + 1) e P [Ev] ≤ (d + 1) · 2e 2 < 1, is satisfied. Therefore, by the Local lemma, with positive probability no event Ev occurs. 42 rainbow connectivity

To use such a partition, we need an estimate on the number of edges spanned by subsets of vertices of Gn,d. Similar results have appeared for instance in [19], but for our purposes we need a more explicit dependence on the degree d. This is granted by Lemma 2.13, proved using the configu- ration model. Hence we have all the ingredients for the main result of this section,  log n  rvc(Gn,d) = O log d with high probability for d ≥ 28. Proof of Theorem 3.4. Let G be a random d-regular graph, γ = 0.11. Use Lemma 3.11 to obtain a partition V = U1 ∪ U2 such that each v ∈ V has at least γr neighbours in each part. All statements about G from now on will hold with high probability. In particular, we assume that G has properties (P3) and (P4) from Lemma 2.13 0 γ γd with γ = 1+ε , where ε = 0.02 is chosen so that 1+ε > 3. We only need the extra (1 + ε)−1 factor later, for Claim 3. Such edge distribution implies that each connected component of G[Ui] contains at least αn vertices, where α is the constant from Lemma 2.13. To see this, note that if A is the vertex set of a component of G with |A| ≤ αn, then the number of edges with both |A|γ(1+ε)d endpoints in A is at most 2 , giving an average degree of at most γ(1 + ε)d inside A and contradicting the fact that G is d-regular. Claim 1. For i ∈ {1, 2}, we can find Wi ⊂ V such that Wi = O(1) and G[Ui ∪ Wi] is connected. j For a set of vertices A ⊂ V, denote Γ (A) = {v ∈ V : dG(v, A) ≤ j}. It is well-known that a random regular graph has good expansion properties (see [26]), i.e. there is a constant ϕ > 0 such that whp |Γ(A)| ≥ (1 + ϕ)|A| n whenever |A| ≤ 2 . Now suppose that A has linear order, |A| ≥ αn, and −1 ` > log α −log 2 take an integer log(1+ϕ) . Iterating the expansion property gives that ` n |Γ (A)| > 2 . To prove Claim 1, suppose A and B are vertex sets of two connected components of G[Ui]. Recall that by (P3), A and B have order at least αn. We just showed that Γ`(A) ∩ Γ`(B) 6= ∅, so there is a path of length at most 2` from A to B in G. Adding the vertices of this path to Wi reduces the number of connected components by one, so repeating −1 this step α times ensures that Vi = Ui ∪ Wi spans a connected graph Gi = G[Vi]. Choose a large integer a such that |Wi| ≤ a for all n and d. The vertex sets V1 and V2 now satisfy |V1 ∩ V2| ≤ 2a, so we turn to the diameters of G1 and G2. Claim 2. For d ≥ 112 (so that γd ≥ 12), every T ⊂ Vi of order at most   ηn | (T)| ≥ + γd |T| γd2 satisfies ΓGi 1 12 . 3.2 vertex rainbow connectivity 43

= ( ) Suppose T does not satisfy the claim, and let S ΓGi T . Since all the edges in Gi with an endpoint in T lie in Gi[S], we get that S spans at least

γd|T| γd|S| 3γd|S| ≥ ≥ = 3|S| 2  γd  γd 2 1 + 12   |S| ≤ + γd · ηn < ηn edges. Note that by the hypothesis 1 12 γd2 d . Hence we can deduce from Lemma 2.13 (P4) that S spans fewer than 3|S| edges, which is a contradiction. Claim 3. Let α be the constant from Lemma 2.13 (P3) and ε > 0 as above. ⊂ αn | ( )| ≥ ( + )| | Every subset T Vi of order at most 1+ε satisfies ΓGi T 1 ε T . = ( ) Assume that T does not expand, and use Lemma 2.13 for S ΓGi T , 0 γ 3 γ = 1+ε > d . Since all the edges of Gi with an endpoint in T lie in Gi[S], we get that S spans at least

γd|T| γd|S| γ0d|S| ≥ = 2 2(1 + ε) 2 edges. This contradicts property (P3) of Lemma 2.13. For d ≥ 112, Claim 2 implies that starting from any vertex v ∈ Vi, we can G ηn c1 log n c expand in i to a set of order γd2 in log d steps, where 1 is a constant αn independent of d and n. Further O(log d) steps give a set of order 1+ε , by Claim 3. For d < 112, we directly apply Claim 3 O(log n) times (thus αn avoiding Claim 2) to expand to a set of order 1+ε . In this range, log d <  log n  log(112) and hence O(log n) = O log d . c log n Denote k = log d , where c > c1 is sufficiently large for the described 4k expansion to go through. Suppose the diameter of Gi is larger than α , and 4k take x0 and xR such that the shortest path x0x1 ... xR is longer than α (such a path exists since Gi is connected). Then we can use the procedure 4 above to expand from vertices x0, x3k, x6k . . . in k steps to get 3α disjoint αn (by the choice of the path) neighbourhoods, each of order 1+ε , which is a contradiction. Thus applying Lemma 3.10 to subsets V1 and V2 gives 9c log n rvc(G) ≤ α log d , as required. Remark 1. The constants γ = 0.11 and ε = 0.02 are chosen so that Theorem 3.4 holds for d ≥ 28. If we are only interested in large values of d, we can set γ arbitrarily close to 0.5 and, say, ε = 0.25 44 rainbow connectivity

concluding remarks

In this chapter we proposed a simple approach to studying rainbow con- nectivity and rainbow vertex connectivity in graphs. Using it we gave a unified proof of several known results, as well as of some new ones. Two obvious interesting questions which remain open are to show that rainbow edge connectivity and rainbow vertex connectivity of random 3-regular graphs on n vertices are logarithmic in n. 4 ZEROFORCINGNUMBER

The zero forcing process on a graph G is defined as follows. Initially, there is a subset S of black vertices, while all other vertices are said to be white. At each time step, a black vertex with exactly one white neighbour will force its white neighbour to become black. The set S is said to be a zero forcing set if, by iteratively applying the forcing step, all of V becomes black. The zero forcing number of G is the minimum cardinality of a zero forcing set in G, denoted by Z(G). Note that given an initial set of black vertices, the set of black vertices obtained by applying the forcing rule until no more changes are possible is unique. We will often use the adjective ‘forcing’ instead of ‘zero forcing’. The forcing process is an instance of a propagation process on graphs (in particular, it is a cellular automaton). Such processes are a common topic across mathematics and computer science (see, e.g., [16, 49, 76, 108]). In other fields (statistical mechanics [44], physics [14], social network analy- sis [92]), diverse graph processes are used to model technical or societal processes. For an overview of the different models and applications, refer to the book [17]. The zero forcing process was proposed in [37] and used in [35] as a criterion for quantum controllability of a system. Independently, the zero forcing number was introduced in [2] as a bound for the minimum rank, or equivalently, the maximum nullity of a graph G. Given an n-vertex graph G, let M(G) denote the maximum nullity over all symmetric real-valued matrices A whose zero-nonzero pattern of the off-diagonal entries is de- scribed by the graph G. This means that for i 6= j, the entry Aij is non-zero if and only if ij is an edge in G, whereas the diagonal entries are cho- sen freely. The minimum rank of G is n − M(G). This parameter has been extensively studied in the last fifteen years, largely due to its connection to inverse eigenvalue problems for graphs, singular graphs, biclique par- titions and other problems. Among several tools introduced to study the minimum rank, the zero forcing number has the advantage that its defini- tion is purely combinatorial. In [2], it was shown that Z(G) ≥ M(G) for all graphs G. We include the proof since it really helps to see the intuition behind the zero forcing process. Suppose that A is a matrix whose off-diagonal zero pattern is 45 46 zeroforcingnumber

described by G, S0 is a zero forcing set in G and Si+1 is the set obtained by applying the forcing rule to Si for i ≥ 0. If the cardinality |S0| is smaller than the nullity of A, we can construct a vector x 6= 0 such that Ax = 0 and 0 x| = 0. But then consider a vertex u ∈ S1 \ S0. By the forcing rule, there is S0 0 a vertex u ∈ S0 which forces u which has no other neighbours outside S0. The component of Ax corresponding to u is ∑v∼u Auvxv. Since x| = 0, the S0 only term which potentially does not vanish is Auu0 xu0 . Hence xu0 = 0, and more generally, x| = 0. Applying the same reasoning, we can iteratively S1 deduce that x| = 0 for all i. As S0 is zero forcing, we eventually get x = 0, Si contradicting the choice of x. We conclude that any zero forcing set has order at least M(G). The minimum rank and forcing number of some specific families of graphs have also been computed in [2]. As a simple example, the com- plete graph Kn on n vertices has Z(Kn) = M(Kn) = n − 1, whereas the n-vertex path Pn has Z(Pn) = M(Pn) = 1. More results on this topic can be found in [1] and [74]. Recently, there has been a lot of interest in studying the forcing number of graphs for its own sake, and its relation to other graph parameters, such as the path cover number [86], connected domination number [13], and the chromatic number [142]. Among others, [41] and [87] contain upper bounds on the zero forcing number of a graph in terms of its degrees. It is easy to see that a trivial lower bound on the zero forcing number of a graph is Z(G) ≥ δ, where δ is the minimum degree of G. This bound cannot be improved without additional assumptions. For, a graph Gδ which consists of r cliques K1, K2,..., Kr of order δ and a matching between each Ki and Ki+1 has minimum degree δ and Z(Gδ) = δ. However, when G has girth g ≥ 3 and minimum degree δ ≥ 2, Davila, Kalinowski and Stephen [50] showed that Z(G) ≥ δ + (δ − 2)(g − 3), confirming the earlier conjecture from [51]. The girth is the length of the shortest cycle in a graph. Our first result substantially improves on this bound, with the exception of very small values of δ.

Theorem 4.1. Let G be a graph of girth g with minimum degree δ. ! δk (i) If g = 2k + 1 for k ∈ N, then Z(G) ≥ e−1 − δk−1 . k + 1 ! δk (ii) If g = 2k + 2 for k ∈ N, then Z(G) ≥ 2e−1 − δk−1 . k + 1 zero forcing number 47

The crucial ingredient of the proof is an upper bound on the density of a graph which contains no cycles C3, C4,..., Cg−1. This is an instance of the so-called Turán problem (see, e.g., [85]). Given a graph H, we de- fine the Turán number ex(n, H) to be the maximum number of edges e(G) over all the n-vertex graphs G not containing a subgraph isomorphic to H. In general, if a graph G does not contain H as a subgraph, we refer to it as H-free. The Turán numbers of graphs have been extensively stud- ied, and the asymptotic value of ex(n, H) is known for all non-bipartite graphs H as a consequence of the Erd˝os-Stone-Simonovits Theorem [71]. Denote the complete bipartite graph with vertex classes of order a and b by Ka,b. A celebrated theorem of Kövari, Sós and Turán [112] says that for 2−1/a a ≤ b, ex(n, Ka,b) = O(n ). This implies that for every bipartite graph H, there exists c = c(H) < 1 such that ex(n, H) = O(n1+c). Using our approach based on the Turán numbers, we can extend Theorem 4.1 to H- free graphs G, improving the trivial bound Z(G) ≥ δ − 1 by a power of δ whenever H is bipartite. Theorem 4.2. Fix a graph H, and let c = c(H), β = β(H) and η = η(H) be constants such that ex(n, H) ≤ βn1+c whenever n ≥ η. If G is an H-free graph of minimum degree δ ≥ 2η, then

 δ 1/c Z(G) ≥ 2−1−2/c . β We remark that in this theorem it is not assumed that H is bipartite, n and since ex(n, H) ≤ (2) for all n and H, it is always possible to choose constants c, β and η with the required property. But only if H is bipartite we can take c < 1, and thus obtain a lower bound on Z(G) which improves on the trivial bound. The authors of [2] report that somewhat surprisingly, M(G) = Z(G) for many graphs for which M(G) was known. Our next theorem shows that for most graphs, M(G) and Z(G) are actually far apart. We consider the random graph model G(n, p). This is an n-vertex graph in which every pair of vertices is adjacent randomly and independently with probability p. With an abuse of notation, we write G(n, p) for the sampled graph, as well as the underlying probability space. The model G 1 is particularly n, 2 (n) interesting since it assigns the same probability to all the 2 2 graphs, thus allowing us to make statements about a typical graph. Hall et al. [98] have shown that with high probability, the maximum nullity of a random graph G 1 lies between 0.49n and 0.86n. On the other hand, we will show that n, 2 the zero forcing number of a typical graph is almost as high as n. 48 zeroforcingnumber

1 Theorem 4.3. Let p = p(n) satisfy n  p ≤ 2/3. With high probability  √  log(np) Z (G(n, p)) = n − 2 + 2 + o(1) · . − log(1 − p)    √  In particular, for p = 1/2 we have Z G 1 = n − 2 + 2 + o(1) log n, n, 2 2 whereas for p = o(1) the formula simplifies to  √  Z (G(n, p)) = n − 2 + 2 + o(1) p−1 log(np).

There is a natural trend in probabilistic combinatorics to explore the possible extensions of results about random graphs to the pseudorandom setting. A graph is pseudorandom if its edge distribution resembles the one of G(n, p). There are several formal approaches to pseudorandomness. Here, we will use the one based on the spectral properties of the graph. The adjacency matrix of a graph G = (V, E) with vertex set V = [n] is an n × n matrix whose entry aij is 1 if {i, j} ∈ E, and 0 otherwise. The eigenvalues of a graph G are the eigenvalues of its adjacency matrix. An (n, d, λ) graph is a d-regular n-vertex graph in which all eigenvalues but the largest one are at most λ in absolute value. If G is an (n, d, λ) graph, its largest eigenvalue is λ1 = d, and the difference d − λ is called the spectral gap. It is well known (see, e.g., [118]) that the larger this gap is, the more closely the edge distribution of a regular graph G approaches that of the random graph with the corresponding edge density. We prove a theorem which provides spectral bounds on the zero forcing number of a graph. The lower bound is given in terms of the smallest eigenvalue of G, akin to the celebrated result of Hoffman on the independence number [101]. Note that λmin is negative, and not necessarily minimal in absolute value as its sign is taken into account. The previously defined parameter λ is used for the upper bound.

Theorem 4.4. Let G be an (n, d, λ)-graph with smallest eigenvalue λmin. Then  2λ  (i) Z(G) ≥ n 1 + min , and d − λmin  1  d − λ  (ii) Z(G) ≤ n 1 − log . 2(d − λ) 2λ + 1 The bound on n − Z(G) implied by inequality (i) is tight, whereas the one implied by (ii) is tight up to a constant factor. 4.1 graphs with forbidden subgraphs 49

The rest of the chapter is organised as follows. Section 4.1 contains re- sults on H-free graphs with a forbidden bipartite graph H. In Section 4.2, we asymptotically determine the zero forcing number of G(n, p). Section 4.3 contains bounds based on spectral properties of a graph.

4.1 graphs with forbidden subgraphs

In this section, we bound the forcing number of graphs with a forbidden bipartite subgraph. We start by restating and proving Theorem 4.1. Namely, if G is a graph with girth g and minimum degree δ, then ! δk (i) Z(G) ≥ e−1 − δk−1 for g = 2k + 1, k ∈ N, and k + 1 ! δk (ii) Z(G) ≥ 2e−1 − δk−1 for g = 2k + 2, k ∈ N. k + 1

Proof of Theorem 4.1 . We will use a slightly weakened version of a result due to Alon, Hoory and Linial [8] – a graph G1 of girth g and average degree d satisfies  (d − 1)k, g = 2k + 1 |V(G1)| ≥ (4.1) 2(d − 1)k, g = 2k + 2.

Let G be an n-vertex graph with girth g = 2k + 1 and minimum degree δ. The proof for the case of even girth is the same. Let S be a zero forcing set in G of order s. It suffices to show that

 kδ k (k + 1)s ≥ − 1 .(4.2) k + 1

Indeed, using the inequalities (k/(k + 1))k ≥ e−1 and (1 − α)k ≥ 1 − kα for α < 1, k ∈ N, the bound (4.2) implies that

 kδ k  k k  k + 1 k (k + 1)s ≥ − 1 = δk 1 − k + 1 k + 1 kδ  k(k + 1)    ≥ δke−1 1 − = e−1 δk − (k + 1)δk−1 , kδ which is the required result. 50 zeroforcingnumber

The proof of (4.2) splits into two cases. Firstly, if s > n/(k + 1), we apply (4.1) to the entire graph G and obtain (k + 1)s > n ≥ (δ − 1)k > ( kδ/(k + 1) − 1 )k , so we are done. Hence we assume hat s ≤ n/(k + 1). Starting from a set S of black vertices, we run an implementation of the zero forcing process, forcing the vertices of V(G) \ S one by one to become black in an arbitrary order. To be more precise, in each step, we choose one specific black vertex u which has a unique white neighbour v. We say that u forces v and as a result, v becomes black. The process is stopped once we have reached a set of black vertices T with |T| = (k + 1)s. Let U be the set containing all vertices u which forced some vertex of v during this implementation of the process. Then, since each vertex can force only one of its neighbours, |U| ≥ ks. Moreover, by the forcing rule, all the edges with an endpoint in U lie inside T. Denoting the number of edges with both endpoints in T by e(T), we have e(T) ≥ |U|δ/2 ≥ k|S|δ/2. In other words, recalling that |T| = (k + 1)s, the graph G[T] has average degree at least kδ/(k + 1). Applying (4.1) to the graph G[T] gives precisely inequality (4.2), which completes the proof.

It is worth mentioning that already for rather small values of δ, our lower bound exceeds the value δ + (δ − 2)(g − 3) conjectured in [51]. Even for girth five, by taking k = 2, we obtain Z(G) ≥ 1/3 (2δ/3 − 1)2, which implies the conjecture of Davila and Kenter for δ ≥ 22. The previous approach will now be used to establish a bound which ap- plies to H-free graphs for any bipartite graph H. We do not try to optimise the constants in this proof. In particular, when the result is applied to a specific graph H, an improvement can be reached by adjusting how ‘long’ we run the zero forcing process. The proof of Theorem 4.1 is an example of this optimisation. The condition ‘d ≥ η’ is just a minor technicality which allows us to rearrange any statement on ex(n, H) into a statement about n, as we do in the first paragraph of the proof.

Proof of Theorem 4.2. It will be useful to rearrange our hypothesis on the Turán number of H, which is that for n ≥ η, ex(n, H) ≤ βn1+c, where η, β and c are constants depending on H. Suppose that G1 is an H-free graph with average degree d ≥ η. In particular, G1 has at least η vertices. 1+c Denoting n1 = |V(G1)|, the hypothesis gives n1d/2 = e(G1) ≤ βn1 , 1/c hence n1 ≥ (d/(2β)) . The proof reduces to the following claim, which we state formally because we will use it to prove Corollary 4.5. 4.1 graphs with forbidden subgraphs 51

Claim 7. Suppose that any H-free graph G1 with average degree d ≥ η satisfies

 d 1/c |V(G )| ≥ .(4.3) 1 2β

1  δ 1/c Let G be an H-free graph with minimum degree δ ≥ 2η. Then Z(G) ≥ . 2 4β To see this, let S be a zero forcing set in G of order s. Assume that s < n/2, since otherwise we can apply (4.3) to the entire graph G to get

n 1  δ 1/c 1  δ 1/c s ≥ ≥ ≥ . 2 2 2β 2 4β

As in the previous proof, starting from S, we run the zero forcing process vertex by vertex until we have reached a set of black vertices T with |T| = 2s. Let U be the set containing all vertices u which forced some vertex of T \ S during our process. Since each vertex can force only one of its neighbours, |U| = s. Moreover, all the edges with an endpoint in U lie inside T. Hence e(T) ≥ |U|δ/2 = sδ/2, and we conclude that the average degree in G[T] is at least 2(sδ/2)/|T| = δ/2. Now we can apply (4.3) to G[T], and obtain

 δ 1/c |T| = 2s ≥ , 4β as required.

Recall that Ka,b denotes the complete bipartite graph with parts of order a and b. Using the well-known result of [112], we give explicit bounds for Ka,b-free graphs.

Corollary 4.5. Let G be a Ka,b-free graph with minimum degree δ ≥ 4a − 4. Then ( − ) 1  δ a/ a 1 Z(G) ≥ . 2 4(b − 1)1/a Proof. We use a statement of the aforementioned Kövari-Sós-Turán theo- rem proved in [85, Theorem 2.22]. The original proof in [112] was for the case a = b, but it immediately implies that for an n-vertex Ka,b-free graph with average degree d,

d ≤ (b − 1)1/an1−1/a + (a − 1).(4.4) 52 zeroforcingnumber

It follows that H = Ka,b satisfies the hypothesis of Claim 7 with c = (a − 1/a 1)/a, β = (b − 1) and η = 2a − 2. For, let G1 be an n-vertex Ka,b-free graph with average degree d ≥ 2a − 2. This implies that the first summand in the right-hand side of (4.4) is at least a − 1, and therefore

d ≤ (b − 1)1/an1−1/a + (a − 1) ≤ 2(b − 1)1/an1−1/a.

Rearranging, we get

 d a/(a−1) n ≥ . 2(b − 1)1/a

Our bound on the zero forcing number of Ka,b-free graphs follows from Claim 7.

Since for b > (a − 1)! and a constant ca, there are constructions of Ka,b-free a/(a−1) graphs on only caδ vertices with minimum degree δ (see, e.g., [10]), the result of Corollary 4.5 is tight up to a constant factor.

4.2 the random graph √ Here we prove that for (log2 n)/ n ≤ p = o(1), with high probability

 √  log(np) Z (G(n, p)) = n − 2 + 2 + o(1) · . p

For p = o(1), this establishes the bound of Theorem 4.3 as − log(1 − p) = (1 + o(1))p. Restricting to p = o(1) in this section keeps the calcu- lations clearer. The case of constant p is easier (as we explain below) and can be proved similarly. The case of sparse graphs is slightly more difficult, and is treated at the end of the section. All the inequalities will hold for large enough n. In a graph G, we write u ∼ v if the vertices u and v are adjacent in G, and u  v otherwise. Our approach combines the first and the second moment method, and is somewhat similar to the argument used to determine the independence number of G(n, p) (see, e.g. [103, Chapter 7, Section 1]). However, the proof contains some delicate points, which we try to explain before giving the formal argument. We start this discussion by considering the special case, G 1 , in which the constant edge density allows for a simple proof. n, 2 4.2 the random graph 53

4.2.1 Outline of the proof

The zero forcing number of the random graph is governed by the occur- rence of a specific substructure called a witness. In a graph G on the vertex set V, a k-witness (or a witness of order k) is a pair of ordered vertex k-tuples   (si)i∈[k], (ti)i∈[k] such that si, ti ∈ V, si ∼ ti for each i, and si  tj for i < j. The definition requires si 6= sj and ti 6= tj for i 6= j, but it might happen that si = tj for some i > j. The adjacency matrix of a k-witness, where the rows and columns are indexed by (si) and (ti) respectively, can be found   in Figure 4.1a. For any pair of k-tuples (si)i∈[k], (ti)i∈[k] , we define the set   of superdiagonal pairs to be si, tj : i, j ∈ [k], i < j , and the set of diagonal pairs to be {(si, ti) : i ∈ [k]}. It is easy to see that if G has a forcing set S of order at most n − k, then it contains a k-witness (Lemma 4.6). Therefore, our aim is to find the order of the largest witness in G 1 . The first subtlety n, 2 arises in trying to use the first moment computation to guess the forcing number of G 1 . Let Rk be the random variable counting the k-witnesses n, 2 in G 1 . For fixed k-tuples (si) and (ti), the probability that they form a n, 2 k+1 k+ −( 2 ) 1 witness is 2 since they determine ( 2 ) superdiagonal and diagonal pairs. Using linearity of expectation, we can multiply this probability with the number of choices for (si) and (ti) to obtain

 2 k n! −(k+1) k −k2  −k  E [R ] ≤ · 2 2 ≤ n2 2 /2 = n22 /2 . k (n − k)!

For k ≥ (4 + ε) log2 n, it holds that E [Rk] = o(1), so Markov’s inequality implies that with high probability G 1 contains no such k-witness. On n, 2 the other hand, if k ≤ (4 − ε) log2 n the expected number of k-witnesses tends to infinity with n, so it is reasonable to take (4 + o(1)) log2 n as the first guess for the order of the largest witness. For example, in finding the independence number of G(n, p), the first moment computation gives the correct value throughout the range of p. However, in our case, the actual  √  asymptotic order of the largest witness, kc := 2 + 2 log2 n, is smaller than 4 log n. 2   The reason for this is that a substructure of a k-witness (si)i∈[k], (ti)i∈[k] obtained by discarding a final segment of (si) and an initial segment of (ti) has a lower expected number of copies inside G 1 than the witness n, 2 itself, and therefore gives a stronger bound on k. For, the discarded ver- tices contribute a factor of order n each to the number of choices for 54 zeroforcingnumber

a potential k-witness, but pose few restrictions on the adjacencies.√ The correct asymptotic value kc is obtained by taking ` = `(k) = 2k/2, and counting substructures called k-subwitnesses. The adjacency matrix of a subwitness is depicted in Figure 4.1b. For integers a ≤ b, define the interval [a .. b] = {a, a + 1, . . . , b}.A k-subwitness in G is a pair of  0 0  0 0 0 0 `-tuples (si)i∈[`], (ti)i∈[k−`+1. . k] such that si, tj ∈ V, si ∼ ti for each i ∈ 0 0   [k − ` + 1 . . `], and si  tj for 1 ≤ i < j ≤ k. Clearly, if (si)i∈[k], (ti)i∈[k] is a witness in G, then the restrictions (s1,..., s`) and (tk−`+1,..., tk) form a subwitness. Another piece of intuition on why we can discard those segments of (si) and (ti) is that if k = (1 − ε)kc, the number of discarded columns t1,..., tk−` is q = (1 − ε) log2 n. Therefore, given a (1 − ε)kc-subwitness in G 1 , we can extend it to a (1 − ε)kc-witness with high probability. Indeed, n, 2 to find the remaining columns t1,..., tq, we only need to consider their adjacencies with s1,..., sq. But a short computation shows that for q = (1 − ε) log n, G 1 has the following extension property with high probability. 2 n, 2 ε/2 For any s1, s2,..., sq, there will be at least n vertices in G 1 satisfying n, 2 any given adjacency restriction with s1,..., sq. This extension property of G 1 allows us to find the missing columns t1,..., tq, as well as the rows n, 2 s`+1,..., s`+q. The argument of Lemma 4.7 for p = 1/2 would amount to counting k-  √  subwitnesses for k = (1 + ε)kc = (1 + ε) 2 + 2 log2 n, and showing that with high probability, G 1 contains no such k-subwitness. This intuition n, 2 gives the correct answer for G 1 , but when p = o(1) the computation n, 2 of the expected number of k-subwitnesses contains another caveat. In a 0 0 subwitness, (si)i∈[`] and (ti)i∈[k−`+1. . k] are orderings of the corresponding vertex sets. In the first moment computation, the ordering contributes a factor of (`!)2 = k2`+o(k). This factor was negligible when p = 1/2 and kc = Θ(log n), but for p = o(1), G(n, p) contains witnesses whose order is polynomial in n, so we need to be more careful. Next, we explain how to shave off a factor of kk+o(k). A subwitness is modified so that the adjacency matrix is invariant under reordering large subsets of the vertices, at the price of discarding a small number of its zero entries. Figure 4.1c illustrates this trade-off. In a graph G, we define a loose k-subwitness (or just loose subwitness) to be a substructure labelled by sets S0, S1,... Sm, T0, T1,..., Tm ⊆ V and bi- 4.2 the random graph 55

(a) (b)

(c)

Figure 4.1: Adjacency matrices of a k-witness, k-subwitness and a loose k- subwitness respectively. The regions which are required to contain only zeros are shaded. In Figure (c), the stars mark the entries which are superdiagonal in this particular ordering of the vertices, but not required to contain zeros.

√ 2 jections fi : Si → Ti for i = 1, . . . , m, where ` = k, m = (2` − k)p = √  2 2 − 1 kp and the following conditions are satisfied.

(i) The sets Si are pairwise disjoint for i = 0, . . . , m, as well as the sets Sm Sm Ti. Denoting S = Si and T = Ti, we have |S| = |T| = ` i=0  √ i=0 and |S0| = |T0| = k − ` = 1 − 2/2 k. The sets S \ S0 and T \ T0 56 zeroforcingnumber

are partitioned equitably into Si, and Ti, that is, 0 ≤ |Si| − |Sj| ≤ 1 for 1 ≤ i < j ≤ m. Therefore, 1/p − 1 ≤ |Si| = |Ti| ≤ 1/p + 1 and the orders of Si are non-increasing for i ∈ [m].

(ii) For the edges between S and T, we require E(T0, S) = ∅, and E(Si, Tj) = ∅ whenever 0 ≤ i < j ≤ m. For every i ∈ [m] and v ∈ Si, there is an edge in G between v and fi(v). In other words, for every i ∈ [m], the bijection fi determines a matching between Si and Ti. The key point is that any graph that contains a k-witness, also contains a loose k-subwitness, which we use to get an improved bound. Since the sets Si and Ti are not ordered, but only paired by the bijections fi for i ∈ [m], we gain a factor of k−k+o(k) in the first moment computation. On the other hand, for i ∈ [m], we do not care about the adjacency relation between Si and Ti apart from the diagonal vertex pairs, but that costs us a much less 2 significant factor (1 − p)|Si| m = (1 − p)(2`−k)/p. In words, the definition of a loose subwitness allows a large number of vertex permutations. However, since those permutations preserve most of the superdiagonal pairs, the adjacency matrix of a loose subwitness still contains most of the zeros which were previously required. Estimating the expected number of loose subwitnesses in G(n, p), we can match the bound obtained from the second moment method (Lemma 4.8). This computation requires some additional understanding of how two wit- nesses can interact, but the ideas are explicit in the argument.

4.2.2 Random graphs of moderate density

We are now ready to present the formal proof. The second moment method used in Lemma 4.8 does not give a sufficiently strong bound for the sparse 2 p < log√ n case, n , so we cover this case with an ingenious argument of Frieze [79] based on Talagrand’s inequality in Section 4.2.3. Lemma 4.7, the lower bound on Z(G), holds for the entire range of interest. We first establish the relationship between the zero forcing number of a graph and the occurrence of a witness. We use the shortened notation s = (si)i∈[k], and denote the image of this k-tuple by s[k] = {si : i ∈ [k]}. Lemma 4.6. Let G be an n-vertex graph and k ∈ N. If Z(G) ≤ n − k, then G contains a k-witness. Moreover, if G contains a k-witness (s, t) with s[k] ∩ t[k] = ∅, then Z(G) ≤ n − k. 4.2 the random graph 57

Proof. Assume that Z(G) ≤ n − k, that is, G has a forcing set S with n − |S| ≥ k. Index the vertices of V(G) \ S according to the order in which they were forced, so t1 is the first forced vertex, t2 the second, and so on, up to t . For 1 ≤ i ≤ k, let si be a vertex which forced ti. Then by the definition k   of a zero forcing set, (si)i∈[k], (ti)i∈[k] is a witness. Conversely, let (s, t) be a k-witness with s[k] ∩ t[k] = ∅. Then V(G) \ t[k] is a forcing set in G, since the vertices t1,..., tk can be forced by the vertices s1,..., sk respectively. We now formalise the ideas outlined in the previous discussion.

Lemma 4.7. Let C/n < p(n) < 1 for a large constant C, and define  √  k = 2 + 2 p−1 (log(np) + log log(np)) .

With high probability, G(n, p) contains no k-witness, and therefore Z (G(n, p)) ≥ n − k.

Proof. For a graph G on the vertex set V, a k-witness and a loose k-subwitness have√ been defined in the previous section. We define k as in the statement, 2 ` = 2 k and √  2 + o(1) √  r = 2` − k = log(np), m = pr = 2 + o(1) log(np). p   The crucial fact is that if G contains a witness (si)i∈[k], (ti)i∈[k] , then a loose subwitness can be found as follows. S0 consists of the first k − ` rows of the witness, S0 := {s1,..., sk−`}, and T0 of the last k − ` columns, T0 := {t`+1,..., tk}. The sets S1,..., Sm are constructed by ordering the ver- tices sk−`+1, sk−`+2,..., s` and partitioning them into m equitable intervals. Naturally, T1,..., Tm are the corresponding columns, Tj = {ti : si ∈ Sj}, and the bijections fj map si to ti for i ∈ [k − ` + 1 . . `]. Let Yk denote the number of loose k-subwitnesses in G(n, p). Our aim is to show E [Yk] → 0. Fix the sets Sj, Tj and bijections fj which satisfy (i). In particular, |S1 ∪ · · · ∪ Sm| = |T1 ∪ · · · ∪ Tm| = 2` − k = r and |Si| = |Ti| ∈ h 1 1 i p − 1, p + 1 for i ∈ [m]. For Sj, Tj and fj to span a loose subwitness, 2 2  r r/m  i.e. for (ii) to be satisfied, we require ` − r + (2) − m · ( 2 ) pairs to be non-edges in G(n, p). The first summand, `2 − r2 = 2(k − `)` − (k − `)2, comes from E(S0, T) = E(T0, S) = ∅, and the second from E(Si, Tj) = 58 zeroforcingnumber

r/m ∅ for i < j. The m · ( 2 ) pairs have been subtracted since we do not impose any restrictions on E(Si, Ti), and they will turn out to be negligible. Moreover, (ii) requires r diagonal edges to be present in G(n, p), so the probability that our fixed Sj, Tj and bijections fj satisfy (ii) is at most

2 2 2 pr(1 − p)` −r /2−r /(2m).

Now we will take the union bound over all the potential loose subwit- n 2 n m nesses. There are at most (`−r) choices for S0, T0, and (r/m) choices for the sets Sj, j ∈ [m]. Each vertex v ∈ Sj gets assigned a vertex fj(v), which can be done in at most nr ways. This assignment also determines the sets Tj = fj(Sj) for j ∈ [m], and we get

 2 m n n 2 2 2 E [Y ] ≤ nr pr(1 − p)` −r /2−r /(2m) k ` − r r/m  2`−2r en  enm r 2 2 2 ≤ nr pre−p(` −r /2−r /(2m)). ` − r r We use the inequalities ` − r ≥ k/4 and r ≥ k/4 to deduce

 2`−2r  r 4en 4en 2 2 2 E [Y ] ≤ mrnr pre−p(` −r /2−r /(2m)) k k k k 2` −2`+r r −p(`2−r2/2) ≤ C1(np) (kp) m e , = 2 = where C1 4e . For the second inequality, we used√ m pr, so that (pr2)/(2m) r/2 k 2 2 e√ = e ≤ e . Finally, substituting ` = 2k/2, ` − r /2 = ( 2 − 1)k2, and noting that (kp)−2`+rmr ≤ (kp)−2`+2r < 1, we obtain √ √ k 2k −pk2( 2−1) E [Yk] < C1 (np) e . √  Taking np sufficiently large and recalling that 2 − 1 pk = √ 2 (log(np) + log log(np)), we get

√ √ k  2 − 2(log(np)+log log(np)) −k E [Yk] ≤ C1(np) e < 2 .

With high probability, G(n, p) contains no loose k-subwitnesses, and there- fore no k-witnesses. By Lemma 4.6, Z(G) ≥ n − k with high probability.

For an upper bound on the zero forcing number of G(n, p), we set k = (1 − ε)kc and show that with high probability, G(n, p) contains such 4.2 the random graph 59 a k-witness (s, t) with s[k] ∩ t[k] = ∅ using a well-known consequence of Chebyshev’s inequality, proved for instance in [12, Corollary 4.3.2]. Let Xn be a sequence of non-negative random variables indexed by some param- 2 eter n going to infinity. If E [Xn] → ∞ and Var [Xn] / (E [Xn]) → 0, then with high probability Xn > 0. The proof can be found in [12, Chapter 4]. √ Lemma 4.8. Let p = p(n) satisfy (log2 n)/ n < p = o(1) and ε > 0. With high probability  √  2 + 2 log(np) Z (G(n, p)) ≤ n − (1 − ε) · . p Proof. Partition the vertex set V of G(n, p) into V and V with |V | = bn/2c √ 1 2 1 −1   and |V2| = dn/2e. Fix k = (1 − ε) p 2 + 2 log(np). We say that a pair

(s, t) (or the corresponding k-witness) is divided if si ∈ V1 and ti ∈ V2 for i ∈ [k]. The set of all divided pairs is denoted by D. We will show that with high probability, G(n, p) contains a divided k-witness. Let Xk denote the number of such k-witnesses in G(n, p). Furthermore, for a pair of k-tuples (s, t), we denote the event that (s, t) is a k-witness by Ws,t. 0 Let us first estimate the expectation of Xk. We will denote n = n/2, and 0 0 0 define the falling factorial power (n )k = n !/(n − k)!. We crudely bound 0 0 k 0 −(p+p2) −1.1p (n )k ≥ (n /2) for k ≤ n /2. We also use 1 − p ≥ e ≥ e for 0 ≤ p ≤ 0.1. In the following equation, the sum runs over all divided pairs of k-tuples (s, t). 2k 0 2 k (k)  n  k − pk2 E [X ] = P [W ] = n p (1 − p) 2 ≥ p e 1.1 /2 k ∑ s,t k 4 (s,t)∈D  1 √ k  1 k ≥ n2 pe−1.1(1+ 2/2) log(np) > n2 p (np)−1.9 16 16  √  where in the second line, we used pk ≤ 2 + 2 log(np). It follows that 0.1k E [Xk] ≥ n −→ ∞. To use Chebyshev’s inequality, we will need second moment estimates. Fix a specific divided pair (s, t). The events Ws0,t0 are symmetric over all the divided pairs (s0, t0), so a standard computation (see, e.g. [12, Section 4.3]) gives   Var [Xk] ≤ E [Xk] ∑ P Ws0,t0 | Ws,t . (s0,t0)∈D s[k]∩s0[k]6=∅ t[k]∩t0[k]6=∅ 60 zeroforcingnumber

We remark that as soon as s0[k] ∩ s[k] = ∅ or t0[k] ∩ t[k] = ∅, the events 0 0 Ws,t and Ws0,t0 are independent, so such pairs (s , t ) do not contribute to the second moment. Moreover, the sum includes the case (s0, t0) = (s, t), so we do not need the additional summand E [Xk] which often appears in the formula. For any pair of k-tuples (s0, t0), we have defined the set of n 0 0 o superdiagonal pairs to be si, tj : i, j ∈ [k], i < j , and the set of diagonal  0 0 0 0 pairs to be si, ti : i ∈ [k] . We now partition the pairs (s , t ). Let Pa,b,d denote the set of divided pairs (s0, t0) such that

• |s[k] ∩ s0[k]| = a, |t[k] ∩ t0[k]| = b, and

• the number of vertex pairs which are diagonal in both (s, t) and (s0, t0) is d.

−1   Moreover, we define the term T = E [X ] 0 0 P W 0 0 | W . a,b,d k ∑(s ,t )∈Pa,b,d s ,t s,t If d > a or d > b, Ta,b,d = 0, so we let d run up to a for simplicity. Using this partition, our sum can be written as

Var [X ] k a k ≤ 2 ∑ ∑ Ta,b,d. (E [Xk]) a,b=1 d=0

We now analyse the term Ta,b,d. We start by counting the divided pairs 0 0 in Pa,b,d. There are at most (n − k)k−a (n − k)k−b ways of selecting and indexing the vertices of s0[k] \ s[k] and t0[k] \ t[k]. Then we select d distinct 0 0 pairs (si, ti) which are diagonal in (s, t) and will also be diagonal in (s , t ), k which can be done in at most (d) ways. The number of ways to place those 0 0 0 0 pairs into (s , t ), i.e. to choose the index j such that sj = si and tj = ti is at most (k)d. Similarly, we choose a − d of the remaining vertices in s[k], and assign them any preimage under s0, which gives an additional factor k−d 0 of (a−d)(k − d)a−d. Finally, we do the same for t and t . Altogether,   0  0  k P ≤ n − k n − k · a,b,d k−a k−b d (4.5) k − d k − d · (k) (k − d) (k − d) d a − d a−d b − d b−d

0 0 To bound the probability of (s , t ) ∈ Pa,b,d being a k-witness, we first bound the number of pairs which are superdiagonal for both (s, t) and (s0, t0). We can even forget about how the overlap vertices are placed in 4.2 the random graph 61

(s0, t0), that is, the number of common superdiagonal pairs is bounded above by Φ(a, b), where

Φ(a, b) = max |{(i, j) ∈ A × B : i < j}| . A,B⊆[k], |A|=a, |B|=b   Note that trivially Φ(a, b) ≤ ab. By definition of Φ, P Ws0,t0 | Ws,t ≤ k−d (k)−Φ(a,b) 0 2 k (k) p (1 − p) 2 . Dividing by E [Xk] = ((n )k) p (1 − p) 2 , summing 0 0 over (s , t ) ∈ Pa,b,d, and using (4.5), we get

−1 k−d (k)−Φ(a,b) Ta,b,d ≤ E [Xk] |Pa,b,d|p (1 − p) 2 0 0     (n − k)k−a (n − k)k−b a+b−d k k − d k − d −d −Φ(a,b) ≤ 0 0 k p (1 − p) (n )k (n )k d a − d b − d +  4 a b k k  k  ≤ ka+b−d p−d(1 − p)−Φ(a,b). n d a − d b − d (4.6) where the last inequality is a consequence of 0 0 0 0  a (n − k)k−a (n − a)k−a (n − a)!(n − k)! 1 4 0 ≤ 0 = 0 0 = 0 ≤ , (n )k (n )k (n − k)!n ! (n )a n (and similarly with b instead of a), which in turn is valid since a, b  n. The following lemma is essential to our argument. √ Φ(a, b) 2 Lemma 4.9. For any k ∈ N and a, b ∈ [k], ≤ 1 − . k(a + b) 2 From its proof, which we defer to the end of this section,√ it will be clear that the maximum value of Ta,b,d is achieved at a = b = 2k/2. We now split into two cases according to the value of a + b. Firstly, as- sume that a + b ≥ 24/(εp). In this case, we trivially bound the product of the three remaining binomial coefficients in (4.6) by 23k. The whole purpose −d of analysing the term Ta,b,d was to gain an extra factor of k compared to the trivial bound on the number of vertex orderings in (s, t). This gives

a+b 3k −(a+b) a+b−d (p+p2)Φ(a,b) Ta,b,d ≤ 4 2 (np) (kp) e ,

a −d a −d and summing over d and using ∑d=0(kp) ≤ ∑d=0 (2 log(np)) < 4, we obtain a a+b+1 3k −(a+b) a+b (p+p2)Φ(a,b) ∑ Ta,b,d ≤ 4 · 2 (np) (kp) e . d=0 62 zeroforcingnumber

With Lemma 4.9 and k/(a + b) ≤ ε log(np)/6, we get

a √ a+b  ε log(np)/2 −1 (1− 2/2)k(p+p2) ∑ Ta,b,d ≤ 8kp · 2 (np) e . d=0  √  Let n be large enough so that p ≤ 0.1ε. Now kp = (1 − ε) 2 + 2 log(np) √ 2 gives e(1− 2/2)k(p+p ) ≤ e(1−ε)(1+0.1ε) log(np) ≤ (np)(1−0.9ε), and hence

a √ a+b  ((log 2)/2−0.9)ε ∑ Ta,b,d ≤ 8(1 − ε)(2 + 2) log(np)(np) d=0 ≤ (np)−ε(a+b)/4

for large enough n. Summing over a, b, we get

a k k k !2 −ε(a+b)/4 −εa/4 ∑ ∑ Ta,b,d ≤ ∑ ∑ (np) = ∑ (np) −→ 0. a,b∈[k] d=0 a=1 b=1 a=1 a+b≥ε/(24p) (4.7) In the second case, a + b < 24/(εp), we bound the binomial coefficients k i by (i) ≤ k , i ∈ {d, a − d, b − d}. Then

a+b −(a+b) 2a+2b−2d −d −ab Ta,b,d ≤ 4 n k p (1 − p) .

2 Given that p < 1/2 for large n, we use the inequality (1 − p) ≥ e−p−p ≥ 2 e−2p, which implies (1 − p)−ab ≤ e2pab ≤ ep(a+b) /2. Furthermore, kp > 1, so

a+b a+b  −1 2 p(a+b)/2 −d −d  −1 2 p(a+b)/2 Ta,b,d ≤ 4n k e k (kp) ≤ 4n k e .(4.8) √ Using the assumption that p ≥ (log2 n)/ n, it follows that

 √ 2 2 + 2 16 n−1k2 ≤ n−1 · · n log2(np) ≤ . log4 n log2 n 4.2 the random graph 63

p(a+b)/2 12/ε a+b Moreover, e ≤ e , so altogether, Ta,b,d ≤ (1/ log n) for large enough n. Summing up,

a k k −(a+b) ∑ ∑ Ta,b,d ≤ ∑ ∑ (a + 1)(log n) a,b∈[k] d=0 a=1 b=1 a+b<ε/(24p) k ! k ! = ∑ (a + 1)(log n)−a ∑ (log n)−b −→ 0. a=1 b=1 2 We conclude that Var [Xk] / (E [Xk]) = o(1), and hence Xk > 0 with high probability. Upon this event, Lemma 4.6 implies Z(G) ≤ n − k. Now we prove Lemma 4.9, which is essentially finding the induced subgraph of a k-witness with a minimum expected number of copies in G(n, p). Proof of Lemma 4.9. Let k be fixed. Recall that we need to prove an upper bound on Φ(a, b)/(k(a + b)) for (a, b) ∈ [k]2 , where Φ(a, b) = maxA,B⊆[k], |A|=a, |B|=b |{(i, j) ∈ A × B : i < j}|. For a + b ≤ k, the claimed bound follows immediately from the trivial bound Φ(a, b) ≤ ab and the AM-GM inequality. So we may assume a + b > k. We first fix |A| = a, |B| = b and |A ∩ B| = g. If A and B are a selection of rows and columns of a k × k matrix, then g denotes the number of diagonal entries at the intersection of selected rows and columns. Define ϕ(A, B) = |{(i, j) ∈ A × B : i < j}|. We claim that ϕ(A, B) ≤ ab − (g + 1)g/2. To see this, note that each ele- ment (c1, c2) with c1, c2 ∈ A ∩ B and c1 ≥ c2 is contained in A × B, but not counted by ϕ. There are (g + 1)g/2 such pairs (in fact, such (c1, c2) are indices of the subdiagonal matrix entries selected by A and B). Now we minimise g for fixed a and b. From the identity |A ∪ B| + |A ∩ B| = |A| + |B| and |A ∪ B| ⊆ [k], we get g ≥ a + b − k, so ϕ(A, B) ≤ ab − (a + b − k + 1)(a + b − k)/2. Taking the maximum over A and B and using the AM-GM inequality, we get Φ(a, b) 1  (a + b − k + 1)(a + b − k)  ≤ ab − k(a + b) k(a + b) 2 ! 1  a + b 2 (a + b − k)2 ≤ − . k(a + b) 2 2

We substitute a + b = 2αk, so that the problem reduces to maximising the function f (α) = 2α2 − (2α − 1)2 /(4α) for 1/2 < α ≤ 1. This is a sim- ple calculus exercise, but we provide details for the sake of transparency. 64 zeroforcingnumber

√ 0 2  2 f (α) = −2α + 1 /(4α ), so f attains its local maximum at α0 = 2/2. Evaluating f at α0, we get √ Φ(a, b) ≤ max f (α) = f (α0) = 1 − 2/2 α for all a, b ⊆ [k]. Note that equality is asymptotically attained when A = [`], j√ k B = [k − ` + 1, k] with ` = 2k/2 , which we have used in Lemma 4.7 (the 0-statement of Theorem 4.3).

4.2.3 The random graph of low density √ We now turn to the case np < n log2 n. Lemma 4.7 gives the 0-statement. For the 1-statement, we remark that second moment estimate (4.8) gives a 1/3 probability bound which tends to 0 faster than e−n . As noticed by [103], it is not hard to understand why the second moment utterly fails in this case. If, say, p = n−3/4, then the largest witnesses are larger than n3/4, and the majority of pairs of such sets share a substantial number of ele-  2 2 ments, making E Xk much larger than E [Xk] . Surprisingly, Frieze [79] has noticed that (4.8) can still be useful if supplemented by an appropriate large deviation inequality. We use Talagrand’s inequality (see, e.g., [103, Theorem 2.29]) to extend Lemma 4.8 to small p. √ Lemma 4.10. Assume that 1  np < n log2 n. Let the vertex set V of G(n, p) be partitioned into V√1 and V2 with |V1| = bn/2c and |V2| = dn/2e, and let −1 k−ε = (1 − ε)(2 + 2)p log(np) with 0 < ε < 1/2. With high probability, G(n, p) contains a divided k−ε-witness.

Proof. Denote by w(G) the order of the largest divided√ witness. We will −1 actually show that w (G(n, p)) ≥ k−2ε = (1 − 2ε)(2 + 2)p log(np) whp, which is sufficient as ε is arbitrary. The first step is a second moment lower bound on the probability P [w (G(n, p)) ≥ k−ε]. For now, we write k = k−ε. The proof is identical to the proof of Lemma 4.8 down to inequality (4.7). In particular, the case a + b ≥ 24/(εp) remains unchanged (and that is the case which determines k). Hence we assume a + b < 24/(εp). Inequal- ity (4.6) from the proof of Lemma 4.8 implies +  4 a b k k  k  T ≤ ka+b−d p−d (1 − p)−ab a,b,d n d a − d b − d +  4k a b kkk ≤ e2pab. n d a b 4.2 the random graph 65

k k Denote a + b = 2u, and using the inequality (d) ≤ (u) which follows from k k k 2 2 d ≤ a  k, as well as (a)(b) ≤ (u) and ab ≤ u , we have

 2u  3  2 3 3 u 4k k 2 16k e k 2 T ≤ e2pu ≤ · e2pu . a,b,d n u n2 u3

3 5 2 −3 2 Define ξ = 16e k /n and g(u) = u log ξu + 2pu , so that log Ta,b,d ≤ g(u) for all a, b, d. We will now use basic calculus to bound the function g on [1, 12/εp)], so a first-time reader may find it helpful to skip over the proof of the claim. The main point is that the function g has a global maxi- mum u0  k. Claim 8. For u ∈ [1, 12/(εp)], g(u) = u log ξ − 3u log u + 2pu2 ≤ k(np)−1/2.

2/3 3 Proof. Let u2 = 12/(εp) and u1 = 4ek (k/n) , or equivalently (ek/u1) = (n/k)2 /64. We compute g0(u) = log ξ − 3 log u − 3 + 4pu = log ξu−3 − 3 + 4pu and claim that

0 0 0 g (1) > 0, g (u1) < 0 and g (u2) < 0. √ To see the first inequality, recall that k > 1/p > n/ log2 n and therefore 0 0 ξ  1. It follows that g (1) = log ξ − 3 + 4p > 0. Evaluating g at u1 and using 5/3 −2/3 −2/3 5/3 pu1 = 4pek n ≤ 100 (np) log (np), we get

0 −3 g (u1) = log(ξu1 ) − 3 + 4pu1 = log (1/4) − 3 + o(1) < 0.

 5    −3 = 3 = log (np) = ( ) 0( ) = −3 − Finally, ξu2 O ξ p O (np)2 o 1 , so g u2 log ξu2 3 + 48ε−1 < 0. 0 0 From g (1) > 0 and g (u1) < 0 it follows that there is a u0 ∈ [1, u1] with 0 −1/2 g (u0) = 0. We will show that g(u0) ≤ 3u0 ≤ k(np) and moreover, that u0 is the global maximum of g on [1, u2]. 0  −3 To see the first claim, note that g (u0) = log ξu0 − 3 + 4pu0 = 0 and hence

2 2/3 −1/2 g(u0) = u0(3 − 4pu0) + 2pu0 ≤ 3u0 ≤ 3u1 = 12ek (k/n) < k (np) . It remains to show that g has no other extrema. This follows from the fact that g00(u) = −3/u + 4p is increasing in u. Therefore g0 has exactly one 66 zeroforcingnumber

0 local minimum at u = 12/p > u1. If g vanishes at more than one point in 0 0 0 (1, u2), then either g (1)g (u2) > 0 or g has multiple local extrema, which is not the case. Hence u0 is the unique local extremum of g, and it is indeed the global maximum since the sign of g0 changes from positive to negative at u0. We conclude that for all a, b, d such that a + b < 24/(εp),

g((a+b)/2) k(np)−1/2 Ta,b,d ≤ e ≤ e . √ Using p < log2 n/ n we bound the exponent by √ (1 − ε)(2 + 2)p−3/2 log(np) n3/4 k(np)−1/2 = √ ≥ √ ≥ n1/8. n n log3 n

Summing the bound for Ta,b,d over all a, b, d and using (4.7) we get

Var [X ] a a k ≤ + 2 ∑ ∑ Ta,b,d ∑ ∑ Ta,b,d (E [Xk]) a,b∈[k] d=0 a,b∈[k] d=0 a+b≥24/(εp) a+b<24/(εp) −1/2 −1/2 ≤ o(1) + k3ek(np) ≤ e2k(np) .

We will apply a stronger form of Chebyshev’s inequality, which reads 2  2 P [Xk > 0] ≥ (E [Xk]) /E Xk (see [103, Remark 3.1] for details). Using

 2 E X Var [X ] −1/2 k = k + ≤ 4k(np) 2 2 1 e , (E [X]k) (E [Xk]) we obtain −4k(np)−1/2 P [Xk > 0] ≥ e (4.9) To show concentration of the order of the largest divided witness w (G(n, p)), we apply Talagrand’s inequality. The random graph G(n, p) is modelled us- ing vertex exposure. Formally, we fix an ordering of the vertices v1, v2,..., vn, and define mutually independent random variables (Zi)i∈[n], where Zi ex- poses the backward edges from the vertex vi. Then w (G(n, p)) is a function 0 of Z1,..., Zn. This function is 1-Lipschitz, that is, if graphs G and G dif- 0 fer only at the vertex vi, then |w(G) − w(G )| ≤ 1. Moreover, whenever w(G(n, p)) ≥ k, there exist 2k certificate vertices, namely the vertices of a divided k-witness, which are responsible for the fact that w(G(n, p)) ≥ k. 4.3 spectral bounds 67

Hence we may apply [103, Theorem√ 2.29] with σ(k) = 2k in their notation. −1 Recalling that k−ε = (1 − ε)(2 + 2)p log(np), we have

2 2 −(k−ε−k−2ε) /(8k−ε) −ε k−ε/8 P [w(G) ≤ k−2ε] P [w(G) ≥ k−ε] ≤ e ≤ e .

−4k(np)−1/2 4 Inequality (4.9) says P [w(G) ≥ k−ε] ≥ e and taking np > (10/ε) , we conclude 2 −1/2 2 −ε k−ε/8+4k−ε(np) −ε k−ε/16 P [w(G) ≤ k−2ε] ≤ e ≤ e −→ 0, as required.

4.3 spectral bounds

In this section we discuss the bounds on the zero forcing number in terms of the graph eigenvalues. The study of spectral properties and their re- lation to other graph parameters is an established area of research with many diverse techniques and applications, surveyed for example in the monograph of Godsil and Royle [90]. One of the earliest results of this type is Hoffman’s bound on the independence number of a graph [101]. Namely, let G be an n-vertex d-regular graph, and let λmin denote its small- est eigenvalue. Hoffman proved that then G contains no independent set of order larger than −λminn/(d − λmin). Note that since the trace of the adja- cency matrix of a graph is zero, λmin is negative. There are many examples showing the bound to be tight. We establish an analogue of Hoffman’s bound for the zero forcing num- ber by showing that Z(G) ≥ n (1 + 2λmin/(d − λmin)). To prove this result we use the following well-known estimate on the edge distribution of a graph in terms of its eigenvalues. Part (ii) of Theorem 4.11 is provided in, e.g., [118], whereas the variant (i) follows from the same proof. For a graph G = (V, E) and two sets U, W ⊆ V, let e˜(U, W) denote the number of edges with one endpoint in U and the other one in W, where any edge with both endpoints in U ∩ W is counted twice. This counting convention is often useful for algebraic methods and only used in this section. Recall that an (n, d, λ) graph is a d-regular n-vertex graph in which all eigenvalues but the largest one are at most λ in absolute value. Theorem 4.11. Let G be an (n, d, λ)-graph, and denote its smallest eigenvalue by λmin. Then for any two vertex subsets U, W of G, s d|U||W|  |U|   |W|  (i) − e˜(U, W) ≤ −λ |U||W| 1 − 1 − . n min n n 68 zeroforcingnumber

s     d|U||W| |U| |W| (ii) − e˜(U, W) ≤ λ |U||W| 1 − 1 − . n n n Proof of Theorem 4.4 (i). Let (s, t) be a k-witness in G, and k = 2µn. By defi- nition of a witness, there are no edges between the sets U = {s1, s2,..., sµn} and W = {tµn+1,..., tk}. Hence, using Theorem 4.11(i),

2 0 = e˜(U, W) ≥ dµ n + λminµn(1 − µ) = µn (dµ + λmin(1 − µ))

= µn (µ(d − λmin) + λmin) ,

which implies µ ≤ −λmin/(d − λmin). Hence the largest witness in G has order at most −2λminn/(d − λmin), which by Lemma 4.6 implies  2λ  Z(G) ≥ n 1 + min . d − λmin Surprisingly, the additional factor of two in the above-mentioned bound that looks like an artefact of the proof, turns out to be necessary, and the result of Theorem 4.4(i) is shown to be tight by the following example. Proposition 4.12. For any even D ≥ 2, and for infinitely many values of N, there ∗ exists an N-vertex D-regular graph G whose smallest eigenvalue is λmin = −2, and which satisfies N − Z (G∗) ≥ 4N/(D + 2) − 2. Proof. Let d = D/2 + 1, and let G be an n-vertex d-regular graph which contains a Hamilton cycle consisting of edges e1, e2,..., en in this order. Clearly, such graphs do exist for all d and infinitely many n. Let G∗ be the of G, that is, G∗ has the vertex set E(G) with two vertices adjacent if the corresponding edges in G share a vertex. Then G∗ has N = nd/2 vertices and is D-regular with D = 2d − 2. Moreover, Hoffman [101] has observed that the smallest eigenvalue of G∗ is −2. ∗ Note that the vertices in G corresponding to e1, e2,..., en form an in- ∗ duced cycle. This implies that V(G ) \{e3, e4,..., en} is a zero forcing set ∗ in G . Namely, the vertex ei forces ei+1 for i = 2, 3, . . . , n − 1. This zero forcing set has order N − (n − 2). Finally, notice that in G∗, we have 2λ N 4 · (nd/2) − min = = n. D − λmin 2d − 2 + 2 Next, we turn our attention to the second part of Theorem 4.4, which says that any (n, d, λ)-graph G satisfies  1  d − λ  Z(G) ≤ n 1 − log . 2(d − λ) 2λ + 1 4.3 spectral bounds 69

In particular, if λ = d1−ε for some ε > 0, then n − Z(G) = Ω ((n log d)/d).

Proof of Theorem 4.4 (ii). We greedily construct a witness. In each step i, we will select vertices si, ti ∈ Ui−1 and a set Ui ⊆ Ui−1. Start with U0 = V, the vertex set of G. Assuming that the steps 1, . . . , i − 1 were executed, let s be any vertex in U − satisfying 1 ≤ deg (s ) ≤ (d − λ)|U − |/n + i i 1 G[Ui−1] i i 1 λ. As usual, NG(si) denotes the neighbourhood of si in G. We fix any ti ∈ NG(si) ∩ Ui−1, and set Ui = Ui−1 \ (NG(si) ∪ {si}). The algorithm continues as long as |Ui| > λn/(d + λ). Denote the total number of steps by k. By construction, the pair (s, t) is a witness. We will show that there is a choice for si throughout the algorithm, and that

n  d − λ  k ≥ log . 2(d − λ) 2λ + 1

Claim 9. If |Ui| > λn/(d + λ), then the induced subgraph G[Ui] contains a vertex u satisfying 1 ≤ deg (u) ≤ (d − λ)|U |/n + λ. G[Ui] i

Proof. Suppose that some set Ui does not satisfy the claim. Since |Ui| > λn/(d + λ), it is not an independent set by Theorem 4.11(ii). Therefore, removing the isolated vertices in G[Ui], we get a non-empty set W ⊆ Ui in which every vertex u satisfies degG[W](u) > (d − λ)|Ui|/n + λ. In particu- lar,   |Ui| e (W, W) = ∑ degG[W](u) > |W| (d − λ) + λ , u∈W n recalling that each edge in E(W, W) is counted twice in e˜(W, W). On the other hand, Theorem 4.11(ii) implies

 d|W|  |W|   |U |  e˜(W, W) ≤ |W| + λ 1 − ≤ |W| (d − λ) i + λ . n n n

We reached a contradiction, which completes the proof of the claim.

Now denote ai = |Ui|/n. By construction, a0 = 1 and, for i ≥ 1, | | − ( ) −   Ui−1 degG[U ] si 1 1 (d − λ)|U − | a = i−1 ≥ |U | − i 1 − λ − 1 i n n i−1 n  d − λ  λ + 1 = 1 − a − . n i−1 n 70 zeroforcingnumber

By induction on i, this implies that for all i,

 λ + 1   d − λ i λ + 1 a ≥ 1 + 1 − − .(4.10) i d − λ n d − λ ≤ n d−λ ≥ ( + ) Claim 10. For i 2(d−λ) log 2λ+1 , ai λ/ d λ . ≤ n d−λ Proof. We use (4.10) to estimate ai for i 2(d−λ) log 2λ+1 , ignoring the constant (1 + (λ + 1)/(d − λ)) and using the inequality 1 − (d − λ)/n ≥ e−2(d−λ)/n for (d − λ)/n < 1/2. This gives  2(d − λ) n  d − λ  λ + 1 a ≥ exp − · log − i n 2(d − λ) 2λ + 1 d − λ 2λ + 1 λ + 1 λ = − = , d − λ d − λ d + λ as required. We conclude that the algorithm continues for at least k = n d−λ 2(d−λ) log 2λ+1 steps, so n d − λ n − Z(G) ≥ log . 2(d − λ) 2λ + 1

To show that for λ ≤ d1−Ω(1), the bound on n − Z(G) is tight up to a con- √  stant factor , we exhibit a sequence of (n, d, λ)-graphs Gm with λ = O d

satisfying n − Z(Gm) ≤ n ((log2 d)/(2d) + o(1)). We use the following con- struction from [118, Section 3]. For an odd integer m, the vertices of Gm are all binary vectors of length m with an odd number of ones except for the all-one vector. Two distinct vertices are adjacent iff the inner product m−1 of the corresponding vectors is 1 modulo 2. This graph has nm = 2 − 1 vertices, degree dm √= (nm − 3)/2, and second largest eigenvalue λ(Gm) = (m−3)/2  1 + 2 = O dm . It is easy to check that if (s, t) is a k-witness in Gm, then the vectors corresponding to t1, t2,..., tk are linearly independent, and therefore k ≤ m = (1 + o(1)) log2 nm = (1 + o(1))nm/(2dm) log2 dm. This gives the required bound on n − Z(Gm).

concluding remarks

• We were wondering whether a linear lower bound on the minimum   rank of a random graph from [98], mr G 1 ≥ 0.14n, can be ex- n, 2 tended to (n, d, λ)-graphs G with d linear in n and λ < d1−Ω(1).A 4.3 spectral bounds 71

negative answer was reached in conversation with Babai [15]. We considered the graph Ht whose vertex set corresponds to t-element subsets of t2, and two vertices are adjacent if and only if the cor- responding sets intersect (the complement of a Kneser graph, de- scribed for instance in [90]). The minimum rank of Ht is mr(Ht) = 2 t2 o(log (nt)), where nt = ( t ) is the corresponding number of vertices. It would be interesting to see if there are such (n, d, λ)-graphs with minimum rank as low as O(log n), which is the lower bound implied by Theorem 4.4 (ii).

• What is the asymptotic value of n − Z (Gn,d), where d is a large con- stant and Gn,d is a graph chosen uniformly at random from all n- vertex d-regular graphs? A greedy argument (see, e.g. [13]) shows that Z (Gn,d) ≤ n (1 − 1/(d − 1)) deterministically, whereas Theo- rem 4.4 (ii) implies that for large d, with high probability, Z (Gn,d) ≤ n (1 − log d/(4d)). This follows from the√ fact that with high probabil- ity, Gn,d is an (n, d, λ)-graph with λ ≤ 3 d (see, e.g., [77]). The lower bound, Z (Gn,d) ≥ n (1 − 40 log d/d), is an immediate consequence of the fact that with high probability, Gn,d contains edges between any two sets S, T with |S|, |T| ≥ 20n log d/d (see, e.g., [104, Lemma 3.6]).

5 THEKÖNIGGRAPHPROCESS

The modern study of random graph processes began in 1959 with the in- augural papers of Erd˝osand Rényi [59, 60]. Given a uniformly random permutation e1,..., eN of E(Kn), they studied the evolution and proper- ties of the graph G(n, m) with edge set {e1,..., em}, which is now known as the Erd˝os-Rényirandom graph. This work has since grown into a well- established research area with many important applications in theoreti- cal computer science, statistical physics, and other branches of mathemat- ics [28, 82, 103]. An important variant of the standard Erd˝os-Rényiprocess, often referred to as the random greedy process, is the following. Given a graph property P, preserved by the removal of edges, begin with an empty n-vertex graph and at each step add an edge chosen uniformly at random from those that do not violate property P. The random greedy process was first consid- ered by Ruci´nskiand Wormald [135] (in the case of bounded degree) and, following discussions of Bollobás and Erd˝os,by Erd˝os,Suen and Winkler in 1995 [72] (in the case of triangle-freeness). Their motivation was defining and analysing a natural probability measure on set of P-maximal graphs. A particularly well studied property is that of being H-free for a general graph H. In many cases, the final graph obtained at the end of the H-free process has been used to give constructions of interest in extremal com- binatorics. In particular, such constructions have been found to improve lower bounds on Turán numbers (see [22, 147]) and on off-diagonal Ram- sey numbers (for example [20, 22, 23, 93]). In addition to looking at the structure and properties of the final graph, one often asks questions about the evolution of the process itself (see, e.g., [22, 119]). The properties mentioned so far are decreasing (closed un- der removal of edges) and local. Monotonicity of P guarantees that the final graph GN is maximal in P and facilitates the use of some common techniques such as coupling with a modified process. So far, global prop- erties are far less well understood and there is no standard approach to analysing these processes (for instance planar [88], r-colourable [119] and k-matching-free [114]). In this chapter we consider a global non-monotone property of a graph G, that the size of a maximum matching ν(G) is equal to cardinality of an 73 74 the könig graph process

optimal vertex cover τ(G).A vertex cover in G is a set of vertices incident to any edge of G. Equivalently, the complement of a vertex cover contains no edges of G and is thus an independent set. We say that the vertex cover C is optimal if there is no vertex cover of cardinality less than C. It is easy to see that in general ν(G) ≤ τ(G) ≤ 2ν(G). We say that G is a König graph (or has property K) if ν(G) = τ(G). The properties ν(G) and τ(G) and the relationship between them have been studied in many contexts. A foundational theorem of König and in- dependently Egerváry [57, 111] says that bipartite graphs have the König property. The problem of finding an optimal vertex cover NP-hard but it can be solved in polynomial time in König graphs via the maximum matching. However, most graphs are closer to the other end of the spec- trum, where τ(G) ∼ 2ν(G). For, with high probability, G(n, m) with m = 1 2 n(log n + ω(1)) has a perfect matching [62], whereas τ(G(n, m)) ∼ n for m  n [103]. 1 In light of this, we are interested in the evolution of a random graph process constrained to the König property, defined as follows.

Definition 5.1. Let G0 be the empty graph on vertex set V, where |V| = n n and set N := (2). Let e1, e2,... eN be a uniformly random ordering of the edges of the complete graph on Kn on V. At each step m ≥ 0, the edge em+1 is offered to Gm. Say that a vertex pair f is acceptable for Gm if f ∈/ {e1,..., em} and Gm + f has property K. If em+1 is acceptable for Gm, we set Gm+1 := Gm + em+1 and say that the edge em+1 is accepted. Otherwise we say that em+1 is rejected and set Gm+1 := Gm. In the remainder of the chapter, we assume that the number of vertices n is even. All the proofs translate to odd n if we define a perfect matching  n  to be a matching of order 2 . We remark that it is also natural to analyse an alternative process in which τ(G) = ν(G) is maintained not just for Gm, but for every subgraph of it. However, this condition is equivalent to bipartiteness and yields precisely the bipartite graph process considered by Erd˝os,Suen and Win- kler [72]. Our process and the resulting graph are rather different. Let us turn back to the König process and consider what can be said about the structure of GN. We start with a simple proposition which is proved in Section 5.3.

1 As usual, we say an event occurs with high probability if it occurs with probability tending to 1 as n → ∞. We write f ∼ g when f is asymptotic to g, that is, f (n) = (1 + o(1))g(n). the könig graph process 75

Proposition 5.2. For all m ≥ 1, the graph Gm has a maximum matching incident to e1,..., em.

Since all the edges of Kn have been offered, it follows that, determinis- tically, the final graph GN has a perfect matching. This settles the value of ν(GN) and raises a number of further questions about the typical struc- ture of GN and the evolution of the process (in particular, occurrence of n a perfect matching). Seeing as GN has a vertex cover C of order 2 , it is a ∗ subgraph of the classical Erd˝os-Gallaigraph Gn, which consists of a clique n on 2 vertices that is completely joined to an independent set on the other n ∗ 2 vertices. Given this, we wonder how close is the graph GN to Gn, or in other words, how many vertex pairs incident to C are missing? How ‘volatile’ is the optimal cover typically in the initial stages of the process, and at which point does it become ‘rigid’ or unique? Our first main result is the following. Theorem 5.3. Let ε > 0. With high probability, the König process satisfies the following properties.

(i) G(1+ε)n log n contains a perfect matching.

(ii) G4n log n has a unique optimal vertex cover C.

(iii) There are O(n) vertex pairs incident to C that are not present in GN. Furthermore, we analyse in more detail the occurrence of a perfect match- ing in Gm. In the Erd˝os-Rényiprocess (G(n, m)), Bollóbas and Thoma- son [31] proved that the very edge that links the last isolated vertex to another vertex makes the graph connected and completes a perfect match- ing with high probability if n is even. In fact, they showed that if m ≥ 1 4 n (log n + ω(1)), then G(n, m) contains a perfect matching on all but at 1 most one non-isolated vertices with high probability. For m < 4 n log n, there are other structural obstructions to containing a perfect matching – in this regime, G(n, m) is likely to contain vertices with two neighbours of degree one, only one of which can be contained in a perfect match- ing. However, quantitatively, the isolated vertices are the main obstruction throughout the evolution of the process, as shown by Frieze [78]. Our second main result is an analogue of [31] for our process. We show that, with high probability, a perfect matching in Gm occurs significantly later than in G(n, m).

 1 1  Theorem 5.4. Let m := 2 + 65 n log n. With high probability, Gm contains isolated vertices. 76 the könig graph process

1 1 The constant 2 + 65 reflects the fact that the number of isolated vertices decays at a slower rate than in G(n, m). This is precisely the reason we find the delay in Gm surprising – one might guess that most vertex pairs containing isolated vertices are acceptable for Gm. However, it turns out that often, pairs containing isolated vertices in Gm are not acceptable as they do not extend the maximum matching in Gm, implying the statement of Theorem 5.4. The delay is, in spirit, similar to Achlioptas processes as the latter are con- ceived for the sake of influencing the typical appearance of graph theoretic properties. Initiating a fruitful line of research, Bohman and Frieze have ex- hibited a process with a delayed phase transition [21]. Besides the phase transition, several other graph properties were considered. For instance, it is known that connectivity and occurrence of a Hamilton cycle (and hence a perfect matching) can be accelerated [105, 116]. As property K is non-monotone and global, many of our arguments, to our knowledge, involve novel ideas and could be of their own interest or be adapted to study other global properties. Our approach is to combine probabilistic arguments with known maximum-matching methods, result- ing in an intuitive and conceptually simple proof. The chapter is structured as follows. In the next section, we introduce standard notation that will be used throughout and outline in detail how the proof will proceed. Then in Section 5.2 we recap some probabilistic tools and prove some preliminary results. In Section 5.3 we use standard techniques to find bounds on ν(Gm) for various time steps m. Theorem 5.3 is proved in Section 5.4. We then show in Section 5.5 that, at almost all time steps from c1n log n to 2n log n, the number of vertices contained in n an optimal cover is close to 2 . This will be used in 5.6 to prove Theorem 5.4. We conclude in Section 5.7 by mentioning some related open problems.

5.1 overview of proof

In this section we give a brief overview of the arguments, introduce our main lemmas and define some notation that will be used throughout. Throughout the proof n will be taken to be sufficiently large when needed. All the statements about Gm in this section will hold with high probability without commenting on it each time. Of course, this is made explicit in the formal statements. 5.1 overview of proof 77

Our first goal, in Section 5.3, is to show that when m is not too small n Gm contains a matching of size 2 (1 − o(1)) with high probability. More precisely, we will prove the following. Lemma 5.5. Let γ > 0 be a sufficiently large constant and let γn ≤ m ≤ 2n log n. With probability 1 − O(n−2), n   ν(G ) ≥ 1 − e−3m/(10n) . m 2 This lemma will follow from the fact that each vertex is in linearly many acceptable pairs and fairly standard arguments (see, for example, [82, Sec- tion 6.1]). The same reasoning yields that, with high probability, after m := (1 + ε)n log n steps, each vertex has degree at least Ω(log n) and Gm has a per- fect matching. This proves statement ((i)) of Theorem 5.3. Section 5.3 concerns the structure of the final graph (Theorem 5.3). Let CN be an optimal cover in GN and let Dm ⊆ V(Gm) be the set of vertices contained in some optimal cover of Gm. How can an edge e incident to CN be rejected during the process? This is only possible if e is not incident to a current optimal cover at the time step m when it is offered (i.e. e ∩ Dm = ∅). The key idea in controlling those rejected edges is to show that 0 for m = 1.1n log n, most vertices in an optimal vertex cover of Gm0 have been in an optimal cover for most of the previous m0 time steps. Assume that Gm0 has a perfect matching, which holds with high probability. Then deterministically, no edges incident to CN will be rejected after the time 0 step m seeing as CN in GN is also an optimal cover in Gm0 . The question in Section 5.5 is how ‘rigid’ an optimal cover of Gm is dur- ing the earlier evolution of the process. We will see that if |Dm| is signifi- n cantly larger than 2 in a positive proportion of steps m, we are accepting too many edges into our graph to maintain an independent set of order n 2 . Let us state the lemma formally. For m0, t ≥ 0, let T(m0, t) be the time period of length t beginning at m0, i.e. T(m0, t) := {m0,..., m0 + t − 1}. p Lemma 5.6. Let m0 := n log n. For γ > 0, let T := T(m0, γn log n) and let ε = (log n)−1/10. With high probability, there are o(n log n) values of m ∈ T  1  such that |Dm| > 2 + ε n. The lemma will be useful in proving Theorem 5.4, as it gives us an upper bound on the number of acceptable pairs for Gm. We conclude Section 5.5 by showing that after 4n log n steps, the optimal cover in Gm is unique to prove Theorem 5.3((ii)). 78 the könig graph process

The proof of Theorem 5.4 is given in Section 5.6. As the details are fairly technical, let us discuss roughly how the number of isolated ver-  1  tices changes after ν(Gm) ≥ 2 + o(1) n, ignoring many events which are less likely to happen. Let Im be the set of isolated vertices in Gm and let Sm = V \ Im. Let Cm be any optimal cover and Bm the vertices matched to Cm by some maximum matching. In this simplified evolution, most of the time the process alternates between a state when |Sm| is even and the |Im| probability of losing isolated vertices is n (1 + o(1)), and the state where 2|Im| |Sm| is odd and this transition probability is n (1 + o(1)). Suppose that at some step m = m1 that there is a matching covering Sm,  1  so Sm = Cm ∪ Bm. Then any isolated vertex v is in 2 + o(1) n acceptable pairs – precisely the pairs whose other endpoint is in Cm ∪ Im. For, any pair between v and Bm will not extend the current maximum matching and is thus not acceptable. Hence Pm [|Im+1| = |Im| − 1] = (1 + o(1))|Im| · n n −1 |Im| 2 · (2) = n (1 + o(1)). Let m2 > m1 be the step in which a previously isolated vertex v is added to Sm . By expansion properties of Gm , it is 1   2 1 − ( ) likely that the edge em2 has created 2 o 1 n alternating paths between v

and Bm2 . Hence in this step, almost all pairs incident to an isolated vertex are acceptable. This holds until the step m3 in which an isolated vertex

v3 is attached and Sm3 is even. If v3 is attached to Bm3−1, the maximum

matching of Gm3 is augmented and we are back at the situation of m1. = But even if v3 is attached to Cm3−1, and thus Cm3 Cm3−1, this creates

quadratically many pairs lying in Bm3 which would extend the maximum

matching in Gm3 . Therefore, after constantly many steps, there will be a

matching covering Sm3 , and we will be back in the situation of m1. It is clear that there are many difficulties in making this heuristic rigor- ous. To start with, vertices newly attached to an optimal cover may not be contained in many alternating paths, and hence we cannot claim that there is a matching (almost) covering Sm. We will find an exact criterion for such vertices (Subsection 5.6.1) and call them quasi-isolates. This recovers some of the described even-odd oscillation, so we can show that in a constant proportion of steps, we are in a position where isolated vertices are lost at a slower rate (Lemma 5.29). It follows that the number of isolated vertices decays slower than in G(n, m), implying that with high probability, G(1/2+C)n log n with C > 0 does not have a perfect matching (Theorem 5.4). Since our argument does not give a sharp constant C, we do not attempt to make marginal improvements. However, we do give a 1 well grounded heuristic for the value C = 4 in Section 5.7. 5.2 preliminaries and probabilistic tools 79

Let us recapitulate and introduce some notation.

all Definition 5.7. Let Gm be the graph whose edge set is {e1,..., em}. Note all that Gm is distributed as G(n, m). The probability measure conditional on {e1,..., em} is Pm, and the corresponding expectation is Em.

In a slight abuse of notation, we use (Gm) for the probability space as well as the sampled process. The logarithm to base e is denoted by log.

5.2 preliminaries and probabilistic tools

5.2.1 The relationship between G(n, m) and G(n, p)

Let G(n, p) denote the n-vertex random graph in which every possible edge is present independently with probability p. The following lemma all allows us to prove that Gm has certain properties, by considering the prop- erties of G(n, m/N). A graph property P is said to be monotone increasing if G ∈ P implies that G + e ∈ P. Lemma 5.8 ([82], Lemma 1.3). Let P be any graph property and let p = p(n) 2 −1/2 n satisfy n p, n(1 − p)p → ∞. If m = p(2) and n is sufficiently large, then

P [G(n, m) ∈ P] ≤ 10m1/2P [G(n, p) ∈ P] .

Moreover, if P is monotone increasing, then P(G(n, m) ∈ P) ≤ 3P(G(n, p) ∈ P). As a consequence, we immediately get a bound on the maximum degree all in G10n log n.

all Claim 5.9. For m = 10n log n, the graph Gm has maximum degree at most 200 log n with probability 1 − O n−2.

10n log n Proof. Let p := n . Using the Chernoff bound (the third formulation in (2) Theorem 5.10 stated below), the probability that a single vertex has degree 200 log n in G(n, p) is at most 2−200 log n. Taking the union bound over all n vertices and applying Lemma 5.8 gives the required result.

5.2.2 Standard Estimates and Probabilistic Tools

Here we collect together the standard probabilistic tools we will use during the proof. The first is a version of the Chernoff Bound taken from [53]. 80 the könig graph process

Theorem 5.10 (The Chernoff Bound). Let X1,..., Xn be a sequence of indepen- n dent [0, 1]-valued random variables and let X = ∑i=1 Xi. Then, for 0 < ε < 1,

2 − ε E(X) P (X < (1 − ε)E(X)) ≤ e 2 ,

2 − ε E(X) P (X > (1 + ε)E(X)) ≤ e 3 . Moreover, if t > 2eE [X], then

P [X > t] ≤ 2−t.

We will use a well-known result about the edge distribution in the ran- dom graph. As usual, EG(U, W) denotes the set of edges of a graph G with one endpoint in U and one in W and EG(U) the set of edges with both endpoints in U. We omit the index G when the graph is clear from the context. The number of edges in G is denoted by e(G). This particular form is stated in [113] for G(n, p), but Lemma 5.8 implies that it also holds for G(n, m).

n Theorem 5.11. Let m = m(n) ≤ 0.9(2). There exists a constant λ > 0 such that, with high probability, in G(n, m) every two disjoint sets U, W ⊂ V of cardinality |U| = u, |W| = w satisfy

2muw r uwm mu2  r m  |E(U, W)| ∈ ± λ and |E(U)| ∈ ± O u . n2 n n2 n

We need another claim on the edge distribution in G(n, m). Although similar results are available in literature, we need an explicit probability bound.

Lemma 5.12. Let β ≤ 0.1 and m ≥ 10β−2n. For G := G(n, m) and any set U βn of cardinality at most 2e , we have  βm|U|  P |E (U)| ≤ = 1 − O(n−2). G n

Proof. We say that G ∈ Du if some set U ⊂ V(G) of cardinality u spans βmu Sβn/(2e) more than n edges in G, and set D = u=1 Du. We will show that P [ ( ) ∈ ] = −2 ( ) = 2m G n, p D O n in G n, p with p n2 . As D is an increasing property, Lemma 5.8 implies that the same bound holds in G(n, m). 5.2 preliminaries and probabilistic tools 81

We estimate the probability of Du by taking the union bound over all the vertex sets of cardinality u.

u(u−1) 1 !u      2 βpn n 2 1 βpnu ne eu P [G(n, p) ∈ D ] ≤ p 2 ≤ . u u βpnu u βn 2

We proceed by splitting the range for u. First note for u < βpn, Du is βpnu empty as a set of cardinality u cannot span 2 edges. In the remaining two cases, we use the fact that the term in brackets is increasing in u (for the −1 calculation,√ see Claim 3), as well as the hypothesis that βpn ≥ 20β ≥ 200. If u ≤ n, then u u  1 − 1 · 1 βpn  1 − 1 βpn −2u P [G(n, p) ∈ Du] ≤ n 2 · n 3 2 ≤ n 2 4 < n . √ βn 1 −1 For n < u ≤ 2e , we recall that 2 βpn ≥ 10β and deduce u   1 βpn! ne eu 2  −1 u P [G(n, p) ∈ D ] ≤ ≤ 2e2β−1 · 2−10β < 2−u. u u βn

Combining these estimates, we get

√ βn n 2e    √  P [G(n, p) ∈ D] ≤ n−u + 2−u = O n−βpn + O 2− n  n−2. ∑ ∑√ u=βpn u= n  As remarked, it follows that P [G(n, m) ∈ D] = O n−2 . We move on to the probabilistic preliminaries. The following lemma is a bound for the lower tail of the binomial distribution. If we are looking to control very large deviations, elementary computations give a stronger estimate than the usual statements of Chernoff’s bounds. Lemma 5.13. Let X ∼ Bin(k, p) with µ = kp → ∞ as k → ∞. Given η > 0 and a constant δ > 0 satisfying δ(2 + log(1/δ)) + (log(δµ))/µ < η for sufficiently large k, we have P [X ≤ δµ] ≤ e−(1−η)µ. In particular, such a constant δ > 0 can be chosen for an arbirary η > 0.

δµ k s k−s Proof. By definition of Bin(k, p), P [X ≤ δµ] ≤ ∑s=0 (s)p (1 − p) . Stan- dard inequalities give s δµ  ekp s δµ  e1+pµ  P [X ≤ δµ] ≤ ∑ e−(k−s)p = ∑ e−kp, s=0 s s=0 s 82 the könig graph process

We can bound each summand by the final term, s = δµ, seeing as the summand is increasing in s. It follows that

 δµ P [X ≤ δµ] ≤ δµ e2δ−1 e−µ.

The hypothesis on δ is equivalent to log(δµ) + δµ(2 + log(1/δ)) ≤ ηµ, and hence P [X ≤ δµ] ≤ eηµe−µ.

5.2.3 Martingale concentration inequalities

Recall that a sequence of random variables X0, X1, . . . is called a martingale if for each i ≥ 1, we have E[Xi | X0,..., Xi] = Xi−1. We will use two stan- dard martingale concentration results in our proof. The first is Azuma’s Inequality. The version we present here was taken from [53].

Theorem 5.14 (Azuma’s Inequality). Let X0, X1,... be a martingale such that for each i ≥ 0 there exists a constant c ≥ 0 such that |Xi − Xi−1| ≤ c. Then,

 λ2  P [|X − X | > λ] ≤ 2 exp − . m 0 2cm

The second is Freedman’s Inequality. It gives a stronger concentration result than Azuma’s inequality when the average differences Xm+1 − Xm are much smaller. To avoid working with filtrations and the corresponding notation, we state it in our specific context. The general statement can be found for instance in [144].

Theorem 5.15 (Freedman’s inequality). Consider a real-valued martingale m (∑i=1 Yi : m ∈ N), where Yi depends only on the first i steps of our process. As- sume that the sequence is uniformly bounded, that is, |Yi| ≤ 1 for i ≥ 0. Define the m  2 predictable quadratic variation process of the martingale, Wm = ∑i=1 Ei−1 Yi . Then, for all ` > 0 and σ2 > 0, " # m  `2  P ∃m : Y ≥ ` and W ≤ σ2 ≤ 2 exp . ∑ i m 2 + ` i=1 2σ 4 /3

The following special case for indicator random variables will be useful in our applications. 5.3 forming a large matching 83

Corollary 5.16. Let Xm be an indicator random variable depending only on Gm and qm := Pm−1 [Xm = 1]. For any m ∈ N and L > 0, " ! !# m m −2n P ∑(Xi − qi) ≥ Ln ∧ ∑ qi ≤ 4Ln ≤ e . i=1 i=1

Proof. Fix m, and define T := {1, . . . m}. Let Yi := Xi − qi for i ≥ 1. By defi-  m0  nition, Ei−1 [Yi] = 0, so ∑i= Yi is a martingale with |Yi| ≤ 1. Moreover, 1 m0  2  ∑i∈T Ei Yi+1 = ∑i∈T qi(1 − qi), so our event implies the event stated in Freedman’s inequality with σ2 = 4Ln. Applying Freedman’s inequality gives precisely " ! !#  Ln  P (X − q ) ≥ Ln ∧ q ≤ 4Ln ≤ 2 exp − ∑ i i ∑ i + i∈T i∈T 8 4/3 ≤ e−Ln/10.

5.3 forming a large matching

Our goal in this section is to prove Lemma 5.5. It can be viewed as an analogue of Frieze’s result on (G(n, m)) – he showed that for m = Θ(n), G(n, m) contains a matching covering almost all non-isolated vertices with high probability. We start by proving Proposition 5.2, a deterministic property of the König process. It will easily follow that after O(n) steps we have a match- ing of linear size.

Proof of Proposition 5.2. By induction on m. The statement is trivial for m = 0. Suppose that it holds for the first m − 1 steps and consider the edge em. Let M be a matching incident to {e1, e2,..., em−1}. If em is disjoint from M, we can take M + em to be the required optimal matching. If em is incident to M and ν(Gm) = ν(Gm−1), then M is still satisfies the condition. 0 Finally, if em is incident to M and ν(Gm) = ν(Gm−1) + 1, let M be a 0 matching of size |M| + 1 in Gm. The union of M and M consists of cycles and paths alternating between M and M0, where one path P containing 0 em has odd length, contains one more edge of M than of M, and two endpoints outside M; all the remaining paths have even length. Then the 0 matching M1 created by replacing edges of M in P by those of M , is a 84 the könig graph process

0 maximum matching in Gm covering all vertices of M. Hence M is incident to {e1,..., em}, as required. −2 −2 Corollary 5.17. Let ε > 0 and m0 := ε n. With probability 1 − O n ( ) ≥ 1−ε · (i) ν Gm0 2 n, and

 1  (ii) whenever m0 ≤ m ≤ 2n log n, each vertex is contained in at least 2 − ε n pairs which are acceptable for Gm.

all Proof. For every m, Gm has an independent set of order n − 2ν(Gm), namely the complement of V(M), where M is the maximum matching granted by Proposition 5.2. Denoting the order of a maximum independent set in G by α(G), standard first-moment computations (see, e.g., [103, Section 7]) yield h   i   P Gall ≥ n ≤ e−εn = O n−2 α m0 ε . −2 It follows that with probability 1 − O n , n − 2ν(Gm0 ) < εn, as required for ((i)).  1  This tells us that with high probability τ(Gm) ≥ 2 − 2ε n for m ≥ m0. Any pair containing v and a vertex in an optimal vertex cover Cm of Gm is −2 all acceptable for Gm. By Corollary 5.9, with probability 1 − O(n ), Gm has maximum degree O(log n) which implies that o(n) pairs incident to v have been offered so far. Thus the number of acceptable pairs incident to v is at  1  least 2 − ε n, as required for ((ii)). To avoid confusion, we remark that the bound in ((i)) is rather crude since the independence number of G(n, m) with m = ε−2n is actually Θ ε2n log ε−1. The probability ‘benchmark’ O n−2 across this section is also arbitrarily chosen – all the probability bounds are significantly stronger. Even though we need a stronger bound on ν(Gm), Corollary 5.17 is a very useful tool, providing a lower bound on the probability that em+1 is acceptable for Gm. As usual, we let δ(G) denote the minimum degree of G. The neighbourhood of a vertex set S in a graph G, excluding S, is denoted by NG(S). We may omit the subscript when it is clear which graph plays the role of G. The following facts about the edge distribution of Gm will be used for our expansion arguments.

Lemma 5.18. Let β := 10−3. There exists γ > 0 such that for all γn ≤ m ≤ −2 2n log n, with probability at least 1 − O n ,Gm has the following properties. 5.3 forming a large matching 85

 −m/(3n) (i) Gm has a subgraph H such that |V(H)| ≥ n 1 − e and δ(H) ≥ 10βm n . βn (ii) For any set S ⊆ V(H) with |S| ≤ 16 , |NH(S) ∪ S| ≥ 2|S|. Proof. First consider ((i)). We will use the following claim. Claim 5.19. Let α := e−m/(3n) and let A be the event that there exists a set S of −2 order αn such that |EGm (S, V \ S) | ≤ 10αβm. Then P [A] = O n . Proof. Let B be the event that statements ((i)) and ((ii)) of Corollary 5.17 all occur and Gm has maximum degree at most 200 log n. By Claim 5.9 and Corollary 5.17, we can pick γ large enough to ensure that B occurs with probability at least 1 − O n−2. We will show that

P [A | B] ≤ n−2,(5.1) from which the claim will follow. Let S be a set of order αn. As we condition on B occurring, we can choose γ/10 to be sufficiently large such that for all m/10 ≤ m0 ≤ 2n log n, each 9 vertex in Gm0 is contained in at least 20 n pairs acceptable for Gm0 . Also, by n choosing γ to be sufficiently large, we can ensure that |S| ≤ 100 . Combining these two facts with the maximum-degree bound gives that 0 2n for all m/10 ≤ m ≤ 2n log n, each v ∈ S is in at least 5 acceptable pairs (v, w) for Gm0 such that w ∈/ S. So the probability that em0+1 is acceptable and has exactly one endpoint in S is at least

2n 5 |S| 4α n ≥ . (2) 5 Hence     9m 4α − 1 αm P [|E (S, V \ S) | ≤ 10αβm | B] ≤ P Bin , ≤ 10αβm ≤ e 2 . Gm 10 5 The last inequality follows from Lemma 5.13 (applied with µ = 3αm/5, τ = 1/6 and δ = 1/60). Taking the union bound over all sets of order αn

  αn n − 1 αm  −1 − m  − αm P [A | B] ≤ · e 2 ≤ eα e 2n ≤ e 10 . αn

Since m ≤ 2n log n, a very crude computation gives αm = e−m/(3n)m ≥ e−2 log n/3m ≥ n1/3, so P [A] ≤ e−αm/10 ≤ n−2, completing the proof of (5.1) and hence the proof of the claim. 86 the könig graph process

Now consider applying the following algorithm to Gm. Let H0 := Gm and R0 := ∅. Now for each i ≥ 0, if there exists some v ∈ Hi such that deg (v) < 10βm , then define H := H \{v} and R := R ∪ {v}. We Hi n i+1 i i+1 i terminate the algorithm when no such v exists, and denote the final step by −2 j. Claim 5.19 implies that P [j ≥ αn] = O n . For, if Rαn is defined, then

|EGm (Rαn, V \ Rαn)| ≤ 10αβm by construction. This completes the proof of ((i)). To show (ii), assume that part (i), as well as the conclusion of Lemma 5.12 hold. Let H be the subgraph given by (i), and let S ⊆ V(H) be a set of β β cardinality |S| ≤ 16 n. Let T = NH(S) ∪ S and suppose that |T| ≤ 2|S| ≤ 8 n. By Claim 5.12 and the minimum-degree assumption about H, we have

βm 10βm |T| ≥ |E (T)| ≥ |S|, n Gm n and so |T| ≥ 10|S|, a contradiction. So |T| > 2|S|, as required for (ii).

We are now ready to present the proof of Lemma 5.5.

Proof of Lemma 5.5. Set β := 10−3 and ζ := 210β−2. Define m0 = m − ζn ≥ 9m 10 . Let B be the event that statements ((i)) and ((ii)) of Lemma 5.18 hold for m0 and let γ be large enough such that P(B) ≥ 1 − O n−2 (possible by Lemma 5.18). We show that h n   i   P ν(G ) < 1 − e−3m/(10n) B = O n−2 ,(5.2) m 2 which will imply the proof of the lemma. In what follows, we condition on B occurring. Hence there exists a set L of vertices which span a subgraph of minimum 0 10βm −m0/3n −3m/10n degree at least n in Gm0 , such that n − |L| ≤ ne ≤ ne . We may ensure L has an even number of vertices by removing an arbitrary 0 vertex if |L| is odd. Define Hi := Gi[L] for each i ≥ m . We use an expan- sion argument due to Bollobás and Frieze [30] to show that Hm contains a perfect matching. Claim 5.20 ([82]). Let H be an n-vertex graph in which every S ⊆ V(H) with |S| ≤ k satisfies |NH(S)| > |S|. If H does not have a perfect matching, then there k are at least (2) vertex pairs f such that ν(H + f ) > ν(H). Proof. See, for example, Frieze and Karo´nski[82, p. 86–87]. 5.3 forming a large matching 87

0 n Consider any step i with m < i ≤ m and ν(Gi) < 2 . As we condition β on B, we may apply Claim 5.20 with k = 16 n to get that there are at least βn/16 ( + ) > ( ) ≤ ( 2 ) vertex pairs f such that ν Gi f ν Gi . As i 2n log n, only o(n2) vertex pairs have been offered so far. Therefore the probability that ν(Gi+1) > ν(Gi) is at least the probability that such an f is offered, which is βn/16 ( ) − i β2 2 ≥ n 9 .(5.3) (2) − i 2  2  Y ∼ n β Let Bin ζ , 29 . We have h n i P [H has no perfect matching | B] ≤ P Y < = e−Ω(n) ≤ n−2. m 2 This completes the proof of the lemma.

We remark that our proof actually gives something stronger, which we will use to prove Theorem 5.3((i)).

Corollary 5.21. Let β > 0, m = ω(n). If L is a vertex subset of even order and βm Gm−ω(n)[L] has minimum degree at least 2n , then Gm[L] has a perfect matching with high probability.

To conclude the section, we show that Gm has a perfect matching for m = (1 + ε)n log n to prove Theorem 5.3((i)). It is not difficult to show the claim for m = (1 + o(1))n log n by controlling distances between ‘low- degree’ vertices in Gm (see, e.g., [82]), but we chose to include the slightly weaker statement, which is restated here for the benefit of the reader. Theorem 5.3((i)). Let ε > 0 and m = (1 + ε)n log n. With high probability, Gm has a perfect matching.

ε Proof. Let t := 4 n log n, and let δ be small enough for Lemma 5.13 to ε hold with η = 20 . We will first show that with high probability, Gm−t has minimum degree at least δ log n. 0  1 ε  By Corollary 5.17, if t ≤ m ≤ 2n log n, each vertex is in at least 2 − 40 n vertex pairs which are acceptable for Gm0 . Therefore, for all k ≥ 0, we have h i P deg (v) ≤ k ≤ P [X ≤ k] , Gm−t 88 the könig graph process

  1 ε  where X ∼ Bin m − 2t, n − 20n . By the choice of δ and applying Lemma 5.13, we have    h i  ε  1 ε −1− ε P deg (v) < δ log n ≤ exp − 1 − (m − 2t) − ≤ n 10 . Gm−t 20 n 20n Taking the union bound over all n vertices gives that with high probability, Gm−t has minimum degree at least δ log n. As t = ω(n), we can apply Corollary 5.21 to Gm−t with L = [n] and deduce that Gm has a perfect matching.

5.4 the structure of GN

Let CN be an optimal cover of our final graph. In this section, we bound the number of rejected vertex pairs incident to CN. In fact, it suffices to all control the edges of Gm rejected from an optimal cover Cm of Gm for m = 1.1n log n. For, as soon as Gm has a perfect matching, which holds with high probability by Theorem 5.3((i)), CN will also be an optimal cover in Gm and any edge incident to it will be accepted. In particular, we do not rely on uniqueness of CN (Theorem 5.3((ii))). We start by introducing some concepts that will be used throughout this section. For a time period T := T(m1, t) and vertex v ∈ V, define the weight WT(v) of a vertex as

WT(v) := |{m ∈ T : v ∈ Dm}|.

Note that WT(v) is a function of our random process. For a set S ⊆ V, define the average weight of S in T as 1 W (S) := W (v). T | | ∑ T T v∈S The main ingredient in the proof of Theorem 5.3 is the following lemma. The proof uses a martingale trick similar to one that will be used in Lemma 5.25. The main difference is that here we apply Freedman’s inequality (Corol- lary 5.16), whereas there we apply Azuma’s inequality. For, Azuma’s in- equality considers the worst-case change of a martingale (Xm). In our case the typical changes are much smaller. Therefore Freedman’s inequality gives a stronger bound, which is also necessary for the computations.

Lemma 5.22. For γ > 0, let m1  t := γn log n and let T := T(m1, t). The following holds with high probability. 5.4 the structure of GN 89

n ( ) ≥ 150n2 (i) No set S of order at least 3 with WT S t is independent in Gm1+t.

n n an2 (ii) Let a ≥ 10. For any set U of order 2 with WT(U) ≥ 2 − t , the number of edges incident to U which were rejected during T + 1 := T(m1 + 1, t) is at most 5an.

Proof of Lemma 5.22. Let m ≥ m1 and let S ⊆ V be a set of order at least n 0 (2) 3 . For each m, define Qm+1 to be the set of vertex pairs in S which are acceptable for Gm. Let s be the maximum integer such that

s |Q0| i ≤ 79n.(5.4) ∑ (n) − i + 1 i=m1+1 2

0 For m1 + 1 ≤ j ≤ s, define Qj := Qj, and for j > s define Qj := ∅. The ( 0) reason we truncate the sequence Qj j≥m1+1 in this manner is to deal with a technicality in our application of Freedman’s inequality. Let Xm be the indicator random variable of the event that em ∈ Qm. Moreover, define

|Qm+1| qm+1 := Pm [Xm+1 = 1] = n , (2) − m so that by definition we have

∑ qm+1 ≤ 80n. m∈T

Given this, by applying Corollary 5.16 with L = 20, we see that " #

−2n P ∑ (Xm+1 − qm+1) ≥ 20n ≤ e .(5.5) m∈T

150n2 Let A be the event that WT(S) ≥ t . Let B be the event that Xm+1 = 0 for all m ∈ T. We will show that P [A ∧ B] ≤ e−2n. As the event that S is independent contains the event B, by taking the union bound over all choices of S this suffices to prove ((i)). To use the condition on WT(S), we will need a simple relation between qm and WT(S).

Claim 5.23. If A occurs, then ∑m∈T qm+1 ≥ 40n. 90 the könig graph process

all Proof. Any vertex pair not contained in Gm but intersecting S ∩ Dm is ac- 1 ceptable for Gm. Therefore, we have |Qm+1| ≥ 2 |S ∩ Dm|(|S| − 1) − m. Sum- ming over m and using ∑m∈T |S ∩ Dm| = tWT(S), which is a restatement of the definition of WT(S), we get

2|Q | tW (S)|S|  6t2  ≥ m+1 ≥ T − ∑ qm+1 ∑ 2 2 O 2 . m∈T m∈T n n n

By the claim assumption and the definition of t, this is at least 50n − O(log2 n) ≥ 40n, and the claim follows.

By Claim 5.23, if both A and B occur, then

∑ (Xm+1 − qm+1) = ∑ qm+1 ≥ 40n. m∈T m∈T

So by (5.5), P [A ∧ B] ≤ e−2n as required. This completes the proof of ((i)). n The proof of ((ii) follows very similarly. Let U be a vertex set of order 2 . Now, for m ≥ m1, let Rm+1 be the set of vertex pairs intersecting U \ Dm 2 |Rm+1| n an and let rm+1 := n . If WT(U) ≥ 2 − t , then (2)−m 4|R | 4n (|U| − |U ∩ D |) ≤ m+1 ≤ m ∑ rm+1 ∑ 2 ∑ 2 m∈T m∈T n m∈T n 4t  n  = − W (U) ≤ 4an. n 2 T

Let Zm be the indicator random variable of the event em ∈ Rm. Analo- gously to part ((i)), applying Corollary 5.16 to rm with the constant L = a gives

" !  2 # n an −an/10 P ∑ Zm+1 ≤ 5an ∧ WT(U) ≥ − ≤ e . m∈T 2 t

Taking the union bound over all possible U and recalling the hypothesis a ≥ 10, we get that ((ii)) holds with probability at least 1 − 2ne−n = 1 − o(1).

We now apply Lemma 5.22 to control rejected edges adjacent to an optimal cover. We showed that it suffices to consider the process up to m = 1.1n log n. 5.5 rigidity and uniqueness of an optimal cover 91

Lemma 5.24. Let m := 1.1n log n. With high probability, if C is any optimal all cover in Gm, then O(n) edges of Gm incident to C have been rejected.

Proof. Set t = m − γ0n and T := T(γ0n, t), where γ0 is a large constant cho- sen so that the conclusion of Lemma 5.5 holds. Namely, with probability  log n  1 − O n , n   ν (G ) ≥ 1 − e3m/(10n) for m ∈ T.(5.6) m 2

We have |Dm| ≥ ν (Gm), so (5.6) implies

n n n ne−3γ0n/(10n) n 20n2 W (V) ≥ − e−3m/(10n) ≥ − ≥ − . T ∑ −3/(10n) 2 2t m∈T 2 2t 1 − e 2 t (5.7)

As V \ C is an independent set in Gm, Lemma 5.22 ((i)) implies that with 150n high probability WT(V \ C) ≤ log n . Combining this with (5.7) gives that n 200n WT(C) ≥ 2 − log n with high probability. Now applying Lemma 5.22 ((ii)) to C gives that, with high probability, the number of edges incident to C rejected during T is O(n). Clearly at most γ0n vertex pairs are rejected before time γ0n. This com- pletes the proof of the lemma.

5.5 rigidity and uniqueness of an optimal cover

Our next aim is to show that, in almost all time steps, the union of all n optimal covers Dm contains only 2 (1 + o(1)) vertices. Note that it is not clear how to do this using structural arguments for Gm along the lines of the proof of Theorem 5.3((ii)). For, a priori it might be possible to modify a specific vertex cover Cm by very small vertex sets. However, we can give a very simple proof relying on the statistics of our process. We start with an elementary observation. For m ≥ 0, conditioned on (ei)i≤m, let pm+1 be the probability that em+1 is acceptable for Gm. Note that, for 0 ≤ m ≤ 2n log n, if Gm contains a matching of size at least  1  n 2 − r(n) , then

n ( 1 +r)n ( ) − ( 2 ) − m n2 − n − (1 + 2r)2n2 − 4n log n ≥ 2 2 ≥ pm+1 n 2 (5.8) (2) − m n 3 5 log n ≥ − 5r − .(5.9) 4 n 92 the könig graph process

A key ingredient is the following lower bound on the number of edges accepted into our graph during a certain time period.

Lemma 5.25. For γ > 0, let T := T(m0, γn log n). Let G be the graph consisting of all edges accepted into (Gm) during the period T. With high probability, p |E(G)| ≥ ∑ pm − n log n.(5.10) m∈T

Proof. For m ≥ 0, define Xm to be the indicator random variable of the event that em is accepted and Ym := Xm − pm. By definition of pm, we have  0  E [Y ] = 0, so m Y is a martingale. Set Y := Y . Moreover, m−1 m ∑i=m0 i ∑m∈T m |Ym| ≤ 1 for each m, so we can apply Azuma’s inequality (Theorem 5.14) p with λ = n log n. We get

 2  h p i n log n P Y < −n log n ≤ exp − . 2γn log n

It follows that, with probability 1 − e−Ω(n), p |E(G)| = ∑ Xm ≥ ∑ pm − n log n. m∈T m∈T

 1  Call a time step m ε-flexible if |Dm| ≥ 2 + ε n. We now prove Lemma 5.6. p Proof of Lemma 5.6. Recall that m0 = n log n, T = T(m0, γn log n) and ε = (log n)−1/10. For ease of notation, let t := |T| = γn log n. Let us assume, in order to obtain a contradiction, that the number of ε-flexible steps is greater than εt. Let G be the graph consisting of edges accepted during the all interval T, and H := G . Note that H is distributed as G + m0+t n,m0 t By Lemma 5.5, we know that with high probability ν(Gm0 ) contains a n −3m0/(10n) matching of size 2 (1 − r), where r := r(n) = e = o(ε). As this property is increasing, we have that Gm with m ≥ m0 also contains a matching of at least this size. Thus, using (5.8), with high probability for 3 all m ≥ m0, we have pm ≥ 4 − 6r. Moreover, if the step m is flexible, we 3 ε have a stronger bound pm ≥ 4 + 2 . By applying Lemma 5.25 and the above analysis, with high probability we have  2  p 3t ε 3 ε |E(G)| + n log n ≥ ∑ pm ≥ + · εt − 6r(1 − ε)t ≥ + t. m∈T 4 2 4 4 (5.11) 5.5 rigidity and uniqueness of an optimal cover 93

Let m1 := m0 + t. let C be an optimal vertex cover in Gm1 . As V \ C is an independent set, by applying Theorem 5.11 we obtain

 3   |E(G )| ≤ |E (C)| + |E (V \ C, C)| ≤ + O (log n)−1/2 m m1 H H 4 1  3   = + O (log n)−1/2 t. 4

As Gm1 ⊇ G, this contradicts (5.11). This completes the proof of Lemma 5.6.

We finish the section by proving Theorem 5.3((ii)), which will be restated here to aid the reader.

Theorem 5.3((ii)). With high probability G4n log n has a unique optimal vertex cover.

5 3 Proof. Set m0 := 4 n log n, m1 := 2 n log n and m2 := 4n log n. By Theo- rem 5.3((i)), with high probability Gm0 has a perfect matching and hence an optimal cover of cardinality n/2. For each i ≥ 0, let Ci be the set of optimal covers of Gi. Observe that for all i ≥ m0, we have Ci+1 ⊆ Ci, as adding edges can only eliminate optimal covers. = { } = { } 0 = Let E0 : e1,..., em0 and E1 : em0+1,..., em1 . Note that G : all 0 n G \ E ∼ G 0 m = n m1 0 n,m , where 4 log . So by Theorem 5.11, with high probability, in Gn,m0 every set of cardinality n/2 contains at least (1 − (( )−1/2)) n O log n 16 log n edges. However, Gm1 contains an independent set of cardinality at least n/2. So, with high probability, at least −1/2 n (1 − O((log n) )) 16 log n pairs of E1 are rejected. Let U be the set of vertices incident to these rejected pairs. −1/4 n We will next show that |U| ≥ (1 − (log n) ) 2 . Suppose, for a contra- −1/4 n diction, that |U| < (1 − (log n) ) 2 . Then applying Theorem 5.11 shows that in G0, with high probability we have 1 1 |E(U)| ≤ n log n − Ω(n(log n)3/4) < (1 − O((log n)−1/2)) n log n, 16 16 a contradiction for n sufficiently large. Now observe that if (u, v) ∈ E1 is rejected, then there exists no C ∈ Ci such that u ∈ C or v ∈ C. This together with our observation from the first paragraph shows that, in fact, every cover in Cm1 contains neither u nor v.

So in particular, no vertex of U is contained in a cover in Cm1 (or Cm2 ). = { } = ( ) ∈ Now let E2 : em1+1,..., em2 and consider e : u, v E2 such that u ∈ U and v ∈/ U. As in the previous paragraph, if e is rejected then no 94 the könig graph process

cover in Cm2 contains u or v. However, if e is accepted, since (by the previ-

ous paragraph) u is in no cover of Cm2 , then every cover of Cm2 contains v. So for each v ∈ V \ U, let Ev be the event that E2 contains a pair (u, v) for some u ∈ U. If Ev occurs, then we know that either v is contained in every

cover or no cover of Cm2 . S So if, A := v∈V\U Ev occurs, then Cm2 contains a unique optimal cover. It remains to show that A occurs with high probability. For a particular v ∈ V, say that ei is good for v if ei = (u, v) for some u ∈ U. Let di(v) be all the degree of v in Gi . For i ≤ m2, the probability that ei is good for v is at least |U| − di(v) (1 − o(1)) n ≥ , (2) − i + 1 n −1/4 as |U| ≥ (1 − (log n) )n/2 and, by Claim 5.9 we have di(v) = O(log n), all all as Gi ⊆ G10n log n. So the number of pairs in E2 that are good for v is at  5 1−o(1)  least X, where X ∼ Bin 2 n log n, n . We have

5 n log n  1 − o(1)  2 P(X = 0) = 1 − ≤ e−2n log n. n E 1 Hence the probability that v does not occur is at most n2 . Applying the union bound over the vertices in V \ U gives that A occurs with high probability, as required.

5.6 delayed perfect matching threshold

The focus of this section is to prove Theorem 5.4((iii)), which says that for  1 1  m = 2 + 65 , Gm still does not have a perfect matching. Throughout this 3 section, we set m2 := 8 n log n and m3 := 2n log n. We assume that

n − 3m2 n  − 1  ν(G ) ≥ − ne 10n ≥ 1 − n 20 (5.12) m2 2 2 as this event occurs with high probability by Lemma 5.5. Our first challenge is to consider vertices v of positive degree which are not in a specific maximum matching Mm, and create a dichotomy between those vertices which create many options for extending Mm, and those that do not. This requires some information on separation of small-degree ver- tices in Gm, a common feature of results on matchings in random graphs. all For u, v ∈ V, let dm(u, v) be the distance from u to v in Gm and let dm (u, v) all be the distance from u to v in Gm . 5.6 delayed perfect matching threshold 95

Lemma 5.26. There exists δ > 0 such that with high probability the follow- ing statement holds. For all m2 ≤ m ≤ m3, there are no three distinct ver- tices u1, u2, u3 such that, for all i, j ∈ [3], we have dGm (ui) < δ log n and dm(ui, uj) ≤ 10.

Proof. Let A be the event that there exist u1, u2, u3 such that for all i, j ∈ [3], all we have deg (ui) ≤ δ log n and dm (ui, uj) ≤ 10. As for any m2 ≤ m ≤ Gm2 3 m3 and any i, j ∈ [3], we have deg (ui) ≥ deg (ui) and dm(ui, uj) ≤ Gm Gm2 d (u u ) ≤ dall (u u ) P [A] = o( ) m3 i, j m3 i, j , it suffices to show that 1 . Let U := {u1, u2, u3} ⊆ V and define

3 = ( ) d : ∑ deg Gm2 ui . i=1

dall (u u ) ≤ i j ∈ [ ] If m3 i, j 10 for all , 3 , then there is a connected subgraph of Gall U m3 on at most 20 vertices containing . In particular, this subgraph has a T Gall F spanning tree. Let F be the event that m3 contains a fixed labelled tree with U ⊆ V(F) and |V(F)| := f ≤ 20. 0 We turn to a lower bound on P [d < 3δ log n | TF]. Let T := {m : em ∈ E(F)} and T := T(γn, m2 − γn). Let Ym be the indicator random variable of the event that em has one endpoint in U and the other in V \ F. Using Corollary 5.17 ((ii)), for γ sufficently large and any γn + 1 ≤ m ≤ m2 such that m ∈/ T0 we have 99 P − [Y = 1] ≥ 3 · . m 1 m 100n 99  Thus, letting X ∼ Bin m2(1 − o(1)), 3 · 100n , we have   P  ∑ Ym < k ≤ P [X < k] . m∈T\T0

99 102 Noting that E [X] = 3 · 100n · m2(1 − o(1)) > 100 · log n and applying Lemma 5.13 with δ sufficiently small, we get " # −(1+ 1 ) log n P [d < 3δ log n | TF] ≤ P ∑ Ym ≤ δ · 3 log n | TF ≤ e 100 . m≤m2 −  log n  f 1 For a fixed tree F we now show that P [TF] = O n . To see this, n −1 note that the probability that the random graph G(n, p) with p = m3(2) 96 the könig graph process

f −1 contains F is precisely p . Using Lemma 5.8, the fact that TF is a mono- 5 log n tone increasing event and a crude estimate p < n , we get − −  5 log n  f 1  log n  f 1 P [T ] ≤ 3 = O . F n n

Putting this together with the previous bound, we get

  f −1 −1− 1 log n  − f  P [(d < 3δ log n) ∧ T ] = n 100 · O = o n . F n   As there are O n f choices for F and U ⊆ V(F), we can take the union bound over U and F to get P [A] = o(1).

For the main part of our argument, it is helpful to introduce some more definitions. We say that a vertex is an isolate or is isolated in Gm if it has no neighbours in Gm. If there exist two vertices v1, v2 of degree one in Gm that share a neighbour, then we arbitrarily choose one to be a quasi-isolate and the other to be its partner. The partner is not a quasi-isolate. Let Jm denote the set of vertices that are either isolates or quasi-isolates in Gm. Define Mm be a maximum matching in Gm chosen subject to the follow- ing:

(M1) V(Mm) := Cm ∪ Bm, where Cm is an optimal vertex cover in Gm.

(M2) No vertex v ∈ Cm matched to u ∈ Bm has a neighbour w ∈/ Mm such that d(w) > d(u).

(M3) Mm contains no quasi-isolate. Achieving (M2) is possible by swapping u for w if this is not the case (this will give another matching of the same size). Similarly, by swapping a quasi-isolate with its partner, (M3) is achievable. Note that there may be many choices for such an Mm. By definition of Mm, the set Jm is disjoint from Mm. Define the set of helpers to be Hm := V \ (Mm ∪ Jm) and observe that the number of helpers |Hm| is independent of the particular choice of Mm. Intuitively, the helpers are vertices attached to Mm in a way that enables Mm to be extended with the addition of em+1. Our aim is to show that in a constant proportion of steps, Gm contains no helpers. In other words, there is a matching covering all vertices apart from isolates and quasi-isolates. At such steps m, the rate of losing isolated 5.6 delayed perfect matching threshold 97

all vertices is slower than in Gm , so Theorem 5.4 will follow. The argument consists of two main lemmas.  −1/20 Lemma 5.27. The number of steps m ≥ m2 at which |Hm| ≥ 2 is O n .

Lemma 5.28. There exists C > 0 such that with high probability, |Jm| 6= 0 and is even for at least Cn log n steps m after m2. Let us first show how the two lemmas imply the desired result.

Lemma 5.29. For at least (C + o(1))n log n steps between m ≥ m2,Hm = ∅ and Jm 6= ∅.

Proof. Since our number of vertices n is even, |Jm| + |Hm| = n − |V(Mm)| is always even. In other words, |Jm| and |Hm| have the same parity, so Lemma 5.28 implies that for Cn log n steps |Hm| is even and Jm 6= ∅. How- ever, Lemma 5.27 shows that |Hm| ≥ 2 for o(n log n) steps. The result follows.

5.6.1 Extending maximum matchings in Gm

Now we move on to Lemma 5.27. It is an analogue of the hitting time result of Bollobás and Thomason [31] for our setting. The aim is to show that, for m2 ≤ m ≤ m3, if we have at least two helpers in Gm, then there are many choices of em+1 whose addition would increase the size of a maximum matching (see Lemma 5.31 for the precise state- ment). Before doing this, we need an expansion property stronger than Lemma 5.18 ((ii)). For the rest of the section, fix δ to be the constant from Lemma 5.26. Lemma 5.30. With high probability the following statement holds. There exists ζ > 0 such that for all m ≥ m2 and for every set S ⊆ V of cardinality at most n 4 log n where every vertex in S has at least δ log n neighbours outside of S, we have

NGm (S) ≥ ζ log n|S|. n Proof. By Theorem 5.11, with high probability every set S with |S| ≤ 4 log n satisfies r 2m|S||N(S)| m|S||N(S)| EG (S, N(S)) ≤ EG (S, N(S)) ≤ + λ . m m3 n2 n If for some S, we have |N(S)| < ζ log n|S|, then this is at most p δ |S| log n(ζ + λ 2ζ) < |S| log n, 2 98 the könig graph process

when ζ is chosen to be sufficiently small. But by hypothesis, EGm (S, N(S)) ≥ δ|S| log n, a contradiction.

For the rest of the section we will assume that our process throughout steps m2 ≤ m ≤ m3 satisfies the properties granted by Lemma 5.26 and Lemma 5.30, without always referring to the probabilistic statements ex- plicitly. An easy consequence of our definitions and Lemma 5.26 are the following properties.

(P1) For any v, N5[v] contains at most three small vertices.

(P2) If a helper has degree one, then its neighbour in Cm is matched in Mm to a vertex of degree at least two. We will now introduce some more definitions. Say that a path P := u1 ... uk ∈ Gm is alternating if there exists a matching M in Gm such that the edges of P alternate between being in and out of M. Say that P augments M 0 if u1, uk ∈/ M. Observe that if P augments M, then M := (M \ P) ∪ (P \ M) is a larger matching in Gm than M. Our next lemma will show that if 2 |Hm| ≥ 2, then there are Ω(n ) vertex pairs that have not yet been offered that will create a path that augments a maximum matching in Gm.

Lemma 5.31. Let m2 ≤ m < m3. If |Hm| ≥ 2, then for some η > 0, the probability that ν(Gm+1) > ν(Gm) is at least η. 2 Proof. Let u1, v1 be distinct vertices in Hm. We will find 2ηn vertex pairs (x, y) such that x, y ∈ Bm (and hence xy ∈/ E(Gm), but that Gm ∪ {xy} con- 2 2 tains a path from u to v that augments Mm. As m = o(n ), at least 1.1ηn of these pairs are not contained in e1,..., em and hence P [ν(Gm+1) > ν(Gm)] ≥ η. For a vertex u ∈ Mm, let g(u) denote the vertex that is matched to u in Mm. Similarly for a set S ⊆ Mm, let g(S) := {g(u) := u ∈ S}. Say that a vertex is large if it has degree at least δ log n and small otherwise. We will first find disjoint paths from u1 and v1 to large vertices in Bm. If u1 and v1 share a neighbour then they do not both have degree one, otherwise u1 or v1 would be a quasi-isolate. So we are able to pick u2 6= v2 ∈ Cm such that u1u2 and v1v2 are edges of Gm. If u3 := g(u2) and v3 := g(u3) are both large, set Pu := u1u2u3 and Pv := v1v2v3. Therefore we may assume that u3 is small. It follows that u1 is also small seeing as deg(u1) ≤ deg(u3) (M3). By (P2), deg(u3) ≥ 2 and so pick u4 ∈ Cm to be a neighbour of u3 distinct from u2. Define u5 := g(u4). By 5.6 delayed perfect matching threshold 99

2 Lemma 5.26, N [u2] contains at most two small vertices, so u5 is large. Set Pu := u1 ... u5. 3 3 We are left with two cases for v3. If v3 is small, then N [u2] ∩ N [v2] = ∅ since otherwise we could find small vertices u1, u3, v3 (and even v1) at mutual distances at most 10. Therefore we can find v5 analogously to u5 and set Pv := v1 ... v5. If v3 is large and u4 6= v2, we may set Pv = v1v2v3. Finally, let v3 be large and u4 = v2. Then v1 is large since other- 3 wise N [u4] contains three small vertices. Hence v1 has another neighbour 0 0 0 0 3 v2 ∈/ {u2, v2}, so set v3 := g(v2). The vertex v3 is also contained in N [u4] 0 0 (recalling that u4 = v2). Hence we may set Pv = v1v2v3. 0 We now extend Pu and Pv to many disjoint paths Pu := u1 ... us and 0 0 0 1 Pv := v1 ... vt such that Pu ∪ {usvt} ∪ Pv augments Mm. Let c := 4 min{δ, 2ζ}, where ζ is the constant from Lemma 5.30. We will perform the following procedure.

∗ ∗ (i) Set U0 := {u }, where u is the endpoint of Pu that is not u1. Simi- ∗ ∗ larly, define V0. By definition, u , v ∈ Bm.

(ii) Having defined Ui and Vi, we will find sets Ui+1 ⊆ Bm and Vi+1 ⊆ Bm and define a constant c such that: Si Si+1 a) Ui+1 is disjoint from Pu ∪ j=1 Uj ∪ Pv ∪ j=1 Vj and Vi+1 is dis- Si+1 Si joint from Pu ∪ j=1 Uj ∪ Pv ∪ j=1 Vj (note that U0 ⊆ Pu and V0 ⊆ Pv).

b) Ui+1 ⊆ g(N(Ui)) and Vi+1 ⊆ g(N(Vi)).

c) Every vertex in g(Ui+1) ∪ g(Vi+1) is large. n d) If |Ui| ≤ 4 log n , then |Ui+1| = |Vi+1| ≥ c|Ui|. Else, |Ui+1| = c |Vi+1| ≥ 4 n.

We iterate this procedure until we reach sets Uk and Vk such that |Uk| = c  log n  |Vk| = 4 n. So the process runs for O log log n steps. Before constructing these sets, let us show how their existence implies the statement of the lemma. It follows from (a) and (b) that for each xk ∈ Uk and yk ∈ Vk there exist sequences x1,..., xk−1 and y1,..., yk−1, 0 where for 1 ≤ i ≤ k − 1 we have xi ∈ Ui and yi ∈ Vi, such that Pu := 0 Pu g(x1) x1 g(x2) x2 ... g(xk)xk and Pv := Pv g(y1) y1 g(y2) y2 ... g(yk)yk are 0 0 paths in Gm. Moreover Pu and Pv are disjoint. c2n2 Using (d) gives that there are at least 16 such pairs of paths where the addition of xkyk would create a path that augments Mm in Gm+1. We 100 the könig graph process

reiterate at most m = O(n log n) of these pairs appear in e1,..., em, this c2 would imply the statement of the lemma with η := 32 . It suffices to construct such sets U1,..., Uk and V1,..., Vk. Suppose that ∗ Ui and Vi have been defined for 0 ≤ i < k. If i = 0, set X = N(u ) and ∗ n Y = N(v ). For i > 0, such that |Ui| = |Vi| ≤ 4 log n , let X := N(Ui) and n 0 0 Y := N(Vi). For i such that |Ui| = |Vi| > 4 log n , pick U ⊆ Ui and V ⊆ Vi n 0 0 each of cardinality 4 log n and set X := N(U ) and Y := N(V ). As Ui ∪ Vi ⊆ Bm and Bm is an independent set, using (c) we may apply Lemma 5.30 to each of Ui and Vi. This gives that, for all i, we have |X|, |Y| ≥ 4c log n|Ui|. As Ui, Vi ⊆ Bm, we have X ∪ Y ⊆ Cm. Therefore we can choose 0 0 0 disjoint sets X and Y , each of cardinality 2c log n|Ui| such that X ⊆ X 0 and Y ⊆ Y. Observe that |Pu|, |Pv| ≤ 5 and

i i  i+1  j i (log n)  i+1 ∑ |Ui| = ∑ (c log n) ≤ i · (c log n) = O = o (log n) . j=1 j=1 log log n

Also note that, by Lemma 5.26 and construction, g(X0) and g(Y0) contain 0 O(|Ui|) small vertices. So it is possible to pick Ui+1 ⊆ g(X ) and Vi+1 ⊆ g(Y0) satisfying (a)-(d), as required. This completes the proof of the lemma.

So whenever |Hm| ≥ 2 there is a positive probability that em+1 extends a current maximum matching. We will use this fact to deduce that there cannot be a large number of steps where |Hm| ≥ 2. Proof of Lemma 5.27. Let η be the constant from the statement of Lemma 5.31 −1/20 and f (n) := n . By our assumptions on the process, Gm2 contains n a matching of size 2 − f (n). By Lemma 5.31, in a step with |Hm| ≥ 2 the probability that the matching number increases is at least η. Note that from step m2, the matching number cannot increase more than f times during the process. Let X ∼ Bin(10 f /η, η). By Chernoff 5.10

− 10 f ·β· 1 P [X < f ] ≤ e β 8 = o(1).

Thus with high probability, after 10 f /η steps with |Hm| ≥ 2 a perfect matching is reached. As once a perfect matching is reached, there are no helpers, the result follows. 5.6 delayed perfect matching threshold 101

5.6.2 States with zero or one helpers

We now present the proof of Lemma 5.28. This would be very simple in an idealised version of the process; if we knew that |Jm| decreases by at most one vertex in each step, and if the transition probability Pm [|Jm+1| = |Jm| − 1] |Jm| 2|Jm| was precisely n when |Jm| is even, and n if |Jm| is odd, we could de- duce the result from a simple second-moment argument (see Claim 5.35 ). However, in our setting we have to make a series of amendments which comprise most of this sibsection. Since we are taking into account worst-case probabilities, a factor of four is lost on the constant C in this proof.

Proof of Lemma 5.28. Let Xm = |Jm|. Note that by Theorem 5.3((i)), with high probability, G2n log n has a perfect matching. So any step m with Jm 6= ∅ occurs before 2n log n. Therefore we restrict our attention to the time period before 2n log n. We also need a simple lower bound on the number of isolated vertices in Gm2 which is based on G(n, p). ≥ 1 1/4 + Claim 5.32. With high probability, Xm2 2 n 1.

Proof. Let us first bound the number of isolated vertices χm2 in G(n, p) with = m2 E [ ] ∼ −2m2/n ∼ p (n) . The expectation and variance of χm2 are χm2 ne 2 h i 1/4 [ ] ∼ E [ ] P ≤ 1 1/4 = n and Var χ χm2 . Chebyshev’s inequality gives χm2 2 n o( ) Gall 1 . Since this is a decreasing event, the same bound holds in m2 by J Gall X = |J | ≥ Lemma 5.8. As m2 contains isolated vertices in m2 , we have m2 m2 1 1/4 2 n + 1 with high probability.

Henceforth we assume this lower bound on Xm2 . We analyse how (Xm) changes, paying special attention to the quasi-isolates. −1/30 Claim 5.33. Let α := n . At each step m ∈ [m2, 2n log n],   Xm 4Xm 8 log n (1 − α) ≤ P [X + < X ] ≤ 1 + . n m m 1 m n n

Proof. Any vertex pair with one point in Jm and another in a current opti- mal cover is acceptable for Gm if it has not yet been offered. The condition n αn ν(Gm) ≥ 2 − 4 and Claim 5.9 imply the lower bound

Xm  n αn  Xm Pm [Xm+1 = Xm − 1] ≥ n − − O(log n) ≥ (1 − α). (2) 2 4 n 102 the könig graph process

For the upper bound, note that v ∈ Jm is not in Jm+1 only if the edge em+1 is incident to v, or, in the case v is a quasi-isolate, to its partner. So there are at most 2Xm vertices that em+1 would have to intersect (Jm and the partners of quasi-isolates in Jm) for v to not be present in Jm+1, we have   2Xmn 4Xm 8 log n Pm [Xm+1 < Xm] ≤ n ≤ 1 + , (2) − 2n log n n n as required.

= ( ) In this proof, we will use X Xm to denote the sequence Xm2 , Xm2+1, ∗ ..., XN. To control X , we will analyse an auxiliary random process Y = (Ym), which we define now. For 0 ≤ m ≤ 2n log n, set   Xm2 m = m2, Ym := Ym − Zm+1 m > m2,

where the random variables (Zm) are mutually independent and defined as follows. Each Zm takes values in {0, 1, 2} and depends on the parity of Ym−1. For ease of notation, let

4  8 log n  9 σ := 1 + and σ := .(5.13) 0 n n 1 10n

If Ym is even:  0 with probability 1 − Ymσ0, Zm+1 := 1 with probability Ymσ0.

Note that, in this case, Zm+1 never takes the value 2. Now, if Ym is odd:

 2 8Ym 0 with probability 1 − Ymσ1 − 2 ,  n = Zm+1 : 1 with probability Ymσ1.  2  8Ym 2 with probability n2 .

1 We remark that taking σ1 ∼ n might seem more intuitive, but we decided to use an even smaller value to offset some lower-order terms. 5.6 delayed perfect matching threshold 103

j 1 1/4k Set ` := 4 n and let S := {2, 4, . . . , 2`}, and define

TX := |{m ≥ m2 : Xm ∈ S}| and TY := |{m ≥ m2Ym ∈ S}|.

Note that the statement of the lemma can be rephrased as saying that TX ≥ Cn log n.

Claim 5.34. For any a, P [TX ≥ a] ≥ P [TY ≥ a] .

Proof. We will exhibit a form of coupling between (Xm) and (Ym) such that (Xm) stays longer than (Ym) in each even state. The coupling is technical because of the steps m in which Xm+1 = Xm − 2, so before giving explicit details we first describe the idea. We con- ∗ ∗ ∗ ∗ struct processes X = (Xm) and Y = (Ym) which we run simultaneously, ∗ ∗ ∗ ∗ ∗ ensuring that |Ym − Xm| ≤ 1. This ensures that whenever Xm 6= Ym, Xm is ∗ actually even and Ym is odd. It also means, in particular, that if one vari- able changes by two, the other one has to change by one. When they differ, the smaller variable is paused and ‘waits’ for the larger one. Later, these indices at which X∗ or Y∗ is paused will be ignored to obtain the original distribution of (Xm) and (Ym) respectively. We will now define X∗ and Y∗. ∗ ∗ ∗ ∗ (i) If Xm = Ym, we say both X and Y are active at m. ∗ ∗ ∗ ∗ (ii) If Xm < Ym, we say X is paused at m and Y is active at m. ∗ ∗ ∗ ∗ (iii) Similarly, if Ym < Xm, we say that Y is paused at m and X is active at m.

Note that in the background of X∗ we are running a modified graph pro- ∗ ∗ cess (Gm). If X is paused at m, the graph process is also paused and ∗ ∗ Gm+1 := Gm. Otherwise the graph process evolves as usual. The prob- ∗ ability measure Pm which governs Xm+1 comes from the graph process. However, in the coupling, we run the graph process in the background and often avoid referring to it explicitly. ∗ From m = 0 to m = m2, Xm is simply the number of quasi-isolates in the G Y∗ = X∗ m > m graph m. We start the coupling with m2 : m2 . At each step 2, we sample a random variable Ψm+1 ∼ Unif[0, 1] which will be our ‘common source of randomness’. The update rules will differ depending on whether X∗ and Y∗ are active or paused. For ease of notation, for i ∈ {1, 2} define

∗ ∗ πi,m := Pm[Xm+1 = Xm − i]. 104 the könig graph process

If X∗ is active at step m, we update as follows.  ∗ Xm − 1 if Ψm+1 ∈ [π2,m, π2,m + π1,m) ,  X∗ := ∗ m+1 Xm − 2 if Ψm+1 ∈ [0, π2,m) ,   ∗ Xm otherwise.

∗ Moreover, if X is active at m and Ψm+1 ∈ [π2,m, π2,m + π1,m), we up- ∗ date the graph process Gm by offering the edge em+1 selected uniformly at  ∗ ∗ random among the vertex pairs which imply the event Xm+1 = Xm − 1 . The same is done in the remaining two cases. Notice that when X∗ is ac- ∗ ∗ tive, Xm+1 has the same distribution as Xm+1. If Y is active at step m, we update as follows (recall the definition of σ0 and σ1 from (5.13)).   ∗ − ∈ [ ∗ ) ∗ Ym 1 if Ψm+1 0, σ0Ym and Ym is even ,  h ∗ 2 ∗ 2   ∗ 8Ym 8Ym ∗ ∗ Ym − 1 if Ψm+1 ∈ , + σ1Ym and Ym is odd , Y∗ = n n m+1 : h ∗ 2   ∗ − ∈ 8Ym ∗ Ym 2 if Ψm+1 0, n and Ym is odd ,   ∗ Ym otherwise. ∗ ∗ When Y is active, Ym+1 has the same distribution as Ym+1. Hence by def- inition of Y∗, the sequence obtained by suppressing the time steps when Y∗ is paused has probability distribution identical to that of the original ∗ process (Ym). The same holds for X and (Xm). ∗ ∗ ∗ If X is paused at step m, we define Xm+1 := Xm, independently of Ψm+1. ∗ ∗ Analogously define Ym+1, if Y is paused at step m. ∗ The distribution of Ym is defined so that (by Claim 5.33) the transition probabilities are ordered as in the figure below. Let us now check that: ∗ ∗ (a) |Ym − Xm| = 1, for each m, and ∗ ∗ ∗ (b) whenever Ym is even, Xm = Ym (and so by definition both processes are active). We prove (a) and (b) by induction on m. The base case m = 0 follows ∗ ∗ from the definition of X0 and Y0 . So now suppose that (a) and (b) hold for ∗ ∗ Xm and Ym. We consider several cases which are illustrated in Figure 5.1. ∗ ∗ Case 1 (Xm = Ym = k and k is odd): This is the trickiest case (see Fig- ∗ ∗ ure 5.1). If Xm+1 = Ym+1, there is nothing to prove. Otherwise, either 5.6 delayed perfect matching threshold 105

∗ ∗ Figure 5.1: Ranges for Ψm+1. Assuming that X and Y are active at m, the striped intervals denote the range for Ψm+1 for which Xm+1 6= Ym+1, and the dashed lines denote the range in which X∗ and Y∗ change simultaneously.

h ∗ 2  8Ym ∗ Ψm+1 ∈ π2,m, n or Ψm+1 ∈ [σ1Ym, π2,m + π1,m); this corresponds ∗ ∗ to the red ranges in Figure 5.1. In the former case, we have Ym+1 = Ym − 2 ∗ ∗ ∗ ∗ ∗ and Xm+1 = Xm − 1 and Ym+1 is odd. In the later case, Xm+1 = Xm − 1 ∗ ∗ and Ym+1 = Ym. Either way, (a) and (b) hold. ∗ ∗ ∗ Case 2 (Xm = Ym = k and k is even): There are two ways in which Ym+1 and ∗ ∗ ∗ ∗ ∗ Xm+1 can differ. In either case, Ym+1 = Ym − 1, and Xm+1 − Xm ∈ {−2, 0}, so (a) and (b) are satisfied. ∗ ∗ ∗ ∗ Case 3 (Xm = Ym − 1): In this case, Xm is even by hypothesis, and Xm+1 = ∗ ∗ ∗ ∗ ∗ Xm. If Ym+1 − Ym = 1}, then X and Y are both active at m + 1. Otherwise ∗ Ym+1 is again odd, so (a) and (b) hold. ∗ Now let us complete the proof of the claim. Suppose (ym), a specific ∗ ∗ ∗ instance of Y , and (xm), a specific instance of X , are sampled by our ∗ ∗ ∗ coupling. By (a) and (b), whenever ym is even, xm = ym and both variables 106 the könig graph process

are active at m. Let (ym) denote the sequence obtained by supressing the ∗ steps when (ym) is paused. Similarly, define (xm). For any k, we have ∗ ∗ ∗ |{m : ym = 2k}| = |{m : ym = 2k}| ≤ |{m : (xm) active and xm = 2k}| = |{m : xm = 2k}|.

As (xm) and (ym) are drawn according to the probability distribution of (Xm) and (Ym), we have that for each a,

P [TX ≥ a] ≥ P [TY ≥ a] .

Claim 5.35. With high probability TY ≥ Cn log n. Proof. The analysis is a variant of the classical coupon collector problem, which is described for instance in [75]. Let Tk be |{m ≥ m2 : Ym = k}|. Note that the variables {T2k : k ∈ [n/2]} are mutually independent. This is because each Tk is determined by a subset of variables {Zm : m ∈ N}, and the sets of variables determining T2k for each k are pairwise disjoint. j 1 1/4k ` Recall that ` = 4 n . We have TY := ∑j=1 T2j. We remark that in (5.32)

we assumed that Xm2 = Ym2 ≥ 2` + 1, so (Ym) indeed attains the values 2, 4, . . . , 2` after the time m2. For 1 ≤ k ≤ `, let us analyse T2k. If T2k = 0, then there is an m such 0 that Ym = 2k + 1 and Ym+1 = 2k − 1. If m is the first time step with 0 Ym0 = 2k + 1, then P [T2k = 0] is precisely the probability that, for m > m the event Zm = 2 occurs before Zm = 1. Hence P [T2k = 0] = O(k/n). Recall the definition of σ0 from (5.13). Conditioned on T2k ≥ 1, T2k is distributed geometrically with probability of success 2kσ0. Therefore ( − ) E [T ] = (1 + O(k/n)) 1 = n + O(1) and Var [T ] ∼ n n 8k . By lin- 2k 2kσ0 8k 2k 26k2 earity of expectation we have

`  n  n  1 1 1  E [T ] = + O(1) = + + ··· + + O(`) Y ∑ ` k=1 8k 8 1 2 n n ∼ · log ` ∼ log n, 8 32

We now calculate the variance of TY in order to use Chebyshev’s inequal- ity to show that TY does not deviate too far from its expectation. As the variables T2k are independent, we have ` `  n2    ∞ 1   [ ] = [ ] = = 2 = 2 Var TY ∑ Var T2k ∑ O 2 O n ∑ 2 O n . k=1 k=1 k k=1 k 5.6 delayed perfect matching threshold 107

So by Chebyshev’s inequality, we have   h p i 1 P |T − E(T)| ≥ n log n = O = o(1). log n

1 It follows that TY ≥ Cn log n with C := 32 .

Seeing as TX ≥ TY, we have that with high probability

|{m ≥ m2 : Xm ∈ 2[`]}| ≥ Cn log n

5.6.3 Decay of the number of isolated vertices

We complete the proof of Theorem 5.4 by tracking the number of isolated vertices (not quasi-isolates). Denote the number of isolated vertices in Gm by Im. Proof of Theorem 5.4. Define

r := min{i : Ii ≤ blog nc}.(5.14)

For technical reasons, we work with the truncated variable Iˆm = max{Im, Ir}. We continue conditioning on the events given by Claim 5.32 and (5.12). De- −1/10 fine ε := (log n) . Say that a time step m is unhelpful if |Hm| = 0 and is  1  not ε-flexible (|Dm| < 2 + ε n). Let

Um := |{m2 ≤ i ≤ m : step i is unhelpful}|. all −2m/n In Gm , the number of isolated vertices is concentrated around ne for n m < 2 log n. The following claim says that the number of isolated vertices all in our process decays slower than in the corresponding Gm .

Claim 5.36. Let m2 ≤ m ≤ 2n log n. With high probability, ! Iˆm +m 2m U 6εm log 2 ≥ − + m − .(5.15) ˆ n n n Im2   Iˆi+1 Proof. For i ≥ m2, define random variables (Yi) by Yi+1 := log + Zi, Iˆi where  1+4ε  1   n , |Hi| = 0 and |Di| < 2 + ε n, Zi =  2+ε n , otherwise. 108 the könig graph process

We have truncated the process at Ii ≈ log n for the ensuing asymp- totic bounds to hold. The definition of Zi has been chosen to ensure that m Y is a submartingale, which we prove now. To see this, we estimate ∑i=m2 i the probability that Iˆi+1 = Iˆi − 1. First consider the case where i < r, where r is defined in (5.14), and 1+4ε ˆ Zi = n , so i is not ε-flexible and |Hi| = 0. As i < r, Ii = Ii. In this case, a pair uv with v isolated is acceptable for Gi only if u ∈ Di ∪ Ji, or if u is the   | | < 1 + | | < 0.98 partner of a quasi-isolate. Since Di 2 ε n and Jm4 n by 5.32, we have Ii(|Di| + 2|Ji|) (1 + 3ε)Ii Pi [Ii+1 = Ii − 1] ≤ n ≤ . (2) − i n

Now we compute Ei [Yi+1]. The first term, corresponding to {Ii+1 = 2 Ii − 2}, is of a lower order. We also use the inequality log(1 − x) ≥ −x − x for x < 0.1 to obtain the bound

I2 2  2  (1 + 3ε)I  1  E [ ] ≥ i · · − + i − + i Yi+1 2 log 1 log 1 Zi 2 n Ii n Ii !  I  (1 + 3ε)I 1 1 1 + 4ε ≥ i + i − − + ≥ O 2 2 0. n n Ii Ii n

2+ε In the case i < r and Zi = n , we do a similar computation to see that Ei [Yi+1] ≥ 0. The only change is in the second term, since now any pair containing an isolated vertex v can be acceptable and therefore the 2 probability that v is not isolated in Gi+1 is at most n−5 log n . The term log n stands for the edges which have been offered so far. This gives

I2 2  2  2I  1  E [ ] ≥ i · · − + i − + ≥ i Yi+1 2 log 1 log 1 Zi 0. 2 n Ii n − 5 log n Ii

In the case i ≥ r, we have Ei [Yi+1] = Zi > 0. To control m Y we will use Freedman’s Inequality (Theorem 5.15), ∑i=m2 i so let us bound the predictable quadratic variation of the process. The first two summands in the following inequality correspond to the event  Iˆi+1 < Iˆi , which only occurs for i > r. The third term corresponds to  Iˆi+1 = Iˆi . Once again, we use the inequality log(1 − x) ≥ −2x to con- clude that

h i Iˆ2 8 2Iˆ 2 10 E Y2 ≤ i · + i · + 1 · Z2 ≤ . i i+1 2 ˆ2 n ˆ2 i ˆ n Ii Ii nIi 5.7 open questions 109

Summing over i and using 1 ≤ 1 , we get Iˆi log n

m2+m h i 10m W = E Y2 ≤ ≤ 20. m ∑ i i+1 n log n i=m2 p Applying Theorem 5.15 with ` = log n, we get that with high probability

m +m 2 p 2 + ε 1 + 4ε (Y − Z ) ≥ − log n − · (m − U ) − · U ∑ i i−1 n m n m i=m2+1 2m U 6εm ≥ − + m − . n n n

1  1 49C  Let C := 32 and m := 8 + 100 n log n. Corollary 5.29 implies that |Hi| = 0 and |Ji| > 0 for at least Cn log n steps i ≥ m2. If one of those steps satisfies i > m + m2, then |Jm+m2 | > 0, so we are done. Therefore we may assume that all those steps i occur before m + m2. This, along with the fact that the number of ε-flexible steps is o(n log n) by Lemma 5.6, implies that Um ≥ Cn log n(1 − o(1)). By substituting in our value of m, we have

2m U 6εm  1 49C   1 C  − m + < + − C + o(1) log n = − + o(1) log n. n n n 4 50 4 50

This along with Claim 5.36 implies that

!   Iˆm +m 1 C log 2 ≥ − − log n. ˆ Im2 4 100

1 1 1 − 1 + C ˆ = > 1/4 ˆ ≥ 4 4 100 > C/200 > Since Im2 Im2 2 n , we get that Im2+m 4 n n blog nc, and therefore Im+m2 > 0. It remains to verify that the constant is 49C 32C as stated in Theorem 5.4, which amounts to checking that 100 > 65 = 1 65 .

5.7 open questions

• We conjecture that the threshold for existence of a perfect matching in 3 Gm is mPM := 4 n log n. For, suppose that one can find a rigorous defi- nition the set of unhelpful vertices Um– those that are not well-connected with the rest of the graph. Then it is likely that the probability of 110 the könig graph process

|Um| losing an unhelpful vertex is n (1 − o(1)) when Um is even, and 2|Um| n (1 − o(1)) when Um is odd. This would imply that the process n log n n log n spends roughly 2 time steps in the even states, and 4 steps in the odd states.

• Is the optimal cover in Gm typically unique as soon as Gm has a perfect matching?

• Is it true that with high probability for all m ≥ cn log n for a small con- n stant c, the union of optimal covers Dm contains 2 (1 + o(1)) vertices? Lemma 5.6 says that this is true most of the time. We have already argued that this seems hard to deduce from the structure of Gm.

1 1 • A question about G(n, m): Is it true that between 6 n log n and 4 n log n, G(n, m) has a matching covering all vertices but the isolates and the quasi-isolates? This would be an extension of the hitting time result of [31], who found a matching covering vertices of positive degree in G(n, m) for m ≥ 1/4n log n.

Consider the analogous hypergraph process. Similar questions can be asked in this setting: What will the final graph look like and when will a perfect matching be obtained? The hypergraph case should be very different as there is no analogous concept to alternating paths. The Random König Hypergraph Process. Fix r ≥ 2 and n such that r|n. Let n N := (r). e1,..., eN be a uniformly random ordering of the edges of the complete r-uniform hypergraph on n vertices. Let G0 be an independent set of n vertices. For i = 0, . . . , N − 1 we define Gi+1 := Gi + ei if τ(Gi + ei) = ν(Gi + ei), and Gi+1 := Gi otherwise. As in the graph process, we expect to reach a graph G, where τ(G) = n n ν(G) = r . We expect V(G) = T ∪ I, where |T| = r , nearly all edges touching T are present and I is independent. It is also interesting to study random graph processes preserving other global properties, for example the property of being a perfect graph. 6 BOUNDEDCOLOURINGSOFMULTIPARTITEGRAPHS ANDHYPERGRAPHS

Our central problem is finding a copy of a given graph/hypergraph G in another edge-coloured graph/hypergraph, such that the colours of the edges of G satisfy certain restrictions. A very general result of this type is the canonical Ramsey theorem [69].

Theorem 6.1. For every graph G, there exists an integer n such that any colour- ing of the edges of the complete graph Kn contains at least one of the following copies of G: (i) a monochromatic copy, in which all the edges have the same colour, (ii) a rainbow copy, in which no two edges have the same colour, or (iii) a lexicographic copy, i.e. a copy whose vertices can be ordered in such a way that the colour of any edge is entirely determined by the smaller endpoint.

If we restrict the number of colours, the conclusion reduces to (i), i.e. we find a monochromatic copy of a graph G. It is natural to ask what are the possible restrictions that guarantee (ii). To tackle this question, we give the following definitions. Let H be a graph. Throughout the chapter, by a colouring of H, we mean an edge-colouring c : E(H) → N. A colouring c is locally k-bounded if each vertex in V is incident to at most k edges of any given colour. Furthermore, a colouring c is globally k-bounded if it has at most k edges of the same colour. In both cases, k gives a lower bound on the number of colours used by c, so our restrictions are in a sense reciprocal to the number of colours. Note that a locally 1-bounded colouring of H is exactly a proper colouring (where any two incident edges carry distinct colours), and a globally 1- bounded colouring is a rainbow colouring of H (in which all edges receive distinct colours). For a given graph G, we say a colouring c of H is G-proper if it contains a properly coloured copy of G, and G-rainbow if it contains a rainbow copy of G. A conjecture of Bollobás and Erd˝osfrom [29] states that any locally n b 2 c-bounded colouring of Kn contains a properly coloured Hamilton cycle 1 (denoted by Cn). In [29] they proved a weaker result, that for α = 69 , any locally αn-bounded colouring is Cn-proper. After several improvements of 111 112 bounded colourings of multipartite graphs and hypergraphs

the constant α (see [47], [138] and [4]), the asymptotic variant of the con- 1 jecture was proved by Lo [123], that is, for α < 2 , any locally αn-bounded colouring is Cn proper. For general graphs G, it is natural to ask what is the minimum value of k for which any locally k-bounded colouring is G-proper. In particular, how does this k depend on the maximum degree of G? The intuition behind this question is that the easiest way to avoid a properly coloured copy of G is to forbid an embedding of its vertex of maximum degree. Alon√ et al. [9] have shown that the colouring is certainly G-proper for k = n . This has ∆(G)13.5 been significantly improved by Böttcher, Kohayakawa and Procacci [32].

Theorem 6.2. If G is an n-vertex graph with maximum degree ∆, then any locally   n -bounded colouring of K is G-proper. 22.4∆2 n We have already hinted that global bounds on colourings yield rainbow copies of G, so all the stated results have parallels in this setting. An ana- logue of the Bollobás-Erd˝osconjecture was proposed in 1986 by Hahn and Thomassen [96]. They conjectured that there is a constant α such that any globally αn-bounded colouring of Kn is Cn-rainbow, which was proved by 1 Albert, Frieze and Reed [3] for α = 64 . Using the same technique as for Theorem 6.2, Böttcher, Kohayakawa and Procacci have translated this result to any bounded-degree graph. This confirms a conjecture of Frieze and Krivelevich [83], which was originally only stated for G being a tree.

Theorem 6.3. If G is an n-vertex graph with maximum degree ∆, then any   globally n -bounded colouring of K is G-rainbow. 51∆2 n We generalise Theorems 6.2 and 6.3 in two different directions.

Multipartite graphs

Motivated by a question of Oriol Serra [137] asking how Theorems 6.2 and 6.3 adapt to the bipartite setting [137], in Section 6.2 we study coloured subgraphs in bipartite and, more generally, multipartite graphs. We con- sider k-bounded colourings of the complete m-partite graph with n vertices in each class, which we denote by Km⊗n, and investigate which subgraphs they contain. In analogy with Theorems 6.2 and 6.3, we focus on properly coloured and rainbow subgraphs with maximum degree ∆. Our results show that for a fixed value of m, the dependency between k and ∆ exhibits bounded colourings of multipartite graphs and hypergraphs 113 a surprising discontinuous behavior when ∆ = Θ(m). The bipartite case (that is, when m = 2) of these questions is of particular interest due to the following relation to Latin transversals. A Latin square L is an n × n matrix with entries in [n] such that each row and each column contain each symbol exactly once. A Latin transver- sal in L is a transversal whose cells contain n different symbols. Notice that an n × n Latin square corresponds to a proper edge colouring of Kn,n with n colours, and a Latin transversal in it corresponds to a rain- bow perfect matching. A famous conjecture of Ryser [136] from 1967 states that every n × n Latin square for odd n contains a Latin transversal. As a step towards this conjecture, Erd˝osand Spencer [70] showed that any n−1 globally 4e -bounded colouring of Kn,n contains a rainbow perfect match- ing. In [131], Perarnau and Serra studied the Latin-transversals problem, using the framework of Lu and Székely for applying the Local lemma to random injections [125]. One of their results gives an asymptotic count of rainbow matchings in globally bounded colourings of Kn,n = K2⊗n. Here we study the problem of finding rainbow or properly coloured copies of various graphs in edge-coloured Km⊗n for m ≥ 2. Our first result in this direction is the following.   Theorem 6.4. Suppose c is a globally n -bounded colouring of K , and 110∆2 m⊗n let G be an m-partite graph with a partition V(G) = U1 ∪ U2 ∪ · · · ∪ Um satis- fying |Ui| ≤ n, and maximum degree ∆. Then c is G-rainbow. We also prove a similar result for properly coloured copies of G in locally bounded colourings of Km⊗n.   Theorem 6.5. Suppose c is a locally n -bounded colouring of K , and let 48∆2 m⊗n G be an m-partite graph with a partition V(G) = U1 ∪ U2 ∪ · · · ∪ Um satisfying |Ui| ≤ n, and maximum degree ∆. Then c is G-proper. Conversely, we give graphs G and a colouring c simultaneously showing that both statements are optimal up to a constant factor: the colouring is globally bounded, and it still does not contain even a properly coloured copy of G. Notice that the order of G in the following proposition is fixed (independent on n). Proposition 6.6. Suppose m ≥ 2, q is a prime power, and n ≥ 3q2. There exists an m-partite graph G with at most 3q2 vertices in each part, maximum degree   ≤ + n ∆ q 2m, and a globally q2+q -bounded colouring of Km⊗n which does not contain a properly coloured copy of G. 114 bounded colourings of multipartite graphs and hypergraphs

  >> m O n When ∆ , the colouring is ∆2 -bounded, matching the bounds of Theorem 6.4 and 6.5. On the other hand, for ∆ << m the problem becomes fundamentally different. By viewing Km⊗n as an almost complete n  graph, we show that even O ∆ -bounded colourings contain the required copies of G. This embedding result is also matched by the corresponding construction, and it follows from the results of Sudakov and Volec [140].

Theorem 6.7. There exist constants α and β such that the following holds. Let G be an N-vertex graph of max degree ∆, and K an N-vertex graph of minimum degree N · 1 − O ∆−1.   Any locally N -bounded colouring of K is G-proper. (i) α∆2   Any globally N -bounded colouring of K is G-rainbow. (ii) β∆2 By applying (i) to ∆ ≤ δm (where δ is a small constant), N = mn and   K = K mn m⊗n, we get that that any locally α∆2 -bounded edge-colouring of K G mn ≥ n m⊗n is -proper. We reiterate that in this regime, α∆2 α∆ , implying that n  any locally α∆ -bounded colouring is G-proper. The analogue is true for rainbow copies of G in globally bounded colourings of Km⊗n, using (ii). It is shown in [140] that the bounds are optimal up to a constant factor. We emphasise that the definitions and hypotheses in all the theorems do not fix any particular partition or ordering of the parts of G. Recently, Cano, Perarnau and Serra [39] have extended Theorem 6.7 in a different direction – they showed that any globally c0n-bounded edge-colouring of n a host graph with minimum degree at least 2 (1 + ε) contains a rainbow Hamilton cycle.

Properly coloured and rainbow copies of hypergraphs

The problem of Bollobás and Erd˝oson finding properly coloured Hamil- ton cycles in locally bounded colourings extends naturally to hypergraphs and has recently been studied in [56] and [54]. We will be looking at edge colourings of r-uniform hypergraphs H, that is, assignments c : E(H) → N. A subhypergraph G of H is said to be properly coloured if any two overlap- ping edges of G receive different colours. Furthermore, if every edge of G receives a different colour, the subgraph G is rainbow. We impose the same type of restrictions on the colourings c. A colouring c is locally k-bounded if the hypergraph induced by a single colour has maximum degree at most k, which means that each vertex is contained in at least k edges of a particular bounded colourings of multipartite graphs and hypergraphs 115 colour. Formally, ∆ H c−1(i) ≤ k for all i ∈ N. We say that c is globally k-bounded if each colour is used at most k times. Dudek, Frieze and Ruci´nski[56] have studied the existence of properly coloured and rainbow Hamilton cycles in coloured complete r-uniform hypergraphs. There are several different notions of hypergraph cycles. For (r) ` ∈ [r − 1], an `-overlapping cycle Cn (`) is an n-vertex hypergraph in which, for some cyclic ordering of its vertices, the edges consist of r consecutive vertices, and each two consecutive edges share exactly ` vertices. For r = 2 and ` = 1, this reduces to the graph cycle. The two extreme cases, ` = 1 and ` = r − 1, are usually referred to as loose and tight cycles respectively. (r) Given an n-vertex r-graph H, any subgraph of H isomorphic to Cn (`) (r) is called an `-overlapping Hamilton cycle. It is easy to show that Cn (`) has n precisely r−` edges and therefore we cannot expect H to have one unless r − ` divides n. Generalising the result of Bollobás and Erd˝osfrom [29], Dudek, Frieze and Ruci´nski [56] have shown the following (see also [54] for some further results). Theorem 6.8. For every ` ∈ [r − 1] there is a constant α (resp. β) such that if n is sufficiently large and divisible by r − `, then any locally αnr−`-bounded (resp. r−` (r) (r) globally βn -bounded) colouring of Kn is Cn (`)-proper (resp. -rainbow). (r) Their proof relies heavily on the cyclic structure of Cn (`). We show that, as in the result of Alon et al. [9], it is actually sufficient to impose restrictions on the degrees in G. For a set S ⊂ V(H), we say the degree, or more accurately, |S|-degree of S is the number of edges of H containing S, denoted by d(S) or degH(S). For a given ` ∈ {0, 1, . . . , r − 1}, the maximum `-degree of H is the maximum degree over all vertex-subsets of order `, that is, ∆`(H) := max{degH(S) : S ⊂ V(H), |S| = `}. For example, the `- overlapping r-uniform cycle has ∆` = 2 and ∆`+1 = 1. Note that ∆0(H) is just the number of edges of H. We apply the Lovász Local Lemma, or more specifically, the corresponding framework of Lu and Székely [125] in order to generalise Theorem 6.8 to hypergraphs G with bounded maximum `- degrees. Theorem 6.9. For a given uniformity r and ` ∈ [r − 1], there exist positive con- stants c1 and c2 such that the following holds. Let G be an r-uniform hypergraph on at most n vertices satisfying ∆`+1(G) = 1. Then the following holds. r−` c1n (r) (i) Any locally -bounded colouring of Kn is G-proper. ∆1(G)∆`(G) r−` c2n (r) (ii) Any globally -bounded colouring of Kn is G-rainbow. ∆1(G)∆`(G) 116 bounded colourings of multipartite graphs and hypergraphs

The proof of Theorem 6.9 is given in Section 6.3, where we also show that the dependencies on n for both parts of the theorem are the best possible. For l = r − 1, we were also able to show that the dependence on the maximum degrees of G is best possible.

6.1 lovász local lemma and the lu-székely framework for random injections

We hope that the the utility of probabilistic methods for constructing com- binatorial objects satisfying specific properties is already manifest. The idea is to show that an object chosen at random satisfies the properties in question with positive probability. This is exactly the statement of the Lovász local lemma, and the sufficient conditions are certain mutual corre- lations between the desired properties. The lemma is usually formulated in terms of bad events B1, B2,..., BN, which correspond to the undesired properties of our object. We say that a graph D with the vertex set [N] is a dependency graph for a family of events B = {B1,..., BN} if for every i ∈ [N], the event Bi is mutually independent of all the events Bj 6= Bi such that ij ∈/ E(D). More generally, D is a negative dependency graph for B if for every i ∈ [N] and every set J ⊂ {j : ij ∈/ D}, it h V i holds that P Bi | j∈J Bj ≤ P [Bi]. The original version of the Local lemma, due to Erd˝osand Lovász [58], used a dependency graph for the set of bad events in order to control the correlations. It was then observed by Erd˝osand Spencer [70] that in fact the same proof applies when we capture the correlations using a negative dependency graph. They called this variant the Lopsided Lovász local lemma. We use the following version of the lemma, often called the Asymmetric local lemma. It is proved e.g. in [128, Chapter 19.3] in the non-lopsided form.

Lemma 6.10 (Asymmetric lopsided Lovász local lemma). Let B = {B1, ..., BN} be a set of bad events with a negative dependency graph D = ([N], E). 1   1 If for all i ∈ [N], P [Bi] ≤ 4 and ∑ij∈E P Bj ≤ 4 , then   ^ P  Bi > 0. i∈[N]

We will be using a type of negative dependency graph which is specific to the probability spaces of random injections. It is a slight generalisation 6.2 embedding m-partite graphs 117 of a dependency graph first constructed by Lu and Székely [125]. However, we cannot cite their results directly because our probability space is slightly more general. Let X = X1 × · · · × Xm and Y = Y1 × ... Ym, where the parts Xi and Yi satisfy |Xi| ≤ |Yi|. We call an injection σ : X → Y part-respecting if σ(x) ∈ Yi for any x ∈ Xi and i ∈ [m], and define S to be the set of all part-respecting injections from X to Y. Consider the probability space Ω on S generated by picking uniformly random injections σi : Xi → Yi for i ∈ [m], and setting σ = (σ1,..., σm) to be the induced injection between X and Y. Let τ : T → U be a given bijection between T ⊂ X and U ⊂ Y. The corresponding canonical event B consists of all part-respecting injections X → Y which extend τ, that is B = Ω(T, U, τ) := {σ ∈ S : σ(x) = τ(x) for all x ∈ T}.

Two events Ω(T1, U1, τ1) and Ω(T2, U2, τ2) S-intersect if the sets T1 and T2 intersect or U1 and U2 intersect. A result of Lu and Székely [125] implies that for a set B of ‘bad’ canonical events, the graph whose vertices are the events in B and edges connect exactly the S-intersecting events is a negative dependency graph. Although Lu and Székely considered the case m = 1, their result can be generalised for arbitrary m. Theorem 6.11. Let Ω be the probability space generated by picking an injection from S uniformly at random, with the notation defined above. Furthermore, let B = {B1, B2,... BN} be some family of canonical events in Ω and let D be a graph on vertex set [N] and ij ∈ E(D) if and only if the events Bi and Bj S-intersect. It holds that D is a negative dependency graph. For the sake of completeness, we present a proof of Theorem 6.11 in the Appendix. The dependency graph treated there and originally proposed by [125] is even a subgraph of D, that is, the actual theorem is slightly more general. Closely related generalisations of the original results of Lu and Székely formulated in the language of hypergraph matchings have been proven in [124] and [38]. We would also like to mention that gener- alisations analogous to Theorem 6.11 were recently studied from an algo- rithmic point of view by Harris and Srinivasan [99], and by Harvey and Vondrák [100].

6.2 embedding m-partite graphs

We start our exposition by studying bounded colourings of Km⊗n, where we seek coloured copies of m-partite graphs with maximum degree ∆ = 118 bounded colourings of multipartite graphs and hypergraphs

Ω(m). For an analysis of how the optimal bounds depend on m and ∆ and why a transition occurs at ∆ = Θ(m), we refer the reader to the introduc- tory section of this chapter. In the present regime, we heavily rely on the m-partite structure of G and Km⊗n, and as the main tool apply a multidi- mensional version of the framework of Lu and Székely. Throughout the section, we omit the floor and ceiling signs whenever it is not critical. For a slight convenience, we deal first with properly coloured subgraphs in locally bounded colourings.

Proof of Theorem 6.5. Let G be an m-partite graph on vertex set U = U1 ∪ U2 ∪ ... Um such that |Ui| ≤ n and no part Ui contains an edge. Let ∆ be the maximum degree of G, and c a locally k-bounded edge-colouring of Km⊗n, k = n K V = V ∪ V ∪ V where 48∆2 . We take the vertex set of m⊗n to be 1 2 ... m with |Vi| = n for all i. We claim that there exists a properly coloured copy of G in Km⊗n, even with the additional constraint that each part Ui is mapped into Vi. 0 0 Let S be the set of injections f : U → V satisfying f (Ui) ⊂ Vi for all i ∈ [m], Ω the uniform probability space on S, and f a random injection drawn from Ω. To index the bad events, we set up some notation. As first, fix an ordering of U. A triple u1-u2-u3 of distinct vertices of G denotes a cherry in G, that is a path of length two with the middle vertex u2. We will only be considering cherries u1-u2-u3 with u1 < u3. Similarly, [v1v2v3] denotes a monochromatic cherry in coloured Km⊗n, that is, a triple for which v1v2 and v2v3 are edges that satisfy c(v1v2) = c(v2v3). Note that we only require v1 6= v3 and not an ordering between them, so that the bijection that maps ui to vi for i = 1, 2, 3 is counted exactly once. The bad events will be all events of form

[v1v2v3] = { ∈ S ( ) = = } Bu1-u2-u3 f : f ui vi for i 1, 2, 3 ,

for all choices of cherries u1-u2-u3 and [v1v2v3] satisfying the conditions above. The set of bad events is denoted by B. As granted by Lemma 6.11, the graph on vertex set B and edges between [ ] [v0 v0 v0 ] v1v2v3 1 2 3 S Bu1-u2-u3 and B 0 0 0 whenever the two events -intersect is a negative de- u1-u2-u3 pendency graph. By definition, this occurs only when the corresponding 0 0 0 0 0 0 cherries {u1, u2, u3} and {u1, u2, u3}, or {v1, v2, v3} and {v1, v2, v3} inter- sect. If the prior occurs, we call the events G-intersecting, and otherwise we call them K-intersecting. [ ] B = B v1v2v3 P [B] ≤ 1 < 1 Each bad event u1-u2-u3 satisfies n2(n−1) 4 , since there are always n possibilities for choosing the image of u2, and at least n(n − 1) 6.2 embedding m-partite graphs 119

possibilities for embedding the leaves u1 and u3 (as they might lie in the same part Vi). By Lemma 6.10 it remains to prove for B ∈ B 1 P B0 ≤ .(6.1) ∑ 4 B0∈B B0 S-intersects B

Upon showing that, with positive probability none of the bad events occur, in which case f yields a properly coloured embedding of G into Km⊗n. S = [v1v2v3] It remains to count the -intersecting events. Fix an event B Bu1-u2-u3 . 0 0 0 0 [v1v2v3] First we count the number IG(B) of events B = B 0 0 0 which are G- u1-u2-u3 0 0 0 intersecting with B. Without loss of generality, u1 ∈ {u1, u2, u3} (note that we allow more than one vertex in the intersection). We have two cases: 0 0 0 0 (i) If u1 is a leaf of u1-u2-u3, then we have ∆ choices for the apex u2 0 and ∆ − 1 choices for the second leaf as a neighbour of u2 in G. The 0 0 0 0 ordering of the cherry u1-u2-u3 is then fixed by the requirement u1 < 0 u3. 0 ∆(∆−1) (ii) If u1 is the apex u2, then there are 2 choices for the two leaves, whose ordering is again predetermined.

3 0 0 0 Altogether, this gives 2 ∆(∆ − 1) choices for u1-u2-u3. Each cherry in G 2 can be mapped to at most n k monochromatic cherries in Km⊗n in a part- 2 0 0 respecting manner - there are n ways to choose v1 and v2 inside the parts 0 0 0 corresponding to u1 and u2, and then further k choices of v3 satisfying 0 0 0 0 c(v1v2) = c(v2v3). Summing up and multiplying by 3 to account for the fact that the intersection may occur at u1, u2 or u3, we conclude 9 I (B) ≤ ∆(∆ − 1)n2k.(6.2) G 2

Next, denote the number of events which K-intersect B by IK(B). As 0 0 0 0 [v1v2v3]  0 0 0 before, let B = B 0 0 0 be such an event and suppose v1 ∈ v1, v2, v3 . u1-u2-u3 There are n ways of choosing the preimage of v1 in the corresponding part of G. Again, we distinguish two cases. 0 0 0 0 ∆(∆−1) (i) If v1 = v2 is the apex of [v1v2v3], then there are 2 ways of 0 0 choosing the leaves of u1 and u3. 0 0 0 (ii) If v1 is a leaf of [v1v2v3], then there are ∆(∆ − 1) ways to complete the 0 0 0 0 0 preimage of v1 into a cherry u1-u2-u3 in G and the condition u1 < u3 0 0 determines whether v1 = v1 or v1 = v3. 120 bounded colourings of multipartite graphs and hypergraphs

0 0 0 Having chosen u1-u2-u3, there are nk ways to complete the monochromatic 0 0 0 3 2 cherry [v1v2v3] in Km⊗n, which gives at most 2 ∆(∆ − 1)n k bad events S- intersecting B at v1. Since the intersection can also occur at v2 and v3, the bound is 9 I (B) ≤ ∆(∆ − 1)n2k.(6.3) K 2 n ≤ 4 n ≥ k ≤ n Introducing these bounds, using n−1 3 for 4 and 48∆2 , we get

9 1 12∆2k 1 P B0 ≤ 2 · ∆(∆ − 1)n2k · < ≤ , ∑ 2 n2(n − 1) n 4 B0∈B B0 S-intersects B

which proves (6.1).

We continue with the proof of our result on rainbow subgraphs in glob- ally bounded colourings.

Proof of Theorem 6.4. We follow the outline of the previous proof, but we now have to avoid the events that any two edges in our embedding of G carry the same colour. Recall, G is an m-partite graph on vertex set U = U1 ∪ U2 ∪ ... Um, that is, parts Ui are independent sets with |Ui| ≤ n. The maximum degree of G is denoted by ∆. Let c be a globally k-bounded K k = n K edge-colouring of m⊗n, where 110∆2 . We take the vertex set of m⊗n to be V = V1 ∪ V2 ∪ ... Vm with |Vi| = n for all i. We claim that there exists a rainbow copy of G in Km⊗n, even with the additional constraint that each part Ui is mapped into Vi. As before, let S denote the set of injections f 0 : U → V satisfying 0 f (Ui) ⊂ Vi for all i. Let f be an injection chosen uniformly at random from S. The set of bad events is now extended to B ∪ C, where B and C [v1v2v3] are as follows. As before, B is the set of events of form Bu1-u2-u3 . Recall that u1-u2-u3 is a cherry in G with apex u2 and leaves satisfying u1 < u3, and [v1v2v3] is a monochromatic cherry in Km⊗n with no specified ordering between the leaves. Similarly, a quadruple (u1u2)(u3u4) in G denotes two disjoint edges u1u2 and u3u4 in G such that u1 < u2, u3 < u4 and u1 < u3. A monochromatic quadruple in Km⊗n is a 4-tuple [v1v2v3v4] of distinct vertices vi ∈ V satisfy- ing c(v1v2) = c(v3v4). Subject to such choices of ui and vi, we define C be the set of events of form

[v1v2v3v4] B = { f ∈ S : f (ui) = vi for i = 1, . . . , 4}. (u1u2)(u3u4) 6.2 embedding m-partite graphs 121

As granted by Lemma 6.11, the graph on vertex set B ∪ C and edges between B and B0 whenever the two events S-intersect is a negative depen- dency graph. By definition, this occurs only when the cherries or quadru- ples corresponding to B and B0 intersect. B P [B] ≤ 1 < 1 Just like before, each event in B satisfies n2(n−1) 4 , whereas B ∈ C P [B] ≤ 1 < 1 events satisfy a stronger inequality n2(n−1)2 4 . Equality is attained when two pairs of vertices lie in the same part of G, say u1, u3 ∈ Ui and u2, u4 ∈ Uj, and otherwise the probability is strictly smaller. By Lemma 6.10 it remains to prove for B ∈ B ∪ C,

1 P B0 ≤ .(6.4) ∑ 4 B0∈B∪C B0 S-intersects B Upon showing that, with positive probability none of the bad events occur, in which case f yields a rainbow embedding of G into Km⊗n. Consider a bad event B ∈ B ∪ C. We denote the number of events in B which G-intersect (resp. K-intersect) B by IG(B) (resp. IK(B)). Analogously, JG(B) (resp. JK(B)) denotes the number of events in C which G-intersect [v1v2v3] ∈ { } (resp. K-intersect) B. If B has the form Bu1-u2-u3 , fix u u1, u2, u3 and [v1v2v3v4] v ∈ {v1, v2, v3}. Otherwise, if B = B , fix u ∈ {u1, u2, u3, u4} (u1u2)(u3u4) and v ∈ {v1, v2, v3, v4}. Either way, each of the two vertices can be chosen in at most 4 ways (as opposed to 3 in the previous proof). We will be counting events that G-intersect or K-intersect B at u or v respectively, and then multiply the result by 4 to take into account all the possibilities. The bounds we obtain are valid in both cases, when B ∈ B as well as B ∈ C. 0 0 0 0 [v1v2v3] For the events B ∈ B of form B 0 0 0 S-intersecting B, we have the u1-u2-u3 same count as in equations (6.2) and (6.3), with the increase by a factor of 4 3 as explained above. It follows that 3 I (B) ≤ 4 · ∆(∆ − 1)n2k = 6∆(∆ − 1)n2k and (6.5) G 2 2 IK(B) ≤ 6∆(∆ − 1)n k.(6.6)

0 0 0 0 0 [v1v2v3v4] 0 0 0 0 To bound JG(B), fix an event B = B 0 0 0 0 with u ∈ {u1, u2, u3, u4}. (u1u2)(u3u4) 0 0 In counting the choices for vertices ui and vi, we switch back and forth between G and Km⊗n to get the best bounds. There are ∆ possible choices for the vertex u0 neighbouring u, and n2 choices for the images v and v0 of u and u0 in the corresponding parts of G. Then we fix a pair {v˜, v˜0} such 122 bounded colourings of multipartite graphs and hypergraphs

that c(v˜v˜0) = c(vv0), which can be done in k ways, since the colouring is globally k-bounded. The preimage of v˜ and v˜0 can again be chosen in at most n∆ ways. The ordering of vertices in G now uniquely determines the 0 0 0 0 [v1v2v3v4] event B 0 0 0 0 . Putting the numbers together, and taking into account (u1u2)(u3u4) [v1v2v3] [v1v2v3v4] at most four choices of u from B = Bu u u or B = B gives 1- 2- 3 (u1u2)(u3u4)

2 2 3 JG(B) ≤ 4n ∆kn∆ = 4∆ n k.(6.7)

0 0 0 0 0 [v1v2v3v4] To control JK(B), once again fix an event B = B 0 0 0 0 satisfying (u1u2)(u3u4)  0 0 0 0 v ∈ v1, v2, v3, v4 . We start by choosing the preimage u of v among the n possible vertices. From there, the argument is the same as for JG(B), 2 3 implying JK(B) ≤ 4∆ n k. k ≤ n n ≥ Summing up and using 110∆2 , we get for 5

12∆(∆ − 1)n2k 8∆2n3k P B0 ≤ + ∑ n2(n − 1) n2(n − 1)2 B0∈B∪C B0 S-intersects B  5 25  ∆2k 1 < 12 · + 8 · ≤ . 4 16 n 4

To conclude this section, we show that Theorems 6.4 and 6.5 are the best we can hope for up to a constant factor. The proof relies on some ideas from [140].

Proof of Proposition 6.6. As stated, let q ≥ m be a prime power, and we let V(G) = P ∪ L ∪ T1 ∪ T2 ∪ ... Tq2+q ∪ S1 ∪ S2 ∪ ... Sq2+q, where P = {p0, p1,... pq2+q} and L = {`0, `1,... `q2+q} correspond to the points and the lines of the finite projective plane PG(2, q), respectively. Moreover, for  2  any i ∈ q + q , both the set Ti and the set Si induces a clique of order m − 1. We join p0 and p1 to all the vertices of T1, p1 and p2 to all the vertices of T2, and so on, up to pq2+q−1, pq2+q and Tq2+q. Analogously, we connect  2  the lines `j−1 and `j to the vertices of the clique Sj for every j ∈ q + q . Finally, we join pi to `j when the point pi belongs to the line pj. Note that our construction forces any embedding of G into Km⊗n (regardless of the colouring) to place points into the same part of Km⊗n, and lines into the same part. Indeed, suppose there are two points pi, pi+1 which are placed into two different parts of Km⊗n. Then, since they both connected to a 6.3 embedding bounded-degree hypergraphs 123

clique Ti+1 of size m − 1, the vertices of this clique can not be placed into the remaining m − 2 parts. We now show that G is indeed m-partite, i.e. we can partition the vertices of G into independent sets U1,... Um. To split P ∪ T1 ∪ T2 ∪ ... Tq2+q, we set P ⊂ U1, and parts U2,... Um contain one vertex of each clique Ti. Vertices in L ∪ S1 ∪ S2 ∪ ... Sq2+q are split in the analogous way, with say L ⊂ U2. And indeed, each part of G has at most 2(q2 + q + 1) ≤ 3q2 vertices. Finally, the maximum degree of G is ∆(G) = (q + 1) + (2m − 2) ≤ q + 2m, as stated. d n e The locally q2+q -bounded colouring of Km⊗n which we now describe is an analogue of the colouring from [140, Lemma 29], motivated by the canonical colourings of Kn. Let V1, V2,... Vm be the parts of Km⊗n, and let 2 each part Vi be split into q + q clusters Vij as evenly as possible. If x and y 0 are vertices of Km⊗n with x ∈ Vij, y ∈ Vi0 j0 and i < i , then the edge xy gets colour (x, Vi0 j0 ). In other words, if we order the parts V1,..., Vm vertically downward, the colours are indexed by (x, Vi0 j0 ), where x lies strictly above Vi0 and each downward fan from x to a cluster Vi0 j0 gets its own unique colour. The resulting colouring c is globally bounded by the order of the n clusters, that is by q2+q . Suppose there is a properly coloured embedding f : V(G) −→ V(Km⊗n). Then f (P) ⊂ Vi and f (L) ⊂ Vj, so, without loss of generality, i < j. By the pigeonhole principle, two lines, say `0 and `1, lie inside the same cluster. But then the cherry formed by `0, `1 and the representative of their inter- section in PG(2, q) ⊂ G is monochromatic by construction of c.

6.3 embedding bounded-degree hypergraphs

This section is concerned with bounded edge-colourings of r-uniform hy- pergraphs. As in Theorems 6.2, 6.3, 6.4 and 6.5, we establish upper bounds (r) on k so that any locally (globally) k-bounded colouring of Kn contains a properly coloured (rainbow) copy of a given hypergraph G. This result is given in Theorem 6.9. We also complement this result by showing that the dependence on n in the bounds on k is asymptotically the best possible.

Proof of Theorem 6.9. Let G be as in the statement, an r-uniform hypergraph satisfying ∆`+1(G) = 1 and ∆i(G) = ∆i for 0 ≤ i ≤ `. The vertex set of G (r) is U, of order at most n, and the vertex set of the complete r-graph Kn is 124 bounded colourings of multipartite graphs and hypergraphs

(r) V. We consider a colouring c of Kn , to which we impose different local or global restrictions. We set up some notation. In the hypergraph setting, it is more convenient for us to index the bad events in terms of edges. Firstly, fix a total ordering on the subsets of U of order r, which induces an ordering on the edges of G. For i ∈ {0, 1, . . . , `}, a cherry in G of overlap i is an ordered pair of edges (e1, e2) satisfying e1 < e2 and |e1 ∩ e2| = i. A vertex u ∈ e1 ∩ e2 is called an apex vertex. Let S be the set of all injections from U to V. We will now define the canonical bad events in the uniform probability space on S. Let (e1, e2) be a cherry of overlap i and τ : e1 ∪ e2 → V an injection satisfying c(τ(e1)) = c(τ(e2)). Notice that since τ is injective, (r) the images τ(e1) = {τ(v) : v ∈ e1} and τ(e2) are edges of Kn , and the colours c(τ(e1)), c(τ(e2)) are well-defined. Moreover, |τ(e1) ∩ τ(e2)| = i is preserved. We define the corresponding canonical bad event (of overlap i)

(e1,e2) B τ = { f ∈ S : f (u) = τ(u) for all u ∈ e1 ∪ e2} .

We denote the set of all bad events of overlap i by Bi. Note that in all of the definitions above, we also allow i = 0, corresponding to the case of disjoint edges. We first consider a locally k-bounded colouring c and prove statement  nr−`  (i) - that c is G-proper for k = O . Let B = B ∪ · · · ∪ B` be the ∆1∆` 1 set of bad events. Lemma 6.11 grants that the graph on vertex set B and ( ) (e0 ,e0 ) e1,e2 1 2 S edges between B τ and B τ0 whenever the two events -intersect is 0 0 a negative dependency graph. By definition, this occurs only when e1 ∪ e2 0 0  0 0 intersects e1 ∪ e2, or τ e1 ∪ τ (e2) intersects τ(e1) ∪ τ(e2). If the prior occurs, we call the events G-intersecting, and otherwise we call them K- 0 0  intersecting. In particular, if u lies in e1 ∪ e2 ∩ (e1 ∪ e2), we say the two events G-intersect at u, and analogously we say the two events K-intersect 0 0  0 0 at v, if v lies in the intersection of τ e1 ∪ τ (e2) and τ (e1) ∪ τ (e2). = (e1,e2) P [ ] ≤ 1 < 1 Each bad event B B τ satisfies B n(n−1)...(n−2r+`+1) 4 . Equality holds when B is of overlap `, and otherwise the probability is strictly smaller. By Lemma 6.10, if every B ∈ B satisfies

1 P B0 ≤ ,(6.8) ∑ 4 B0∈B B0 S-intersects B 6.3 embedding bounded-degree hypergraphs 125 then with positive probability all events in B are avoided, i.e. f (G) is a (r) properly coloured copy of G in Kn . (e1,e2) Fix an event B = B τ and vertices u ∈ e1 ∪ e2, v ∈ τ(e1) ∪ τ(e2). 0 We define IG(B, i, u) to be the number of events B of overlap i satisfying 0 0 0 u ∈ e1 ∪ e2. Similarly, IK(B, i, v) is the number of events B of overlap i 0 0 0 0 satisfying v ∈ τ (e1) ∪ τ (e2). Note that there are at most 2r − i ≤ 2r choices of the vertex u and similarly 2r − i ≤ 2r choices of v. Claim 1. For all i ∈ [`],

r r IG (B, i, u) = O (n ∆1∆ik) and IK(B, i, v) = O (n ∆1∆ik) .

To see the first inequality, there are at most ∆1 ways of choosing an edge r e ∈ E(G) containing u, and at most (i) ways to select the apex vertices from e. The vertices of e can be mapped to any r-tuple of vertices in V, for which there are n(n − 1) ... (n − r + 1) ≤ nr choices. From there, there are at most ∆i ways to choose the other edge in G forming a cherry of overlap 0 0 i with e, and whether e = e1 or e = e2 is determined by the ordering of 0 0 0 edges. Without loss of generality, e = e1. Finally, as τ (e1) is fixed, there 0 0 (r) are at most k choices for τ (e2) which form a monochromatic cherry in Kn 0 0 with it. The number of orderings of τ (e2) is suppressed into the constant, r so altogether IG(B, i, u) = O (n ∆1∆ik). For the second inequality, there are n ways to select the preimage τ−1(v) ∈ U, and ∆1 ways to extend it into an edge e of G. Then we fix the im- age τ(w) for all vertices w ∈ e \{τ−1(v)}, for which there are at most (n − 1)(n − 2) ... (n − r + 1) ≤ nr−1 ways. Now we are in the same posi- 0 tion as above, so there are further O(∆ik) ways to completely determine B . r Multiplying the different counts, we get IK(B, i, u) = O (n ∆1∆ik). Claim 2. For all i ∈ [`],

 0  `−r  ∑ P B = O n ∆1∆`k .(6.9) 0 B ∈Bi B0 S-intersects B

Indeed, it is easy to see that ∆i−1(G) ≤ (n − i + 1)∆i(G) for every i and every hypergraph G. Namely, for a vertex set A ⊂ U with |A| = i − 1 and any vertex v ∈ U \ A, degG(A ∪ {v}) ≤ ∆i(G) holds by the definition of ∆i. Summing up over all the choices of v gives degG(A) ≤ (n − i + 1)∆i(G).  `−i  Iterating the inequality yields ∆i(G) = O n ∆` for i ≤ `. An event of 1 = −2r+i overlap i has probability exactly n(n−1)...(n−2r+i+1) O n . Summing 126 bounded colourings of multipartite graphs and hypergraphs

up IG(B, i, u) and IK(B, i, v) over all i ∈ [`] and vertices u ∈ U and v ∈ V, we get

 0  r `−i   −2r+i  `−r  ∑ P B = ∑ O n k∆1n ∆` · O n = O n ∆1∆`k , B0∈B i∈[`] B0 S-intersects B

r−` as required. Therefore, setting k = c1n for c > 0 sufficiently small, we ∆1∆` 1 get (6.8), which completes the proof of the part (i). r−` For the part (ii), suppose c is k-globally bounded, where k = c2n and ∆1∆` c2 < c1 is a positive real we determine later. Note that since c2 < c1, the equation (6.9) still holds. It is therefore enough to prove a bound analogous to the one in Claim 1 for the number of bad events B0 of overlap zero. To be precise, f : U → V is again a random injection, and the set of bad events is now B ∪ B0, where B0 contains the events of overlap zero. The negative dependency graph has edges exactly between the pairs of S-intersecting events in B ∪ B0. (e1,e2) Fix an event B = B τ , vertices u ∈ e1 ∪ e2 and v ∈ τ(e1) ∪ τ(e2), and define IG(B, i, u) and IK(B, i, v) just like before. r Claim 3. IG(B, 0, u) + IK(B, 0, v) = O (n ∆0∆1k). Recall that ∆0 is just the number of edges of G. The count is exactly like before. For IG(B, 0, u), we select an edge e of G containing u and an injec- 0 r tion τ : e → V in O(∆1n ) ways, and then another edge and its image 0 which matches c(τ (e)) in O(∆0k) ways. Multiplying gives IG(B, 0, u) = r O (n ∆0∆1k) . The same holds for IK(B, 0, v) – we can select an edge e and 0 0 r an injection τ : e → V so that v ∈ τ (e) in O (∆1n ) ways, and then we are in the same position as above. Summing up completes the proof of Claim 3.  `  0 −2r 0 Introducing ∆0 = O n ∆` and P [B ] = O n for events B of over- lap 0 gives

 0  `−r  ∑ P B = O n ∆1∆`k .(6.10) 0 B ∈B∪B0 B0 S-intersects B  r−`  We set k = O n . Lemma 6.10 implies that then there exists an em- ∆1∆` bedding f avoiding all events in B ∪ B0, i.e. a rainbow embedding of G.

We conclude the section with two constructions. Firstly, we show that the   power k = O nr−` is the highest power of n for which we can guarantee 6.3 embedding bounded-degree hypergraphs 127

an embedding. Secondly, for ` = r − 1 and ∆1(G)∆r−1(G) = Θ(n), we cannot hope for a stronger embedding result than Theorem 6.9. (r) For the first construction, let D` (m) be an m-vertex r-uniform hyper- graph such that each set of ` + 1 vertices is contained in exactly one edge. Note that a recent result of Keevash on the existence of designs [107] grants existence of such hypergraphs whenever m is sufficiently large and the pa- rameters r, ` and m satisfy all the necessary divisibility conditions. Also (r) note that each `-subset of the vertices of D` (m) is contained in m − ` edges. In fact, we do not need a design, but only a hypergraph in which all the vertex subsets of order ` are contained in at least two edges, but each (` + 1)-subset is contained in at most one edge. Such hypergraphs can be constructed probabilistically using standard nibble techniques.

Proposition 6.12. Let r, ` and m be integers such that there exists a hypergraph (r) r−` (r) D` (m). For any n ≥ m there exists a globally n -bounded colouring of Kn (r) which contains no properly coloured copy of D` (m).

(r) Proof. Let V be the vertex set of Kn with a given ordering, and let c be (r) n the following colouring of the edges of Kn using (`) colours. For each v1 < v2 < ··· < vr ∈ V, set

c(v1, v2,..., vr) = {v1, v2,..., v`}.

(r) That is, the edges of Kn are coloured so that the colour of each edge is uniquely determined by the first ` vertices. The colouring c is globally r−` (r) n -bounded. Suppose there is a properly coloured copy of D` (m) in c, and let v1, v2,..., v` be the minimal vertices in this copy. But then there are two edges e1 and e2 in this embedding of G containing v1, v2,... v`, so c(e1) = c(e2) = {v1, v2,..., vr}, which is a contradiction.

The second constructions that the dependence on the degrees of G in Theorem 6.9 is tight at least for ` = r − 1.

Proposition 6.13. For any r ≥ 2, there is an r-uniform hypergraph G on n ver- tices satisfying ∆1(G)∆r−1(G) = Θ(n), and a globally O(1)-bounded colouring (r) of Kn which is not G-proper.

n1 r  Proof. Let n1 be a natural number and n = 1 + n1 + (r−1)n1 = Θ n1 . Let G have vertex set V = {v} ∪ L1 ∪ L2. Here |L1| = n1 are vertices in the first 128 bounded colourings of multipartite graphs and hypergraphs

level, which means that each (r − 1)-tuple in L1 forms an edge with the root v. Furthermore, each (r − 1)-tuple S ⊂ L1 has its own n1 children belonging to L2. In other words, for each u ∈ L2, there is a unique (r − 1)-tuple S ⊂ L1  r−1 such that S ∪ {u} is an edge. Indeed, G has the 1-degree ∆1(G) = Θ n1 attained by v and the vertices in the first level. Moreover, ∆r−1(G) ≤ n1 by looking at any r − 1 vertices in L1 ∪ {v}. (r) Next, we give the promised colouring c of Kn . Let n be partitioned into sets S , S ,..., S n of order r + 1. An edge {u , u ,... ur} satisfying 1 2 r+1 1 2 ∈ ∈ ∈ { } u1 Si1 , u2 Si2 , up to ur Sir gets the colour i1, i2,... ir (viewed as a multiset). The colouring is globally (r + 1)r-bounded. Suppose that the bijection f : V → [n] induces a properly coloured copy of G, and that, without loss of generality, the image of v lies in S1. If the other r members of S1 lie in f (L1), then they span r edges of colour {1, 1, . . . 1}. Otherwise, let w ∈ L2 satisfy f (w) ∈ S1. There is an (r − 1)-tuple S ⊂ L1 which forms an edge in G with w, and an edge with v. The edges f (S ∪ {v}) and f (S ∪ {w}) have the same colour in the embedding given by f .

concluding remarks

We have studied the problem of finding a rainbow and properly coloured copy of a graph/hypergraph G with some degree restrictions in a bounded k-colouring of the complete multipartite graph/hypergraph. We obtained upper bounds on k in terms of maximum degree/`-degree of G that guar- antees locally and globally k-bounded colourings to be G-proper and G- rainbow, respectively. Moreover, for multipartite graphs, the dependence of k on other parameters in our bounds is the best possible up to a con- stant factor. However, there are several natural questions which remain open. Here we mention two of them that we find the most interesting. We state them only for properly coloured copies of hypergraphs in locally bounded colourings, but the analogues for globally bounded colourings seeking rainbow copies are just as interesting. The first question asks for the correct asymptotics of k that guarantee a properly coloured copy of a tight Hamilton cycle in locally k-bounded (r) colourings of Kn . What is the largest possible k, such that any locally k- (r) r bounded colouring of Kn is Cn(r − 1)-proper? In particular, for r = 3, is 1+ε (3) there an ε > 0 such that any locally O(n )-bounded colouring of Kn contains a tight Hamilton cycle? 6.3 embedding bounded-degree hypergraphs 129

We have shown that the dependence on n in Theorem 6.9 is the best possible. However, apart from the case l = r − 1, we do not know if the dependence on the maximum degrees ∆1, ∆2,..., ∆` of G is the correct one or not. The first unresolved case are the 3-uniform linear hypergraphs, i.e. hypergraphs G satisfying ∆2(G) = 1. Are there 3-uniform n-vertex  n2  linear hypergraphs G and O 2 -bounded colourings which are not ∆1(G) G-proper?

7 INTERVALSINTHEHALES–JEWETTTHEOREM

The Hales–Jewett theorem [97] is one of the central results in Ramsey the- ory. Quoting Graham, Rothschild and Spencer [91], it “strips van der Waer- den’s theorem of its unessential elements and reveals the heart of Ramsey theory. It provides a focal point from which many results can be derived and acts as a cornerstone for much of the more advanced work." To state the theorem requires some notation. Given natural numbers m and n, let [m]n be the collection of all n-letter words, where each let- ter is taken from the alphabet [m] = {1, 2, ..., m}. Given a word w from [m]n, a subset S of [n] and an element i of [m], let w(S, i) be the word obtained from w by replacing the jth letter with i for all j in S.A com- binatorial line in [m]n with active set S 6= ∅ is then a subset of the form {w(S, 1), w(S, 2),..., w(S, m)}. Theorem 7.1 (Hales –Jewett theorem). For any natural numbers m and r, there exists a natural number n such that any r-colouring of the elements of [m]n contains a monochromatic combinatorial line. For m = 2, the Hales–Jewett theorem is simple to prove. Consider all se- quences of length r of the form 11 . . . 122 . . . 2, that is, a string of 1s followed by a string of 2s. Since there are r + 1 different sequences, the pigeonhole principle implies that two of them must receive the same colour. If the first of these sequences switches from 1s to 2s after the ith letter and the second switches after the jth letter with j > i, then these two sequences form a monochromatic combinatorial line whose active set is the interval [i + 1, j]. n Given a word w from [m] , disjoint subsets S1,..., Sq of [n] and ele- ments i1,..., iq of [m], let w(S1, i1;...; Sq, iq) be the word obtained from w by replacing the jth letter with ik if j is in Sk. For m = 3, the first step in proving the Hales–Jewett theorem is to show that for n sufficiently large there is an r-subcube in which the colours are indifferent to the replace- ment of 0’s by 1’s. That is, there is a word w ∈ [3]n and disjoint intervals S1,..., Sr of [n] such that for any T ⊆ [r] and any i1,..., ir ∈ [3], the word 0 0 0 0 w(S1, j1;...; Sr, jr) obtained by letting jt = 2 for all t ∈ T and jt = it for 00 00 all t ∈/ T has the same colour as the word w(S1, j1 ;...; Sr, jr ) defined anal- 00 00 ogously by letting jt = 3 for all t ∈ T and jt = it for all t ∈/ T. That is, regardless of how the intervals S1,..., Sr are filled, we may switch the label 131 132 intervals in the hales–jewett theorem

on any subset of the intervals from 2 to 3 without changing the colour of the word. To complete the proof, we consider the r-colouring of [2]r where the word v = v(1) ... v(r) receives the colour of the word w(S1, v(1);...; Sr, v(r)). By the m = 2 case of the theorem, there is a monochromatic combinato- rial line determined by a active set T ⊆ [r]. This implies that there are i1,..., ir ∈ [2] such that the word w(S1, j1;...; Sr, jr) with jt = 1 for all t ∈ T 0 0 and jt = it for all t ∈/ T has the same colour as the word w(S1, j1;...; Sr, jr) 0 0 with jt = 2 for all t ∈ T and jt = it for all t ∈/ T. But we already know 00 00 that this latter word has the same colour as the word w(S1, j1 ;...; Sr, jr ) 00 00 with jt = 3 for all t ∈ T and jt = it for all t ∈/ T. Therefore, we have a monochromatic combinatorial line with active set S = ∪t∈TSt. In particular, this proof shows that it is possible to find monochromatic combinatorial lines in [3]n where the active set has a comparatively sim- ple structure - it is the union of at most r intervals. The main result of this chapter says that there are situations where one can do no better, sug- gesting that the proof strategy described above is, at least in some sense, necessary. Theorem 7.2. For any n and any odd r > 1, there is an r-colouring of [3]n containing no monochromatic combinatorial line whose active set is the union of fewer than r intervals. 3 n 0 Proof. Fix a vector (t1, t2, t3) ∈ Zr and, for a word w ∈ [3] , let T (w) = 0 ∑j∈[n] tw(j). In words, t assigns a weight to each letter in [3] and T (w) is then the sum of the weights over all letters of w, where the sum is taken modulo r. Let the word w be obtained from w by contracting each interval on which w is constant to a single letter. Set T(w) = T0(w). Finally, we construct the word w+ by inserting a letter 1 at the start and end of w. Our colouring of [3]n will be T+(w) = T(w+). For example, for w = 11122133, we have w+ = 1111221331, w+ = 12131, and T+(w) = T(w+) = t1 + t2 + t1 + t3 + t1. We claim that for t1 = t3 = 2 and t2 = −1, the + n colouring T : [3] → Zr contains no monochromatic combinatorial line whose active set is the union of fewer than r intervals. Let us introduce some more notation. Consider a combinatorial line (x1, x2, x3) with xi = w(S, i), where S is a union of q disjoint non-consecutive intervals in [n]. Although any word that coincides with xi outside the set S can be chosen as the representative w, let us set w = x1 to avoid ambi- guity. Outside S, the word w+ consists of a collection of non-empty sub- words w0, w1,..., wq, where wj−1 precedes wj for all j = 1, . . . , q. Note + that the subwords w0 and wq are non-empty by the construction of w . intervals in the hales–jewett theorem 133

Denote the first letter of wj by fj and the last letter by `j+1 (such an in- dexing will be more convenient below). We now show that the difference + T(xi ) − (T(w0) + ··· + T(wq)) depends only on i and the letters `j, fj.

Claim 11. For any t1, t2, t3 ∈ Zr and i ∈ [3],

+ + T (xi) = T(x ) = i (7.1) =T(w0) + hi(`1, f1) + T(w1) + hi(`2, f2) + ··· + hi(`q, fq) + T(wq), where the hi(`, f ) are Zr-valued functions specified in the proof. + Proof. We write yi = xi for i ∈ [3] so that the identity (7.1) is just a state- ment about how T can be computed from the decomposition of the word yi. Let us first take q = 1. There are essentially two cases. Firstly, suppose `1 6= f1. For concreteness, we take `1 = 1, f1 = 2. Then y1 = y2 = w0 w1 and, therefore, T(y1) = T(y2) = T(w0) + T(w1) and h1(1, 2) = h2(1, 2) = 0. Moreover, h3(1, 2) = t3. Suppose now that `1 = f1 and consider the spe- cial case `1 = f1 = 1. For a word u ending in 1, let u \ 1 be the word obtained from u by removing its final letter. Then y1 = (w0 \ 1) w1. Hence, h1(1, 1) = T(y1) − T(w0) − T(w1) = −t1. Moreover, hi(1, 1) = ti for i = 2, 3. Since hi (`, f ) = hi ( f , `), all possible cases are summarised in the following table: (`, f ) (1, 1)(2, 2)(3, 3)(2, 3)(3, 1)(1, 2)

h1(`, f ) −t1 t1 t1 t1 0 0

h2(`, f ) t2 −t2 t2 0 t2 0

h3(`, f ) t3 t3 −t3 0 0 t3 The general case now follows by a simple induction. Indeed, suppose that (7.1) holds for q − 1. We will verify that it also holds for q. By the q = 1 case discussed above,

T(yi) = T(w0) + hi(l1, f1) + T(w1iw2 ... iwq).(7.2)

Since w1 is non-empty, we can apply the induction hypothesis to the term T(w1iw2 ... iwq), which completes the proof. A careful reader may notice that in (7.2), the intervals of the active set S have been replaced by a single letter i, which just facilitates the notation and makes no difference since T is computed from a contraction of the word. 134 intervals in the hales–jewett theorem

Suppose now that t1 = t3 = 2, t2 = −1 and there is a combinatorial line + + + (x1, x2, x3) such that T (x1) = T (x2) = T (x3). Then, for i ∈ {1, 3},

q + +  T (xi) − T (x2) = ∑ hi(`j, fj) − h2(`j, fj) = 0. j=1

In particular, summing these two equalities,

q + + +  T (x1) + T (x3) − 2T (x2) = ∑ h1(`j, fj) + h3(`j, fj) − 2h2(`j, fj) = 0. j=1

But we can verify that with t1 = t3 = −2t2 = 2, for each ` and f we have + + + h1(`, f ) + h3(`, f ) − 2h2(`, f ) = 2, so 0 = T (x1) + T (x3) − 2T (x2) = 2q. Since r is odd, 2q = 0 in Zr implies q ≥ r, as required.

concluding remarks

the even case Surprisingly, Imre Leader and Eero Räty [121] have shown that our result does not extend to r = 2 colours. That is, in any two- colouring of [3]n, there exists a combinatorial line whose active set is an interval. For even r, our result implies that for any n there is an r-colouring of [3]n containing no monochromatic combinatorial line whose active set is the union of fewer than r − 1 intervals. Whether the minimal number of intervals for even r ≥ 4 is r − 1 or r remains an open question.

higher m Given m and r, let HJ(m, r) be the smallest dimension n such that every r-colouring of [m]n contains a monochromatic combinatorial line. By following Shelah’s proof of the Hales–Jewett theorem [139], one can show that for n sufficiently large depending on m and r there is a monochromatic combinatorial line in [m]n whose active set is the union of at most HJ(m − 1, r) intervals. We do not know how tight this is, but even a marginal improvement on either side of the bound would be interesting.

Question 7.3. For m ≥ 4, are there r-colourings of [m]n containing no monochromatic combinatorial line whose active set is the union of at most 100r intervals? Alternatively, choose m ≥ 3, r > 2. Is it true that for sufficiently large n, any r-colouring of [m]n contains a monochromatic combinatorial line whose active set is the union of at most HJ(m − 1, r) − 1 intervals. A MULTIDIMENSIONALLU-SZÉKELY

We now present a proof of Theorem 6.11. Note that the case m = 1 is a result of Lu and Székely from [125]. As in their result, we actually show that the following slightly stronger choice of a graph gives a negative de- pendency graph. We say that two canonical events Ω(T1, U1, τ1) and Ω(T2, U2, τ2) in the probability space Ω conflict if

−1 −1 ∃x ∈ T1 ∩ T2 : τ1(x) 6= τ2(x) or ∃y ∈ U1 ∩ U2 : τ1 (y) 6= τ2 (y). Clearly two conflicting events are disjoint, and therefore negatively corre- lated. We now show that just connecting the conflicting events suffices for a negative dependency graph, even if we require the injections to respect a given partition of X and Y.

Theorem A.1. Let X = X1 × · · · × Xm and Y = Y1 × ... Ym, where the parts Xk and Yk satisfy |Xk| ≤ |Yk| for all k ∈ [m]. Consider the probability space Ω generated by picking a uniformly random injection σ : X → Y satisfying σ(Xk) ⊂ Yk for all k. Denote the set of such injections by S. 0 Let B1, B2,..., BN be canonical events in Ω, and define the graph D on [N] by

0 E(D ) = {ij : Bi and Bj conflict}.

0 Then D is a negative dependency graph for the events B1,... BN. Since the dependency graph D0 is a subgraph of the graph D in Theorem 6.11 with the edges between pairs of S-intersecting events, it follows that D is also a negative dependency graph for the same set of events.

Proof. Our proof follows the outline of [125, Theorem 1], but there are several claims we need to verify in the multidimensional setting. A matching between X and Y is a triple (T, U, τ), where τ is a part- respecting bijection from T ⊂ X to U ⊂ Y, that is τ(Xk ∩ T) ⊂ Yk for all k. All the functions we consider will be part-respecting. Fix an event Bi = Ω(T, U, τ) for a matching (T, U, τ), and a set of indices J = J(i) ⊂ {j ∈

135 136 multidimensional lu-székely

0   [N] : ij ∈/ E(D )}. We are to show the inequality P Bi | ∧j∈J Bj ≤ P [Bi], which is equivalent to     ^ ^ P  Bj | Bi ≤ P  Bj . (A.1) j∈J j∈J

hV i Here we assume that P j∈J Bj > 0, otherwise there is nothing to prove. The inequality follows immediately from the following claim. Claim. For any canonical event B0 = Ω(T, U0, τ0),       ^ ^ 0 P  Bj ∧ Bi ≤ P  Bj ∧ B  . (A.2) j∈J j∈J

Intuitively, the claim says that upon ∧j∈J Bj, there is no mapping of T τ that is less likely than T → U. Proof of Claim. Fix a canonical event B0 = Ω(T, U0, τ0). Let J0 = J0(J(i), B0) 0 0 be the set of indices j ∈ J so that Bj does not conflict B . If j ∈ J \ J , then 0 0 Bj conflicts B , that is B implies Bj. Therefore     0 0 ^ 0 ^ 0 Bj ∧ B = B , so  Bj ∧ B =  Bj ∧ B . j∈J j∈J0

The idea is that the functions σ ∈ ∧j∈J0 Bj are equally likely to map T via τ to U as they are to map it via τ0 to U0. Formally, we construct an automorphism of the probability space Ω which fixes each of the events 0 0 Bj = (Tj, Uj, τj) for j ∈ J , but maps B to Bi. To do this, we set up some notation. Let ρ be a part-respecting permuta- tion of Y, i.e. a bijection Y → Y satisfying ρ(Yk) ⊂ Yk for all k ∈ [m]. The permutation ρ determines an action on the matchings

πρ(σ) = ρ ◦ σ for all (P, Q, σ).

Clearly πρ takes any matching (P, Q, σ) to a matching (P, ρ(Q), ρσ) with the same domain. Furthermore, it preserves the uniform measure on Ω (for this it is crucial to notice that ρ preserves parts of Y). Finally, the action of πρ on the canonical events in Ω is described by

πρ(Ω(P, Q, σ)) = Ω(P, ρ(Q), ρσ). multidimensional lu-székely 137

Now we are ready to show that       ^ ^ 0 P  Bj ∧ Bi = P  Bj ∧ B  . (A.3) j∈J0 j∈J0

Let ρ : Y → Y be a part-respecting bijection defined by

ρ(y) = y for any y ∈ Y \ U0 and ρ(y) = τ(τ0−1(y)) for y ∈ U0. 0 0 We first check that ρ fixes points of Uj for j ∈ J . This is clear for y ∈ Uj \ U . 0 0 For y ∈ Uj ∩ U , there is an x ∈ T with τ (x) = y. Since the events Bj with 0 0 0 j ∈ J conflict neither Bi nor B , this x satisfies τ(x) = τj(x) = τ(x) = y, so indeed ρ(y) = y, and therefore πρ(Tj, Uj, τj) = (Tj, Uj, τj). Furthermore, 0 0 0 ρ(τ (x)) = τ(x) for every x ∈ T, so πρ(T, U , τ ) = (T, U, τ). This proves Equation (A.3), since the two events correspond to each other under the automorphism πρ. Hence    h  i P ∧j∈J Bj ∧ Bi ≤ P ∧j∈J0 Bj ∧ Bi h  0i = P ∧j∈J0 Bj ∧ B   0 = P ∧j∈J Bj ∧ B . This proves the claim. Keeping T fixed, the events B0 = Ω(T, U0, τ0) across all the matchings (T, U0, τ0) partition the probability space Ω, so summing up equation (A.3) over all such B0 gives

h i h  0i P ∧j∈J0 Bj = ∑ P ∧j∈J0 Bj ∧ B B0 h  i ≥ ∑ P ∧j∈J0 Bj ∧ Bi B0 h  i = ∑ P ∧j∈J0 Bj | Bi P [Bi] B0 h  i  0 = ∑ P ∧j∈J0 Bj | Bi P B B0 h  i = P ∧j∈J0 Bj | Bi ,

0 where in the fourth line we used the fact that P [Bi] = P [B ] by the unifor- mity of our probability space Ω.

Since any two conflicting events in the space Ω are S-intersecting, Theo- rem 6.11 immediately follows.

BIBLIOGRAPHY

1. A dynamic catalog of spectral data for families of graphs, http://aimath. org/pastworkshops/catalog2.html (2017). 2. AIM Minimum Rank – Special Graphs Work Group. Zero forcing sets and the minimum rank of graphs. Linear Algebra and its Applica- tions 428, 1628 (2008). 3. Albert, M., Frieze, A. & Reed, B. Comments on: “Multicoloured Hamilton cycles” [Electron. J. Combin. 2 (1995), Research Paper 10, 13 pp. (electronic); MR1327570 (96b:05058)]. Electron. J. Combin. 2, Re- search Paper 10, Comment 1, 1 HTML document (1995). 4. Alon, N. & Gutin, G. Properly colored Hamilton cycles in edge- colored complete graphs. Random Structures Algorithms 11, 179 (1997). 5. Alon, N. & Friedland, S. The maximum number of perfect matchings in graphs with a given degree sequence. Electron. J. Combin. 15, Note 13, 2 (2008). 6. Alon, N., Grytczuk, J., Hałuszczak, M. & Riordan, O. Nonrepetitive colorings of graphs. Random Structures & Algorithms 21, 336 (2002). 7. Alon, N., Grytczuk, J., Laso´n,M. & Michałek, M. Splitting necklaces and measurable colorings of the real line. Proceedings of the American Mathematical Society 137, 1593 (2009). 8. Alon, N., Hoory, S. & Linial, N. The Moore Bound for Irregular Graphs. Graphs and Combinatorics 18, 53 (2002). 9. Alon, N., Jiang, T., Miller, Z. & Pritikin, D. Properly colored sub- graphs and rainbow subgraphs in edge-colorings with local con- straints. Random Structures Algorithms 23, 409 (2003). 10. Alon, N., Rónyai, L. & Szabó, T. Norm-Graphs: Variations and Ap- plications. Journal of Combinatorial Theory, Series B 76, 280 (1999). 11. Alon, N., Seymour, P. & Thomas, R. A separator theorem for nonpla- nar graphs. J. Amer. Math. Soc. 3, 801 (1990). 12. Alon, N. & Spencer, J. H. The probabilistic method Fourth, xiv+375 (John Wiley & Sons, Inc., Hoboken, NJ, 2016).

139 140 bibliography

13. Amos, D., Caro, Y., Davila, R. & Pepper, R. Upper bounds on the k- forcing number of a graph. Discrete Applied Mathematics 181, 1 (2015). 14. Arenas, A., Díaz-Guilera, A., Kurths, J., Moreno, Y. & Zhou, C. Syn- chronization in complex networks. Physics Reports 469, 93 (2008). 15. Babai, L. personal communication. 16. Balogh, J., Bollobás, B. & Morris, R. Graph bootstrap percolation. Random Structures & Algorithms 41, 413 (2012). 17. Barrat, A., Barthélemy, M. & Vespignani, A. Dynamical Processes on Complex Networks (Cambridge University Press, Cambridge, UK, 2008). 18. Bean, D., Ehrenfeucht, A. & McNulty, G. Avoidable patterns in strings of symbols. Pacific Journal of Mathematics 85, 261 (1979). 19. Ben-Shimon, S., Krivelevich, M. & Sudakov, B. Local resilience and Hamiltonicity maker-breaker games in random regular graphs. Com- bin. Probab. Comput. 20, 173 (2011). 20. Bohman, T. The triangle-free process. Adv. Math. 221, 1653 (2009). 21. Bohman, T. & Frieze, A. Avoiding a giant component. Random Struc- tures Algorithms 19, 75 (2001). 22. Bohman, T. & Keevash, P. The early evolution of the H-free process. Invent. Math. 181, 291 (2010). 23. Bohman, T. & Keevash, P. Dynamic concentration of the triangle-free pro- cess in The Seventh European Conference on Combinatorics, Graph Theory and Applications 16 (Ed. Norm., Pisa, 2013), 489. 24. Bollobás, B. & Chung, F. R. K. The diameter of a cycle plus a random matching. SIAM J. Discrete Math. 1, 328 (1988). 25. Bollobás, B. & Fernandez de la Vega, W. The diameter of random regular graphs. Combinatorica 2, 125 (1982). 26. Bollobás, B. The isoperimetric number of random regular graphs. European J. Combin. 9, 241 (1988). 27. Bollobás, B. Modern graph theory xiv+394 (Springer-Verlag, New York, 1998). 28. Bollobás, B. Random graphs Second edition, xviii+498 (Cambridge University Press, Cambridge, 2001). 29. Bollobás, B. & Erd˝os,P. Alternating Hamiltonian cycles. Israel J. Math. 23, 126 (1976). bibliography 141

30. Bollobás, B. & Frieze, A. M. in Random graphs ’83 (Pozna´n, 1983) 23 (North-Holland, Amsterdam, 1985). 31. Bollobás, B. & Thomason, A. in Random graphs ’83 (Pozna´n, 1983) 47 (North-Holland, Amsterdam, 1985). 32. Böttcher, J., Kohayakawa, Y. & Procacci, A. Properly coloured copies and rainbow copies of large graphs with small maximum degree. Random Structures Algorithms 40, 425 (2012). 33. Brown, T. Is there a sequence on four symbols in which no two adja- cent segments are permutations of one another? The American Math- ematical Monthly 78, 886 (1971). 34. Burgarth, D. K. Identifying combinatorially symmetric Hidden Markov Models. ArXiv e-prints. arXiv: 1709.02932 [math.CO] (2017). 35. Burgarth, D., Bose, S., Bruder, C. & Giovannetti, V. Local controlla- bility of quantum networks. Physical Review A 79, 060305 (R) (2009). 36. Burgarth, D. & Giovannetti, V. Full control by locally induced relax- ation. Physical review letters 99, 100501 (2007). 37. Burgarth, D. & Giovannetti, V. Full control by locally induced relax- ation. Physical Review Letters 99, 100501 (2007). 38. Cano Vila, M. D. P. Rainbow matchings in hypergraphs MA thesis (Uni- versitat Politècnica de Catalunya, 2015). 39. Cano, P., Perarnau, G. & Serra, O. Rainbow spanning subgraphs in bounded edge–colourings of graphs with large minimum degree. Electronic Notes in Discrete Mathematics 61, 199 (2017). 40. Caro, Y., Lev, A., Roditty, Y., Tuza, Z. & Yuster, R. On rainbow con- nection. Electron. J. Combin. 15, Research paper 57, 13 (2008). 41. Caro, Y. & Pepper, R. Dynamic approach to k-forcing. Theory and Applications of Graphs 2, Article 2 (2015). 42. Cassaigne, J., Currie, J. D., Schaeffer, L. & Shallit, J. Avoiding three consecutive blocks of the same size and same sum. J. ACM 61, Art. 10, 17 (2014). 43. Chakraborty, S., Fischer, E., Matsliah, A. & Yuster, R. Hardness and algorithms for rainbow connection. J. Comb. Optim. 21, 330 (2011). 44. Chalupa, J., Leath, P. L. & Reich, G. R. Bootstrap percolation on a Bethe lattice. Journal of Physics C: Solid State Physics 12,L31 (1979). 142 bibliography

45. Chandran, L. S., Das, A., Rajendraprasad, D. & Varma, N. M. Rain- bow connection number and connected dominating sets. J. Graph Theory 71, 206 (2012). 46. Chartrand, G., Johns, G. L., McKeon, K. A. & Zhang, P. Rainbow connection in graphs. Math. Bohem. 133, 85 (2008). 47. Chen, C. C. & Daykin, D. E. Graphs with Hamiltonian cycles having adjacent lines different colors. J. Combinatorial Theory Ser. B 21, 135 (1976). 48. Chen, C. C. & Quimpo, N. F. in Combinatorial mathematics, VIII (Gee- long, 1980) 23 (Springer, Berlin-New York, 1981). 49. Coja-Oghlan, A., Feige, U. & Krivelevich, M. Contagious sets in ex- panders in Proceedings of the 26th Symposium on Discrete Algorithms (SODA) (2015), 1953. 50. Davila, R., Kalinowski, T. & Stephen, S. Proof of a conjecture of Davila and Kenter regarding a lower bound for the forcing number in terms of girth and minimum degree http://arxiv.org/abs/1611.06557. 2016. 51. Davila, R. & Kenter, F. Bounds for the Zero Forcing Number of Graphs with Large Girth. Theory and Applications of Graphs 2, Arti- cle 1 (2015). 52. Dekking, F. M. Strongly nonrepetitive sequences and progression- free sets. J. Combin. Theory Ser. A 27, 181 (1979). 53. Dubhashi, D. P. & Panconesi, A. Concentration of measure for the anal- ysis of randomized algorithms xvi+196 (Cambridge University Press, Cambridge, 2009). 54. Dudek, A. & Ferrara, M. Extensions of results on rainbow Hamilton cycles in uniform hypergraphs. Graphs Combin. 31, 577 (2015). 55. Dudek, A., Frieze, A. M. & Tsourakakis, C. E. Rainbow connection of random regular graphs. SIAM J. Discrete Math. 29, 2255 (2015). 56. Dudek, A., Frieze, A. & Ruci´nski,A. Rainbow Hamilton cycles in uniform hypergraphs. Electron. J. Combin. 19, Paper 46, 11 (2012). 57. Egerváry, J. Matrixok kombinatorius tulajdonságairól. Matematikai és Fizikai Lapok 38, 16 (1931). 58. Erd˝os,P. & Lovász, L. Problems and results on 3-chromatic hyper- graphs and some related questions, 609, Vol. 10 (1975). 59. Erd˝os,P. & Rényi, A. On random graphs. I. Publ. Math. Debrecen 6, 290 (1959). bibliography 143

60. Erd˝os,P. & Rényi, A. On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutató Int. Közl. 5, 17 (1960). 61. Erd˝os,P. & Rényi, A. On random matrices. Magyar Tud. Akad. Mat. Kutató Int. Közl 8, 455 (1964). 62. Erd˝os,P. & Rényi, A. On the existence of a factor of degree one of a connected random graph. Acta Math. Acad. Sci. Hungar. 17, 359 (1966). 63. Erd˝os,P. & Rényi, A. On random matrices. II. Studia Sci. Math. Hun- gar. 3, 459 (1968). 64. Erdös, P. Some remarks on the theory of graphs. Bull. Amer. Math. Soc. 53, 292 (1947). 65. Erd˝os,P. Graph theory and probability. Canad. J. Math 11, 34 (1959). 66. Erd˝os,P. Graph theory and probability. II. Canad. J. Math. 13, 346 (1961). 67. Erd˝os,P. Some unsolved problems. Magyar Tud. Akad. Mat. Kutató Int. Közl. 6, 221 (1961). 68. Erd˝os,P., Pach, J., Pollack, R. & Tuza, Z. Radius, diameter, and mini- mum degree. J. Combin. Theory Ser. B 47, 73 (1989). 69. Erdös, P. & Rado, R. A combinatorial theorem. Journal of the London Mathematical Society 1, 249 (1950). 70. Erd˝os, P. & Spencer, J. Lopsided Lovász local lemma and Latin transversals. Discrete Appl. Math. 30, 151 (1991). 71. Erd˝os,P. & Stone, A. H. On the structure of linear graphs. Bulletin of the American Mathematical Society 52, 1087 (1946). 72. Erd˝os,P., Suen, S. & Winkler, P. On the size of a random maximal graph in Proceedings of the Sixth International Seminar on Random Graphs and Probabilistic Methods in Combinatorics and Computer Science, “Random Graphs ’93” (Pozna´n, 1993) 6 (1995), 309. 73. Evdokimov, A. A. Strongly asymmetric sequences generated by a finite number of symbols. Dokl. Akad. Nauk SSSR 179, 1268 (1968). 74. Fallat, S. & Hogben, L. Minimum rank, maximum nullity, and zero forcing number of graphs. Handbook of Linear Algebra, 2nd edition, L. Hogben editor, CRC Press, Boca Raton (2013). 75. Ferrante, M. & Saltalamacchia, M. Lecture Notes: The Coupon Col- lector’s Problem URL: http : / / mat . uab . cat / matmat / PDFv2014 / v2014n02.pdf. 2014. 144 bibliography

76. Freidlin, M. I. & Wentzell, A. D. Diffusion processes on graphs and the averaging principle. The Annals of Probability 21, 2215 (1993). 77. Friedman, J. A proof of Alon’s second eigenvalue conjecture and related problems. Mem. Amer. Math. Soc. 195, viii+100 (2008). 78. Frieze, A. M. On large matchings and cycles in sparse random graphs. Discrete Math. 59, 243 (1986). 79. Frieze, A. M. On the independence number of random graphs. Dis- crete Math. 81, 171 (1990). 80. Frieze, A. M. & Łuczak, T. On the independence and chromatic num- bers of random regular graphs. J. Combin. Theory Ser. B 54, 123 (1992). 81. Frieze, A. M. & Molloy, M. Splitting an expander graph. J. Algorithms 33, 166 (1999). 82. Frieze, A. & Karo´nski,M. Introduction to random graphs xvii+464 (Cambridge University Press, Cambridge, 2016). 83. Frieze, A. & Krivelevich, M. On rainbow trees and cycles. Electron. J. Combin. 15, Research paper 59, 9 (2008). 84. Frieze, A. & Tsourakakis, C. E. in Approximation, randomization, and combinatorial optimization 541 (Springer, Heidelberg, 2012). 85. Füredi, Z. & Simonovits, M. in Erd˝os Centennial (eds Lovász, L., Rusza, I. & Sós, V. T.) 169 (Springer, 2013). 86. Gentner, M., Penso, L. D., Rautenbach, D. & Souza, U. S. Extremal values and bounds for the zero forcing number. Discrete Applied Mathematics 214, 196 (2016). 87. Gentner, M. & Rautenbach, D. Some bounds on the zero forcing number of a graph. Discrete Applied Mathematics 236, 203 (2018). 88. Gerke, S., Schlatter, D., Steger, A. & Taraz, A. The random planar graph process. Random Structures Algorithms 32, 236 (2008). 89. Gilbert, E. N. Random graphs. Ann. Math. Statist. 30, 1141 (1959). 90. Godsil, C. & Royle, G. Algebraic Graph Theory (Springer New York, 2001). 91. Graham, R. L., Rothschild, B. L. & Spencer, J. H. Ramsey theory xiv+196 (John Wiley & Sons, Inc., Hoboken, NJ, 2013). 92. Granovetter, M. Threshold Models of Collective Behavior. American Journal of Sociology 83, 1420 (1978). bibliography 145

93. Griffiths, S., Morris, R. & Pontiveros, G. F. The triangle-free process and R (3, k). To appear in Memoirs of the American Mathematical Society (2013). 94. Grytczuk, J. Nonrepetitive colorings of graphs—a survey. Int. J. Math. Math. Sci. Art. ID 74639, 10 (2007). 95. Grytczuk, J. Thue type problems for graphs, points, and numbers. Discrete Math. 308, 4419 (2008). 96. Hahn, G. & Thomassen, C. Path and cycle sub-Ramsey numbers and an edge-colouring conjecture. Discrete Math. 62, 29 (1986). 97. Hales, A. W. & Jewett, R. I. Regularity and positional games. Trans. Amer. Math. Soc. 106, 222 (1963). 98. Hall, H. T., Hogben, L., Martin, R. & Shader, B. Expected values of parameters associated with the minimum rank of a graph. Linear Algebra and its Applications 433, 101 (2010). 99. Harris, D. G. & Srinivasan, A. A constructive algorithm for the Lovász local lemma on permutations in Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms (ACM, New York, 2014), 907. 100. Harvey, N. J. A. & Vondrák, J. in 2015 IEEE 56th Annual Symposium on Foundations of Computer Science—FOCS 2015 1327 (IEEE Computer Soc., Los Alamitos, CA, 2015). 101. Hoffman, A. J. in Selected Papers of Alan J Hoffman 407 (World Scien- tific, 2003). 102. Janson, S. Random regular graphs: asymptotic distributions and con- tiguity. Combin. Probab. Comput. 4, 369 (1995). 103. Janson, S., Łuczak, T. & Rucinski, A. Random graphs (John Wiley & Sons, 2011). 104. Kamˇcev, N., Łuczak, T. & Sudakov, B. Anagram-Free Colourings of Graphs. Combinatorics, Probability and Computing, 1 (2017). 105. Kang, M. & Panagiotou, K. On the connectivity threshold of Achliop- tas processes. J. Comb. 5, 291 (2014). 106. Karo´nski,M. & Ruci´nski,A. in The mathematics of Paul Erd˝os,I 311 (Springer, Berlin, 1997). 107. Keevash, P. The existence of designs. arXiv preprint arXiv:1401.3665 (2014). 146 bibliography

108. Kempe, D., Kleinberg, J. & Tardos, É. Maximizing the spread of influ- ence through a social network in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’03 (ACM Press, 2003), 137. 109. Keränen, V. in Automata, languages and programming (Vienna, 1992) 41 (Springer, Berlin, 1992). 110. Kim, J. H. The Ramsey number R(3, t) has order of magnitude t2/ log t. Random Structures Algorithms 7, 173 (1995). 111. Konig, D. Gráfok és mátrixok. Matematikai és Fizikai Lapok 38, 116 (1931). 112. Kovári, T., Sós, V. T. & Turán, P. On a problem of K. Zarankiewicz. Colloquium Mathematicae 3, 50 (1954). 113. Krivelevich, M. & Sudakov, B. in More sets, graphs and numbers 199 (Springer, Berlin, 2006). 114. Krivelevich, M., Kwan, M., Loh, P.-S. & Sudakov, B. The random k- matching-free process. arXiv preprint arXiv:1708.01054 (2017). 115. Krivelevich, M., Lubetzky, E. & Sudakov, B. Hamiltonicity thresholds in Achlioptas processes. Random Structures Algorithms 37, 1 (2010). 116. Krivelevich, M., Lubetzky, E. & Sudakov, B. Hamiltonicity thresholds in Achlioptas processes. Random Structures Algorithms 37, 1 (2010). 117. Krivelevich, M., Reichman, D. & Samotij, W. Smoothed analysis on connected graphs. SIAM J. Discrete Math. 29, 1654 (2015). 118. Krivelevich, M. & Sudakov, B. in More sets, graphs and numbers (eds Gyori, E., Katona, G. & Lovász, L.) 199 (Springer, 2006). 119. Krivelevich, M., Sudakov, B. & Vilenchik, D. On the random satisfi- able process. Combin. Probab. Comput. 18, 775 (2009). 120. Krivelevich, M. & Yuster, R. The rainbow connection of a graph is (at most) reciprocal to its minimum degree. J. Graph Theory 63, 185 (2010). 121. Leader, I. & Raty, E. A Note on Intervals in the Hales-Jewett Theorem. ArXiv e-prints. arXiv: 1802.03087 [math.CO] (2018). 122. Li, X., Shi, Y. & Sun, Y. Rainbow connections of graphs: a survey. Graphs Combin. 29, 1 (2013). 123. Lo, A. Properly coloured Hamiltonian cycles in edge-coloured com- plete graphs. Combinatorica 36, 471 (2016). bibliography 147

124. Lu, L., Mohr, A. & Székely, L. in Recent advances in harmonic analysis and applications 243 (Springer, New York, 2013). 125. Lu, L. & Székely, L. Using Lovász local lemma in the space of ran- dom injections. Electron. J. Combin. 14, Research Paper 63, 13 (2007). 126. Lubotzky, A. Expander graphs in pure and applied mathematics. Bulletin of the American Mathematical Society 49, 113 (2012). 127. Margulis, G. A. Explicit constructions of concentrators. Problemy Peredachi Informatsii 9, 71 (1973). 128. Molloy, M. & Reed, B. Graph colouring and the probabilistic method xiv+326 (Springer-Verlag, Berlin, 2002). 129. Moser, R. A. & Tardos, G. A constructive proof of the general Lovász local lemma. J. ACM 57, Art. 11, 15 (2010). 130. Novikov, P. S. & Adjan, S. I. Infinite periodic groups. I, II, III. Izv. Akad. Nauk SSSR Ser. Mat. 32, 212 (1968). 131. Perarnau, G. & Serra, O. Rainbow perfect matchings in complete bipartite graphs: existence and counting. Combin. Probab. Comput. 22, 783 (2013). 132. Pinsker, M. S. On the complexity of a concentrator in 7th International Telegraffic Conference 4 (1973), 1. 133. Pleasants, P. A. B. Non-repetitive sequences. Proc. Cambridge Philos. Soc. 68, 267 (1970). 134. Robinson, R. W. & Wormald, N. C. Almost all regular graphs are Hamiltonian. Random Structures Algorithms 5, 363 (1994). 135. Ruci´nski,A. & Wormald, N. C. Random graph processes with degree restrictions. Combin. Probab. Comput. 1, 169 (1992). 136. Ryser, H. J. Neuere probleme der kombinatorik. Vorträge über Kombi- natorik, Oberwolfach, 69 (1967). 137. Serra, O. personal communication. 138. Shearer, J. A property of the colored complete graph. Discrete Math. 25, 175 (1979). 139. Shelah, S. Primitive recursive bounds for van der Waerden numbers. J. Amer. Math. Soc. 1, 683 (1988). 140. Sudakov, B. & Volec, J. Properly colored and rainbow copies of graphs with few cherries. J. Combin. Theory Ser. B 122, 391 (2017). 148 bibliography

141. Szele, T. Kombinatorische Untersuchungen über den gerichteten vollständigen Graphen. Mat. Fiz. Lapok 50, 223 (1943). 142. Taklimi, F. A. Zero forcing sets for graphs PhD thesis (University of Regina, 2013). 143. Thue, A. Über unendliche Zeichenreihen. (na, 1906). 144. Tropp, J. A. Freedman’s inequality for matrix martingales. Electron. Commun. Probab. 16, 262 (2011). 145. Wilson, T. E. & Wood, D. R. Abelian square-free graph colouring. arXiv preprint arXiv:1607.01117 (2016). 146. Wilson, T. E. & Wood, D. R. Anagram-free Graph Colouring. arXiv preprint arXiv:1607.01117 (2016). 147. Wolfovitz, G. Lower bounds for the size of random maximal H-free graphs. Electron. J. Combin. 16, Research Paper 4, 26 (2009). 148. Wormald, N. C. in Surveys in combinatorics, 1999 (Canterbury) 239 (Cambridge Univ. Press, Cambridge, 1999). PUBLICATIONS

Papers in peer-reviewed journals:

1. Kamˇcev, N., Krivelevich, M. & Sudakov, B. Some remarks on rain- bow connectivity. J. Graph Theory 83, 372 (2016). 2. Kamˇcev, N., Łuczak, T. & Sudakov, B. Anagram-Free Colourings of Graphs. Combinatorics, Probability and Computing, 1 (2017). 3. Kamˇcev, N., Sudakov, B. & Volec, J. Bounded colorings of multipar- tite graphs and hypergraphs. European J. Combin. 66, 235 (2017).

Submitted papers:

1. Conlon, D. & Kamcev, N. Intervals in the Hales-Jewett theorem. ArXiv e-prints. arXiv: 1801.08919 [math.CO] (2018). 2. Kalinowski, T., Kamˇcev, N. & Sudakov, B. Zero forcing number of graphs. ArXiv e-prints. arXiv: 1705.10391 [math.CO] (2017). 3. Kamˇcev, N., Krivelevich, M., Morrison, N. & Sudakov, B. The König graph process. In preparation.

149