IMPA

Master’s Thesis

Low-Complexity Decompositions of Combinatorial Objects

Author: Supervisor: Davi Castro-Silva Roberto Imbuzeiro Oliveira

A thesis submitted in fulfillment of the requirements for the degree of Master in Mathematics at

Instituto de Matem´aticaPura e Aplicada

April 15, 2018

iii

Contents

1 Introduction 1 1.1 High-level overview of the framework...... 1 1.2 Remarks on notation and terminology...... 2 1.3 Examples and structure of the thesis...... 3

2 Abstract decomposition theorems7 2.1 Probabilistic setting...... 7 2.2 Structure and pseudorandomness...... 8 2.3 Weak decompositions...... 9 2.3.1 Weak Regularity Lemma...... 10 2.4 Strong decompositions...... 10 2.4.1 Szemer´ediRegularity Lemma...... 11

3 Counting subgraphs and the 15 3.1 Counting subgraphs globally...... 15 3.2 Counting subgraphs locally and the Removal Lemma...... 17 3.3 Application to property testing...... 19

4 Extensions of graph regularity 21 4.1 Regular approximation...... 21 4.2 Relative regularity...... 23

5 regularity 27 5.1 Intuition and definitions...... 27 5.2 Regularity at a single level...... 28 5.3 Regularizing all levels simultaneously...... 30

6 Dealing with sparsity: transference principles 33 6.1 Subsets of pseudorandom sets...... 33 6.2 Upper-regular functions...... 34 6.3 Green-Tao-Ziegler Dense Model Theorem...... 37

7 Transference results for L1 structure 41 7.1 Relationships between cut norm and L1 norm...... 41 7.2 Inheritance of structure lemmas...... 43 7.3 A “coarse” structural correspondence...... 45 7.4 A “fine” structural correspondence...... 46 7.4.1 Proof of Theorem 7.2...... 47

8 Extensions and open problems 51

Bibliography 53

1

Chapter 1

Introduction

Many times in Mathematics and Computer Science we are dealing with a large and general class of objects of a certain kind and we wish to obtain non-trivial results which are valid for all objects belonging to this class. This may be a very hard task if the possible spectrum of behavior for the members of this class is very broad, since it is unlikely that any single argument will hold uniformly along this whole spectrum. Such results may be easy (or easier) to obtain when the class we are dealing with is highly structured, in the sense that one can encode its elements in such a way that the description of each object has a relatively small size; then it may be possible to use this structure to prove results valid uniformly over all objects in this class, or to do a case-by-case analysis to obtain such results. At the other end of the spectrum there are the random objects, which have a very high complexity in the sense that any description of a randomly chosen object must specify the random choices made at each point separately, and thus be very large if the object in consideration is large. However, for such objects there are various “concentration inequalities” which may be used to obtain results valid with high probability over the set of random choices made. Therefore, if we can decompose every object belonging to the general class we are interested in into a “highly structured” component (which has low complexity) and a “pseudorandom” component (which mimics the behavior of random objects in certain key statistics), then we may analyze each of these components separately by different means and so be able to obtain results which are valid for all such objects. An illustrative example of a “structure-pseudorandomness” decomposition of this kind is Szemer´edi’scelebrated Regularity Lemma [31]. This important result roughly asserts that the vertices of any graph G may be partitioned into a bounded number of equal-sized parts, in such a way that for almost all pairs of partition classes the bipartite graph between them is random-like. Both the upper bound we get for the order of this partition and the quality of the pseudorandomness behavior of the edges between these pairs depend only on an accuracy parameter ε we are at liberty to choose. In this example, the object to be decomposed is the edge set E of a given arbitrary graph G = (V,E), which belongs to the “general class” of all graphs. The structured component then represents the pairs (Vi,Vj) of partition classes together with the density of edges between them, and it has low complexity because the order of the partition is uniformly bounded for all graphs. The pseudorandom component represents the actual edges between these pairs, and has a random-like property known as ε-regularity which we will define in the next chapter. This result has many applications in Combinatorics and Computer Science (see, e.g., [20, 21] for a survey), and it has inspired numerous other decomposition results in a similar spirit both inside and outside . In this work we aim to survey many decomposition theorems of this form present in the literature. We provide a unified framework for proving them and present some new results along these same lines.

1.1 High-level overview of the framework

In our setting, the combinatorial objects to be decomposed will be represented as functions defined over a discrete space X. This identification does not give much loss in generality, since given a combinatorial object O (such as a graph, hypergraph or additive group), we may usually 2 Chapter 1. Introduction identify some underlying discrete space X for this kind of object and then represent O as a function fO defined on X. We endow X with a probability measure P, so that the objects considered may be viewed as random variables, and define a family C of “low-complexity” subsets of X. The specifics of both the probability measure P and the structured family C will depend on the application at hand, and it is from them that we will define our notions of complexity and pseudorandomness. The sets belonging to C are seen as the basic structured sets, which have complexity 1, and any subset of X which may be obtained by boolean operations from at most k of these basic structured sets A1, ··· ,Ak ∈ C is said to have complexity at most k according to C. We then say two functions g, h : X → R are ε-indistinguishable according to C if, for all sets A ∈ C, we have that |E [(g − h) 1A]| ≤ ε. Intuitively, this means that we are not able to effectively distinguish between h and g by taking their empirical averages over random elements chosen from one of the basic sets in C. A function f : X → R is then said to be ε-pseudorandom if it is ε-indistinguishable from the constant function 1 on X. Thus pseudorandom functions are in some sense uniformly distributed over structured sets, mimicking random functions of mean 1 defined on X. These concepts are closely related to the notions of pseudorandomness and indistinguishability in the area of Computational Complexity Theory (in the non-uniform setting). In this case, we have a collection F of “efficiently computable” boolean functions f : X → {0, 1} (which are though of as adversaries), and two distributions A and B on X are said to be ε-indistinguishable by F if |P (f(A) = 1) − P (f(B) = 1)| ≤ ε ∀f ∈ F A distribution R is then said to be ε-pseudorandom for F if it is ε-indistinguishable from the uniform distribution UX on X. Intuitively, this means that no adversary from the class F is able to distinguish R from UX with non-negligible advantage. This is completely equivalent to our definitions, if we identify each function f in F with its support f −1(1) in X, and identify the distributions A, B with the functions g(x) := P(A = x)·|X|, h(x) := P(B = x) · |X|. Then   |P (f(A) = 1) − P (f(B) = 1)| = E (g − h) 1f −1(1) , where the expectation on the right-hand side is with respect to the uniform distribution. In our abstract decomposition theorems given in Chapter2, it will be convenient to deal with σ-algebras on X rather than with subsets of X; since a σ-algebra on a finite space X is a finite collection of subsets of X, the intuition will be essentially the same. However, this change will make it simpler to apply tools such as the Cauchy-Schwarz inequality and Pythagoras’ theorem, which will be both very important in our energy-increment arguments. Moreover, we will also require pseudorandom functions to have no correlation in average value to the structured sets, and thus be ε-indistinguishable from the zero function on X. Since the expected value function is linear, this “translation” in our definition makes no important difference. The framework as described here will be retaken in Chapter6, when we talk about transfer- ence principles and the Dense Model Theorem.

1.2 Remarks on notation and terminology

We will be mainly interested in very large objects, and use the usual asymptotic notation O, Ω, and Θ with subscripts indicating parameters the implied constant is allowed to depend on. For instance, Oα,β(X) denotes a quantity bounded in absolute value by Cα,β|X| for some quantity Cα,β depending only on α and β. We write Ea∈A,b∈B to denote the expectation when a in chosen uniformly from the set A and b is chosen uniformly from the set B, both choices being independent. For any real numbers a and b, we write x = a ± b to denote a − b ≤ x ≤ a + b. Given an integer n ≥ 1, we write [n] for the set {1, ··· , n}. If A is a set and k is an integer, A we write k to denote the collection of all k-element subsets of A. We write A4B to denote the symmetric difference (A \ B) ∪ (B \ A) of the sets A and B. 1.3. Examples and structure of the thesis 3

Formally, a graph G is given by a pair (V,E), where V is a finite set called the vertex set and V  E ⊆ 2 , called the edge set, is a subset of the (unordered) pairs of vertices. We will sometimes write uv or vu to denote an edge {u, v} ∈ E, and denote by 1G the edge indicator function 1G(x, y) := 1{xy ∈ E}. For any graph G, we will refer to its vertex set by V (G) and its edge set as E(G); if there is no risk of confusion, we may denote them only by V and E. For subsets A, B ⊆ V (G), we write eG(A, B) to denote the number of edges in G with one vertex in A and other in B, counting twice edges inside A ∩ B. We also write dG(A, B) := eG(A, B)/|A × B| to denote the edge density of the pair (A, B). As customary, we say that H is a subgraph of G, and write H ⊆ G, if V (H) ⊆ V (G) and E(H) ⊆ E(G). Moreover, if V (H) = V (G), then we say that H is a spanning subgraph of G. If  W  W ⊆ V (G), then the subgraph of G induced by W is the graph G[W ] := W, E(G) ∩ 2 .A subgraph H of G is an if H = G[V (H)]. The collection of all partitions of a set A is denoted by P(A). If P0 := (Vi)i∈[k] ∈ P(V ) is a partition of a vertex set V , we say a graph G = (V,E) is P0-partite if there are no edges inside each vertex class in P0, i.e., if eG(Vi,Vi) = 0 for all i ∈ [k].

1.3 Examples and structure of the thesis

In Combinatorics, probably the most important and widely known decomposition result of the kind we discussed is Szemer´edi’sRegularity Lemma (Theorem 2.3 proven in the next chapter), already mentioned in the first part of this chapter. This lemma, in a slightly weaker earlier version, was originally used by Szemer´edito prove that all sets of integers with positive upper- density contain arbitrarily large arithmetic progressions [30], a result now known as Szemer´edi’s Theorem. More recently, it was used by Lov´aszand B. Szeg´edy[22] to construct limit objects for infinite sequences of graphs. The Regularity Lemma was shown to imply the compactness of a metric space on two-variable functions in which finite graphs may be naturally embedded, and thus that every sequence of graphs has a converging subsequence in this space. The authors then showed how the compactness of this metric space may be used to prove a stronger version of the Regularity Lemma, known as the Regular Approximation Lemma, which we will prove in Section 4.1. In Computer Science, Trevisan, Tulsiani and Vadhan [37] proved a general decomposition theorem with the same philosophy as the results presented here, and in a similar framework as the one discussed in Section 1.1 (but with a different notion of complexity more adapted to applications in Computer Science). They used this result to show that every high-entropy distribution is indistinguishable from an efficiently samplable distribution of the same entropy. They also showed how this decomposition theorem may be used to prove an important result in Computational Complexity Theory known as Impagliazzo’s Hardcore Theorem [17]. Still in the realm of Computer Science, an important theme which falls within the scope of our subject matter is that of graph property testing, introduced in the seminal paper of Goldreich, Goldwasser and Ron [13]. Let us now quickly present the main problem in this area and its connection to our philosophy and objectives; we will mention this theme again in Section 3.3 and in the introduction of Chapter4. Given ε > 0, let us say a graph G on n vertices is ε-far from satisfying a graph property P if one needs to add and/or delete at least εn2 edges to G in order to turn it into a graph that satisfies P. An ε-test for P is a randomized algorithm T making a total number of edge queries bounded by a function of ε only, and which can distinguish with probability at least 2/3 between graphs satisfying P and graphs that are ε-far from satisfying P. A graph property P is then said to be testable if, for any given ε > 0, there exists an ε-test for P. The central problem in graph property testing is to determine which properties are testable, and also to devise efficient ε-tests for these properties. To see its relation to our subject of study, suppose we have a decomposition of a graph G into a structured low-complexity component and a pseudorandom component. Intuitively, if we query a large (constant) number of randomly chosen edges from G, then with high probability we will have queried the expected proportion of edges from each of the classes in the structured component, and the effects of the pseudorandom component will be averaged out; thus a property P should be testable if and only if knowing the 4 Chapter 1. Introduction structured component of a graph is enough to tell if P is satisfied (or is close to being satisfied) for that graph. It turns out that this intuition may be formalized, and the testable graph properties were completely characterized in this sense on a great paper by Alon, Fischer, Newman and Shapira [1]. Their characterization roughly says that a graph property P is testable if and only if, for every ε > 0, ε-testing P can be reduced (in a specific property-testing sense) to the property of satisfying one of finitely many Szemer´edi-partitioninstances. This characterization is an illus- trative example of our “low-complexity decomposition” interpretation of the Regularity Lemma, and this interpretation was key in proving the characterization in [1]. Another class of results which is closely related to our subject matter of low-complexity decompositions is that of transference principles, which allow us to transfer some combinatorial theorems from the “dense setting” over to the “sparse setting”, where the objects may be much harder to handle. To account for the vanishing density of the objects in the sparse setting, it is usual to renor- malize the functions representing these objects so that they have average close to 1. This renor- malization causes the functions considered to become unbounded as the size of the universe X grows, which is a major source of difficulties. The transference principles then assert that, if the sparse (unbounded) functions satisfy some mild “uniformity” conditions, then they may be modeled by bounded functions which have similar key properties as those of the original function. Transference principles of this form were essential ingredients in the papers [16, 36], where the authors transfered Szemer´edi’sTheorem, and its generalization to polynomial progressions due to Bergelson and Leibman [4], to dense subsets of a sufficiently pseudorandom subset of the integers. Using this result, they were able to prove that the theorems mentioned above hold also for the set of primes, even though it has zero density inside the integers. In a subsequent paper [35], Tao transfered the Hypergraph Removal Lemma (see [32, 28]) to sub- of pseudorandom sparse hypergraphs, and then used this result to prove that the Gaussian primes contain arbitrarily shaped constellations. In this work we will not focus on giving applications of the decomposition theorems mentioned, but rather concentrate on the abstract mathematical ideas behind these results. These ideas may be viewed as representing a dichotomy between structure and randomness, as brilliantly advocated by Tao [34, 33], and which seems to permeate many areas of Mathematics. We will focus here on the case of (finite) sets, graphs and hypergraphs, with the main interest being the case of graphs. However, our methods are presented in a general context, and may also be used in other settings. In Chapter2 we will present the general framework for establishing our results and the precise definition of complexity and pseudorandomness we will use. We will then show how to apply our decomposition theorems by using them to prove a weaker form of the Regularity Lemma due to Frieze and Kannan [12], and then Szemer´edi’s Regularity Lemma itself. In Chapter3 we show how the regular properties of the partitions given in each form of the Regularity Lemma may be used to approximate the number of copies of any fixed graph H inside a graph G. This approximate counting is then used to prove an important result in Graph Theory known as the Graph Removal Lemma, which roughly says that every graph G on n vertices having o(n|V (H)|) copies of a given graph H can be made H-free by deleting o(n2) edges. We next prove two extensions of the Regularity Lemma for graphs in Chapter4, which were each made to handle a different issue that the original Regularity Lemma left unaddressed. The first extension, called the Regular Approximation Lemma (Theorem 4.1), intuitively asserts that is possible to greatly enhance the regularity of a graph by making very few edge modifications, and so gives us more control for the pseudorandom component in terms of the complexity of the structured component. The second extension (Theorem 4.2) is a relative form of the Regularity Lemma, useful for dealing with arbitrary spanning subgraphs of a known fixed graph, and is especially useful for dealing with very sparse graphs. We will then give in Chapter5 the generalization of the Regularity Lemma to the setting of uniform hypergraphs, which are “higher order” versions of graphs whose edges are now composed of d vertices, for some integer d ≥ 3 called the uniformity of the hypergraph. We remark that this higher number of vertices inside each edge introduces a much more intricate structure than that present in graphs, and the corresponding regularity lemma is accordingly much more involved than Szemer´edi’sRegularity Lemma. 1.3. Examples and structure of the thesis 5

In Chapter6 we will consider transference principles, which were already discussed above, and which permit us to transfer some combinatorial theorems from the usual “positive density” setting to objects having asymptotically negligible density but which satisfy some mild uniformity conditions. We will present three results in this direction, which concern different uniformity conditions we require the sparse objects to satisfy, but which give similar conclusions. Chapter7 will be dedicated to obtaining transference results in the graph setting for L1 struc- ture, which is stronger than the cut structure which is preserved by the transference principles given in Chapter6. These results are in some sense an strengthening of the theorems given in Chapter6 when applied to the setting of graphs, and may be seen as requiring the “transference function” from the sparse space to the dense space to be continuous in L1 norm, so as to pre- serve the underlying L1 geometry. The results presented in this chapter are the main original contributions of this work Chapter8 then mentions some possible extensions to the results shown in Chapter7, indi- cating a path to be taken for future work.

7

Chapter 2

Abstract decomposition theorems

This chapter is aimed at developing a general method to decompose an arbitrary object f of some kind into a sum f = fstr + fpsd, where fstr is a low-complexity structured component and fpsd behaves randomly. As mentioned in the introduction, such a decomposition is useful because we may then use different methods to analyze each one of the components separately, taking advantage of their structure, and making it much easier to analyze the arbitrary original object f. In such a decomposition we must always perform a trade-off, increasing our control on one of the components at the expense of worsening our control on the other. In many situations, it turns out to be useful to allow a third term ferr into the decomposition f = fstr + fpsd + ferr, which can be seen as an error term and may be made sufficiently small for the application at hand. We will see in Subsection 2.4.1 that the presence of the error component is in fact essential if we wish to use this decomposition to prove Szemer´edi’sRegularity Lemma. The method we will use to prove such “decomposition theorems” is a simple energy-increment argument (in this form due to Tao [33]), which will be described in the next sections.

2.1 Probabilistic setting

Let (X, Bmax, P) be a probability space, and for brevity let us call a sub-σ-algebra B of Bmax simply a factor of Bmax. Given measurable sets A1, ··· ,Ak ∈ Bmax, we denote by σ(A1, ··· ,Ak) the smallest factor of Bmax which contains all these sets. Given factors B1, ··· , Bk ⊆ Bmax, we denote by B1∨· · ·∨Bk the join of these factors, which is the smallest factor of Bmax which contains all of them. 2 Given a square-integrable function f ∈ L (Bmax) and a factor B ⊆ Bmax, we define the conditional expectation E[f|B] ∈ L2(B) as the orthogonal projection of f to the closed subspace 2 2 L (B) of L (Bmax) consisting of the B-square integrable functions. A simple application of Pythagoras’ theorem then gives the following lemma:

0 Lemma 2.1 (Pythagoras’ theorem). Let B ⊆ B be two factors of Bmax. Then for any function 2 f ∈ L (Bmax) we have

0 2 2 0 2 kE[f|B ]kL2 = kE[f|B]kL2 + kE[f|B ] − E[f|B]kL2 We remark that, even though the general decomposition theorems in this chapter are stated and proven in full generality, in applications we will only deal with finite probability spaces X equipped with the discrete σ-algebra Bmax = 2 . In this restricted setting, X will be a finite set, every subset A ⊆ X will be measurable and a factor B of Bmax may be identified with the partition of X induced by its atoms; this identification will be made throughout the rest of this work without further comments. If in addition we suppose that the probability P is the uniform probability distribution over X, then for any partition B : X = X1 ∪ · · · ∪ Xk of X into k atoms we have that 1 X E[f|B](x) = f(y) |Xi| y∈Xi whenever x ∈ Xi; thus the conditional expectation is just an averaging of the function over the atoms of B. 8 Chapter 2. Abstract decomposition theorems

As an important example, consider the case of a bipartite graph G = (V1 ∪ V2,E). We then V1×V2 take X = V1 × V2, Bmax = 2 and the uniform probability distribution over (X, Bmax); this corresponds to picking a pair (x1, x2) ∈ V1 × V2 uniformly at random. If we have a partition S B : V1 × V2 = i,j Xi × Yj, then E[1G|B](x, y) = eG(Xi,Yj)/|Xi × Yj| is just the edge density between Xi 3 x and Yj 3 y. For any sets A ⊆ V1,B ⊆ V2 we then have that

1 eG(V1,V2) |E [(1G − E[1G]) 1A×B]| = eG(A, B) − |A × B| , |V1 × V2| |V1 × V2| which may be seen as a kind of discrepancy of the edges over the pair (A, B) and, if made smaller than ε, resembles the usual definition of ε-regularity (which we recall bellow in Remark 2.1). It will therefore be more appropriate in our setting to define ε-regularity in the following less standard (but essentially equivalent) way:

Definition 2.1 (ε-regularity). A bipartite graph G = (V1 ∪ V2,E) is ε-regular for some ε > 0 if, for all sets A ⊆ V1,B ⊆ V2, we have that

eG(V1,V2) eG(A, B) − |A × B| ≤ ε|V1 × V2| (2.1) |V1 × V2|

Similarly, a (non-bipartite) graph G = (V,E) is ε-regular if, for all A, B ⊆ V , we have

2|E| 2 eG(A, B) − |A × B| ≤ ε|V | |V |2

Remark 2.1. The usual definition of ε-regularity requires instead the left-hand side of (2.1) to be smaller than ε|A × B| whenever |A| ≥ ε|V1|, |B| ≥ ε|V2|; this requirement implies ε-regularity in our definition, which in turn implies ε1/3-regularity in the usual definition.

2.2 Structure and pseudorandomness

We will now define the notions of structure and pseudorandomness we will need for the de- composition theorems in the next two sections; these notions were first introduced by Tao in [33]. The basic structured objects will be factors of Bmax belonging to a collection S fixed at the beginning. These factors are supposed to be of low complexity and represent the information we can efficiently obtain about our random variables. It is from them that we define the complexity of other objects. Remark 2.2. As explained in Section 1.1, the idea was to define the complexity of arbitrary sets in our space X by how they relate to some “basic structured sets” belonging to a given family C. While this is indeed the spirit of our definitions, in order to apply the energy-increment method it is better to work with factors of Bmax instead of subsets of X; this is why the basic objects with which we define complexity are factors instead of sets. It may be instructive to think of these basic factors in S as each being generated by a single basic structured set in C.

Definition 2.2 (Complexity). We say that a factor B ⊆ Bmax has S-complexity at most M, and denote this by complexS (B) ≤ M, if it may be written as the join B = Y1 ∨ · · · ∨ Ym of m factors Yi ∈ S for some m ≤ M.

2 Definition 2.3 (Pseudorandomness). Given ε > 0, we say that a function f ∈ L (Bmax) is ε-pseudorandom according to S if kE[f|Y]kL2 ≤ ε holds for all Y ∈ S. Intuitively, a function f : X → R is pseudorandom if it has negligible correlation with all structured factors. This may be seen by using the Cauchy-Schwarz inequality: for every set A ⊆ X which is measurable by some structured factor Y ∈ S, we have

|E[f1A]| = |E [E [f|Y] 1A]| ≤ kE[f|Y]kL2 k1AkL2 ≤ kE[f|Y]kL2 , (2.2) and so |E[f1A]| ≤ ε if f is ε-pseudorandom. 2.3. Weak decompositions 9

2.3 Weak decompositions

To ease the presentation, we will give in this chapter the general theorems for “one-dimensional” functions, i.e., functions whose image is in R. However, this restriction is not of importance in the proofs, and the theorems may then be straightforwardly generalized to functions whose image is in Rk. We will say more about this generalization on a later chapter, where we will need a “multi-dimensional” structure theorem in order to prove the Hypergraph Regularity Lemma. From the definitions presented last section, we get the following energy-increment result:

2 Lemma 2.2 (Lack of pseudorandomness implies energy increment [33]). Let f ∈ L (Bmax), ε > 0 and B ⊆ Bmax be such that f − E[f|B] is not ε-pseudorandom. Then there exists a factor Y ∈ S such that 2 2 2 kE[f|B ∨ Y]kL2 > kE[f|B]kL2 + ε

2 2 Proof. By hypothesis, there exists Y ∈ S such that kE [f − E [f|B] |Y]kL2 > ε . By Pythagoras’ theorem, this implies that 2 2 kE [f − E [f|B] |B ∨ Y]kL2 > ε By Pythagoras’ theorem again, we have

2 2 kE [f − E [f|B] |B ∨ Y]kL2 = kE [f|B ∨ Y] − E [f|B]kL2 2 2 = kE [f|B ∨ Y]kL2 − kE [f|B]kL2 ,

2 2 2 and so kE [f|B ∨ Y]kL2 > kE [f|B]kL2 + ε . To draw a parallel with Graph Theory, note that on the bipartite graph setting if [ B : V1 × V2 = Xi × Yj i,j is a (product) partition of V1 × V2, then  2 2 X X |Xi| |Yj| eG(Xi,Yj) k [f|B]k 2 = E L |V | |V | |X × Y | i j 1 2 i j is the so-called index of the partition B, which is an essential ingredient in the usual proof of the Regularity Lemma; here, it will represent the “energy” we wish to maximize and is at the core of our energy-increment arguments. By a simple iteration of Lemma 2.2, we easily obtain a “weak” decomposition theorem (which, following Tao, we will call the Weak Structure Theorem):

2 Lemma 2.3 (Weak Structure Theorem [33]). Let f ∈ L (Bmax) be such that kfkL2 ≤ 1, let B be a factor of Bmax and let 0 < ε ≤ 1. Then there exists a decomposition f = fstr + fpsd where:

2 • fstr = E[f|B ∨ Z] for some factor Z of S-complexity less than 1/ε

• fpsd is ε-pseudorandom according to S

2 Proof. We will choose factors Y1, Y2, ··· , Ym ∈ S, for some m < 1/ε , using the following algorithm (which relies on Lemma 2.2): – Step 0: Initialize i = 0

– Step 1: Define Z := Y1 ∨ · · · ∨ Yi, fstr := E[f|B ∨ Z] and fi := f − fstr

– Step 2: If fi is ε-pseudorandom, let fpsd := fi and STOP. Otherwise, choose Yi+1 ∈ S such 2 2 2 that kE[f|B ∨ Z ∨ Yi+1]kL2 > kE[f|B ∨ Z]kL2 + ε – Step 3: Increment i to i + 1 and return to Step 1

2 Since the energy kE[f|B ∨ Z]kL2 is bounded between 0 and 1 (by the hypothesis kfkL2 ≤ 1) and increments by more than ε2 at each iteration, the algorithm must terminate in less than 1/ε2 iterations and the lemma follows. 10 Chapter 2. Abstract decomposition theorems

2.3.1 Weak Regularity Lemma As a simple and important application of Lemma 2.3, we will prove Frieze and Kannan’s Weak Regularity Lemma: Theorem 2.1 (Weak Regularity Lemma [12]). For every ε > 0 and every graph G = (V,E), 2/ε2 there exists a partition P : V = V1 ∪ · · · ∪ Vk of V into k ≤ 2 parts satisfying

X eG(Vi,Vj) 2 eG(A, B) − |A ∩ Vi||B ∩ Vj| ≤ ε|V | , ∀A, B ⊆ V (2.3) |Vi × Vj| i,j∈[k]

Remark 2.3. A partition P satisfying (2.3) is called a weak ε-regular partition of V . Proof. This is basically a restatement of the Weak Structure Theorem specialized to the graph setting. The probability space here is given by (V ×V, P, 2V ×V ), with P being the uniform probability distribution over V × V . We define the structured set S := {σ (A × V,V × B): A, B ⊆ V }, which is chosen so that any product set A×B ⊆ V ×V will be measurable in some factor Y ∈ S, and any factor Y ∈ S will have only product sets as its atoms. By the Weak Structure Theorem applied to the edge indicator function 1G and the trivial σ-algebra B = {∅,V × V }, we obtain a factor Z of S-complexity at most 1/ε2 for which

kE [1G − E [1G|Z] |Y]kL2 ≤ ε, ∀Y ∈ S By construction, the factor Z will be a product σ-algebra, and each “coordinate” of Z induces a 2 partition of V into at most 21/ε atoms; we refine these partitions and so obtain a single partition 2/ε2 P : V = V1 ∪ · · · ∪ Vk into k ≤ 2 parts. Since for any sets A, B ⊆ V there exists a structured factor Y ∈ S for which A × B ∈ Y, by Cauchy-Schwarz we obtain

max E [(1G − E [1G|Z])1A×B] = max E [(E [1G − E [1G|Z] |σ(A × V,V × B)]) 1A×B] A,B⊆V A,B⊆V

≤ max kE [1G − E [1G|Z] |Y]k 2 ≤ ε Y∈S L

We then finish the proof by noting that   1 X eG(Vi,Vj) E [(1G − E [1G|Z])1A×B] = 2 eG(A, B) − |A ∩ Vi||B ∩ Vj| |V | |Vi × Vj| i,j∈[k]

2.4 Strong decompositions

The Weak Structure Theorem (Lemma 2.3) already gives an interesting and non-trivial decom- position result, but its applications are limited because the pseudorandomness of the component fpsd is relatively weak compared to the complexity bound we have for the component fstr. As already noted before, the way to increase our control on both of these terms simultaneously is to allow for a small error term ferr on the decomposition. This is done in the following theorem:

2 Theorem 2.2 (Strong Structure Theorem [33]). Let f ∈ L (Bmax) be such that kfkL2 ≤ 1, let 0 < ε ≤ 1 and let F : R+ → R+ be an arbitrary increasing function. Then there exists an integer M = Oε,F (1) and a decomposition f = fstr + fpsd + ferr such that:

• fstr = E[f|B], for some factor B of S-complexity at most M 0 0 • fpsd = f − E[f|B ], where B ⊆ B ⊆ Bmax, is 1/F (M)-pseudorandom 0 • ferr = E[f|B ] − E[f|B] satisfies kferrkL2 ≤ ε Proof. By increasing F if necessary, we may assume that F (x) ≥ x + 1 for all x ∈ R+, and that F is strictly increasing. 2.4. Strong decompositions 11

We will recursively define real numbers M0 < M1 < M2 < ··· and factors B0 ⊆ B1 ⊆ B2 ⊆ · · · ⊆ Bmax in the following way. First, set M0 := 0 and B0 := {∅,X}. Then, for every i ≥ 1, use Lemma 2.3 with ε being 1/F (Mi−1) and B being Bi−1 to obtain a factor Zi of 2 S-complexity at most F (Mi−1) such that f − E[f|Bi−1 ∨ Zi] is 1/F (Mi−1)-pseudorandom; set 2 then Mi := Mi−1 + F (Mi−1) and Bi := Bi−1 ∨ Zi. Note that complexS (Bi) ≤ Mi for all i ≥ 0. By Pythagoras’ theorem and the hypothesis 2 kfkL2 ≤ 1, the energy kE[f|Bi]kL2 is increasing and is bounded between 0 and 1. By the pigeonhole principle, we may then find an index 1 ≤ j ≤ 1/ε2 such that

2 2 2 kE [f|Bj]kL2 − kE [f|Bj−1]kL2 ≤ ε

By Pythagoras’ theorem, this implies that kE [f|Bj] − E [f|Bj−1]kL2 ≤ ε. We may then set fstr := E [f|Bj−1], fpsd := f − E [f|Bj], M := Mj−1 and ferr := E [f|Bj] − E [f|Bj−1] to obtain the claim. We remark that the upper bound we get for M in this theorem is extremely large. Indeed, this bound is obtained by iteratively applying 1/ε2 times the transformation x 7→ x + F (x)2 starting from x = 0. If F is an exponential function (as will be the case in the proof of Szemer´edi’sRegularity Lemma), then we obtain an exponential tower of height Θ(ε−2). Unfortunately, as we will briefly discuss in Remark 2.5 at the end of this chapter, these terrible bounds cannot be improved in general.

2.4.1 Szemer´ediRegularity Lemma Using the Strong Structure Theorem, we may now prove Szemer´edi’scelebrated Regularity Lemma:

Theorem 2.3 (Szemer´ediRegularity Lemma [31]). For every ε > 0 and k0 ≥ 1, there exists an integer K0 such that the following holds. Every graph G = (V,E) with |V | ≥ K0 admits a partition P : V = V0 ∪ V1 ∪ · · · ∪ Vk of its vertex set with the following properties:

• k0 ≤ k ≤ K0

•| V0| < ε|V | and |V1| = |V2| = ··· = |Vk|

2 • all but at most εk of the pairs (Vi,Vj) are ε-regular

Remark 2.4. The set V0 is called the exceptional set, and a partition P satisfying the second and third properties above is called an ε-regular partition of V .

Proof. We will apply the Strong Structure Theorem in the graph setting: X = V × V , Bmax = 2V ×V and uniform probability distribution P; this is the space generated by selecting pairs of vertices (x, y) from V 2 independently and uniformly at random. The set of basic structured objects will be S := {σ(A × V,V × B): A, B ⊆ V }, which represents the information of whether vertex x belongs to a subset A and whether y belongs to a subset B of V . Define the error parameter α and the function F by

ε3/2 8  22x 2 α := ,F (x) := k + (2.4) 8 ε 0 α

Applying the Strong Structure Theorem to the edge indicator function 1G, with ε substituted by α and the function F as defined above, we obtain an integer M = Oα,F (1) and a pair of factors B ⊆ B0 ⊆ 2V ×V such that:

– complexS (B) ≤ M 0 – kE [1E − E[1E|B ]|Y]kL2 ≤ 1/F (M) ∀Y ∈ S 0 – kE[1E|B ] − E[1E|B]kL2 ≤ α 12 Chapter 2. Abstract decomposition theorems

We note that, for any function f and any Y-measurable set Y , we can apply Cauchy-Schwarz to obtain

|E [f1Y ]| = |E [E [f|Y] 1Y ]| ≤ kE [f|Y]kL2 k1Y kL2 ≤ kE [f|Y]kL2 By the definition of the set S, the second property then implies that 1 | [(1 − [1 |B0]) 1 ]| ≤ ∀A, B ⊆ V (2.5) E E E E A×B F (M)

By construction, the factor B is a product σ-algebra and each coordinate of B is generated by at most complexS (B) ≤ M sets. Thus, each coordinate of B induces a partition of V into at most 2M parts, and their common refinement (which we will call P) partitions V into at most 22M atoms. We further refine this partition into a new more “equitable” partition V = V0∪V1∪· · ·∪Vk in such a way that:

– k0 ≤ k = OM,α(1)

– |V0| < α|V | and |V1| = |V2| = ··· = |Vk|

– each non-exceptional set Vi is entirely contained within an atom of P

2M This may be accomplished by setting k := max{k0, d2 /αe}, partitioning V greedily inside the atoms of P, and uniting all remaining vertices into the exceptional set V0; this way we have that |V |  |V | − |V | (1 − α)|V | |V | ≤ − 1 22M < α|V | and |V | = 0 ≥ ∀i ∈ [k] 0 k i k k

As each Vi ×Vj is contained inside an atom of B, we see that E[1G|B] is constant over Vi ×Vj for all i, j ∈ [k]; let di,j denote this value. To show that this pair (Vi,Vj) is ε-regular, that is

eG(Vi,Vj) eG(A, B) − |A × B| ≤ ε|Vi × Vj| ∀A ⊆ Vi,B ⊆ Vj, |Vi × Vj| by the triangle inequality it suffices to show that ε |e (A, B) − d |A × B|| ≤ |V × V | ∀A ⊆ V ,B ⊆ V G i,j 2 i j i j Dividing by |V |2, this is equivalent to

ε|V × V | | [(1 − [1 |B])1 ]| ≤ i j E G E G A×B 2|V |2

2 2 2 Since |Vi × Vj| ≥ |V | /2k , it suffices to show that |E [(1G − E[1G|B])1A×B] | ≤ ε/4k . By our choice of F and inequality (2.5), we have

−2 ε  22M  ε | [(1 − [1 |B0])1 ] | ≤ k + ≤ , E G E G A×B 8 0 α 8k2 so by the triangle inequality it is sufficient to prove that ε | [( [1 |B0] − [1 |B]) 1 ]| ≤ E E G E G A×B 8k2

2 2 2 Because k1A×BkL2 ≤ k1Vi×Vj kL2 ≤ 1/k , by Cauchy-Schwarz it suffices to show that

h i ε2 ( [1 |B0] − [1 |B])2 1 ≤ E E G E G Vi×Vj 64k2

2 This last inequality must be satisfied by all but at most εk pairs (Vi,Vj), otherwise we will have h i ε2 ε3 ( [1 |B0] − [1 |B])2 > εk2 · = = α2, E E G E G 64k2 64 0 contradicting the fact that kE [1G|B ] − E [1G|B]kL2 ≤ α, and finishing the proof. 2.4. Strong decompositions 13

Repeating the same proof but starting with a non-trivial partition of the vertex-set into at most k0 parts, we can make sure the final regular partition of the graph refines a given partition, which is important in some applications of the Regularity Lemma (especially when dealing with partite graphs). Also, by applying this theorem with a somewhat smaller value for ε and then equitably redistributing the vertices of the exceptional set V0 into the other parts, we can easily obtain a (nearly) equitable partition of V without an exceptional class which satisfies the same regularity conditions. We present these simple remarks as a corollary bellow:

Corollary 2.1. For every ε > 0 and k0 ≥ 1, there exists an integer K0 such that the following holds. For every graph G = (V,E) and every equitable partition P0 of V into at most k0 parts, there exists a partition P : V = V1 ∪ V2 ∪ · · · ∪ Vk which refines P0 and satisfies the following properties:

• k0 ≤ k ≤ K0

• ||Vi| − |Vj|| ≤ 1 for all i, j ∈ [k]

2 • all but at most εk of the pairs (Vi,Vj) are ε-regular Observe that our proof of the Szemer´ediRegularity Lemma relied on the decomposition 1G = fstr + fpsd + ferr given by the Strong Structure Theorem (Theorem 2.2). As noted in the beginning of Chapter1, the structured component fstr = E [1G|B] gave us our regular partition of the vertex set and the edge densities di,j between their classes, while the pseudorandom 0 component fpsd = 1G − E [1G|B ] gave us the random-like distribution of the actual edges of G inside pairs of classes in a finer partition given by B0. In here we also have the error term ferr, which essentially gives the difference in edge densities between pairs of the regular partition we constructed and those of the finer partition given by B0, and it is this term which is responsible for the (possible) existence of up to εk2 irregular pairs. It is then a natural question whether the presence of these irregular pairs is truly necessary or may be eliminated. It turns out that these irregular pairs are necessary: for a given n ∈ N, let us define the half graph on 2n vertices as the bipartite graph Hn with vertex classes A = {a1, ··· , an} and B = {b1, ··· , bn}, in which aibj is an edge if and only if i ≤ j. As was noted in [3], the half-graphs give us an infinite family of graphs for which every ε-regular partition of their vertex sets into k parts must contain many irregular pairs (at least ck, for some c > 0 depending only on ε). As the presence of irregular pairs comes from the error component ferr, this justifies our claim made at the beginning of this chapter that the existence of the error component is necessary if we wish to prove the Regularity Lemma. Remark 2.5. As already noted above, because the function F chosen in the proof of the Reg- ularity Lemma has exponential growth, the upper bound K0 we obtain for the number of parts in the regular partition is an exponential tower of height Θ(ε−2). By an ingenious probabilistic construction, Gowers [15] was able to prove a lower bound on K0 which was a tower of exponents of height Θ(ε−1/48). More recently, Fox and Mikl´osLov´asz[10] were able to improve this bound and show that the upper bound given by our proof is in fact tight, as there exist graphs where any ε-regular partition must have a number of parts at least as big as a tower of exponents of height Θ(ε−2).

15

Chapter 3

Counting subgraphs and the Graph Removal Lemma

The regularity lemmas we have seen may be viewed as approximation results stating that every graph G can be approximated by some structure of bounded complexity. This structure is essentially a “rounded” version of G, and may be identified with a weighted graph on the same vertex set as G whose weights are given by the edge densities between the classes of the regular partition. With this interpretation in mind, we will use the following notation:

Definition 3.1 (Rounded graph). If G = (V,E) is a graph and P : V = V1 ∪ · · · ∪ Vk is a 2 partition of V , we denote by GP := E[1G|P ⊗ P] the function which maps each pair (x, y) ∈ V to the edge density dG(Vi,Vj) between the classes Vi 3 x and Vj 3 y. The notion of approximation given by the regularity lemmas is related to indistinguishability by cuts. More precisely, following the discussion in Section 1.1, we say that G and GP are ε-indistinguishable by cuts if |Ex,y∈V [(1G(x, y) − GP (x, y))1A×B(x, y)]| ≤ ε holds for all cuts A × B ⊆ V × V . This suggests us to work with the following norm, which will greatly simplify the presentation and the proofs of some results to follow:

Definition 3.2 (Cut norm). Let V1,V2 be (not necessarily distinct) finite sets. Given any function f : V1 × V2 → R, we define the cut norm of f as

kfk := max |Ex∈V1,y∈V2 [f(x, y)1A×B(x, y)]| (3.1)  A⊆V1,B⊆V2

It is easy to see that the cut norm is indeed a norm. Using this notation, our definition of ε-regularity of a graph G is equivalent to the inequality k1G − [1G]k ≤ ε. Moreover, a E  partition P of V (G) is weak ε-regular for G if and only if k1G − GP k ≤ ε.  One can easily obtain the following equivalent expression for the cut norm, which will prove to be useful below:

kfk = max |Ex∈V1,y∈V2 [f(x, y)a(x)b(y)]| (3.2)  a:V1→[0,1] b:V2→[0,1] Indeed, since the expectation above is bilinear in a and b, the extrema occur when a and b are {0, 1}-valued and so (3.1) and (3.2) are equivalent. The counting lemmas proven in the next sections are standard in Graph Theory, and concern approximately counting copies of a fixed graph H inside a large graph G using only the infor- mation given by the rounded graph GP (we define a copy of H in G as being a subgraph of G which is isomorphic to H).

3.1 Counting subgraphs globally

Instead of counting copies of a subgraph H in G, it will be more convenient (and essentially equivalent when G is large) to count homomorphisms from H to G. Definition 3.3 (Homomorphism). Given two graphs G and H, an homomorphism from H to G is a map ϕ : V (H) → V (G) which preserves adjacency between vertices:

∀x, y ∈ V (H), {x, y} ∈ E(H) ⇒ {ϕ(x), ϕ(y)} ∈ E(G) 16 Chapter 3. Counting subgraphs and the Graph Removal Lemma

We denote the number of homomorphisms from H to G by hom(H,G).

Denoting by n the number of vertices in G and by h the number of vertices in H, we note that hom(H,G) differs from the number of labeled copies of H in G by at most nh−1, which becomes negligible when compared to nh as n grows large; this justifies our claim that counting copies of H in a large graph G is essentially equivalent to counting homomorphisms from H to G. We then have the following lemma, which roughly says we can count copies of H inside G up to an o(nh) additive error by only knowing a weak o(1)-regular partition for G: Lemma 3.1 (Global counting lemma). Let H be a graph with vertex set V (H) = [h] and ε > 0 be a positive number. Then for any graph G = (V,E) with |V | = n and any partition P = (Vi)i∈[k] of V which is weak ε-regular for G, we have

X Y Y h hom(H,G) = dG(Vφ(i),Vφ(j)) |Vφ(i)| ± ε|E(H)|n , φ:[h]→[k] ij∈E(H) i∈[h] where the sum is over all functions from [h] to [k].

Proof. By definition, for any x1 ∈ Vφ(1), x2 ∈ Vφ(2), ··· , xh ∈ Vφ(h) and any i, j ∈ [h], we have GP (xi, xj) = dG(Vφ(i),Vφ(j)). It follows that Y Y Y Y dG(Vφ(i),Vφ(j)) 1xi∈Vφ(i) = GP (xi, xj) 1xi∈Vφ(i) (3.3) ij∈E(H) i∈[h] ij∈E(H) i∈[h]

We note also that Y X Y |Vφ(i)| = 1xi∈Vφ(i) (3.4)

i∈[h] x1,··· ,xh∈V i∈[h] Using equations (3.3) and (3.4), we obtain X Y Y dG(Vφ(i),Vφ(j)) |Vφ(i)| φ:[h]→[k] ij∈E(H) i∈[h]   X Y X Y = dG(Vφ(i),Vφ(j))  1xi∈Vφ(i) 

φ:[h]→[k] ij∈E(H) x1,··· ,xh∈V i∈[h] X X Y Y = GP (xi, xj) 1xi∈Vφ(i)

φ:[h]→[k] x1,··· ,xh∈V ij∈E(H) i∈[h]   X X Y Y =  1xi∈Vφ(i)  GP (xi, xj)

x1,··· ,xh∈V φ:[h]→[k] i∈[h] ij∈E(H) X Y = GP (xi, xj)

x1,··· ,xh∈V ij∈E(H)

It then suffices to prove the inequality  

Y Y Ex1,··· ,xh∈V  1G(xi, xj) − GP (xi, xj) ≤ ε|E(H)|

ij∈E(H) ij∈E(H)

We will prove this by a simple telescoping sum argument, which was given in [5]. Let us arbitrarily label the edges of H by {e1, e2, ··· , e|E(H)|}; we then have the identity

Y Y 1G(xi, xj) − GP (xi, xj) ij∈E(H) ij∈E(H)

|E(H)| t−1 ! |E(H)|  X Y Y = GP (xer ) (1G(xet ) − GP (xet ))  1G(xes ) t=1 r=1 s=t+1 3.2. Counting subgraphs locally and the Removal Lemma 17

Suppose for notational convenience that et = {1, 2}; then for any fixed x3, ··· , xh ∈ V we have

   t−1 ! |E(H)| Y Y Ex1,x2∈V  GP (xer ) (1G(x1, x2) − GP (x1, x2))  1G(xes )

r=1 s=t+1

≤ |Ex1,x2∈V [(1G(x1, x2) − GP (x1, x2)) at(x1)bt(x2)]| , where at and bt are the functions given by Y Y Y Y at(x1) := GP (xer ) 1G(xes ) and bt(x2) := GP (xer ) 1G(xes ). rt rt 1∈er 1∈es 2∈er 2∈es

By hypothesis we have that k1 − G k ≤ ε, so by equation (3.2) the expression on the right G P  is at most ε for all fixed x3, ··· , xh. Applying this same reasoning for all edges {i, j} ∈ E(H) and using the triangle inequality, we obtain  

Y Y Ex1,··· ,xh∈V  1G(xi, xj) − GP (xi, xj)

ij∈E(H) ij∈E(H)    |E(H)| t−1 ! |E(H)| X Y Y ≤ Ex1,··· ,xh∈V  GP (xes ) (1G(xet ) − GP (xet ))  1G(xes )

t=1 s=1 s=t+1 ≤ ε|E(H)|

Using this lemma and a new efficient algorithm for finding a weak regular partition of a graph, Fox, Mikl´osLov´aszand Zhao [11] recently obtained a deterministic algorithm running in time O(ε−OH (1)n2) which, for any given graph G on n vertices, finds the number of copies of H in G up to an error of at most εnh. Note that the bounds obtained by the Weak Regularity Lemma (Theorem 2.1) are exponential in 1/ε, while the running time of the algorithm mentioned is polynomial in 1/ε (for any fixed graph H). This is possible because the weak ε-regular partition obtained in the proof of Theorem 2.1 is actually generated by only 2/ε2 sets, and it is possible to use these generating sets in the algorithm for computing copies of H instead of the partition they induce on V .

3.2 Counting subgraphs locally and the Removal Lemma

The Global Counting Lemma proven last section allows us to count the total number of copies of a given graph H inside a large graph G up to an additive error which is small when compared to |V (G)||V (H)|, the number of maps from V (H) to V (G). However, due to the global nature of this counting, this lemma is unsuitable for the purpose of counting copies of H inside a small subset of V (G), which will be needed in the proof of the Graph Removal Lemma. For this we need a somewhat stronger counting lemma, which counts the copies of H locally instead of globally. The next definition makes this idea of “local counting” more precise.

Definition 3.4 (Canonical homomorphisms). Let H be a graph on [h] and V1, ··· ,Vh be (not necessarily distinct) vertex sets. If G is a graph on V1 ∪ · · · ∪ Vh, then a homomorphism ϕ from H to G is said to be canonical if it maps each i ∈ [h] to a vertex in the corresponding set Vi. We then denote by hom∗(H,G) the number of canonical homomorphisms from H to G. The corresponding counting lemma for canonical homomorphisms is the following: Lemma 3.2 (Local Counting Lemma). Let H be a graph with vertex set V (H) = [h], and let V1, ··· ,Vh be (not necessarily distinct) vertex sets. If G is a graph on V1 ∪ · · · ∪ Vh for which ∗ the pair (Vi,Vj) is ε-regular whenever ij ∈ E(H), then the number hom (H,G) of canonical 18 Chapter 3. Counting subgraphs and the Graph Removal Lemma homomorphisms of H in G satisfies

∗ Y Y Y hom (H,G) − dG(Vi,Vj) |Vi| ≤ ε|E(H)| |Vi|

ij∈E(H) i∈[h] i∈[h] The proof of Lemma 3.2 is very simple and proceeds by the same telescoping sum argument as that used in the proof of Lemma 3.1, but now using the inequality

Exi∈Vi,xj ∈Vj [(1G(xi, xj) − dG(Vi,Vj)) a(xi)b(xj)] ≤ ε, valid for all {i, j} ∈ E(H) and all functions a : Vi → [0, 1], b : Vj → [0, 1]. Note the asymmetry in the conditions required for each of the lemmas given above: while in the Global Counting Lemma we only need to know a weak ε-regular partition for the graph G, in the Local Counting Lemma we require all pairs (Vi,Vj) for which ij ∈ E(H) to be ε-regular, which is a much stronger condition. The reason for this asymmetry is that the Global Counting Lemma only estimates the average ∗ ∗ of homφ(H,G) over all mappings φ :[h] → [k] (where homφ(H,G) denotes the number of homomorphisms where each i ∈ [h] is mapped to a vertex xi ∈ Vφ(i)), while the Local Counting Lemma is equivalent to estimating a single one of these values. Using Lemma 3.2, we may now state and prove the Graph Removal Lemma: Theorem 3.1 (Graph Removal Lemma [9]). For every graph H and every ε > 0, there exist n0 ∈ N and δ > 0 such that the following holds. Any graph G on n ≥ n0 vertices which contains at most δn|V (H)| copies of H may be made H-free by removing at most εn2 edges. Before proving Theorem 3.1, it is important to make an observation. This theorem may be informally though of as saying that, if a given large graph G contains “few” copies of a fixed graph H, then we can destroy all these copies of H by removing “few” edges from G. However, while this is indeed a valid interpretation of the result, it hides the crucial fact that the two occurrences of the word “few” are actually very different in nature. Indeed, if G has n vertices, then “few” copies of H mean a small constant times n|V (H)|, while “few” edges mean a 2 7/2 small constant times n . It is not at all clear why, say, 2n copies of the K4 on four vertices cannot be fitted inside G in such a way to obtain at least n2/1000 edge-disjoint copies of K4.

n ε ε|E(H)| o Proof. Let h be the number of vertices of H, and define γ := min 4 , 2|E(H)| . We will prove |E(H)| −1 −h −1 the theorem for δ := ε (4h!) (2K(γ)) and n0 := δ , where K(γ) is the bound given by the Szemer´ediRegularity Lemma (Theorem 2.3) for error parameter γ and lower bound m = 1. Suppose that G has at most δnh copies of H. Applying the Szemer´ediRegularity Lemma to G with error parameter γ and lower bound m = 1, we obtain a γ-regular partition V = V0 ∪ V1 ∪ · · · ∪ Vk into k + 1 ≤ K(γ) + 1 parts. We now construct a subgraph G0 of G by deleting the following edges:

– Edges incident to a vertex in the exceptional set V0;

– Edges between pairs (Vi,Vj) which are not γ-regular;

– Edges between pairs (Vi,Vj) with edge-density at most ε. The number of edges deleted is then at most

n2 n γn · n + γk2 · + ε ≤ εn2 k 2

If G0 does not contain a copy of H we are done; suppose for contradiction that G0 contains a copy of H, and fix such a copy. For every i ∈ [h], let φ(i) ∈ [k] be the index of the set Vφ(i) which contains the i-th vertex of this copy of H in G0; applying the local counting lemma (Lemma 3.2) on the graph G0 and 3.3. Application to property testing 19

vertex sets Vφ(1), ··· ,Vφ(h), we obtain the existence of at least    h Y Y |E(H)| (1 − γ)n d 0 (V ,V ) − γ|E(H)| |V | ≥ (ε − γ|E(H)|)  G φ(i) φ(j)  φ(i) k ij∈E(H) i∈[h] ε|E(H)|  n h ≥ 2 2K(γ) = 2h!δnh homomorphisms of H in G0, and hence at least 2h!δnh − nh−1 > h!δnh labeled copies of H in G0. This contradicts the fact that G has at most δnh copies of H, completing the proof.

3.3 Application to property testing

As a simple application of Theorem 3.1 we will use it to give an interesting result in the area of property testing, namely that the property of being H-free is testable for any graph H. Let us first briefly recall some definitions given in Section 1.3. Given a graph property P and a constant ε > 0, we say that a graph G on n vertices is ε-far from satisfying P if one needs to add and/or delete at least εn2 edges to G in order to turn it into a graph that satisfies P. An ε-test for P is a randomized algorithm which, by making only Oε(1) edge queries, can distinguish with probability at least 2/3 between graphs satisfying P and graphs that are ε-far from satisfying P. Finally, a graph property P is testable if, for any given ε > 0, there exists an ε-test for P. For a given graph H, we then say that a graph G satisfies the property of being H-free if it has no subgraph which is isomorphic to H. Let us now use Theorem 3.1 to show that the property of being H-free is testable for any fixed graph H. Let δ = δ(ε) be the constant obtained in Theorem 3.1 applied to a given graph H and quantity ε > 0. Our ε-test for H-freeness then proceeds as follows: take k := d2/δe sets of h := |V (H)| vertices each from G, chosen uniformly and independently at random, and declare G to be H-free if and only if none of them contains a copy of H. If G is indeed H-free, it is clear that we won’t obtain a copy of H this way, so the algorithm will give the correct answer with probability 1. If G is ε-far from being H-free, by Theorem 3.1 there exist at least δnh copies of H in G. Since each set of h vertices in G can contain at most h! copies of H, it follows that the probability of finding a copy of H inside a uniformly chosen set of h vertices in G is at least δnh/h! n ≥ δ h The probability that the algorithm errs in this case (that is, fails to find a copy of H) is then at most (1 − δ)k ≤ e−δk ≤ e−2 < 1/3, proving this algorithm is indeed an ε-test. Using a stronger variant of the Regularity Lemma which is very close in spirit to our Strong Structure Theorem (when specialized to the graph setting), Alon, Fischer, Krivelevich and M. Szegedy [2] were able to prove the natural generalization of the Graph Removal Lemma for induced subgraphs (where we are now allowed to add and/or remove up to εn2 edges in order to destroy all induced copies of H in G), and thus that being induced H-free is also testable. They in fact obtained a much stronger result in property testing, as we will briefly discuss in the introductory part of the next chapter.

21

Chapter 4

Extensions of graph regularity

There have been various extensions of Szemer´edi’sRegularity Lemma for graphs (see [27] for some of them), which involve many of the same principles and ideas as the original Regularity Lemma but are each tailored for a specific application. We have already seen Frieze and Kannan’s Weak Regularity Lemma (Theorem 2.1), which has a weaker notion of regularity but provides much better bounds for the size of the regular partitions, and may thus be used for algorithmic purposes (see, for instance, [12] and [11]). In the opposite direction, there is the Strong Regularity Lemma of Alon, Fischer, Krivelevich and M. Szegedy [2] mentioned at the end of the last chapter, which provides partitions with stronger regularity properties at the cost of worsening still the upper bound we get for the order of these partitions. Roughly speaking, given a constant ν > 0 and any decreasing function  : N → (0, 1], this lemma provides two equitable partitions P ⊂ Q of the graph such that P is ν-regular, Q is (|P|)-regular, and P is ν-close to Q in some sense which is related to the edge densities between the classes inside each partition. This lemma was originally obtained in order to prove a far-reaching generalization of the Induced Graph Removal Lemma for a family of colored graphs, and with this result (and some additional ideas) prove that every first-order graph property not containing a quantifier alterna- tion of the form ∀∃ is testable (in the sense described in Section 1.3). It isn’t hard to prove the Strong Regularity Lemma by iterating our Strong Structure Theorem or the Szemer´ediRegularity Lemma (we will essentially do it in our proof of Theorem 4.1), but its statement is somewhat technical and we refrain from giving it in order not to make our presentation too repetitive. We only remark that, even on the tame case when the function (k) is a polynomial in 1/k, the bound we get for the size of the partitions P and Q is a wowzer-type function (one level higher in the Ackermann hierarchy than the exponential tower function) in a power of 1/ν. Moreover, this cannot be improved (see [6]). In this chapter we will focus on two variants of the Regularity Lemma which have a somewhat distinct flavor.

4.1 Regular approximation

As noted in Remark 2.5, the number of parts in an ε-regular partition of an arbitrary graph cannot be guaranteed to be smaller that a tower of exponents of height Θ(ε−2); even a weak −2 ε-regular partition can only be guaranteed to have size at most 2Θ(ε ) [6], which is still a reasonably large function of ε. Despite these results, the next theorem allows us to have an arbitrarily good control on the size of the regular partition in terms of its regularity parameter, as long as we are allowed to modify a small fraction of the edges of the graph. It was first obtained by R¨odland Schacht [26] in a more general form (stated for uniform hypergraphs), and is a byproduct of the hypergraph generalization of the Regularity Lemma, which we will see in the next chapter. Theorem 4.1 (Regular Approximation Lemma [26]). For every ν > 0 and every function  : N → (0, 1] there exist integers n0 and K0 such that the following holds. For every graph G = (V,E) on |V | = n ≥ n0 vertices, there exists an equitable partition V = V1 ∪ V2 ∪ · · · ∪ Vk 0 into k ≤ K0 parts and a graph H = (V,E ) on the same vertex set as G such that: •| E4E0| ≤ νn2

• On the graph H, every pair (Vi,Vj) is (k)-regular 22 Chapter 4. Extensions of graph regularity

Proof. We will apply an energy-increment argument similar to that used in the proof of Theorem 2.2, the main difference being that we now iterate the Szemer´ediRegularity Lemma (in the form which it appears in Corollary 2.1) instead of Lemma 2.3. Start with the trivial partition P0 := {V } and k0 := |P0| = 1. For each i ≥ 0, having Pi and ki already known, apply Corollary 2.1 to the graph G with initial partition Pi and error parameter (ki)/4. We then obtain an equitable partition Pi+1 of size ki+1 = Oki,(ki)(1) which 2 refines Pi and such that all but at most (ki)ki+1/4 pairs of classes in Pi+1 are (ki)/4-regular for G. Since each Pi+1 refines Pi, if we denote by Pi ⊗ Pi the product σ-algebra on V × V generated 2 by the products of classes in Pi, we see by Pythagoras’ theorem that kE [1G | Pi ⊗ Pi]kL2 is increasing and bounded between 0 and 1. By the pigeonhole principle, there must exist a value of j ≤ 9/ν2 such that

ν2 k [1 | P ⊗ P ]k2 ≤ k [1 | P ⊗ P ]k2 + (4.1) E G j+1 j+1 L2 E G j j L2 9

For such a value of j, let us denote P := Pj and Q := Pj+1, and note that k := |Pj| is bounded by a function depending only on (·) and ν. By Pythagoras’ theorem and equation (4.1), we conclude that ν k [1 | P ⊗ P] − [1 | Q ⊗ Q]k ≤ E G E G L2 3 We relabel the classes of the partitions P and Q in such a way that [ P = (Vi)i∈[k], Q = (Vi,r)i∈[k],r∈[m], and Vi = Vi,r for all i ∈ [k]. r∈[m]

Since all refining partitions Pi in our argument are required to be equitable and they have bounded order, this can always be done if |V | = n is sufficiently large. Now, for every pair of classes (Vi,r,Vj,s) with i, j ∈ [k], r, s ∈ [m], we add or delete edges randomly to change the (expected) density of this pair to dG(Vi,Vj). Note that the expected number of changed edges in each pair (Vi,r,Vj,s) is

|dG (Vi,Vj) − dG (Vi,r,Vj,s)| · |Vi,r||Vj,s| (4.2)

Because for every (x, y) ∈ Vi,r × Vj,s we have that E [1G|Q ⊗ Q](x, y) = dG(Vi,r,Vj,s) and E [1G|P ⊗ P](x, y) = dG(Vi,Vj), it follows from (4.2) that the expected total number of edges changed in G is X X |dG (Vi,Vj) −dG (Vi,r,Vj,s) | · |Vi,r||Vj,s| i,j∈[k] r,s∈[m] 2 = n Ex,y∈V [ |E [1G|P ⊗ P](x, y) − E [1G|Q ⊗ Q](x, y)| ] 2 = n kE [1G|P ⊗ P] − E [1G|Q ⊗ Q]kL1 νn2 ≤ 3 By concentration inequalities, this value will be less than νn2/2 with high probability. Moreover, it follows from the Chernoff bound (see the proof of Lemma 7.2 in Chapter7) that, after these changes, with high probability all pairs (Vi,r,Vj,s) which were (k)/4-regular in G will be (k)/2-regular and have density dG(Vi,Vj) ± (k)/4. 2 2 By definition, at most ((k)/4)k m pairs (Vi,r,Vj,s) are not (k)/4-regular in G. For each of them, we substitute the graph G[Vi,r,Vj,s] by a on Vi,r × Vj,s with expected density dG(Vi,Vj). Then with high probability we change a further at most

(k)k2m2 2n2 νn2 2 ≤ 4 k2m2 2 edges, and all of these pairs (Vi,r,Vj,s) will also be (k)/2-regular and have density dG(Vi,Vj) ± (k)/4 on the modified graph. 4.2. Relative regularity 23

Call by H the graph obtained after all these modifications, and note that

|E(G)4E(H)| ≤ νn2

Consider now a pair of indices i, j ∈ [k]. For all r, s ∈ [m], we know that (Vi,r,Vj,s) is (k)/2- regular in H and has density dH (Vi,r,Vj,s) = dG(Vi,Vj) ± (k)/4. Thus, for any sets A ⊆ Vi, B ⊆ Vj we have

X  (k)  e (A, B) = d (V ,V )|A ∩ V ||B ∩ V | ± |V ||V | H H i,r j,s i,r j,s 2 i,r j,s r,s∈[m] X  (k) (k)  = d (V ,V ) ± |A ∩ V ||B ∩ V | ± |V ||V | G i j 4 i,r j,s 2 i,r j,s r,s∈[m] 3(k) = d (V ,V )|A||B| ± |V ||V | G i j 4 i j = dH (Vi,Vj)|A||B| ± (k)|Vi||Vj|, showing that (Vi,Vj) is (k)-regular in H and completing the proof.

4.2 Relative regularity

In some applications, we are dealing with spanning subgraphs of a fixed graph G (which may be easier for us to analyze) and we wish to obtain some regularity result for these graphs relative to the host graph G. One important example of this is when we are dealing with subgraphs of the random graph G(n, p), with p = p(n) tending to zero as n grows, and we wish to obtain results valid with high probability for all spanning subgraphs of G(n, p) (see [8] for many such results). A closely related situation is that of subgraphs of sparse pseudorandom graphs (see [7]). In this case, instead of having a random model for graphs and obtaining results valid with high probability over the random choices made, we have a fixed (very sparse) graph G which exhibits random-like behavior in the distribution of its edges, and we wish to use this behavior to extend results from the usual “dense” setting of graphs to all spanning subgraphs of G (this philosophy will be retaken in Chapters6 and7). In this section we will try to obtain the most general conditions possible that the host graph G must satisfy which allow us to use our framework to prove “relative regularity” results for its spanning subgraphs. This is done so that we may use the properties we know the graph G satisfies in order to obtain similar properties satisfied by its subgraphs. The precise notion of relative regularity we will use is defined below:

Definition 4.1 ((ε, H, G)-regularity). Let G be a graph on V and H be a spanning subgraph of G. We say a pair (U, W ) of subsets of V is (ε, H, G)-regular if

eH (A, B) eH (U, W ) − ≤ ε ∀A ⊆ U, B ⊆ W : |A × B| ≥ ε|U × W |, eG(A, B) eG(U, W ) where we define eH (A, B)/eG(A, B) := 0 when eG(A, B) = 0. In our analysis we will only consider the case where G is bipartite, but it is easy to extend this analysis to the non-partite case by using the same arguments that we used in the proofs of Theorems 2.1 and 2.3. Let then G = (V1 ∪V2,E(G)) be a bipartite graph, and we wish to see under which conditions we are able to obtain regularity-like results for a subgraph H ⊆ G relative to G. Let PG be the probability distribution on V1 × V2 given by PG(x, y) := 1G(x, y)/|E(G)|, and denote the 2 corresponding L norm by k · kL2(G). Using the Strong Structure Theorem to 1H in this space with a small error parameter α, increasing function F , and with structured set

S := {σ(A × V2,V1 × B): A ⊆ V1,B ⊆ V2}, 24 Chapter 4. Extensions of graph regularity

V1×V2 we obtain a factor B ⊂ 2 of S-complexity at most M = OF,α(1) and a decomposition 1H = fstr + fpsd + ferr, where fstr = EG [1H |B], fpsd is 1/F (M)-pseudorandom and kferrkL2(G) ≤ α. The factor B is formed by the join of at most M factors of the form σ(Ai × V2,V1 × Bi), with Ai ⊆ V1 and Bi ⊆ V2. If V1 = U1 ∪ U2 ∪ · · · ∪ UK is the partition of V1 induced by the sets Ai and V2 = W1 ∪ W2 ∪ · · · ∪ WL is the partition of V2 induced by the sets Bi, then we know their M orders K,L are at most 2 and every atom of B is of the form Ur × Ws. We may refine these partitions in order to obtain equitable partitions

V1 = V1,0 ∪ V1,1 ∪ · · · ∪ V1,k,V2 = V2,0 ∪ V2,1 ∪ · · · ∪ V2,k

M into k := d2 /αe sets of equal size plus the exceptional sets V1,0,V2,0 satisfying |V1,0| < α|V1|, |V2,0| < α|V2|. We then wish to show that

eH (A, B) eH (V1,i,V2,j) − ≤ ε eG(A, B) eG(V1,i,V2,j) whenever A ⊆ V1,i,B ⊆ V2,j satisfy |A × B| ≥ ε|V1,i × V2,j|, for some i, j ∈ [k]. Because each V1,i × V2,j is contained within a single atom Ur × Ws of B (if i, j ∈ [k]) and

(1 − α)2|V × V | |V × V | |V × V | ≥ 1 2 ≥ 1 2 , 1,i 2,j k2 2k2 by the triangle inequality it suffices to show that

eH (A, B) eH (Ur,Ws) ε − ≤ eG(A, B) eG(Ur,Ws) 2

2 holds whenever A × B ⊆ Ur × Ws ∈ B satisfy |A × B| ≥ ε|V1 × V2|/2k . Now we note that, for every (x, y) ∈ Ur × Ws, we have

EG[1H 1Ur ×Ws ] eH (Ur,Ws) EG[1H |B](x, y) = = PG(Ur × Ws) eG(Ur,Ws)

This implies that, whenever A × B ⊆ Ur × Ws, we have   eH (A, B) eH (Ur,Ws) eG(A, B) EG[(1H − fstr)1A×B] = − eG(A, B) eG(Ur,Ws) |E(G)|

As 1H − fstr = fpsd + ferr, by the triangle inequality it then suffices to show that

ε e (A, B) ε e (A, B) | [f 1 ]| ≤ G and | [f 1 ]| ≤ G (4.3) EG psd A×B 4 |E(G)| EG err A×B 4 |E(G)|

ε|V1×V2| whenever |A × B| ≥ 2k2 and A × B is contained inside a single product set V1,i × V2,j. For the first inequality, we note that 1 | [f 1 ]| ≤ k [f |σ(A × V ,V × B)]k ≤ (4.4) EG psd A×B EG psd 2 1 L2(G) F (M)

For the second inequality, we apply Cauchy-Schwarz and obtain

2 2 2  2  eG(A, B) | [f 1 ]| ≤ kf 1 k 2 k1 k 2 = f 1 , EG err A×B err V1,i×V2,j L (G) A×B L (G) EG err V1,i×V2,j |E(G)| so the second inequality of (4.3) would be satisfied if we could ascertain that

2 2 ε eG(A, B) EG[ferr1V1,i×V2,j ] ≤ min (4.5) 16 A×B⊂V1×V2 |E(G)| 2 |A×B|≥ε|V1×V2|/2k 4.2. Relative regularity 25

Suppose then we have a lower-bound γ > 0 on the relative density of product sets A × B of 2 size greater than ε|V1 × V2|/2k , i.e.

dG(A, B) eG(A, B)/|A × B| ε|V1 × V2| = ≥ γ, ∀A ⊆ V1,B ⊆ V2 : |A × B| ≥ 2 dG(V1,V2) eG(V1,V2)/|V1 × V2| 2k

2p γ 2 Then, if we take α = ε 32 , equation (4.5) must be satisfied by all but at most εk pairs (V1,i,V2,j), otherwise we would have

2 4 2 2 2 2 ε γ|A × B| ε 2 α ≥ kferrkL2(G) = EG[ferr] > εk · ≥ γ = α 16 |V1 × V2| 32

2 This proves that the second inequality in (4.3) holds for all but at most εk of the pairs (V1,i,V2,j). Likewise, if we take the function F (x) := ε−2α−2γ−122x+5, we have that

2 2 2 1 ε γ  α  ε γ ε dG(A, B) |A × B| ε eG(A, B) = M ≤ 2 ≤ = F (M) 8 2 · 2 8k 4 dG(V1,V2) |V1 × V2| 4 |E(G)|

2 holds whenever |A × B| ≥ ε|V1 × V2|/2k . Together with equation (4.4), this proves the first inequality in (4.3). It is a simple exercise to extend this argument to the multi-partite case by repeating it for each pair (with a smaller error parameter) and refining the partitions obtained. We then obtain the following theorem of relative regularity.

Theorem 4.2 (Relative Regularity Lemma). For every ε, γ > 0 and k0, ` ≥ 1, there exist constants η > 0 and K0 ≥ k0 such that the following holds. Let G = (V,E(G)) be a P0-partite graph on n vertices, where P0 : V = V1 ∪ · · · ∪ V` is an equitable `-partition of V , and suppose that

dG(A, B) ≥ γ ∀A ⊆ Vi,B ⊆ Vj : |A × B| ≥ η|Vi × Vj| (4.6) dG(Vi,Vj) is valid for every i, j ∈ [`]. Then every spanning subgraph H ⊆ G admits an equitable partition P into k parts refining P0 which satisfies:

• k0 ≤ k ≤ K0 • All but at most εk2 pairs of parts in P are (ε, H, G)-regular The condition (4.6) given in the statement of the theorem amounts to saying that the graph G contains no reasonably large sets of vertices having density much smaller than its expected density. It may be intuitively though of as having “no sparse spots” (apart from those given by the partition P0). One of the simplest classes of graphs satisfying this condition is the class of η-uniform graphs, which are graphs that satisfy a natural kind of pseudorandomness condition which takes into consideration the graph’s edge density. Below we give its definition in the more general situation of partite graphs, as given in [19]:

Definition 4.2 ((P0, η)-uniform graphs). Let a partition P0 = (Vi)i∈[`] of V be fixed. We write (A, B) ≺ P0 if either ` = 1 or A ⊂ Vi, B ⊂ Vj for some i 6= j in [`]. Given a constant η > 0, we 2 then say a graph G = (V,E) of density p := 2|E|/|V | is (P0, η)-uniform if

eG(A, B) 2 − p ≤ ηp ∀(A, B) ≺ P0 : |A × B| ≥ η|V | |A × B|

If P0 is the trivial partition of V into a single part, we say simply that G is η-uniform. The Sparse Regularity Lemma of Kohayakawa and R¨odlthen follows as an immediate corol- lary of Theorem 4.2:

Corollary 4.1 (Sparse Regularity Lemma I [18, 19]). For every ε > 0 and k0 ≥ 1, there exist constants η > 0 and K0 ≥ k0 such that the following holds. 26 Chapter 4. Extensions of graph regularity

Suppose G is a (P0, η)-uniform graph, where P0 is an equitable partition of V (G) into at most k0 parts. Then every spanning subgraph H ⊆ G admits an equitable partition P into k parts refining P0 which satisfies:

• k0 ≤ k ≤ K0 • All but at most εk2 pairs of parts in P are (ε, H, G)-regular

Another version of the Sparse Regularity Lemma considered in these same papers deals with upper-regular graphs, a condition which may be roughly described as having “no dense spots” and is in some sense dual to the condition given by equation (4.6). We will show how to derive this version in Section 6.2. 27

Chapter 5

Hypergraph regularity

In this chapter we will extend the Regularity Lemma to uniform hypergraphs. This result, called the Hypergraph Regularity Lemma, was first obtained by Nagle, R¨odl,Schacht and Skokan [23, 29, 28] and, independently, by Gowers [14]. The version we will present here is due to Tao [32], who proved it in order to obtain his result that the Gaussian primes contain arbitrarily shaped constellations [35], which is a version of the Green-Tao theorem for the Gaussian primes. The proof presented here is closely related to the one given by Tao, but adapted to our setting and using the methods already developed at earlier sections.

5.1 Intuition and definitions

Definition 5.1. Given an integer d ≥ 2, a d-uniform hypergraph is a pair H = (V, E) where V V  is a vertex set and E ⊆ d is a collection of unordered d-tuples of vertices (which we will call edges). The Hypergraph Regularity Lemma may be seen as a “higher-order” version of Szemer´edi’s Regularity Lemma for graphs (Theorem 2.3); while the graph version seeks to regularize the set of edges of a graph (which is of “second order”, as a subset of the pairs of vertices) by partitioning its vertex set (which is then of “first order”), the hypergraph regularity lemma seeks to regularize the d-th order set of edges of a d-uniform hypergraph by (d − 1)-th order sets of (d − 1)-tuples of vertices, and then regularize these new sets by (d − 2)-th order sets of (d − 2)-tuples of vertices and so on, until we end up in a partition of its vertex set. This way, we get a sequence

 V    V  V  P = E, \E , P ⊂ P , ··· , P ⊂ P , P ⊂ P(V ) d d d−1 d − 1 2 2 1 of partitions at each order such that the j-th order partition Pj is well approximated in a certain sense by the (j − 1)-th order partition Pj−1, for all 2 ≤ j ≤ d. Remark 5.1. It might seem strange that we need so many partitions, and at all different “or- ders”. It is possible to obtain a “regularity lemma” only partitioning the set of vertices and not higher-order tuples, but the regular properties of the partitions obtained are not sufficiently strong to imply some important applications such as a hypergraph counting lemma. This is related to a similar problem when trying to construct limit objects for hypergraphs using the Hypergraph Regularity Lemma (see [38]), which requires us to consider a (2d − 2)-dimensional object for limits of d-uniform hypergraphs To obtain such partitions, we will make use of a “multidimensional” generalization of the Strong Structure Theorem (SST, Theorem 2.2) given in Section 2.4, and whose proof is essentially identical to that of the original SST. For this, we will need to define a kind of multidimensional conditional expectation: (1) (k) N 2 Definition 5.2. Let f := (f , ··· , f ) ∈ i∈[k] L (Bmax) be a k-tuple of square-integrable N (i) k N real functions. Given a factor B := i∈[k] B of Bmax := i∈[k] Bmax, define E[f|B] ∈ N 2 (i) i∈[k] L (B ) by

 (1) (1) (k) (k)  E[f|B](x) := E[f |B ](x), ··· , E[f |B ](x) , ∀x ∈ X 28 Chapter 5. Hypergraph regularity

We also define the norm v u k uX (i) 2 kfkL2∗k := t kf kL2 i=1 With these definitions, we obtain: Theorem 5.1 (Multidimensional Strong Structure Theorem). Let S be a collection of “struc- k (1) (k) N 2 2 tured factors” of Bmax. Suppose f := (f , ··· , f ) ∈ i∈[k] L (Bmax) satisfies kfkL2∗k ≤ k, let 0 < ε ≤ 1 and m ≥ 1 be constants, and let F : R+ → R+ be an arbitrary increasing function. 0 k Then there exists an integer M = Oε,F,m,k(1) satisfying M ≥ m, factors B ⊆ B ⊆ Bmax, and a decomposition f = fstr + fpsd + ferr such that:

• fstr = E[f|B], with complexS (B) ≤ M 0 • fpsd = f − E[f|B ] is 1/F (M)-pseudorandom 0 • ferr = E[f|B ] − E[f|B] satisfies kferrkL2∗k ≤ ε The main difference between this multidimensional version of SST and simple repeated appli- cations of the original SST is that in Theorem 5.1 the individual factors B(1), ··· , B(k) may be made correlated to each other, depending on the structure of the set S. This correlation cannot be obtained simply by applying k times the original SST, and it will be crucial for establishing the Hypergraph Regularity Lemma. Proof. We repeat the proof of the Strong Structure Theorem (Theorem 2.2), but using the new N conventions. Set M0 := m and B0 := i∈[k] {∅,X}. For each i ≥ 1, use (a multidimensional version of) the Weak Structure Theorem (Lemma 2.3) with ε being 1/F (Mi−1) and B being 2 Bi−1. We obtain a factor Zi of complexity at most kF (Mi−1) relative to S, and such that f − E[f|Bi−1 ∨ Zi] is 1/F (Mi−1)-pseudorandom; set then Bi := Bi−1 ∨ Zi and Mi := Mi−1 + 2 kF (Mi−1) . 2 2 Because kfkL2∗k ≤ k, by the pigeonhole principle there exists j ≤ k/ε such that

2 2 2 2 kE[f|Bj+1] − E[f|Bj ]kL2∗k = kE[f|Bj+1]kL2∗k − kE[f|Bj ]kL2∗k ≤ ε

0 We may then take B := Bj , B := Bj+1 and M := Mj. We will now fix some notation for the hypergraph setting before stating the Hypergraph Regularity Lemma. It will be convenient to restrict ourselves to the case of partite hypergraphs. Let then H = ((Vj)j∈[`], E) be an `-partite d-uniform hypergraph whose vertex set is indexed by [`]; this means that each hyperedge will have exactly d vertices, no two of which belonging to the same class Vj. Q For any subset f ⊆ [`], we define Vf = (Vj)j∈f := j∈f Vj and let πf : V[`] → Vf be the canonical projection map onto the coordinates in f. We then define on V[`] the σ-algebra Af := −1 {πf (E): E ⊆ Vf}, which is the collection of all subsets of V[`] which depend only on the elements whose index belongs to the set f. As an example, consider a 3-partite 3-uniform hypergraph H = ((V1,V2,V3), E). Then E ⊆ V1 × V2 × V3, V{1,3} = V1 × V3, V{1} = V1,

π{1,3}(x1, x2, x3) = (x1, x3) ∈ V{1,3} ∀(x1, x2, x3) ∈ V1 × V2 × V3

V1×V2×V3 and A{1} = {E1 ×V2 ×V3 : E1 ⊆ V1}, A{1,2} = {E12 ×V3 : E12 ⊆ V1 ×V2}, A{1,2,3} = 2 . Given a factor B ⊆ A[`], define the complexity of B (written complex(B)) as the smallest number of sets needed to generate B as a σ-algebra. Given a set e ⊆ [`], define the skeleton ∂e of e as the collection {f ⊂ e : |f| = |e| − 1}.

5.2 Regularity at a single level

Let us now give a high-level overview of our Hypergraph Regularity Lemma to be proven in the next section. 5.2. Regularity at a single level 29

[`] Consider a subset g ∈ d of d indices in [`], and let Eg := E ∩ Vg be the edges of the hypergraph H with one vertex in each Vj, j ∈ g. We then start with the factor

−1 −1 Bg := {∅, πg (Eg), πg (Vg \Eg),V[`]} ⊂ Ag generated by the edges Eg (which may be an arbitrary subset of Vg), and try to capture some of g  its structure by lower-order σ-algebras Be ⊂ Ae, for e ∈ d−1 . We then try to model these σ- g  algebras Be by still lower-order approximations Bf ⊂ Af, for f ∈ d−2 , and repeat this procedure of regularizing higher-order σ-algebras by lower-order ones until we end up in a partition of each vertex class Vj, j ∈ g, which is represented by a factor B{j} of A{j}. The constructed σ-algebras Be should capture most of the structure of the hyperedges Eg present on the |e|-th order level which is measurable by Ae, so that the actual hyperedges between the atoms of Be behave randomly, while still maintaining bounded complexity. [`] This must be performed for all set of indices g ∈ d , and the lower-order σ-algebras Be we construct must depend only on the set of indices e but not on the sets g ⊃ e we started with. This is the reason we need the dependencies between the factors B(i) given by Theorem 5.1, which is not guaranteed by only repeatedly applying the Strong Structure Theorem. Our first step is then to obtain a lemma to regularize each “level” of our construction sepa- rately. As before, we will need to construct two σ-algebras (the coarse and fine approximations) to obtain an optimal result. Lemma 5.1 (Regularity at the j-th Level [32]). Let m ≥ 1 and ` ≥ d ≥ 2 be integers, and let H = ((Vi)i∈[`], E) be an `-partite d-uniform hypergraph on V = V1 ∪ · · · ∪ V`. [`] Let 2 ≤ j ≤ d be an integer and, for each e ∈ j , let Be ⊆ Ae be a σ-algebra satisfying complex(Be) ≤ m. Let ε > 0 be a positive number and F : R+ → R+ be an arbitrary increasing function. [`]  Then there exists M = Oε,F,m,`(1) satisfying the bound M ≥ F (m) and, for each f ∈ j−1 , 0 there exists a pair of σ-algebras Bf ⊆ Bf ⊆ Af (the coarse and fine approximations) such that [`] the following holds. Every j-th level measurable set Ee ∈ Be, e ∈ j , admits a lower-order (Ee) (Ee) (Ee) decomposition 1Ee = fstr + fpsd + ferr where: h i (Ee) W [`]  • fstr := E 1Ee | f∈∂e Bf , with complex(Bf) ≤ M ∀f ∈ j−1 h i (Ee) W 0 • fpsd := 1Ee − E 1Ee | f∈∂e Bf satisfies  

(Ee) Y 1 sup E fpsd 1Ef  ≤ E ∈A ∀f∈∂e F (M) f f f∈∂e

(Ee) h 0 i h i (Ee) W W 2 • ferr := E 1Ee | f∈∂e Bf − E 1Ee | f∈∂e Bf satisfies kferr kL ≤ ε This lemma is basically a restatement of the Multidimensional Strong Structure Theorem (Theorem 5.1) applied to the partite hypergraph setting. Here Bmax = A[`] is the discrete σ- Q algebra on V[`] = i∈[`] Vi, we have a function 1Ee for each set Ee measurable with respect to one of the original σ-algebras Be we are given at the beginning, and we wish to decompose all of them at once by using bounded-complexity σ-algebras Bf one level lower than the Be. This follows essentially by enumerating all measurable sets Ee ∈ Be and taking the collection S to be a suitable “multidimensional” version of    _ [`]  S = σ(E ): e ∈ ,E ∈ A ∀f ∈ ∂e , (5.1) j f j f f f∈∂e  which represents the information in Be which is measurable with respect to a lower-order σ- algebra Bf, f ∈ ∂e. P Proof. We enumerate the “original” measurable sets Ee by an index from 1 to k := [`] |Be|, e∈( j ) [`] so that for every e ∈ j , each set Ee ∈ Be is mapped to an index ı(Ee) ∈ [k]. We denote also by 30 Chapter 5. Hypergraph regularity

[`] [`] e :[k] → j the membership function which associates each index i ∈ [k] to the set e(i) ∈ j from whose σ-algebra Be(i) came the set Ee indexed by i. With this enumeration, we join all these measurable sets’ indicator functions into a single  [`]  (i) k-tuple f = 1Ee : e ∈ j ,Ee ∈ Be , so that the i-th coordinate f is the indicator function of the set indexed by i. Note that k is bounded, since

X X 2complex(Be) ` 2m k = |Be| ≤ 2 ≤ 2 2 [`] [`] e∈( j ) e∈( j )

The version of the set Sj (equation (5.1)) which is adapted to our setting is

 k    O _ [`]  S := σ(E ) : g ∈ ,E ∈ A ∀f ∈ ∂g j  f  j f f  i=1 f∈∂e(i)∩∂g 

W [`] It represents taking a factor Yg = f∈∂g σ(Ef) ∈ Sj ∩ Ag for some g ∈ j and pulling it back to a factor of the whole space of f in the natural way. Any factor B = Y1 ∨ · · · ∨ Yr with N W complexity r relative to Sj may then be written as B = i∈[k] f∈∂e(i) Bf, for some factors [`]  Bf ⊆ Af generated by at most r sets Ef ∈ Af, ∀f ∈ j−1 . The Multidimensional Strong Structure Theorem (Theorem 5.1) applied to f and Sj then [`] T permits us to conclude the proof, since for every e ∈ j and Ef ∈ Af, ∀f ∈ ∂e, the set f∈∂e Ef is measurable by a factor Ye ∈ Sj, and so   h i 1 ı(Ee) Y ı(Ee) E fpsd 1Ef  ≤ E fpsd | Ye ≤ sup kE [fpsd | Y]kL2∗k ≤ . L2 Y∈S F (M) f∈∂e j

Intuitively, this lemma provides “coarse” low-order σ-algebras {Bf}f∈∂e and “fine” low-order 0 σ-algebras {Bf}f∈∂e which approximate the higher-order σ-algebras Be given at the beginning. The coarse σ-algebras have bounded complexity, the fine approximation is close to the coarse approximation in L2 norm, and the higher-order σ-algebras are very pseudorandom with respect to the fine lower-order σ-algebras.

5.3 Regularizing all levels simultaneously

The full regularity lemma now follows from the previous lemma by recursion, made to regularize all different levels at once: Theorem 5.2 (Hypergraph Regularity Lemma [32]). Let ` ≥ d ≥ 2 be integers and let H = ((Vi)i∈[`], E) be an `-partite d-uniform hypergraph on V = V1 ∪ · · · ∪ V`. + + [`] Let F : R → R be an arbitrary increasing function and, for all e ∈ d , define Ee := E ∩ Ve −1 −1 as the set of edges of H in Ve and Be := {∅, πe (Ee), πe (Ve \Ee),V[`]} as the factor of Ae −1 generated by πe (Ee). Define Md := 1. Then there exist numbers M1,M2, ··· ,Md−1 satisfying

F (1) ≤ Md−1 ≤ F (Md−1) ≤ Md−2 ≤ · · · ≤ F (M2) ≤ M1 = OF,`(1)

[`] 0 and, for each 1 ≤ j < d and each f ∈ j , there exists a pair of σ-algebras Bf ⊆ Bf ⊆ Af such that: [`] • complex(B ) ≤ M for all 1 ≤ j ≤ d, f ∈ f j j    

_ 0 _ 1 • E 1Ee | Bf − E 1Ee | Bf ≤ F (Mj) f∈∂e f∈∂e L2 [`] for all 2 ≤ j ≤ d, e ∈ ,E ∈ B j e e 5.3. Regularizing all levels simultaneously 31

   

_ 0 Y 1 • sup E 1Ee − E 1Ee | Bf 1Ef  ≤ E ∈A ∀f∈∂e F (M1) f f f∈∂e f∈∂e [`] for all 2 ≤ j ≤ d, e ∈ ,E ∈ B j e e

Proof. We proceed by recursion on the level j, from j = d to j = 2. [`] We first use Regularity at the j-th Level (Lemma 5.1) with j = d, σ-algebras Be for e ∈ d , m = 1, ε = 1/F (1) and the function F in the lemma being substituted by a function Fd−1 we will choose at the end. We then obtain a number Md−1 satisfying the bounds Fd−1(1) ≤ Md−1 = [`]  0 O`,F (1),Fd−1 (1) and, for each f ∈ d−1 , we obtain a pair of σ-algebras Bf ⊂ Bf ⊂ Af such that:

 [`]  − complex(B ) ≤ M ∀f ∈ f d−1 d − 1       _ 0 _ 1 1 [`] − E 1Ee | Bf − E 1Ee | Bf ≤ = ∀e ∈ ,Ee ∈ Be F (1) F (Md) d f∈∂e f∈∂e L2       _ 0 Y 1 [`] − sup E 1Ee − E 1Ee | Bf 1Ef  ≤ ∀e ∈ ,Ee ∈ Be E ∈A ∀f∈∂e Fd−1(Md−1) d f f f∈∂e f∈∂e

0 [`] Supposing we have already constructed the σ-algebras Be ⊂ Be ⊂ Ae for all e ∈ j , with 2 ≤ j < d, together with the number Mj, we now use Regularity at the j-th Level for these σ-algebras (Be) [`] with m = Mj, ε = 1/F (Mj) and F being Fj−1 (for some function Fj−1 e∈( j ) we will choose at the end). We then obtain a number Mj−1 satisfying Fj−1(Mj) ≤ Mj−1 = [`]  0 OMj ,`,F (Mj ),Fj−1 (1) and, for each f ∈ j−1 , a pair of σ-algebras Bf ⊂ Bf ⊂ Af such that:

 [`]  − complex(B ) ≤ M ∀f ∈ f j−1 j − 1       _ 0 _ 1 [`] − E 1Ee | Bf − E 1Ee | Bf ≤ ∀e ∈ ,Ee ∈ Be F (Mj) j f∈∂e f∈∂e L2       _ 0 Y 1 [`] − sup E 1Ee − E 1Ee | Bf 1Ef  ≤ ∀e ∈ ,Ee ∈ Be E ∈A ∀f∈∂e Fj−1(Mj−1) j f f f∈∂e f∈∂e

Now it suffices to choose the functions F1,F2, ··· ,Fd−1 in such a way that Fi(Mi) ≥ F (M1) for all i ∈ [d − 1] (and that M1 remains bounded by OF,`(1)). We then choose F1 = F and, for each j from 2 to d − 1, choose Fj in a way that Fj(Mj) ≥ Fj−1(Mj−1); this is possible because Fj−1 was already chosen and Mj−1 depends only on `, Mj,F and Fj−1, so it suffices to choose Fj sufficiently large depending on `, F and Fj−1 (and so ultimately only on ` and F ). This way we have that Fi(Mi) ≥ F (M1) for all i ∈ [d − 1], and

M1 = O`,F,F1,F2,··· ,Fd−1 (1) = O`,F (1).

This theorem may be used to give a Hypergraph Counting Lemma and a Hypergraph Removal Lemma which generalize those proven for graphs in Chapter3 to arbitrary uniform hypergraphs. We will not do so here, but instead refer the interested reader to [32] or [28].

33

Chapter 6

Dealing with sparsity: transference principles

This chapter is devoted to proving results which allow us to transfer some combinatorial theorems from the usual “positive density” setting to the very sparse setting, when we are dealing with objects which have asymptotically negligible density as the size of our universe increases. The moral of these results is that, if a sparse object S satisfies some mild regularity conditions, then it may be modeled by a dense object D which is in some sense indistinguishable from S. For the rest of this chapter, fix a finite set X to be our universe and a probability distribution P on X. While P may be arbitrary, it is instructive to think of it as being the uniform probability distribution over the elements of X, and this is what we will assume in our informal discussions. We will use the ideas and definitions discussed in Section 1.1, which we briefly recall below. Let C ⊆ 2X be a collection of subsets of X, which we will think of as being the basic structured subsets of X and are supposed to be of low complexity. We say that two functions g, h : X → R are ε-indistinguishable according to C if for all A ∈ C 0 we have that |E [(g − h) 1A]| ≤ ε. A set A ⊆ X is said to have complexity at most K with respect to C if it may be expressed as a boolean combination of at most K sets of C; we denote 0 this by complexC(A ) ≤ K.

6.1 Subsets of pseudorandom sets

The aim of this section is to show that every subset D of a (possibly very sparse) pseudorandom set R ⊂ X may be modeled by a set M ⊂ X whose density inside X is the same as the density of D in R. We will actually prove a slightly more general result, regarding arbitrary positive functions g : X → R+ (which we will henceforth call measures) instead of sets. This result concerns when it is possible to approximate a measure g by a bounded function h : X → [0, 1] in such a way that g is indistinguishable from h according to the collection C. To see the relation of this problem to that described in the first paragraph, let us represent the pseudorandom set R by its normalized indicator function gR := 1R · |X|/|R|, and represent any subset D ⊂ R by gD := 1D · |X|/|R|. This normalization is made to account for the possibly very small density of R in X, and for us to have the expectation of gD equal to the relative density of D in R. If we can approximate the measure gD by a bounded function h : X → [0, 1], then by sampling independently each element x ∈ X with probability h(x), we can construct a set M which (with high probability) will have the same density in X as that of D in R, and also have the same proportion of elements intersecting each of the sets in C. The main difference of this problem to that considered before in Chapter2 is that the measure g we wish to model is no longer bounded, which introduces some additional complications. What our next theorem will show is that, if instead g is bounded by a sufficiently pseudorandom measure ν : X → R+ (which may itself be unbounded), then there exists a function h : X → [0, 1] which is indistinguishable from g according to C. The notion of pseudorandomness we will need here is that of being indistinguishable from the constant function 1 by a somewhat larger collection than C, comprising all sets of a given complexity K with respect to C: 34 Chapter 6. Dealing with sparsity: transference principles

Definition 6.1. A measure ν : X → R+ is (η, K)-pseudorandom according to C if

|E [(ν − 1)1A]| ≤ η ∀A ⊆ X : complexC(A) ≤ K With this notion of pseudorandomness, we obtain the following theorem: Theorem 6.1 (Pseudorandom Transference Principle [24]). Let ε ∈ (0, 1] be an error parameter +  2 and suppose ν : X → R is (ε, 1/ε )-pseudorandom according to C. Then any measure g : X → R+ satisfying 0 ≤ g ≤ ν admits a “dense model” h : X → [0, 1] which is 3ε-indistinguishable from g:

|E [(g − h)1A]| ≤ 3ε, ∀A ∈ C

Proof. Define the probability distribution Pν on X by Pν (A) := E[ν1A]/E[ν] for all A ⊆ X, and let Eν be the associated expected value function. Define the bounded function ( g(x)/ν(x) if ν(x) > 0 f(x) = 0 if ν(x) = 0 and apply the Weak Structure Theorem (Lemma 2.3) to f, with probability distribution Pν and structured set S = {σ(A): A ∈ C}. We then obtain a factor B ⊆ 2X of complexity less than 1/ε2 with respect to C and such that the function h := Eν [f|B] satisfies

|Eν [(f − h)1A]| ≤ kEν [f − h | σ(A)]kL2(ν) ≤ ε, ∀A ∈ C

Since 0 ≤ g ≤ ν, it is clear that h satisfies 0 ≤ h ≤ 1; let us now prove that |E [(g − h)1A]| ≤ 3ε for all A ∈ C. Fixed any A ∈ C, by the triangle inequality we have

|E [(g − h)1A]| = |E [(νf − h)1A]| ≤ |E [ν(f − h)1A]| + |E [(ν − 1)h1A]| The first term on the right-hand side is easily seen to be at most 2ε, since

|E [ν(f − h)1A]| = |Eν [(f − h)1A]| · E[ν] ≤ ε(1 + ε) ≤ 2ε

 2 For the second term, note that B∨σ(A) has complexity at most 1/ε and h1A is measurable with respect to B ∨ σ(A). Because 0 ≤ h ≤ 1 and the function y 7→ E[(ν − 1)y1A0 ] is linear for any fixed set A0 ⊆ X, there exists a set A0 ∈ B ∨ σ(A) such that

|E[(ν − 1)h1A]| ≤ |E[(ν − 1)1A0 ]|

By the pseudorandomness hypothesis of ν we obtain that |E[(ν −1)1A0 ]| ≤ ε, thus completing the proof. This theorem is closely related to the transference principle used by Green, Tao and Ziegler [16, 36] to transfer Additive Combinatorics results from dense subsets of the integers to dense subsets of sparse pseudorandom sets of integers. We will give their result, called here the Dense Model Theorem, in Section 6.3. We remark that Theorem 6.1 applied to the case X = V × V and C = {A × B : A, B ⊆ V } allows us to transfer the Szemer´ediRegularity Lemma (Theorem 2.3) to the “subgraph of an η-uniform graph” setting and with this reprove Corollary 4.1; see the details in [24].

6.2 Upper-regular functions

In some situations, the sparse set or unbounded measure we are interested in analyzing is not majorized by some fixed pseudorandom measure. In such cases, it is also possible to obtain a similar transference result as that given in the last section if the object in question satisfies some mild uniformity condition which we call upper-regularity: 6.2. Upper-regular functions 35

Definition 6.2 (Upper regularity). Given constants η > 0 and D,K ≥ 1, we say that a function f : X → R+ is (η, D, K)-upper regular with respect to C if

E [f1A] ≤ DP(A)kfkL1 ∀A ⊆ X : complexC(A) ≤ K, P(A) ≥ η Intuitively this definition says that, while the function f may sometimes take values much higher than its average, inside reasonably large sets of bounded complexity these values are to some extent averaged out. As a more “analytical” example of upper regular functions, note that whenever we have kfkLp ≤ CkfkL1 for some value p > 1 and some constant C ≥ 1, by H¨older’sinequality we obtain that 1−1/p E[f1A] ≤ kfkLp k1AkLq ≤ CkfkL1 P(A) −1/p The value on the right-hand side is at most Cη P(A)kfkL1 if P(A) ≥ η, showing that in this case f is (η, Cη−1/p,K)-upper regular for any η > 0, K ≥ 1 and any class of distinguishers C ⊆ 2X . The definition of upper regularity may be seen as a weak “L1-Cauchy-Schwarz” inequality, and it will allow us to apply (with some care) the energy-increment method for a function f of bounded L1 norm even when we have no bounds for the L2 norm of f. The following lemma uses this idea in order to obtain an analogue of the Weak Structure Theorem (Lemma 2.3) for upper-regular functions:

4 2 Lemma 6.1. Given ε > 0 and D ≥ 1, define K := 9D4/ε2 and η := (ε/3D)9D /ε . Then for every (η, D, K)-upper regular function f : X → R+ there exists a factor B ⊆ 2X such that: 0 0 • complexC(A ) ≤ K, ∀A ∈ B

• P(A0) ≥ η, ∀A0 ∈ B \ {∅}

•| E[(f − E[f|B])1A]| ≤ εkfkL1 , ∀A ∈ C

Proof. We will recursively choose sets A1,A2, ··· ,Am ⊂ X, for some 1 ≤ m ≤ K, in the following way. First, define Z0 := {∅,X} and set i = 1. Given Zi−1, we will construct a collection Ci which approximates C in the following sense:

0 0 0 1. For all A ∈ C there exists A ∈ Ci such that P(A4A ) ≤ α and A ∈ Zi−1 ∨ σ(A) 0 0 2. P(A ∩ A)/P(A) ∈ [α, 1 − α] ∪ {0, 1} for all A ∈ Ci and every atom A of Zi−1

To do this we decompose Zi−1 into atoms A1 ∪ A2 ∪ · · · ∪ AMi−1 and construct, for each set 0 SMi−1 0 A ∈ C, the “approximating set” A := j=1 Aj where   ∅, if P(A ∩ Aj) < αP(Aj) 0  Aj := A ∩ Aj, if P(A ∩ Aj)/P(Aj) ∈ [α, 1 − α]   Aj, if P(A ∩ Aj) > (1 − α)P(Aj)

This way, for every j ∈ [Mi−1] we have that

0 0 P((A ∩ Aj)4Aj) ≤ αP(Aj) ⇒ P(A4A ) ≤ α 0 0 Aj ∈ Zi−1 ∨ σ(A) ⇒ A ∈ Zi−1 ∨ σ(A),

0 and condition 2 is satisfied by construction. The set Ci is then formed by all these sets A . 0 0 If kE [f − E[f|Zi−1] | σ(A )]kL2 ≤ αkfkL1 for all A ∈ Ci, then set B = Zi−1. Otherwise, choose (any) Ai ∈ Ci such that

2 2 2 2 kE [f|Zi−1 ∨ σ(Ai)]kL2 ≥ kE[f|Zi−1]kL2 + α kfkL1

(the existence of such a set is guaranteed by Lemma 2.2). Define Zi := Zi−1 ∨ σ(Ai) and increment i to i + 1. 36 Chapter 6. Dealing with sparsity: transference principles

We see that, for all i ≤ K, any non-empty set B ∈ Zi will have complexity at most i and probability at least αi ≥ αK = η, so

|E[f1A]| kE[f|Zi]kL∞ = max ≤ DkfkL1 A atom of Zi P(A)

2 2 2 2 2 This way, since kE[f|Zi]kL2 is bounded by D kfkL1 and increases by at least α kfkL1 at each step, the algorithm must terminate at a time m ≤ D2/α2. At the end, by Cauchy-Schwarz and our stopping condition, we must have

0 0 ∀A ∈ Cm+1, |E [(f − E[f|B]) 1A0 ]| ≤ kE [f − E[f|B] | σ(A )]kL2 ≤ αkfkL1 Given any A ∈ C, we then have     |E [(f − E[f|B]) 1A]| ≤ |E [(f − E[f|B]) 1A0 ]| + E (f − E[f|B]) 1A\A0 + E (f − E[f|B]) 1A0\A 0 ≤ αkfkL1 + DkfkL1 (P(A4A ) + η) ≤ 3DαkfkL1

4 2 It then suffices to take α = ε/3D and K = 9D4/ε2, η = (ε/3D)9D /ε .

By taking the “dense model” function h = E [f|B] obtained in Lemma 6.1, we immediately obtain the corresponding transference principle for upper-regular functions: Theorem 6.2 (Upper-regular Transference Principle). For every ε > 0 and D ≥ 1, there exist O(1) K = (D/ε)O(1) and η = 2−(D/ε) such that the following holds. + If f : X → R is an (η, D, K)-upper regular function with respect to C satisfying kfkL1 ≤ 1, then f admits a 1/D-dense model h : X → [0,D] which is ε-indistinguishable from f:

|E [(f − h)1A]| ≤ ε, ∀A ∈ C This notion of upper-regularity given for functions may be naturally specialized to the case of graphs. Given constants 0 < η ≤ 1 and D ≥ 1, we say a graph G = (V,E) is (η, D)-upper regular 2 2 if dG(A, B) ≤ Dp for all subsets A, B ⊆ V satisfying |A||B| ≥ η|V | , where p := 2|E|/|V | is the edge density of G. Intuitively, this condition means that there are no reasonably large sets A, B ⊆ V which have a density much higher than the average density of the graph. It is easy to see that every subgraph G with relative density 1/D inside an η-uniform graph Γ is (η, (1+η)D)-upper regular. We will next use Theorem 6.2 to prove a partial converse to this observation:

−(D/ε)O(1) Lemma 6.2. For every ε > 0 and D ≥ 1, there exist η = 2 and n0 ∈ N such that the following holds. For every (η, D)-upper regular graph G on n ≥ n0 vertices and with density 1/n  p  1, there exists an ε-uniform graph Γ ⊇ G on the same vertex set such that G is 1/D-dense on Γ. Proof. Let us denote by V the vertex set of G. Apply the Upper-regular Transference Principle −1 (Theorem 6.2) to the normalized edge indicator function p 1G on the space V ×V , with uniform probability distribution and collection of distinguishers C = {A×B : A, B ⊆ V }. We then obtain a function h : V ×V → [0,D] which satisfies the inequality kp−11 −hk ≤ ε, and by specializing G  the proof of Theorem 6.2 we may easily require h to be symmetric (that is, h(x, y) = h(y, x) for all x, y ∈ V ). Define the function fΓ := 1G + p(1 − 1G)(D − h), and note that 0 ≤ fΓ ≤ 1. Let Γ be a random graph on V with P(xy ∈ Γ) = fΓ(x, y) for all pairs {x, y} of vertices in V , all choices being independent. With probability 1 the graph Γ will contain G as a subgraph, and if p  1/n then standard concentration inequalities imply that k1 − f k ≤ εp with high probability (see the details in Γ Γ  the proof of Lemma 7.2 next chapter). Then

kp−11 − Dk ≤ p−1k1 − f k + kp−1f − Dk Γ  Γ Γ  Γ  ≤ ε + kp−11 − hk + k1 (D − h)k G  G  ≤ ε + ε + Dp, 6.3. Green-Tao-Ziegler Dense Model Theorem 37 which is less than 3ε if p ≤ ε/D. With positive probability, the value of k1ΓkL1 will be at most its expectation kfΓkL1 ≤ Dp; thus there exists a graph Γ ⊇ G of density at most Dp and which is (3ε)1/2-uniform. This lemma together with Corollary 4.1 allows us to immediately prove a sparse regularity lemma for upper-regular graphs, provided the density of the upper-regular graph satisfies 1/n  p  1. Iterating Lemma 6.1 instead of Lemma 2.3 in the proof of the Strong Structure Theorem (Theorem 2.2) and then repeating our proof of the Szemer´ediRegularity Lemma with minimal changes, it is easy to obtain the full sparse regularity lemma we give below without these further conditions. We note that this version of sparse regularity was also proven first by Kohayakawa and R¨odl[18, 19].

Corollary 6.1 (Sparse Regularity Lemma II [18, 19]). For every ε > 0, k0 ≥ 1 and D ≥ 1 there exist constants η > 0 and K0 ≥ k0 such that the following holds. Every (η, D)-upper regular graph G = (V,E) admits an equitable partition P = (Vi)i∈[k] into k parts such that:

• k0 ≤ k ≤ K0

• ||Vi| − |Vj|| ≤ 1 for all i, j ∈ [k]

2 2 • all but but at most εk pairs (Vi,Vj) are εp regular, where p := 2|E|/|V |

6.3 Green-Tao-Ziegler Dense Model Theorem

One of the key ingredients in Green and Tao’s proof that the primes contain arbitrarily long arithmetic progressions [16] was a relative version of Szemer´edi’sTheorem. Szemer´edi’sTheorem [30] states that any subset of the integers with positive upper density contains arbitrarily long arithmetic progressions. The relative version of this theorem proven by Green and Tao gives the same conclusion when the ground set is no longer the integers, but instead an arbitrary pseudorandom subset (or more generally some pseudorandom measure). The proof of this relative version is split in two parts, the Dense Model Theorem and the Counting Lemma. The Dense Model Theorem will be given below, and asserts that any relatively dense set A of a sufficiently pseudorandom subset of N may be modeled by a dense subset A˜ of N (this is the same idea of Theorem 6.1 given in Section 6.1, but uses a slightly different measure of pseudorandomness). The Counting Lemma then says that the number of arithmetic progressions in A is close (when properly normalized) to the number of arithmetic progressions in A˜. Szemer´edi’sTheorem applied to the set A˜ then permits us to conclude A has arbitrarily long arithmetic progressions. The Dense Model Theorem was made more explicit in a subsequent paper of Tao and Ziegler [36], where they used similar methods to obtain the stronger result that the primes contain arbi- trarily long polynomial progressions. The pseudorandomness condition they use in this theorem is equivalent to the one given below, taken from [25]: Definition 6.3. If F is a collection of bounded functions f : X → [−1, 1], we denote by F k the 0 Qk 0 + collection of all functions of the form i=1 fi, where fi ∈ F and k ≤ k. We say ν : X → R is η-pseudorandom according to F k if

0 0 k |E [(ν − 1)f ]| ≤ η ∀f ∈ F In this context, the transference principle they proved says roughly the following: if ν is η(ε)-pseudorandom according to F K(ε), then any measure g satisfying 0 ≤ g ≤ ν admits a dense model h : X → [0, 1] which is ε-indistinguishable from g according to F (where η(ε),K(ε) depend only on ε). The full statement of their theorem will be given below (Theorem 6.3), but before stating it formally we will discuss how to obtain such a result using our setting and previous theorems. By using the transformation f 7→ (1 + f)/2, we may assume without loss of generality that all functions f ∈ F have image in [0, 1]; this assumption will slightly simplify the exposition. If all functions in the family F were boolean functions, then Theorem 6.1 would easily permit us to conclude. Indeed, in this case we could associate each boolean function to its support 38 Chapter 6. Dealing with sparsity: transference principles and take for C the family of the supports of all f ∈ F; the product of k functions in F would then simply be the intersection of their supports. Since for k sets A1, ··· ,Ak ∈ C the atoms of T σ(A1, ··· ,Ak) are exactly the (non-empty) intersections i≤k Bi, where each Bi is either Ai or C 1/ε2 2 Ai , and since there are at most 2 atoms in a factor generated by k = 1/ε sets, it follows 2 2 that if ν is ε2−1/ε -pseudorandom according to F 1/ε , then ν is also (ε, 1/ε2)-pseudorandom according to C. Theorem 6.1 immediately allows us to conclude (substituting ε by ε/3). If the functions in F are not boolean, we construct a family C of sets having the same distinguishing power as F and try to apply Theorem 6.1 to the family C. To construct C, let us first bound the complexity of the functions in F by approximating each f ∈ F by a step-function f˜ with steps of size ε/2; more precisely, given f ∈ F we define the function

f(x) ε f˜(x) := , ε/2 2 and note that 0 ≤ f(x) − f˜(x) < ε/2 for all x ∈ X. By the bounds we have on g and the searched dense model h, we obtain ε ε 0 ≤ [g(f − f˜)] < , 0 ≤ [h(f − f˜)] < , E 2 E 2 ˜ ε from which we get |E[(g − h)f] − E[(g − h)f]| < 2 . We now write the approximating function f˜ as the sum of a bounded number of boolean functions: b2/εc X ε f˜(x) = 1 2 f(x)≥jε/2 j=1 Then, if |E[(g − h)f]| > ε, we have that

b2/εc ε ε X < | [(g − h)f˜]| ≤ | [(g − h)1 ]|, 2 E 2 E f(x)≥jε/2 j=1 so there must be some 1 ≤ j ≤ b2/εc such that |E[(g − h)1f(x)≥jε/2]| > ε/2. This way, whenever the family of functions F ε-distinguishes g from h, the family of sets

CF,ε := {{x ∈ X : f(x) ≥ jε/2} : f ∈ F, 1 ≤ j ≤ b2/εc} will ε/2-distinguish g from h. By Theorem 6.1 we just need to make sure |E[(ν − 1)1A0 ]| ≤ ε/6 0 2 for all sets A ⊆ X of complexity at most 36/ε relative to CF,ε to conclude this cannot happen. 0 2 Suppose then there exists a set A ⊆ X of complexity k ≤ 36/ε relative to CF,ε for which |E[(ν − 1)1A0 ]| > ε/6. By definition, there exist

A1 = {x ∈ X : f1(x) ≥ j1ε/2} , ··· ,Ak = {x ∈ X : fk(x) ≥ jkε/2} ∈ CF,ε and a set Ω ⊆ {−1, +1}k such that we may write A0 as a disjoint union of atoms

k 0 [ \ ωi A = Ai , ω∈Ω i=1

+1 −1 where we define Ai := Ai and Ai := X \ Ai. This implies there exists ω ∈ Ω for which

" k # Y ε −1/εO(1) ω E (ν − 1) 1A i > = 2 i 6 · 2k i=1 By a “quantitative” version of the Weierstrass polynomial approximation theorem, we may approximate any threshold function 1x≥t inside [0, 1] by a polynomial p : [0, 1] → [0, 1] whose degree and height depend only on the desired accuracy. To be more precise, given an accuracy parameter α > 0, the notion of approximation we require is for the polynomial p to have distance 6.3. Green-Tao-Ziegler Dense Model Theorem 39

∞ at most α to 1x≥t in L norm inside the smaller set [0, 1] \ [t − α, t], where we take away a small interval around t to account for the discontinuity of the function 1x≥t at that point.

Approximating each indicator function 1 ωi (x) := 1{f (x)≥j ε/2}ωi in this way by a polynomial Ai i i pi(fi(x)) with sufficient accuracy, we obtain

" k # O(1) Y −1/ε E (ν − 1) pi(fi) > 2 i=1

Expanding the product into a sum of multinomials in f1, ··· , fk and using the fact that the degree and height of each polynomial is bounded as a function of ε, we see that one of the terms n1 n2 nk |E [(ν − 1)f1 f2 ··· fk ]| must be grater than cε > 0 depending only on ε. 2 If we then take K = K(ε) as 36/ε times the largest possible degree of such a polynomial pi, 0 K 0 we will have obtained a function f ∈ F which satisfies |E[(ν − 1)f ]| > cε. It then suffices to require the pseudorandomness hypothesis

0 0 K |E[(ν − 1)f ]| ≤ cε ∀f ∈ F to guarantee that every function 0 ≤ g ≤ ν has a dense model h : X → [0, 1] which is ε- indistinguishable from g by the family of functions F. With some more care (see [24] or [25]), we may obtain more optimal bounds as in the theorem below:

Theorem 6.3 (Green-Tao-Ziegler Dense Model Theorem [16, 36]). For every ε > 0, there exist O(1) k = 1/εO(1) and η = 2−1/ε such that the following holds. Suppose that F is a finite collection of bounded functions f : X → [−1, 1], ν : X → R+ is η-pseudorandom according to F k, and g : X → R+ is a function such that g ≤ ν. Then there exists a bounded function h : X → [0, 1] such that

|E [(g − h)f]| ≤ ε ∀f ∈ F We remark that in the papers [16, 36] cited above the result appeared without an explicit bound on the constants involved (as they were only interested in asymptotic results), and with very different notation. The theorem stated here appeared in [24, 25] and, apart from the differences noted above, is equivalent to Theorem 7.1 in [36].

41

Chapter 7

Transference results for L1 structure

The transference principles shown in the last chapter provide a means of transferring some combinatorial results from the dense setting (where they are easier to prove) to the sparse setting, provided the sparse objects satisfy some mild regularity conditions. Consider for instance the Pseudorandom Transference Principle (Theorem 6.1) specialized to the case of graphs. The sparse setting (which we will call “sparse space”) may be identified with the space of all subgraphs of a given sparse Γ, while the “dense space” is identified with the set of all graphs on the same vertex set V as Γ. This result is then a kind of correspondence between these two spaces, roughly saying that every subgraph G of Γ admits a model graph f(G) on the same vertex set which is dense and indistinguishable from G by cuts (when G is properly normalized). The aim of this chapter is to show that this dense model function f may be made “continuous” in L1 norm, so that sparse graphs whose edge sets are close to each other will have dense models whose edge sets are close to each other, and also to obtain the same result for a “sparse model” function g from the dense space to the sparse space. This will allow us to pass from one space of graphs to the other while preserving the underlying L1 geometry, which may be important in some applications.

7.1 Relationships between cut norm and L1 norm

The notion of approximation given by the transference principles (when applied to the graph setting) is most naturally expressed in terms of the cut norm, and it will be crucial for our objectives to understand the relationship between the cut norm and the L1 norm for functions defined on V × V . We recall from Chapter3 that, for any f : V × V → R, the cut norm of f is defined as

kfk := max |Ex,y∈V [f(x, y)1A×B(x, y)]|,  A,B⊆V and that by linearity this definition is equivalent to

kfk = max |Ex,y∈V [f(x, y)u(x)v(y)]|  u,v:V →[0,1]

The L1 norm of f is here (and for the rest of this chapter) defined as

kfkL1 := Ex,y∈V [|f(x, y)|]

It is easy to see that kfk ≤ kfk 1 , with equality holding if f is either non-negative or  L non-positive. However, there is no inequality in the other direction which holds generally for all functions, as can be seen by taking a random uniform assignment of 1 or −1 for each pair (x, y) ∈ V 2. This random function will clearly have unitary L1 norm, but with high probability its cut norm will be O(|V |−1/2). The main reason why such an assignment f has unbounded ratio kfk 1 /kfk is that there L  is no correlation between the values taken by f and the cuts A × B used to give the cut norm. 42 Chapter 7. Transference results for L1 structure

If we require f to be correlated to such structures, then it is possible to obtain some bounds on the ratio kfk 1 /kfk , which will be important for our results. L  We then define the following notion:

Definition 7.1 (Step function). We say a function f : V × V → R is a step function on k steps if there exists a partition P = (Vi)i∈[k] of V into k parts such that f is constant inside each set Vi × Vj, i, j ∈ [k]. If we suppose that the function f is a step function on k steps, then a simple relationship 1 between the cut norm and the L norm is that kfk 1 ≤ 2kkfk . Indeed, suppose the steps are L  V1,V2, ··· ,Vk and, for each i ∈ [k], define the sets [ [ Pi := Vj and Ni := Vj j∈[k] j∈[k]

f|Vi×Vj >0 f|Vi×Vj <0

Then k k X X kfk 1 = kf1 k = (| [f1 ]| + | [f1 ]|) ≤ 2kkfk L Vi×V L1 E Vi×Pi E Vi×Ni  i=1 i=1 This way, functions with a relatively small number of steps play an important role when we wish to have a relationship between the cut norm and the L1 norm. This suggests us to consider the following important operator, related to the “rounded graph” as defined in Chapter3:

Definition 7.2 (Stepping operator). Let P be a partition of the set V , whose atoms are V1,V2, ··· ,Vk. 2 Given any function f : V → R, we define (f)P := E [f | P ⊗ P] as the function 1 X (f)P (x, y) = f(u, v), |Vi||Vj| u∈Vi,v∈Vj where Vi is the part which contains x and Vj is the part which contains y. It is easy to prove that the stepping operator is a contraction in cut norm. Moreover, since it is a conditional expectation, we know from Probability Theory that it is also a contraction in Lp norm for all p ≥ 1. Using these facts, we may prove the following crucial “rigidity” result:

2 Lemma 7.1 (Rigidity of Strong Regularity). Let f1, f2 : V → R be two functions and P1, P2 2 be two partitions of V into at most k parts. If f1 − (f1)P , f2 − (f2)P ≤ ε/4k , then 1  2 

(f1) − (f2) ≤ kf1 − f2k 1 + ε P1 P2 L1 L

Proof. Let Q := P1 ∨P2 be the common refinement of P1 and P2. Because the stepping operator is a contraction in L1 norm, we conclude that

(f ) − (f ) ≤ kf − f k 1 1 Q 2 Q L1 1 2 L Also, because the stepping operator is a contraction in cut norm, we have ε (f1)P − (f1)Q = ((f1)P1 − f1)Q ≤ (f1)P − f1 ≤ 1   1  4k2

2 Notice the partition Q has at most k steps (since each of P1 and P2 has at most k steps), and so 2 ε (f1)P − (f1)Q ≤ 2k (f1)P − (f1)Q ≤ 1 L1 1  2

The same reasoning as above similarly implies that (f2) − (f2) ≤ ε/2, and thus by P2 Q L1 the triangle inequality

(f1) − (f2) ≤ (f1) − (f1) + (f1) − (f2) 1 + (f2) − (f2) P1 P2 L1 P1 Q L1 Q Q L Q P2 L1 ≤ (f ) − (f ) + ε 1 Q 2 Q L1

≤ kf1 − f2kL1 + ε 7.2. Inheritance of structure lemmas 43

7.2 Inheritance of structure lemmas

This section is devoted to establishing two important lemmas on “inheriting structure” by taking random choices inside a sufficiently uniform graph Γ. Recall from Section 4.2 that a graph Γ = (V,E) with density p := 2|E|/|V |2 is η-uniform if 2 |dΓ(A, B) − p| ≤ ηp holds for all sets A, B ⊆ V satisfying |A × B| ≥ η|V | . We note the simple −1 fact that this condition implies p 1Γ − 1 ≤ 2η.  Suppose then we are given a symmetric function T : V 2 → [0, 1] with a bounded number of steps and a sufficiently uniform graph Γ on V . The lemmas proven in this section roughly say that, if we randomly choose a subgraph G of Γ by adding each edge xy ∈ Γ to G with probability T (x, y), then G will inherit both the cut and the L1 structures from the function T . Below we state the first of these lemmas, which deals with the cut structure: Lemma 7.2 (Inheritance of cut structure). Let ε > 0, k ≥ 1 be given and define η := ε/4k. Then the following holds for every sufficiently large n ∈ N. Let V be a set of n vertices, Γ be an η-uniform graph on V with density p ≥ Ωε(1/n), and T : V 2 → [0, 1] be a symmetric function with at most k steps. If G is a random (spanning) subgraph of Γ with P(xy ∈ G) = T (x, y) ∀xy ∈ Γ, all choices being independent, then

2 −1  − ε pn2 P p 1G − T > ε < e 8 

In this lemma, the condition on the density p ≥ Ωε(1/n) means that p ≥ Cε/n for some quantity Cε > 0 depending only on ε. The proof of Lemma 7.2 (and also that of Lemma 7.3 below) proceeds as follows. We first break the statement into a part that uses only the “pseudorandomness” of Γ and a part that involves only the “real randomness” of G. The pseudorandom part is in fact deterministic, and is dealt with by using the uniformity of Γ and the triangle inequality for the cut norm. The random part follows by standard applications of the Chernoff bound and the union bound. We give the details below. Proof. Because G ⊆ Γ, we have that

−1 −1 −1 p 1G − T ≤ p 1Γ(1G − T ) + (p 1Γ − 1)T    −1 X −1 ≤ p k1Γ(1G − T )k + (p 1Γ − 1)T 1Vi×V , (7.1)   i∈[`] where V1,V2, ··· ,V` are the steps of T (and so ` ≤ k). For a fixed i ∈ [`], let A ⊆ Vi, B ⊆ V be sets satisfying

 −1  −1 E (p 1Γ − 1)T 1A×B = (p 1Γ − 1)T 1Vi×V ,   −1  let σ ∈ {−1, +1} be the sign of E (p 1Γ − 1)T 1A×B and let Ti,j ∈ [0, 1] be the value of T over Vi × Vj. Note that for every j ∈ [`] we must have

 −1   −1  Ti,jσE (p 1Γ − 1)1A×(B∩Vj ) ≤ σE (p 1Γ − 1)1A×(B∩Vj ) , otherwise for a contradicting value of j we would have

 −1   −1  −1 σE (p 1Γ − 1)T 1A×(B\Vj ) > σE (p 1Γ − 1)T 1A×B = (p 1Γ − 1)T 1Vi×V ,  which is impossible. Then   −1 X −1 (p 1Γ − 1)T 1Vi×V = σE  (p 1Γ − 1)T 1A×(B∩Vj )  j∈[`] X  −1  = Ti,jσE (p 1Γ − 1)1A×(B∩Vj ) j∈[`] 44 Chapter 7. Transference results for L1 structure

X  −1  ≤ σE (p 1Γ − 1)1A×(B∩Vj ) j∈[`]  −1  = σE (p 1Γ − 1)1A×B −1 ≤ p 1Γ − 1  From inequality (7.1) we then obtain

−1 −1 X −1 p 1G − T ≤ p k1Γ(1G − T )k + p 1Γ − 1    i∈[`] −1 ≤ p k1Γ(1G − T )k + ` · 2η  −1 ε ≤ p k1Γ(1G − T )k + ,  2

−1  and so p 1G − T > ε ≤ (k1Γ(1G − T )k > εp/2). P P  For any given sets A, B ⊆ V , the Chernoff bound implies that

   2 2 2 −2 ε pn /2 |Γ∩A×B| X ε pn 2 |Γ∩A×B| P  (1xy∈G − T (x, y)) >  ≤ 2e 2 2 xy∈Γ∩A×B 2 − ε pn2 ≤ 2e 4

Since there are only 22n pairs (A, B) of subsets of V , by the union bound we obtain that  εp  εp P k1Γ(1G − T )k > = P ∃A, B ⊆ V : |E [1Γ(1G − T )1A×B]| >  2 2   2 X X ε pn ≤ P  (1xy∈G − T (x, y)) >  2 2 A,B⊆V xy∈Γ∩A×B 2 2n − ε pn2 ≤ 2 · 2e 4 ,

2 − ε pn2 which is smaller than e 8 for sufficiently large n ∈ N, provided p ≥ Ωε(1/n). The second lemma has a similar philosophy, but deals with the L1 structure: Lemma 7.3 (Inheritance of L1 structure). Let ε > 0, k ≥ 1 be given and define η := ε/4k4. Then the following holds for every n ≥ 1. Let V be a set of n vertices, Γ be an η-uniform graph on V with density p, and let T1,T2 : 2 V → [0, 1] be symmetric functions with at most k steps. For each edge xy ∈ Γ, we draw σxy uniformly and independently at random from [0, 1], and construct the (random) graphs G1,G2 on V by including edge xy ∈ Γ in G1 (resp. G2) if and only if σxy ≤ T1(x, y) (resp. σxy ≤ T2(x, y)). Then ε2 2 −1  − 4 pn P p k1G14G2 kL1 − kT1 − T2kL1 > ε < 2e 2 Proof. Define T := T1 − T2, so that T has at most k steps (which we will call V1,V2, ··· ,V`, for 2 some ` ≤ k ) and kT kL∞ ≤ 1. We have that

−1 −1 −1 1 1 1 p k1G14G2 kL1 − kT1 − T2kL ≤ p k1G14G2 kL1 − k1ΓT kL + p k1ΓT kL1 − kT kL

Let Ti,j be the value of T over Vi × Vj for i, j ∈ [`]. Then

−1 1 X −1 X p k1ΓT k 1 − kT kL1 = p 1xy∈Γ|T (x, y)| − |T (x, y)| L n2 x,y∈V x,y∈V

1 X −1  = p eΓ(Vi,Vj) − |Vi||Vj| |Ti,j| n2 i,j∈[`]

1 X −1 ≤ p eΓ (Vi,Vj) − |Vi| |Vj| n2 i,j∈[`] 7.3. A “coarse” structural correspondence 45

2 If |Vi||Vj| ≥ ηn , then

−1 2 p eΓ (Vi,Vj) − |Vi| |Vj| ≤ η|Vi||Vj| ≤ ηn ,

2 while if |Vi||Vj| < ηn we have

−1 2 2 p eΓ (Vi,Vj) − |Vi| |Vj| ≤ (1 + η)ηn ≤ 2ηn

We then conclude that

−1 1 X 2 2 ε p k1ΓT k 1 − kT kL1 ≤ 2ηn = 2` η ≤ , L n2 2 i,j∈[`] and so by the Chernoff bound

−1   εp 1 1 p k1G14G2 k 1 − kT1 − T2kL > ε ≤ k1G14G2 k 1 − k1ΓT kL > P L P L 2   2 X ε pn = P  (1G14G2 (x, y) − P (xy ∈ G14G2)) >  2 2 xy∈Γ 2 2 2 −2 ε pn − ε pn2 ≤ 2e ( 2 ) 2 = 2e 4

7.3 A “coarse” structural correspondence

In this section we give a first transference result between the dense and the sparse settings which also takes into account the L1 structure of the graphs. To enunciate this result, it will be convenient to first establish a notation for the different spaces of graphs we will work with:

Definition 7.3 (Spaces of graphs). Given a graph Γ = (V (Γ),E(Γ)), we denote by S(Γ) the collection of all spanning subgraphs of Γ, that is

S(Γ) := {G = (V (Γ),E(G)) : E(G) ⊆ E(Γ)}

There are two different kinds of graphs which will be used here to generate these spaces. The first is the complete graph on a given vertex set V , which we will denote by KV , and whose space S(KV ) will represent the “dense space” of graphs on this vertex set. The second is a sparse η-uniform graph for some small constant η, which we will usually denote by Γ, and whose space S(Γ) represents the “sparse space” of graphs. We then wish to prove that the sparse space S(Γ) and the dense space S(KV (Γ)) on the same vertex set are in some sense equivalent, having the same “geometries” given both by the cut structure and by the L1 structure. For this first result, we will try to construct a bijection between representative subsets of each space, in such a way that this bijection preserves both the cut structure of each graph and the pairwise L1 distance between them. Perhaps the most natural way of obtaining representative subsets suitable to our purposes is to take an ε-net of each space in L1 norm, since this would imply that every graph has a representative in the corresponding set at a distance of at most ε in both L1 and cut norms. However, it is not hard to prove that any ε-net of S(KV (Γ)) will be larger than the entire space S(Γ) if the density of Γ is sufficiently small compared to ε, which makes such a bijection impossible for sparse graphs Γ. Fortunately, our proof of Theorem 6.1 implies that both spaces are equivalent in cut norm, so we may take ε-nets in cut norm from each of these spaces and make a bijection preserving the cut structure of each graph. The following theorem says we can do this in such a way to also preserve the pairwise L1 distance between them:

O(1) Theorem 7.1. For every ε > 0 there exists a constant η = 2−(1/ε) and a function M : N → N such that the following holds. 46 Chapter 7. Transference results for L1 structure

Suppose (Γn)n∈N is a sequence of η-uniform graphs with |V (Γn)| = n and density p = p(n) ≥ Ωε(1/n). Then there exist subsets

ε ε Sn = {G1, ··· ,GM(n)} ⊂ S(Γn),Dn = {H1, ··· ,HM(n)} ⊂ S(KV (Γn)) which satisfy the following conditions for every sufficiently large n ∈ N: ε ε • Sn and Dn are ε-nets in cut norm of S(Γn) and S(KV (Γn)), respectively:

∀G ∈ S(Γ ) ∃i ∈ [M(n)] : p−1k1 − 1 k ≤ ε n G Gi  ∀H ∈ S(K ) ∃i ∈ [M(n)] : k1 − 1 k ≤ ε V (Γn) H Hi 

• For all i, Hi is a dense model of Gi:

−1 ∀i ∈ [M(n)], p 1Gi − 1Hi ≤ ε 

ε ε 1 • Sn and Dn have the same L structure:

−1 ∀i, j ∈ [M(n)], 1Hi4Hj L1 = (1 ± ε)p 1Gi4Gj L1

2 Proof. Suppose n is sufficiently large, and let us call a symmetric function T : V (Γn) → [0, 1] by a template on V (Γn). Then the Weak Regularity Lemma (Theorem 2.1) and the Pseudorandom

Transference Principle (Theorem 6.1) imply that all graphs in S(Γn) and S(KV (Γn)) are ε-close O(1) in cut norm to a template with at most K = 21/ε steps. 1 Take an ε-net in L norm of the space of templates on V (Γn) with at most K steps and which does not contain two templates that are ε-close in L1 norm. As there exists one such net with at 2 most Kn(1/ε)K = eΘε(n) elements, we may apply the inheritance of structure lemmas at each template in this net (using the same random choices) and then apply union-bound to obtain two ε ε 1 “model sets” Sn ⊂ S(Γn), Dn ⊂ S(KV (Γn)) which have the same L and cut structures as the templates they model. The lemma follows by changing ε to ε/3.

7.4 A “fine” structural correspondence

Theorem 7.1 proven in the last section provides a “coarse” structural correspondence between the sparse space S(Γ) and the dense space S(KV (Γ)), given by means of a bijection between ε-nets in cut norm of each space, and which preserves the cut structure of graphs and their pairwise L1 distance. However, these ε-nets contain only a vanishing fraction of the graphs in each space, and we have no guarantees that they are “well-spread” in L1 norm. This section is devoted to showing that, if we do not require the correspondence map between the two spaces to be bijective, then we can get a much “finer” structural correspondence, which concerns almost all graphs from each space and almost every “L1 structures” in a sense we will now define. For a given graph Γ and an integer m, let us define a constellation of order m in S(Γ) as being simply a collection of m graphs {G1, ··· ,Gm} ⊆ S(Γ). A constellation should be seen as being characterized by the cut structure of its elements and the (relative) L1 structure between them. This definition is made so that we can talk about approximating a collection of several graphs at once while preserving the overall structure of this collection. This is the philosophy of the next definition:

Definition 7.4 (ε-similarity). A constellation {G1, ··· ,Gm} is ε-similar to a constellation {H1, ··· ,Hm} in S(Γ) if: – ∀i ∈ [m]: p−1k1 − 1 k ≤ ε, and Gi Hi  −1 −1 – ∀i, j ∈ [m]: p k1Gi4Gj k = p k1Hi4Hj k ± ε, where p := 2|E(Γ)|/|V (Γ)|2 is the density of Γ. 7.4. A “fine” structural correspondence 47

We say that a subset S ⊆ S(Γ) ε-contains all constellations of order m in S(Γ) if, for any collection of m graphs {G1, ··· ,Gm} ⊆ S(Γ), there exists a collection {H1, ··· ,Hm} ⊆ S which is ε-similar to {G1, ··· ,Gm}. Note that a set S being an ε-net of S(Γ) in cut norm is equivalent to S ε-containing all constellations of order 1 in S(Γ), but is strictly weaker than ε-containing all constellations of order m for any m ≥ 2. Intuitively, if we are dealing with at most m graphs at a time, ε-containing all constellations of order m is virtually the same as being an ε-net in L1 norm. With this notion, we may now state the “fine” structural correspondence principle:

m O(1) Theorem 7.2. For every ε > 0, m ≥ 1, there exists a constant η = 2−(2 /ε) such that the following holds.

Suppose (Γn)n∈N is a sequence of η-uniform graphs with |V (Γn)| = n and density p = p(n) ≥ 0 0 Ωε,m(1/n). Then for every sufficiently large n ∈ N there exist subsets Sn ⊆ S(Γn), Dn ⊆

S(KV (Γn)) and functions 0 0 0 0 f : Sn → Dn, g : Dn → Sn which satisfy the following conditions: 0 0 • Sn and Dn contain almost all graphs in S(Γn) and S(KV (Γn)):

0 0 |Sn| > (1 − ε)|S(Γn)|, |Dn| > (1 − ε)|S(KV (Γn))|

0 0 • Sn and Dn ε-contain all constellations of order m in S(Γn) and S(KV (Γn)), respectively • f(G) is a dense model of G and g(H) is a sparse model of H:

0 −1 ∀G ∈ Sn : p 1G − 1f(G) ≤ ε  0 −1 ∀H ∈ Dn : 1H − p 1g(H) ≤ ε  • f and g are both continuous in L1-norm:

∀G ,G ∈ S0 : 1 ≤ p−1 k1 k + ε 1 2 n f(G1)4f(G2) L1 G14G2 L1

∀H ,H ∈ D0 : p−1 1 ≤ k1 k + ε 1 2 n g(H1)4g(G2) L1 H14H2 L1

7.4.1 Proof of Theorem 7.2 For the proof of Theorem 7.2 we will need to define some additional spaces: Definition 7.5. Given ε > 0, k ≥ 1 and a graph Γ of density p := 2|E(Γ)|/|V (Γ)|2, we define:

• The space TV (Γ) of templates on V (Γ):

 2 TV (Γ) := T : V (Γ) → [0, 1],T symmetric

• The space TV (Γ);k of templates on V (Γ) with at most k steps:  TV (Γ);k := T ∈ TV (Γ) : T has at most k steps

• The space S(Γ; k, ε) of well-structured graphs with a partition of order k: n ε o S(Γ; k, ε) := G ⊆ Γ: ∃ T ∈ T kp−11 − T k ≤ V (Γ);k G  16k2

The following simple lemma is an easy consequence of Lemma 7.2 applied to the constant template T ≡ 1/2: Lemma 7.4. For every ε > 0, k ≥ 1 there exists η = (ε/k)O(1) such that the following holds. For every sequence (Γn)n≥1 of η-uniform graphs with |V (Γn)| = n and density p = p(n) ≥ Ωε(1/n), we have |S(Γ ; k, ε)| lim n = 1 n→∞ |S(Γn)| 48 Chapter 7. Transference results for L1 structure

The following lemma is the main technical step in the proof of Theorem 7.2, and roughly says 1 that the space S(Γn; K, ε) is very “well spread” in L norm:

m O(1) m O(1) Lemma 7.5. For every ε > 0, m ≥ 1 there exist η = 2−(2 /ε) , K = 2(2 /ε) such that the following holds. If Γn is η-uniform, then S(Γn; K, ε) ε-contains all constellations of order m in S(Γn).

Proof. Let {G1, ··· ,Gm} ⊂ S(Γn) be any constellation of order m. For every subset I ⊆ [m], define the graph !   \ \ GI := Gi ∩  (Γ \ Gj) i∈I j∈[m]\I

As GI ⊆ Γ for all I ⊆ [m], we may apply the Weak Regularity Lemma with error parameter m+1 ε/2 for each of these graphs. We then obtain, for each I ⊆ [m], a partition PI of V (Γ) into (2m+1/ε)2 −1 m+1 at most 2 parts such that p k1G − (GI )P k ≤ ε/2 . W I I  Let Q := I⊆[m] PI be the common refinement of these partitions; then

m  m+1 2 2 −2 3m+2 |Q| ≤ 2(2 /ε) = 2ε 2 =: K and

−1 −1 −1 p k1G − (GI )Qk ≤ p k1G − (GI )P k + p k(GI )P − (GI )Qk I  I I  I  ε −1 ≤ + p ((GI )PI )Q − (GI )Q 2m+1  ε ≤ , ∀I ⊆ [m] 2m

For each i ∈ [m] we then conclude that

−1 X −1 ε p k1G − (Gi)Qk ≤ p k1G − (GI )Qk ≤ i  I  2 I⊆[m]: i∈I S P Note that all the GI are disjoint and Γ = I⊆[m] GI , so I⊆[m] (GI )Q = (Γ)Q. For each I ⊆ [m], we may then construct a random graph HI ⊆ Γ in the following way: for each edge xy ∈ Γ, independently from all other choices, we put xy in exactly one of the HI with

(GI )Q(x, y) ∀I ⊆ [m], P(xy ∈ HI ) = (Γ)Q(x, y)

m+3 2 ε/2 K (2m/ε)O(1) If η ≤ 4K = 2 , then by inheritance of cut structure (Lemma 7.2) we have that   −1 (GI )Q ε (ε/2m+4K2)2pn2 P p 1HI − > m+3 2 < e ∀I ⊆ [m] (Γ)Q 2 K  In particular,   (G ) ε 2 −1 I Q Θε,m(pn ) P p k1HI kL1 − > m+3 2 < e ∀I ⊆ [m] (7.2) (Γ)Q L1 2 K

Let Q1, ··· ,Q` (where ` ≤ K) be the atoms of Q; then

(GI )Q −1 (GI )Q −1 − p k1GI kL1 ≤ − p 1GI (Γ)Q L1 (Γ)Q L1 −1 X |Qi||Qj| eGI (Qi,Qj) p eGI (Qi,Qj) = 2 − n eΓ(Qi,Qj) |Qi||Qj| i,j≤`

X eGI (Qi,Qj) |Qi||Qj| −1 = 2 − p n eΓ(Qi,Qj) i,j≤` 7.4. A “fine” structural correspondence 49

−1 2 X eG (Qi,Qj) X p (1 + η)p · ηn ≤ 2ηp−1 I + n2 n2 i,j≤` i,j≤` 2 2 |Qi||Qj |≥ηn |Qi||Qj |<ηn ≤ 2ηK2 ε ≤ ∀I ⊆ [m] 2m+3K Together with (7.2), this implies

 ε  2 −1 −1 Θε,m(pn ) 1 1 p k1HI kL − p k1GI kL > < e P 2m+2K

m We may then find 2 graphs HI , I ⊆ [m], such that

−1 (GI )Q ε −1 ε 1 1 p 1HI − ≤ m+3 2 and p |k1HI kL − k1GI kL | ≤ m+2 (Γ)Q 2 K 2 K  S Fixing these graphs, we define for each i ∈ [m] the graph Hi := I3i HI and the template P (GI )Q Ti := . We note that Hi ⊆ Γ, Ti ∈ T for all i ∈ [m], and I3i (Γ)Q V (Γ);K

−1 X −1 (GI )Q m−1 ε ε p 1Hi − Ti ≤ p 1HI − ≤ 2 m+3 2 = 2 ;  (Γ)Q 2 K 16K I3i  by definition we conclude that {H1, ··· ,Hm} ⊂ S(Γ; K, ε). Let us now prove {H1, ··· ,Hm} is ε-similar to {G1, ··· ,Gm}. First,

p−1k1 − 1 k ≤ p−1k1 − (G ) k + kp−1(G ) − T k + kT − p−11 k Gi Hi  Gi i Q  i Q i  i Hi 

ε X −1 (GI )Q ε ≤ + p (GI )Q − + 2 2 (Γ)Q 1 16K I3i L ε ε ε ≤ + 2m−1 + ≤ ε ∀i ∈ [m] 2 2m+3K 16K2 Also, because X X 1Hi4Hj = 1HI and 1Gi4Gj = 1GI , I⊆[m] I⊆[m] |I∩{i,j}|=1 |I∩{i,j}|=1 we have that

−1 X −1 1 1 1 1 p k1Hi4Hj kL − k1Gi4Gj kL ≤ p |k1HI kL − k1GI kL | I⊆[m] |I∩{i,j}|=1 ε ≤ 2m−1 ≤ ε 2m+2K for all i, j ∈ [m], thus finishing the proof. With these lemmas, we may now proceed to the proof of Theorem 7.2. Proof of Theorem 7.2. Given ε > 0, m ≥ 1, let us define

−2 3m+2 m O(1) ε m O(1) K := 2ε 2 = 2(2 /ε) and η := = 2−(2 /ε) 2m+5K4

Suppose Γ is an η-uniform graph with |V (Γ)| = n, density p ≥ Ωε(1/n) and n is sufficiently big. 0 0 (ε) Let S := S(Γ; K, ε), D := S(KV (Γ); K, ε) and let T ⊂ TV (Γ);K be an (ε/16)-net of TV (Γ);K in L1 norm containing eΘε,K (n) elements. We apply the inheritance of structure lemmas at each template in this net (using the same random choices) and then apply union bound to obtain the (ε) existence of “model graphs” φ(T ) ∈ S(KV (Γ)) and ψ(T ) ∈ S(Γ) for any T ∈ T which satisfy the following conditions: i) ∀T ∈ T (ε) : k1 − T k ≤ ε/16K2, φ(T )  50 Chapter 7. Transference results for L1 structure

kp−11 − T k ≤ ε/16K2 ψ(T )  ii) ∀T ,T ∈ T (ε) : 1 − kT − T k ≤ ε/8, 1 2 φ(T1)4φ(T2) L1 1 2 L1 p−1 1 − kT − T k ≤ ε/8 ψ(T1)4ψ(T2) L1 1 2 L1

Condition i) tells us that φ(T (ε)) ⊆ D0 and ψ(T (ε)) ⊆ S0; for each T ∈ T (ε), we define 0 0 f(ψ(T )) := φ(T ) and g(φ(T )) := ψ(T ). For every other G ∈ S , H ∈ D , we may find TG,TH ∈ T such that kp−11 − T k , k1 − T k ≤ ε/16K2, and also T 0 ,T 0 ∈ T (ε) such that V (Γ);K G G  H H  G H 0 0 kTG − TGkL1 , kTH − TH kL1 ≤ ε/16 (if several templates satisfy these properties, take any one of them); we then define 0 0 f(G) := φ(TG), g(H) := ψ(TH ) By Lemma 7.4, we already know that

|S(Γ; K, ε)| > (1 − ε)|S(Γ)| and |S(KV (Γ); K, ε)| > (1 − ε)|S(KV (Γ))| if n is sufficiently large. By Lemma 7.5, we also know that S(Γ; K, ε) and S(KV (Γ); K, ε) ε-contain all constellations of order m in S(Γ) and S(KV (Γ)) respectively. To finish the proof of the theorem it then suffices to prove the following two facts:

1. For all G ∈ S0,H ∈ D0, we have

kp−11 − 1 k ≤ ε, k1 − p−11 k ≤ ε G f(G)  H g(H)  This follows from

−1 −1 kp 1G − 1 k = kp 1G − 1 0 k f(G)  φ(TG)  −1 0 0 ≤ kp 1G − TGk + kTG − T k + kT − 1 0 k  G  G φ(TG)  ε ε ε ≤ + + ≤ ε, 16K2 8 16K2 −1 −1 k1H − p 1 k = k1H − p 1 0 k g(H)  ψ(TH )  0 0 −1 ≤ k1H − TH k + kTH − T k + kT − p 1 0 k  H  H ψ(TH )  ε ε ε ≤ + + ≤ ε 16K2 8 16K2

0 0 2. For all G1,G2 ∈ S , H1,H2 ∈ D , we have

−1 −1 1 1 1 1 k1f(G1)4f(G2)kL ≤ p k1G14G2 kL + ε, p k1g(H1)4g(H2)kL ≤ k1H14H2 kL + ε

This follows from the inequalities

−1 3ε 0 0 ε kTG − TG k 1 ≤ p k1G 4G k 1 + and |k1 k 1 − kT − T k 1 | ≤ , 1 2 L 1 2 L 4 f(G1)4f(G2) L G1 G2 L 8 so that we have

0 0 ε k1 k 1 ≤ kT − T k 1 + f(G1)4f(G2) L G1 G2 L 8 0 0 ε ≤ kT − TG k 1 + kTG − TG k 1 + kTG − T k 1 + G1 1 L 1 2 L 2 G2 L 8 ε ≤ kT − T k 1 + G1 G2 L 4 −1 1 ≤ p k1G14G2 kL + ε

−1 1 1 The inequality p k1g(H1)4g(H2)kL ≤ k1H14H2 kL + ε is proven similarly. 51

Chapter 8

Extensions and open problems

In this last chapter we will briefly mention some possible extensions to the work done in Chapter 7, which seem to be amenable to the methods exposed here and indicate a possible path to be taken for future work. We will also state some natural questions our results left open and which do not seem to follow from our methods, requiring substantially different ideas. While the results presented in Chapter7 focused solely on graphs (more specifically on sub- graphs of pseudorandom graphs), the arguments used in their proofs and the framework devel- oped in this work seem to make it possible to extend them to other combinatorial objects and other notions of pseudorandomness. Perhaps the simplest and most natural extension of these results would be to generalize them to the space of upper-regular graphs (which were defined in Section 6.2). Indeed, our Lemma 6.2 shows that, under the conditions 1/n  p  1 we are interested in, every upper-regular graph G is a dense subgraph of some uniform graph ΓG; it then seems likely our methods will work in the space of upper-regular graphs too. The main difficulty is that there is no longer a fixed uniform graph Γ which contains all graphs in this space, so we cannot use it as a ground set for the random choices and probabilistic estimates made as we did before. However, with a little more care and using slightly different concentration inequalities, it seems possible to eliminate these difficulties. It seems natural to expect that the robust properties for the dense model function obtained in Theorem 7.2, as well as the structure-preserving correspondences we obtained in Theorem 7.1, should facilitate or optimize some stability results for graphs in the sparse setting we have worked in. Finding such interesting applications of our results would of course be very important. The robustness of the dense model function given in Theorem 7.2 may also be valuable in different settings other than graphs. It would be interesting to see if it is possible to obtain more general results in this direction, working abstractly with the framework as described in Section 1.1 or in Chapter2. On a different direction, it might be interesting also to combine our theorems with other known results in the same setting of subgraphs of pseudorandom graphs. Indeed, our theorems assume a very weak pseudorandomness condition on the host graph Γ, which is then satisfied in most cases where some pseudorandomness condition is required. A natural question which our results leave open is if a structural correspondence as given in Theorem 7.2 may be proven encompassing all graphs. Since our arguments require all graphs in each space to have a strongly regular partition into essentially the same number of parts, it doesn’t seem likely that the energy-increment methods we have used throughout this work could obtain such a universal result. Another natural question is if in Theorem 7.2 we could make the “sparse model” function g be the left-inverse of the “dense model” function f (the opposite direction is clearly impossible because of the distinct cardinalities of the two spaces). However, in some sense it is impossible to obtain such a result by using regularity, as we have done throughout; more specifically, we cannot obtain this result by passing through the space of templates on a bounded number of steps, which are the approximations of graphs we get from using the regularity lemmas. Indeed, to get such a stronger result we would need to obtain, for each graph G belonging to the domain S0 of the dense model function f, a template T (G) for which the inequalities

−1 −1 1 1 1 p k1G1 − 1G2 kL − ε ≤ kT (G1) − T (G2)kL ≤ p k1G1 − 1G2 kL + ε

0 are satisfied for all pairs G1,G2 ∈ S . 52 Chapter 8. Extensions and open problems

Unfortunately, the next lemma shows that unless S0 contains only a negligible proportion of graphs from the sparse space (as was the case in Theorem 7.1), the space of templates is too small in L1 norm to obtain such a correspondence:

Lemma 8.1. For every 0 < ε, α < 1/2 and every k ∈ N there exists η > 0 such that the following holds. Let (Γn)n∈N be a sequence of η-uniform graphs with |V (Γn)| = n and density 0 p = p(n) ≥ Ωε,α(1/n). Then for all sufficiently large n ∈ N and all sets Sn ⊆ S(Γn) with 0 0 |Sn| ≥ ε|S(Γn)|, there exists no function T : Sn → TV (Γn);k satisfying

−1 0 1 1 kT (G1) − T (G2)kL ≥ p k1G1 − 1G2 kL − α ∀G1,G2 ∈ Sn

Proof. Let γ > 0 be a small constant depending on α and M be a large integer which we will 1 Θγ,K (n) choose later. Fix a γ-net of TV (Γn);K in L norm with 2 elements, and choose M graphs 0 G1, ··· ,GM ∈ Sn satisfying

−1 1 p k1 − 1 k 1 ≥ − γ ∀i, j ∈ [M], i 6= j Gi Gj L 2

0 Suppose the function T : Sn → TV (Γn);K satisfies the condition on the statement of the lemma. For every i, j ∈ [M], let Ti,Tj be templates in the fixed γ-net such that kTi − T (Gi)kL1 , kTj − T (Gj)kL1 ≤ γ. We then have

kTi − TjkL1 ≥ kf(Gi) − f(Gj)kL1 − kTi − T (Gi)kL1 − kTj − T (Gj)kL1

≥ kf(Gi) − f(Gj)kL1 − 2γ −1 1 ≥ p k1G1 − 1G2 kL − α − 2γ 1 ≥ − α − 3γ 2

1/2−α By choosing γ = 4 > 0 we make sure these templates are all different. However, if η > 0 is small enough and n is large enough, we obtain that almost all graphs Gi,Gj ∈ S(Γn) satisfy −1 1 p k1Gi − 1Gi kL ≥ 1/2 − α; this gives a contradiction for big n since the number of graphs in 0 ε 0 Θγ,K (n) Sn satisfying this last inequality will be at least 2 |Sn|  2 . 53

Bibliography

[1] Noga Alon, Eldar Fischer, Ilan Newman, and Asaf Shapira. “A combinatorial character- ization of the testable graph properties: it’s all about regularity”. In: SIAM Journal on Computing 39.1 (2009), pp. 143–167. [2] Noga Alon, Eldar Fischer, Michael Krivelevich, and Mario Szegedy. “Efficient testing of large graphs”. In: Combinatorica 20.4 (2000), pp. 451–476. [3] Noga Alon, Richard A Duke, Hanno Lefmann, Vojtech Rodl, and Raphael Yuster. “The algorithmic aspects of the regularity lemma”. In: Journal of Algorithms 16.1 (1994), pp. 80– 109. [4] Vitaly Bergelson and Alexander Leibman. “Polynomial extensions of van der Waerdens and Szemer´edistheorems”. In: Journal of the American Mathematical Society 9.3 (1996), pp. 725–753. [5] Christian Borgs, Jennifer T Chayes, L´aszl´oLov´asz,Vera T S´os,and Katalin Vesztergombi. “Convergent sequences of dense graphs I: Subgraph frequencies, metric properties and testing”. In: Advances in Mathematics 219.6 (2008), pp. 1801–1851. [6] David Conlon and Jacob Fox. “Bounds for graph regularity and removal lemmas”. In: Geometric and Functional Analysis 22.5 (2012), pp. 1191–1256. [7] David Conlon, Jacob Fox, and Yufei Zhao. “Extremal results in sparse pseudorandom graphs”. In: Advances in Mathematics 256 (2014), pp. 206–290. [8] David Conlon and William Timothy Gowers. “Combinatorial theorems in sparse random sets”. In: arXiv preprint arXiv:1011.4310 (2010). [9] Paul Erd¨os,Peter Frankl, and Vojtech R¨odl.“The asymptotic number of graphs not con- taining a fixed subgraph and a problem for hypergraphs having no exponent”. In: Graphs and Combinatorics 2.1 (1986), pp. 113–121. [10] Jacob Fox and L´aszl´oMikl´osLov´asz.“A tight lower bound for Szemer\’edi’s regularity lemma”. In: arXiv preprint arXiv:1403.1768 (2014). [11] Jacob Fox, L´aszl´oMikl´osLov´asz, and Yufei Zhao. “A fast new algorithm for weak graph regularity”. In: arXiv preprint arXiv:1801.05037 (2018). [12] Alan Frieze and Ravi Kannan. “Quick approximation to matrices and applications”. In: Combinatorica 19.2 (1999), pp. 175–220. [13] Oded Goldreich, Shari Goldwasser, and Dana Ron. “Property testing and its connection to learning and approximation”. In: Journal of the ACM (JACM) 45.4 (1998), pp. 653–750. [14] William T Gowers. “A new proof of Szemer´edi’stheorem”. In: Geometric & Functional Analysis GAFA 11.3 (2001), pp. 465–588. [15] William T Gowers. “Lower bounds of tower type for Szemer´edi’suniformity lemma”. In: Geometric & Functional Analysis GAFA 7.2 (1997), pp. 322–337. [16] Ben Green and . “The primes contain arbitrarily long arithmetic progressions”. In: Annals of Mathematics (2008), pp. 481–547. [17] Russell Impagliazzo. “Hard-core distributions for somewhat hard problems”. In: Founda- tions of Computer Science, 1995. Proceedings., 36th Annual Symposium on. IEEE. 1995, pp. 538–545. [18] Yoshiharu Kohayakawa. “Szemer´edisregularity lemma for sparse graphs”. In: Foundations of computational mathematics. Springer, 1997, pp. 216–230. [19] Yoshiharu Kohayakawa and Vojtˇech R¨odl.“Szemer´edisregularity lemma and quasi-randomness”. In: Recent advances in algorithms and combinatorics. Springer, 2003, pp. 289–351. 54 BIBLIOGRAPHY

[20] J´anosKoml´osand Mikl´osSimonovits. “Szemer´edi’sregularity lemma and its applications in graph theory”. In: (1996). [21] J´anosKoml´os,Ali Shokoufandeh, Mikl´osSimonovits, and Endre Szemer´edi.“The regularity lemma and its applications in graph theory”. In: Theoretical aspects of computer science. Springer. 2002, pp. 84–112. [22] L´aszl´oLov´aszand Bal´azsSzegedy. “Szemer´edislemma for the analyst”. In: GAFA Geo- metric And Functional Analysis 17.1 (2007), pp. 252–270. [23] Brendan Nagle, Vojtˇech R¨odl, and Mathias Schacht. “The counting lemma for regular k-uniform hypergraphs”. In: Random Structures & Algorithms 28.2 (2006), pp. 113–179. [24] Omer Reingold, Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. “Dense subsets of pseudorandom sets”. In: Foundations of Computer Science, 2008. FOCS’08. IEEE 49th Annual IEEE Symposium on. IEEE. 2008, pp. 76–85. [25] Omer Reingold, Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. “New proofs of the Green-Tao-Ziegler dense model theorem: An exposition”. In: arXiv preprint arXiv:0806.0381 (2008). [26] Vojtˇech R¨odland Mathias Schacht. “Regular partitions of hypergraphs: regularity lem- mas”. In: Combinatorics, Probability and Computing 16.6 (2007), pp. 833–885. [27] Vojtˇech R¨odland Mathias Schacht. “Regularity lemmas for graphs”. In: Fete of combina- torics and computer science. Springer, 2010, pp. 287–325. [28] Vojtˇech R¨odl and Jozef Skokan. “Applications of the regularity lemma for uniform hyper- graphs”. In: Random Structures & Algorithms 28.2 (2006), pp. 180–194. [29] Vojtˇech R¨odland Jozef Skokan. “Regularity Lemma for k-uniform hypergraphs”. In: Ran- dom Structures & Algorithms 25.1 (2004), pp. 1–42. [30] Endre Szemeredi. “On sets of integers containing no k elements in arithmetic progression”. In: Acta Arith 27 (1975), pp. 299–345. [31] Endre Szemer´edi. Regular partitions of graphs. Tech. rep. STANFORD UNIV CALIF DEPT OF COMPUTER SCIENCE, 1975. [32] Terence Tao. “A variant of the hypergraph removal lemma”. In: Journal of combinatorial theory, Series A 113.7 (2006), pp. 1257–1280. [33] Terence Tao. “Structure and randomness in combinatorics”. In: Foundations of Computer Science, 2007. FOCS’07. 48th Annual IEEE Symposium on. IEEE. 2007, pp. 3–15. [34] Terence Tao. Structure and randomness: pages from year one of a mathematical blog. Amer- ican Mathematical Soc., 2008. [35] Terence Tao. “The Gaussian primes contain arbitrarily shaped constellations”. In: Journal dAnalyse Math´ematique 99.1 (2006), pp. 109–176. [36] Terence Tao and Tamar Ziegler. “The primes contain arbitrarily long polynomial progres- sions”. In: Acta Mathematica 201.2 (2008), pp. 213–305. [37] Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. “Regularity, boosting, and efficiently simulating every high-entropy distribution”. In: Computational Complexity, 2009. CCC’09. 24th Annual IEEE Conference on. IEEE. 2009, pp. 126–136. [38] Yufei Zhao. “Hypergraph limits: a regularity approach”. In: Random Structures & Algo- rithms 47.2 (2015), pp. 205–226.