IMPA
Master’s Thesis
Low-Complexity Decompositions of Combinatorial Objects
Author: Supervisor: Davi Castro-Silva Roberto Imbuzeiro Oliveira
A thesis submitted in fulfillment of the requirements for the degree of Master in Mathematics at
Instituto de Matem´aticaPura e Aplicada
April 15, 2018
iii
Contents
1 Introduction 1 1.1 High-level overview of the framework...... 1 1.2 Remarks on notation and terminology...... 2 1.3 Examples and structure of the thesis...... 3
2 Abstract decomposition theorems7 2.1 Probabilistic setting...... 7 2.2 Structure and pseudorandomness...... 8 2.3 Weak decompositions...... 9 2.3.1 Weak Regularity Lemma...... 10 2.4 Strong decompositions...... 10 2.4.1 Szemer´ediRegularity Lemma...... 11
3 Counting subgraphs and the Graph Removal Lemma 15 3.1 Counting subgraphs globally...... 15 3.2 Counting subgraphs locally and the Removal Lemma...... 17 3.3 Application to property testing...... 19
4 Extensions of graph regularity 21 4.1 Regular approximation...... 21 4.2 Relative regularity...... 23
5 Hypergraph regularity 27 5.1 Intuition and definitions...... 27 5.2 Regularity at a single level...... 28 5.3 Regularizing all levels simultaneously...... 30
6 Dealing with sparsity: transference principles 33 6.1 Subsets of pseudorandom sets...... 33 6.2 Upper-regular functions...... 34 6.3 Green-Tao-Ziegler Dense Model Theorem...... 37
7 Transference results for L1 structure 41 7.1 Relationships between cut norm and L1 norm...... 41 7.2 Inheritance of structure lemmas...... 43 7.3 A “coarse” structural correspondence...... 45 7.4 A “fine” structural correspondence...... 46 7.4.1 Proof of Theorem 7.2...... 47
8 Extensions and open problems 51
Bibliography 53
1
Chapter 1
Introduction
Many times in Mathematics and Computer Science we are dealing with a large and general class of objects of a certain kind and we wish to obtain non-trivial results which are valid for all objects belonging to this class. This may be a very hard task if the possible spectrum of behavior for the members of this class is very broad, since it is unlikely that any single argument will hold uniformly along this whole spectrum. Such results may be easy (or easier) to obtain when the class we are dealing with is highly structured, in the sense that one can encode its elements in such a way that the description of each object has a relatively small size; then it may be possible to use this structure to prove results valid uniformly over all objects in this class, or to do a case-by-case analysis to obtain such results. At the other end of the spectrum there are the random objects, which have a very high complexity in the sense that any description of a randomly chosen object must specify the random choices made at each point separately, and thus be very large if the object in consideration is large. However, for such objects there are various “concentration inequalities” which may be used to obtain results valid with high probability over the set of random choices made. Therefore, if we can decompose every object belonging to the general class we are interested in into a “highly structured” component (which has low complexity) and a “pseudorandom” component (which mimics the behavior of random objects in certain key statistics), then we may analyze each of these components separately by different means and so be able to obtain results which are valid for all such objects. An illustrative example of a “structure-pseudorandomness” decomposition of this kind is Szemer´edi’scelebrated Regularity Lemma [31]. This important result roughly asserts that the vertices of any graph G may be partitioned into a bounded number of equal-sized parts, in such a way that for almost all pairs of partition classes the bipartite graph between them is random-like. Both the upper bound we get for the order of this partition and the quality of the pseudorandomness behavior of the edges between these pairs depend only on an accuracy parameter ε we are at liberty to choose. In this example, the object to be decomposed is the edge set E of a given arbitrary graph G = (V,E), which belongs to the “general class” of all graphs. The structured component then represents the pairs (Vi,Vj) of partition classes together with the density of edges between them, and it has low complexity because the order of the partition is uniformly bounded for all graphs. The pseudorandom component represents the actual edges between these pairs, and has a random-like property known as ε-regularity which we will define in the next chapter. This result has many applications in Combinatorics and Computer Science (see, e.g., [20, 21] for a survey), and it has inspired numerous other decomposition results in a similar spirit both inside and outside Graph Theory. In this work we aim to survey many decomposition theorems of this form present in the literature. We provide a unified framework for proving them and present some new results along these same lines.
1.1 High-level overview of the framework
In our setting, the combinatorial objects to be decomposed will be represented as functions defined over a discrete space X. This identification does not give much loss in generality, since given a combinatorial object O (such as a graph, hypergraph or additive group), we may usually 2 Chapter 1. Introduction identify some underlying discrete space X for this kind of object and then represent O as a function fO defined on X. We endow X with a probability measure P, so that the objects considered may be viewed as random variables, and define a family C of “low-complexity” subsets of X. The specifics of both the probability measure P and the structured family C will depend on the application at hand, and it is from them that we will define our notions of complexity and pseudorandomness. The sets belonging to C are seen as the basic structured sets, which have complexity 1, and any subset of X which may be obtained by boolean operations from at most k of these basic structured sets A1, ··· ,Ak ∈ C is said to have complexity at most k according to C. We then say two functions g, h : X → R are ε-indistinguishable according to C if, for all sets A ∈ C, we have that |E [(g − h) 1A]| ≤ ε. Intuitively, this means that we are not able to effectively distinguish between h and g by taking their empirical averages over random elements chosen from one of the basic sets in C. A function f : X → R is then said to be ε-pseudorandom if it is ε-indistinguishable from the constant function 1 on X. Thus pseudorandom functions are in some sense uniformly distributed over structured sets, mimicking random functions of mean 1 defined on X. These concepts are closely related to the notions of pseudorandomness and indistinguishability in the area of Computational Complexity Theory (in the non-uniform setting). In this case, we have a collection F of “efficiently computable” boolean functions f : X → {0, 1} (which are though of as adversaries), and two distributions A and B on X are said to be ε-indistinguishable by F if |P (f(A) = 1) − P (f(B) = 1)| ≤ ε ∀f ∈ F A distribution R is then said to be ε-pseudorandom for F if it is ε-indistinguishable from the uniform distribution UX on X. Intuitively, this means that no adversary from the class F is able to distinguish R from UX with non-negligible advantage. This is completely equivalent to our definitions, if we identify each function f in F with its support f −1(1) in X, and identify the distributions A, B with the functions g(x) := P(A = x)·|X|, h(x) := P(B = x) · |X|. Then |P (f(A) = 1) − P (f(B) = 1)| = E (g − h) 1f −1(1) , where the expectation on the right-hand side is with respect to the uniform distribution. In our abstract decomposition theorems given in Chapter2, it will be convenient to deal with σ-algebras on X rather than with subsets of X; since a σ-algebra on a finite space X is a finite collection of subsets of X, the intuition will be essentially the same. However, this change will make it simpler to apply tools such as the Cauchy-Schwarz inequality and Pythagoras’ theorem, which will be both very important in our energy-increment arguments. Moreover, we will also require pseudorandom functions to have no correlation in average value to the structured sets, and thus be ε-indistinguishable from the zero function on X. Since the expected value function is linear, this “translation” in our definition makes no important difference. The framework as described here will be retaken in Chapter6, when we talk about transfer- ence principles and the Dense Model Theorem.
1.2 Remarks on notation and terminology
We will be mainly interested in very large objects, and use the usual asymptotic notation O, Ω, and Θ with subscripts indicating parameters the implied constant is allowed to depend on. For instance, Oα,β(X) denotes a quantity bounded in absolute value by Cα,β|X| for some quantity Cα,β depending only on α and β. We write Ea∈A,b∈B to denote the expectation when a in chosen uniformly from the set A and b is chosen uniformly from the set B, both choices being independent. For any real numbers a and b, we write x = a ± b to denote a − b ≤ x ≤ a + b. Given an integer n ≥ 1, we write [n] for the set {1, ··· , n}. If A is a set and k is an integer, A we write k to denote the collection of all k-element subsets of A. We write A4B to denote the symmetric difference (A \ B) ∪ (B \ A) of the sets A and B. 1.3. Examples and structure of the thesis 3
Formally, a graph G is given by a pair (V,E), where V is a finite set called the vertex set and V E ⊆ 2 , called the edge set, is a subset of the (unordered) pairs of vertices. We will sometimes write uv or vu to denote an edge {u, v} ∈ E, and denote by 1G the edge indicator function 1G(x, y) := 1{xy ∈ E}. For any graph G, we will refer to its vertex set by V (G) and its edge set as E(G); if there is no risk of confusion, we may denote them only by V and E. For subsets A, B ⊆ V (G), we write eG(A, B) to denote the number of edges in G with one vertex in A and other in B, counting twice edges inside A ∩ B. We also write dG(A, B) := eG(A, B)/|A × B| to denote the edge density of the pair (A, B). As customary, we say that H is a subgraph of G, and write H ⊆ G, if V (H) ⊆ V (G) and E(H) ⊆ E(G). Moreover, if V (H) = V (G), then we say that H is a spanning subgraph of G. If W W ⊆ V (G), then the subgraph of G induced by W is the graph G[W ] := W, E(G) ∩ 2 .A subgraph H of G is an induced subgraph if H = G[V (H)]. The collection of all partitions of a set A is denoted by P(A). If P0 := (Vi)i∈[k] ∈ P(V ) is a partition of a vertex set V , we say a graph G = (V,E) is P0-partite if there are no edges inside each vertex class in P0, i.e., if eG(Vi,Vi) = 0 for all i ∈ [k].
1.3 Examples and structure of the thesis
In Combinatorics, probably the most important and widely known decomposition result of the kind we discussed is Szemer´edi’sRegularity Lemma (Theorem 2.3 proven in the next chapter), already mentioned in the first part of this chapter. This lemma, in a slightly weaker earlier version, was originally used by Szemer´edito prove that all sets of integers with positive upper- density contain arbitrarily large arithmetic progressions [30], a result now known as Szemer´edi’s Theorem. More recently, it was used by Lov´aszand B. Szeg´edy[22] to construct limit objects for infinite sequences of graphs. The Regularity Lemma was shown to imply the compactness of a metric space on two-variable functions in which finite graphs may be naturally embedded, and thus that every sequence of graphs has a converging subsequence in this space. The authors then showed how the compactness of this metric space may be used to prove a stronger version of the Regularity Lemma, known as the Regular Approximation Lemma, which we will prove in Section 4.1. In Computer Science, Trevisan, Tulsiani and Vadhan [37] proved a general decomposition theorem with the same philosophy as the results presented here, and in a similar framework as the one discussed in Section 1.1 (but with a different notion of complexity more adapted to applications in Computer Science). They used this result to show that every high-entropy distribution is indistinguishable from an efficiently samplable distribution of the same entropy. They also showed how this decomposition theorem may be used to prove an important result in Computational Complexity Theory known as Impagliazzo’s Hardcore Theorem [17]. Still in the realm of Computer Science, an important theme which falls within the scope of our subject matter is that of graph property testing, introduced in the seminal paper of Goldreich, Goldwasser and Ron [13]. Let us now quickly present the main problem in this area and its connection to our philosophy and objectives; we will mention this theme again in Section 3.3 and in the introduction of Chapter4. Given ε > 0, let us say a graph G on n vertices is ε-far from satisfying a graph property P if one needs to add and/or delete at least εn2 edges to G in order to turn it into a graph that satisfies P. An ε-test for P is a randomized algorithm T making a total number of edge queries bounded by a function of ε only, and which can distinguish with probability at least 2/3 between graphs satisfying P and graphs that are ε-far from satisfying P. A graph property P is then said to be testable if, for any given ε > 0, there exists an ε-test for P. The central problem in graph property testing is to determine which properties are testable, and also to devise efficient ε-tests for these properties. To see its relation to our subject of study, suppose we have a decomposition of a graph G into a structured low-complexity component and a pseudorandom component. Intuitively, if we query a large (constant) number of randomly chosen edges from G, then with high probability we will have queried the expected proportion of edges from each of the classes in the structured component, and the effects of the pseudorandom component will be averaged out; thus a property P should be testable if and only if knowing the 4 Chapter 1. Introduction structured component of a graph is enough to tell if P is satisfied (or is close to being satisfied) for that graph. It turns out that this intuition may be formalized, and the testable graph properties were completely characterized in this sense on a great paper by Alon, Fischer, Newman and Shapira [1]. Their characterization roughly says that a graph property P is testable if and only if, for every ε > 0, ε-testing P can be reduced (in a specific property-testing sense) to the property of satisfying one of finitely many Szemer´edi-partitioninstances. This characterization is an illus- trative example of our “low-complexity decomposition” interpretation of the Regularity Lemma, and this interpretation was key in proving the characterization in [1]. Another class of results which is closely related to our subject matter of low-complexity decompositions is that of transference principles, which allow us to transfer some combinatorial theorems from the “dense setting” over to the “sparse setting”, where the objects may be much harder to handle. To account for the vanishing density of the objects in the sparse setting, it is usual to renor- malize the functions representing these objects so that they have average close to 1. This renor- malization causes the functions considered to become unbounded as the size of the universe X grows, which is a major source of difficulties. The transference principles then assert that, if the sparse (unbounded) functions satisfy some mild “uniformity” conditions, then they may be modeled by bounded functions which have similar key properties as those of the original function. Transference principles of this form were essential ingredients in the papers [16, 36], where the authors transfered Szemer´edi’sTheorem, and its generalization to polynomial progressions due to Bergelson and Leibman [4], to dense subsets of a sufficiently pseudorandom subset of the integers. Using this result, they were able to prove that the theorems mentioned above hold also for the set of primes, even though it has zero density inside the integers. In a subsequent paper [35], Tao transfered the Hypergraph Removal Lemma (see [32, 28]) to sub-hypergraphs of pseudorandom sparse hypergraphs, and then used this result to prove that the Gaussian primes contain arbitrarily shaped constellations. In this work we will not focus on giving applications of the decomposition theorems mentioned, but rather concentrate on the abstract mathematical ideas behind these results. These ideas may be viewed as representing a dichotomy between structure and randomness, as brilliantly advocated by Tao [34, 33], and which seems to permeate many areas of Mathematics. We will focus here on the case of (finite) sets, graphs and hypergraphs, with the main interest being the case of graphs. However, our methods are presented in a general context, and may also be used in other settings. In Chapter2 we will present the general framework for establishing our results and the precise definition of complexity and pseudorandomness we will use. We will then show how to apply our decomposition theorems by using them to prove a weaker form of the Regularity Lemma due to Frieze and Kannan [12], and then Szemer´edi’s Regularity Lemma itself. In Chapter3 we show how the regular properties of the partitions given in each form of the Regularity Lemma may be used to approximate the number of copies of any fixed graph H inside a graph G. This approximate counting is then used to prove an important result in Graph Theory known as the Graph Removal Lemma, which roughly says that every graph G on n vertices having o(n|V (H)|) copies of a given graph H can be made H-free by deleting o(n2) edges. We next prove two extensions of the Regularity Lemma for graphs in Chapter4, which were each made to handle a different issue that the original Regularity Lemma left unaddressed. The first extension, called the Regular Approximation Lemma (Theorem 4.1), intuitively asserts that is possible to greatly enhance the regularity of a graph by making very few edge modifications, and so gives us more control for the pseudorandom component in terms of the complexity of the structured component. The second extension (Theorem 4.2) is a relative form of the Regularity Lemma, useful for dealing with arbitrary spanning subgraphs of a known fixed graph, and is especially useful for dealing with very sparse graphs. We will then give in Chapter5 the generalization of the Regularity Lemma to the setting of uniform hypergraphs, which are “higher order” versions of graphs whose edges are now composed of d vertices, for some integer d ≥ 3 called the uniformity of the hypergraph. We remark that this higher number of vertices inside each edge introduces a much more intricate structure than that present in graphs, and the corresponding regularity lemma is accordingly much more involved than Szemer´edi’sRegularity Lemma. 1.3. Examples and structure of the thesis 5
In Chapter6 we will consider transference principles, which were already discussed above, and which permit us to transfer some combinatorial theorems from the usual “positive density” setting to objects having asymptotically negligible density but which satisfy some mild uniformity conditions. We will present three results in this direction, which concern different uniformity conditions we require the sparse objects to satisfy, but which give similar conclusions. Chapter7 will be dedicated to obtaining transference results in the graph setting for L1 struc- ture, which is stronger than the cut structure which is preserved by the transference principles given in Chapter6. These results are in some sense an strengthening of the theorems given in Chapter6 when applied to the setting of graphs, and may be seen as requiring the “transference function” from the sparse space to the dense space to be continuous in L1 norm, so as to pre- serve the underlying L1 geometry. The results presented in this chapter are the main original contributions of this work Chapter8 then mentions some possible extensions to the results shown in Chapter7, indi- cating a path to be taken for future work.
7
Chapter 2
Abstract decomposition theorems
This chapter is aimed at developing a general method to decompose an arbitrary object f of some kind into a sum f = fstr + fpsd, where fstr is a low-complexity structured component and fpsd behaves randomly. As mentioned in the introduction, such a decomposition is useful because we may then use different methods to analyze each one of the components separately, taking advantage of their structure, and making it much easier to analyze the arbitrary original object f. In such a decomposition we must always perform a trade-off, increasing our control on one of the components at the expense of worsening our control on the other. In many situations, it turns out to be useful to allow a third term ferr into the decomposition f = fstr + fpsd + ferr, which can be seen as an error term and may be made sufficiently small for the application at hand. We will see in Subsection 2.4.1 that the presence of the error component is in fact essential if we wish to use this decomposition to prove Szemer´edi’sRegularity Lemma. The method we will use to prove such “decomposition theorems” is a simple energy-increment argument (in this form due to Tao [33]), which will be described in the next sections.
2.1 Probabilistic setting
Let (X, Bmax, P) be a probability space, and for brevity let us call a sub-σ-algebra B of Bmax simply a factor of Bmax. Given measurable sets A1, ··· ,Ak ∈ Bmax, we denote by σ(A1, ··· ,Ak) the smallest factor of Bmax which contains all these sets. Given factors B1, ··· , Bk ⊆ Bmax, we denote by B1∨· · ·∨Bk the join of these factors, which is the smallest factor of Bmax which contains all of them. 2 Given a square-integrable function f ∈ L (Bmax) and a factor B ⊆ Bmax, we define the conditional expectation E[f|B] ∈ L2(B) as the orthogonal projection of f to the closed subspace 2 2 L (B) of L (Bmax) consisting of the B-square integrable functions. A simple application of Pythagoras’ theorem then gives the following lemma:
0 Lemma 2.1 (Pythagoras’ theorem). Let B ⊆ B be two factors of Bmax. Then for any function 2 f ∈ L (Bmax) we have
0 2 2 0 2 kE[f|B ]kL2 = kE[f|B]kL2 + kE[f|B ] − E[f|B]kL2 We remark that, even though the general decomposition theorems in this chapter are stated and proven in full generality, in applications we will only deal with finite probability spaces X equipped with the discrete σ-algebra Bmax = 2 . In this restricted setting, X will be a finite set, every subset A ⊆ X will be measurable and a factor B of Bmax may be identified with the partition of X induced by its atoms; this identification will be made throughout the rest of this work without further comments. If in addition we suppose that the probability P is the uniform probability distribution over X, then for any partition B : X = X1 ∪ · · · ∪ Xk of X into k atoms we have that 1 X E[f|B](x) = f(y) |Xi| y∈Xi whenever x ∈ Xi; thus the conditional expectation is just an averaging of the function over the atoms of B. 8 Chapter 2. Abstract decomposition theorems
As an important example, consider the case of a bipartite graph G = (V1 ∪ V2,E). We then V1×V2 take X = V1 × V2, Bmax = 2 and the uniform probability distribution over (X, Bmax); this corresponds to picking a pair (x1, x2) ∈ V1 × V2 uniformly at random. If we have a partition S B : V1 × V2 = i,j Xi × Yj, then E[1G|B](x, y) = eG(Xi,Yj)/|Xi × Yj| is just the edge density between Xi 3 x and Yj 3 y. For any sets A ⊆ V1,B ⊆ V2 we then have that
1 eG(V1,V2) |E [(1G − E[1G]) 1A×B]| = eG(A, B) − |A × B| , |V1 × V2| |V1 × V2| which may be seen as a kind of discrepancy of the edges over the pair (A, B) and, if made smaller than ε, resembles the usual definition of ε-regularity (which we recall bellow in Remark 2.1). It will therefore be more appropriate in our setting to define ε-regularity in the following less standard (but essentially equivalent) way:
Definition 2.1 (ε-regularity). A bipartite graph G = (V1 ∪ V2,E) is ε-regular for some ε > 0 if, for all sets A ⊆ V1,B ⊆ V2, we have that
eG(V1,V2) eG(A, B) − |A × B| ≤ ε|V1 × V2| (2.1) |V1 × V2|
Similarly, a (non-bipartite) graph G = (V,E) is ε-regular if, for all A, B ⊆ V , we have
2|E| 2 eG(A, B) − |A × B| ≤ ε|V | |V |2
Remark 2.1. The usual definition of ε-regularity requires instead the left-hand side of (2.1) to be smaller than ε|A × B| whenever |A| ≥ ε|V1|, |B| ≥ ε|V2|; this requirement implies ε-regularity in our definition, which in turn implies ε1/3-regularity in the usual definition.
2.2 Structure and pseudorandomness
We will now define the notions of structure and pseudorandomness we will need for the de- composition theorems in the next two sections; these notions were first introduced by Tao in [33]. The basic structured objects will be factors of Bmax belonging to a collection S fixed at the beginning. These factors are supposed to be of low complexity and represent the information we can efficiently obtain about our random variables. It is from them that we define the complexity of other objects. Remark 2.2. As explained in Section 1.1, the idea was to define the complexity of arbitrary sets in our space X by how they relate to some “basic structured sets” belonging to a given family C. While this is indeed the spirit of our definitions, in order to apply the energy-increment method it is better to work with factors of Bmax instead of subsets of X; this is why the basic objects with which we define complexity are factors instead of sets. It may be instructive to think of these basic factors in S as each being generated by a single basic structured set in C.
Definition 2.2 (Complexity). We say that a factor B ⊆ Bmax has S-complexity at most M, and denote this by complexS (B) ≤ M, if it may be written as the join B = Y1 ∨ · · · ∨ Ym of m factors Yi ∈ S for some m ≤ M.
2 Definition 2.3 (Pseudorandomness). Given ε > 0, we say that a function f ∈ L (Bmax) is ε-pseudorandom according to S if kE[f|Y]kL2 ≤ ε holds for all Y ∈ S. Intuitively, a function f : X → R is pseudorandom if it has negligible correlation with all structured factors. This may be seen by using the Cauchy-Schwarz inequality: for every set A ⊆ X which is measurable by some structured factor Y ∈ S, we have
|E[f1A]| = |E [E [f|Y] 1A]| ≤ kE[f|Y]kL2 k1AkL2 ≤ kE[f|Y]kL2 , (2.2) and so |E[f1A]| ≤ ε if f is ε-pseudorandom. 2.3. Weak decompositions 9
2.3 Weak decompositions
To ease the presentation, we will give in this chapter the general theorems for “one-dimensional” functions, i.e., functions whose image is in R. However, this restriction is not of importance in the proofs, and the theorems may then be straightforwardly generalized to functions whose image is in Rk. We will say more about this generalization on a later chapter, where we will need a “multi-dimensional” structure theorem in order to prove the Hypergraph Regularity Lemma. From the definitions presented last section, we get the following energy-increment result:
2 Lemma 2.2 (Lack of pseudorandomness implies energy increment [33]). Let f ∈ L (Bmax), ε > 0 and B ⊆ Bmax be such that f − E[f|B] is not ε-pseudorandom. Then there exists a factor Y ∈ S such that 2 2 2 kE[f|B ∨ Y]kL2 > kE[f|B]kL2 + ε
2 2 Proof. By hypothesis, there exists Y ∈ S such that kE [f − E [f|B] |Y]kL2 > ε . By Pythagoras’ theorem, this implies that 2 2 kE [f − E [f|B] |B ∨ Y]kL2 > ε By Pythagoras’ theorem again, we have
2 2 kE [f − E [f|B] |B ∨ Y]kL2 = kE [f|B ∨ Y] − E [f|B]kL2 2 2 = kE [f|B ∨ Y]kL2 − kE [f|B]kL2 ,
2 2 2 and so kE [f|B ∨ Y]kL2 > kE [f|B]kL2 + ε . To draw a parallel with Graph Theory, note that on the bipartite graph setting if [ B : V1 × V2 = Xi × Yj i,j is a (product) partition of V1 × V2, then 2 2 X X |Xi| |Yj| eG(Xi,Yj) k [f|B]k 2 = E L |V | |V | |X × Y | i j 1 2 i j is the so-called index of the partition B, which is an essential ingredient in the usual proof of the Regularity Lemma; here, it will represent the “energy” we wish to maximize and is at the core of our energy-increment arguments. By a simple iteration of Lemma 2.2, we easily obtain a “weak” decomposition theorem (which, following Tao, we will call the Weak Structure Theorem):
2 Lemma 2.3 (Weak Structure Theorem [33]). Let f ∈ L (Bmax) be such that kfkL2 ≤ 1, let B be a factor of Bmax and let 0 < ε ≤ 1. Then there exists a decomposition f = fstr + fpsd where:
2 • fstr = E[f|B ∨ Z] for some factor Z of S-complexity less than 1/ε
• fpsd is ε-pseudorandom according to S
2 Proof. We will choose factors Y1, Y2, ··· , Ym ∈ S, for some m < 1/ε , using the following algorithm (which relies on Lemma 2.2): – Step 0: Initialize i = 0
– Step 1: Define Z := Y1 ∨ · · · ∨ Yi, fstr := E[f|B ∨ Z] and fi := f − fstr
– Step 2: If fi is ε-pseudorandom, let fpsd := fi and STOP. Otherwise, choose Yi+1 ∈ S such 2 2 2 that kE[f|B ∨ Z ∨ Yi+1]kL2 > kE[f|B ∨ Z]kL2 + ε – Step 3: Increment i to i + 1 and return to Step 1
2 Since the energy kE[f|B ∨ Z]kL2 is bounded between 0 and 1 (by the hypothesis kfkL2 ≤ 1) and increments by more than ε2 at each iteration, the algorithm must terminate in less than 1/ε2 iterations and the lemma follows. 10 Chapter 2. Abstract decomposition theorems
2.3.1 Weak Regularity Lemma As a simple and important application of Lemma 2.3, we will prove Frieze and Kannan’s Weak Regularity Lemma: Theorem 2.1 (Weak Regularity Lemma [12]). For every ε > 0 and every graph G = (V,E), 2/ε2 there exists a partition P : V = V1 ∪ · · · ∪ Vk of V into k ≤ 2 parts satisfying
X eG(Vi,Vj) 2 eG(A, B) − |A ∩ Vi||B ∩ Vj| ≤ ε|V | , ∀A, B ⊆ V (2.3) |Vi × Vj| i,j∈[k]
Remark 2.3. A partition P satisfying (2.3) is called a weak ε-regular partition of V . Proof. This is basically a restatement of the Weak Structure Theorem specialized to the graph setting. The probability space here is given by (V ×V, P, 2V ×V ), with P being the uniform probability distribution over V × V . We define the structured set S := {σ (A × V,V × B): A, B ⊆ V }, which is chosen so that any product set A×B ⊆ V ×V will be measurable in some factor Y ∈ S, and any factor Y ∈ S will have only product sets as its atoms. By the Weak Structure Theorem applied to the edge indicator function 1G and the trivial σ-algebra B = {∅,V × V }, we obtain a factor Z of S-complexity at most 1/ε2 for which
kE [1G − E [1G|Z] |Y]kL2 ≤ ε, ∀Y ∈ S By construction, the factor Z will be a product σ-algebra, and each “coordinate” of Z induces a 2 partition of V into at most 21/ε atoms; we refine these partitions and so obtain a single partition 2/ε2 P : V = V1 ∪ · · · ∪ Vk into k ≤ 2 parts. Since for any sets A, B ⊆ V there exists a structured factor Y ∈ S for which A × B ∈ Y, by Cauchy-Schwarz we obtain
max E [(1G − E [1G|Z])1A×B] = max E [(E [1G − E [1G|Z] |σ(A × V,V × B)]) 1A×B] A,B⊆V A,B⊆V
≤ max kE [1G − E [1G|Z] |Y]k 2 ≤ ε Y∈S L
We then finish the proof by noting that 1 X eG(Vi,Vj) E [(1G − E [1G|Z])1A×B] = 2 eG(A, B) − |A ∩ Vi||B ∩ Vj| |V | |Vi × Vj| i,j∈[k]
2.4 Strong decompositions
The Weak Structure Theorem (Lemma 2.3) already gives an interesting and non-trivial decom- position result, but its applications are limited because the pseudorandomness of the component fpsd is relatively weak compared to the complexity bound we have for the component fstr. As already noted before, the way to increase our control on both of these terms simultaneously is to allow for a small error term ferr on the decomposition. This is done in the following theorem:
2 Theorem 2.2 (Strong Structure Theorem [33]). Let f ∈ L (Bmax) be such that kfkL2 ≤ 1, let 0 < ε ≤ 1 and let F : R+ → R+ be an arbitrary increasing function. Then there exists an integer M = Oε,F (1) and a decomposition f = fstr + fpsd + ferr such that:
• fstr = E[f|B], for some factor B of S-complexity at most M 0 0 • fpsd = f − E[f|B ], where B ⊆ B ⊆ Bmax, is 1/F (M)-pseudorandom 0 • ferr = E[f|B ] − E[f|B] satisfies kferrkL2 ≤ ε Proof. By increasing F if necessary, we may assume that F (x) ≥ x + 1 for all x ∈ R+, and that F is strictly increasing. 2.4. Strong decompositions 11
We will recursively define real numbers M0 < M1 < M2 < ··· and factors B0 ⊆ B1 ⊆ B2 ⊆ · · · ⊆ Bmax in the following way. First, set M0 := 0 and B0 := {∅,X}. Then, for every i ≥ 1, use Lemma 2.3 with ε being 1/F (Mi−1) and B being Bi−1 to obtain a factor Zi of 2 S-complexity at most F (Mi−1) such that f − E[f|Bi−1 ∨ Zi] is 1/F (Mi−1)-pseudorandom; set 2 then Mi := Mi−1 + F (Mi−1) and Bi := Bi−1 ∨ Zi. Note that complexS (Bi) ≤ Mi for all i ≥ 0. By Pythagoras’ theorem and the hypothesis 2 kfkL2 ≤ 1, the energy kE[f|Bi]kL2 is increasing and is bounded between 0 and 1. By the pigeonhole principle, we may then find an index 1 ≤ j ≤ 1/ε2 such that
2 2 2 kE [f|Bj]kL2 − kE [f|Bj−1]kL2 ≤ ε
By Pythagoras’ theorem, this implies that kE [f|Bj] − E [f|Bj−1]kL2 ≤ ε. We may then set fstr := E [f|Bj−1], fpsd := f − E [f|Bj], M := Mj−1 and ferr := E [f|Bj] − E [f|Bj−1] to obtain the claim. We remark that the upper bound we get for M in this theorem is extremely large. Indeed, this bound is obtained by iteratively applying 1/ε2 times the transformation x 7→ x + F (x)2 starting from x = 0. If F is an exponential function (as will be the case in the proof of Szemer´edi’sRegularity Lemma), then we obtain an exponential tower of height Θ(ε−2). Unfortunately, as we will briefly discuss in Remark 2.5 at the end of this chapter, these terrible bounds cannot be improved in general.
2.4.1 Szemer´ediRegularity Lemma Using the Strong Structure Theorem, we may now prove Szemer´edi’scelebrated Regularity Lemma:
Theorem 2.3 (Szemer´ediRegularity Lemma [31]). For every ε > 0 and k0 ≥ 1, there exists an integer K0 such that the following holds. Every graph G = (V,E) with |V | ≥ K0 admits a partition P : V = V0 ∪ V1 ∪ · · · ∪ Vk of its vertex set with the following properties:
• k0 ≤ k ≤ K0
•| V0| < ε|V | and |V1| = |V2| = ··· = |Vk|
2 • all but at most εk of the pairs (Vi,Vj) are ε-regular
Remark 2.4. The set V0 is called the exceptional set, and a partition P satisfying the second and third properties above is called an ε-regular partition of V .
Proof. We will apply the Strong Structure Theorem in the graph setting: X = V × V , Bmax = 2V ×V and uniform probability distribution P; this is the space generated by selecting pairs of vertices (x, y) from V 2 independently and uniformly at random. The set of basic structured objects will be S := {σ(A × V,V × B): A, B ⊆ V }, which represents the information of whether vertex x belongs to a subset A and whether y belongs to a subset B of V . Define the error parameter α and the function F by
ε3/2 8 22x 2 α := ,F (x) := k + (2.4) 8 ε 0 α
Applying the Strong Structure Theorem to the edge indicator function 1G, with ε substituted by α and the function F as defined above, we obtain an integer M = Oα,F (1) and a pair of factors B ⊆ B0 ⊆ 2V ×V such that:
– complexS (B) ≤ M 0 – kE [1E − E[1E|B ]|Y]kL2 ≤ 1/F (M) ∀Y ∈ S 0 – kE[1E|B ] − E[1E|B]kL2 ≤ α 12 Chapter 2. Abstract decomposition theorems
We note that, for any function f and any Y-measurable set Y , we can apply Cauchy-Schwarz to obtain
|E [f1Y ]| = |E [E [f|Y] 1Y ]| ≤ kE [f|Y]kL2 k1Y kL2 ≤ kE [f|Y]kL2 By the definition of the set S, the second property then implies that 1 | [(1 − [1 |B0]) 1 ]| ≤ ∀A, B ⊆ V (2.5) E E E E A×B F (M)
By construction, the factor B is a product σ-algebra and each coordinate of B is generated by at most complexS (B) ≤ M sets. Thus, each coordinate of B induces a partition of V into at most 2M parts, and their common refinement (which we will call P) partitions V into at most 22M atoms. We further refine this partition into a new more “equitable” partition V = V0∪V1∪· · ·∪Vk in such a way that:
– k0 ≤ k = OM,α(1)
– |V0| < α|V | and |V1| = |V2| = ··· = |Vk|
– each non-exceptional set Vi is entirely contained within an atom of P
2M This may be accomplished by setting k := max{k0, d2 /αe}, partitioning V greedily inside the atoms of P, and uniting all remaining vertices into the exceptional set V0; this way we have that |V | |V | − |V | (1 − α)|V | |V | ≤ − 1 22M < α|V | and |V | = 0 ≥ ∀i ∈ [k] 0 k i k k
As each Vi ×Vj is contained inside an atom of B, we see that E[1G|B] is constant over Vi ×Vj for all i, j ∈ [k]; let di,j denote this value. To show that this pair (Vi,Vj) is ε-regular, that is
eG(Vi,Vj) eG(A, B) − |A × B| ≤ ε|Vi × Vj| ∀A ⊆ Vi,B ⊆ Vj, |Vi × Vj| by the triangle inequality it suffices to show that ε |e (A, B) − d |A × B|| ≤ |V × V | ∀A ⊆ V ,B ⊆ V G i,j 2 i j i j Dividing by |V |2, this is equivalent to
ε|V × V | | [(1 − [1 |B])1 ]| ≤ i j E G E G A×B 2|V |2
2 2 2 Since |Vi × Vj| ≥ |V | /2k , it suffices to show that |E [(1G − E[1G|B])1A×B] | ≤ ε/4k . By our choice of F and inequality (2.5), we have
−2 ε 22M ε | [(1 − [1 |B0])1 ] | ≤ k + ≤ , E G E G A×B 8 0 α 8k2 so by the triangle inequality it is sufficient to prove that ε | [( [1 |B0] − [1 |B]) 1 ]| ≤ E E G E G A×B 8k2
2 2 2 Because k1A×BkL2 ≤ k1Vi×Vj kL2 ≤ 1/k , by Cauchy-Schwarz it suffices to show that
h i ε2 ( [1 |B0] − [1 |B])2 1 ≤ E E G E G Vi×Vj 64k2
2 This last inequality must be satisfied by all but at most εk pairs (Vi,Vj), otherwise we will have h i ε2 ε3 ( [1 |B0] − [1 |B])2 > εk2 · = = α2, E E G E G 64k2 64 0 contradicting the fact that kE [1G|B ] − E [1G|B]kL2 ≤ α, and finishing the proof. 2.4. Strong decompositions 13
Repeating the same proof but starting with a non-trivial partition of the vertex-set into at most k0 parts, we can make sure the final regular partition of the graph refines a given partition, which is important in some applications of the Regularity Lemma (especially when dealing with partite graphs). Also, by applying this theorem with a somewhat smaller value for ε and then equitably redistributing the vertices of the exceptional set V0 into the other parts, we can easily obtain a (nearly) equitable partition of V without an exceptional class which satisfies the same regularity conditions. We present these simple remarks as a corollary bellow:
Corollary 2.1. For every ε > 0 and k0 ≥ 1, there exists an integer K0 such that the following holds. For every graph G = (V,E) and every equitable partition P0 of V into at most k0 parts, there exists a partition P : V = V1 ∪ V2 ∪ · · · ∪ Vk which refines P0 and satisfies the following properties:
• k0 ≤ k ≤ K0
• ||Vi| − |Vj|| ≤ 1 for all i, j ∈ [k]
2 • all but at most εk of the pairs (Vi,Vj) are ε-regular Observe that our proof of the Szemer´ediRegularity Lemma relied on the decomposition 1G = fstr + fpsd + ferr given by the Strong Structure Theorem (Theorem 2.2). As noted in the beginning of Chapter1, the structured component fstr = E [1G|B] gave us our regular partition of the vertex set and the edge densities di,j between their classes, while the pseudorandom 0 component fpsd = 1G − E [1G|B ] gave us the random-like distribution of the actual edges of G inside pairs of classes in a finer partition given by B0. In here we also have the error term ferr, which essentially gives the difference in edge densities between pairs of the regular partition we constructed and those of the finer partition given by B0, and it is this term which is responsible for the (possible) existence of up to εk2 irregular pairs. It is then a natural question whether the presence of these irregular pairs is truly necessary or may be eliminated. It turns out that these irregular pairs are necessary: for a given n ∈ N, let us define the half graph on 2n vertices as the bipartite graph Hn with vertex classes A = {a1, ··· , an} and B = {b1, ··· , bn}, in which aibj is an edge if and only if i ≤ j. As was noted in [3], the half-graphs give us an infinite family of graphs for which every ε-regular partition of their vertex sets into k parts must contain many irregular pairs (at least ck, for some c > 0 depending only on ε). As the presence of irregular pairs comes from the error component ferr, this justifies our claim made at the beginning of this chapter that the existence of the error component is necessary if we wish to prove the Regularity Lemma. Remark 2.5. As already noted above, because the function F chosen in the proof of the Reg- ularity Lemma has exponential growth, the upper bound K0 we obtain for the number of parts in the regular partition is an exponential tower of height Θ(ε−2). By an ingenious probabilistic construction, Gowers [15] was able to prove a lower bound on K0 which was a tower of exponents of height Θ(ε−1/48). More recently, Fox and Mikl´osLov´asz[10] were able to improve this bound and show that the upper bound given by our proof is in fact tight, as there exist graphs where any ε-regular partition must have a number of parts at least as big as a tower of exponents of height Θ(ε−2).
15
Chapter 3
Counting subgraphs and the Graph Removal Lemma
The regularity lemmas we have seen may be viewed as approximation results stating that every graph G can be approximated by some structure of bounded complexity. This structure is essentially a “rounded” version of G, and may be identified with a weighted graph on the same vertex set as G whose weights are given by the edge densities between the classes of the regular partition. With this interpretation in mind, we will use the following notation:
Definition 3.1 (Rounded graph). If G = (V,E) is a graph and P : V = V1 ∪ · · · ∪ Vk is a 2 partition of V , we denote by GP := E[1G|P ⊗ P] the function which maps each pair (x, y) ∈ V to the edge density dG(Vi,Vj) between the classes Vi 3 x and Vj 3 y. The notion of approximation given by the regularity lemmas is related to indistinguishability by cuts. More precisely, following the discussion in Section 1.1, we say that G and GP are ε-indistinguishable by cuts if |Ex,y∈V [(1G(x, y) − GP (x, y))1A×B(x, y)]| ≤ ε holds for all cuts A × B ⊆ V × V . This suggests us to work with the following norm, which will greatly simplify the presentation and the proofs of some results to follow:
Definition 3.2 (Cut norm). Let V1,V2 be (not necessarily distinct) finite sets. Given any function f : V1 × V2 → R, we define the cut norm of f as
kfk := max |Ex∈V1,y∈V2 [f(x, y)1A×B(x, y)]| (3.1) A⊆V1,B⊆V2
It is easy to see that the cut norm is indeed a norm. Using this notation, our definition of ε-regularity of a graph G is equivalent to the inequality k1G − [1G]k ≤ ε. Moreover, a E partition P of V (G) is weak ε-regular for G if and only if k1G − GP k ≤ ε. One can easily obtain the following equivalent expression for the cut norm, which will prove to be useful below:
kfk = max |Ex∈V1,y∈V2 [f(x, y)a(x)b(y)]| (3.2) a:V1→[0,1] b:V2→[0,1] Indeed, since the expectation above is bilinear in a and b, the extrema occur when a and b are {0, 1}-valued and so (3.1) and (3.2) are equivalent. The counting lemmas proven in the next sections are standard in Graph Theory, and concern approximately counting copies of a fixed graph H inside a large graph G using only the infor- mation given by the rounded graph GP (we define a copy of H in G as being a subgraph of G which is isomorphic to H).
3.1 Counting subgraphs globally
Instead of counting copies of a subgraph H in G, it will be more convenient (and essentially equivalent when G is large) to count homomorphisms from H to G. Definition 3.3 (Homomorphism). Given two graphs G and H, an homomorphism from H to G is a map ϕ : V (H) → V (G) which preserves adjacency between vertices:
∀x, y ∈ V (H), {x, y} ∈ E(H) ⇒ {ϕ(x), ϕ(y)} ∈ E(G) 16 Chapter 3. Counting subgraphs and the Graph Removal Lemma
We denote the number of homomorphisms from H to G by hom(H,G).
Denoting by n the number of vertices in G and by h the number of vertices in H, we note that hom(H,G) differs from the number of labeled copies of H in G by at most nh−1, which becomes negligible when compared to nh as n grows large; this justifies our claim that counting copies of H in a large graph G is essentially equivalent to counting homomorphisms from H to G. We then have the following lemma, which roughly says we can count copies of H inside G up to an o(nh) additive error by only knowing a weak o(1)-regular partition for G: Lemma 3.1 (Global counting lemma). Let H be a graph with vertex set V (H) = [h] and ε > 0 be a positive number. Then for any graph G = (V,E) with |V | = n and any partition P = (Vi)i∈[k] of V which is weak ε-regular for G, we have
X Y Y h hom(H,G) = dG(Vφ(i),Vφ(j)) |Vφ(i)| ± ε|E(H)|n , φ:[h]→[k] ij∈E(H) i∈[h] where the sum is over all functions from [h] to [k].
Proof. By definition, for any x1 ∈ Vφ(1), x2 ∈ Vφ(2), ··· , xh ∈ Vφ(h) and any i, j ∈ [h], we have GP (xi, xj) = dG(Vφ(i),Vφ(j)). It follows that Y Y Y Y dG(Vφ(i),Vφ(j)) 1xi∈Vφ(i) = GP (xi, xj) 1xi∈Vφ(i) (3.3) ij∈E(H) i∈[h] ij∈E(H) i∈[h]
We note also that Y X Y |Vφ(i)| = 1xi∈Vφ(i) (3.4)
i∈[h] x1,··· ,xh∈V i∈[h] Using equations (3.3) and (3.4), we obtain X Y Y dG(Vφ(i),Vφ(j)) |Vφ(i)| φ:[h]→[k] ij∈E(H) i∈[h] X Y X Y = dG(Vφ(i),Vφ(j)) 1xi∈Vφ(i)
φ:[h]→[k] ij∈E(H) x1,··· ,xh∈V i∈[h] X X Y Y = GP (xi, xj) 1xi∈Vφ(i)
φ:[h]→[k] x1,··· ,xh∈V ij∈E(H) i∈[h] X X Y Y = 1xi∈Vφ(i) GP (xi, xj)
x1,··· ,xh∈V φ:[h]→[k] i∈[h] ij∈E(H) X Y = GP (xi, xj)
x1,··· ,xh∈V ij∈E(H)
It then suffices to prove the inequality
Y Y Ex1,··· ,xh∈V 1G(xi, xj) − GP (xi, xj) ≤ ε|E(H)|
ij∈E(H) ij∈E(H)
We will prove this by a simple telescoping sum argument, which was given in [5]. Let us arbitrarily label the edges of H by {e1, e2, ··· , e|E(H)|}; we then have the identity
Y Y 1G(xi, xj) − GP (xi, xj) ij∈E(H) ij∈E(H)
|E(H)| t−1 ! |E(H)| X Y Y = GP (xer ) (1G(xet ) − GP (xet )) 1G(xes ) t=1 r=1 s=t+1 3.2. Counting subgraphs locally and the Removal Lemma 17
Suppose for notational convenience that et = {1, 2}; then for any fixed x3, ··· , xh ∈ V we have
t−1 ! |E(H)| Y Y Ex1,x2∈V GP (xer ) (1G(x1, x2) − GP (x1, x2)) 1G(xes )
r=1 s=t+1
≤ |Ex1,x2∈V [(1G(x1, x2) − GP (x1, x2)) at(x1)bt(x2)]| , where at and bt are the functions given by Y Y Y Y at(x1) := GP (xer ) 1G(xes ) and bt(x2) := GP (xer ) 1G(xes ). r
By hypothesis we have that k1 − G k ≤ ε, so by equation (3.2) the expression on the right G P is at most ε for all fixed x3, ··· , xh. Applying this same reasoning for all edges {i, j} ∈ E(H) and using the triangle inequality, we obtain
Y Y Ex1,··· ,xh∈V 1G(xi, xj) − GP (xi, xj)
ij∈E(H) ij∈E(H) |E(H)| t−1 ! |E(H)| X Y Y ≤ Ex1,··· ,xh∈V GP (xes ) (1G(xet ) − GP (xet )) 1G(xes )
t=1 s=1 s=t+1 ≤ ε|E(H)|
Using this lemma and a new efficient algorithm for finding a weak regular partition of a graph, Fox, Mikl´osLov´aszand Zhao [11] recently obtained a deterministic algorithm running in time O(ε−OH (1)n2) which, for any given graph G on n vertices, finds the number of copies of H in G up to an error of at most εnh. Note that the bounds obtained by the Weak Regularity Lemma (Theorem 2.1) are exponential in 1/ε, while the running time of the algorithm mentioned is polynomial in 1/ε (for any fixed graph H). This is possible because the weak ε-regular partition obtained in the proof of Theorem 2.1 is actually generated by only 2/ε2 sets, and it is possible to use these generating sets in the algorithm for computing copies of H instead of the partition they induce on V .
3.2 Counting subgraphs locally and the Removal Lemma
The Global Counting Lemma proven last section allows us to count the total number of copies of a given graph H inside a large graph G up to an additive error which is small when compared to |V (G)||V (H)|, the number of maps from V (H) to V (G). However, due to the global nature of this counting, this lemma is unsuitable for the purpose of counting copies of H inside a small subset of V (G), which will be needed in the proof of the Graph Removal Lemma. For this we need a somewhat stronger counting lemma, which counts the copies of H locally instead of globally. The next definition makes this idea of “local counting” more precise.
Definition 3.4 (Canonical homomorphisms). Let H be a graph on [h] and V1, ··· ,Vh be (not necessarily distinct) vertex sets. If G is a graph on V1 ∪ · · · ∪ Vh, then a homomorphism ϕ from H to G is said to be canonical if it maps each i ∈ [h] to a vertex in the corresponding set Vi. We then denote by hom∗(H,G) the number of canonical homomorphisms from H to G. The corresponding counting lemma for canonical homomorphisms is the following: Lemma 3.2 (Local Counting Lemma). Let H be a graph with vertex set V (H) = [h], and let V1, ··· ,Vh be (not necessarily distinct) vertex sets. If G is a graph on V1 ∪ · · · ∪ Vh for which ∗ the pair (Vi,Vj) is ε-regular whenever ij ∈ E(H), then the number hom (H,G) of canonical 18 Chapter 3. Counting subgraphs and the Graph Removal Lemma homomorphisms of H in G satisfies
∗ Y Y Y hom (H,G) − dG(Vi,Vj) |Vi| ≤ ε|E(H)| |Vi|
ij∈E(H) i∈[h] i∈[h] The proof of Lemma 3.2 is very simple and proceeds by the same telescoping sum argument as that used in the proof of Lemma 3.1, but now using the inequality
Exi∈Vi,xj ∈Vj [(1G(xi, xj) − dG(Vi,Vj)) a(xi)b(xj)] ≤ ε, valid for all {i, j} ∈ E(H) and all functions a : Vi → [0, 1], b : Vj → [0, 1]. Note the asymmetry in the conditions required for each of the lemmas given above: while in the Global Counting Lemma we only need to know a weak ε-regular partition for the graph G, in the Local Counting Lemma we require all pairs (Vi,Vj) for which ij ∈ E(H) to be ε-regular, which is a much stronger condition. The reason for this asymmetry is that the Global Counting Lemma only estimates the average ∗ ∗ of homφ(H,G) over all mappings φ :[h] → [k] (where homφ(H,G) denotes the number of homomorphisms where each i ∈ [h] is mapped to a vertex xi ∈ Vφ(i)), while the Local Counting Lemma is equivalent to estimating a single one of these values. Using Lemma 3.2, we may now state and prove the Graph Removal Lemma: Theorem 3.1 (Graph Removal Lemma [9]). For every graph H and every ε > 0, there exist n0 ∈ N and δ > 0 such that the following holds. Any graph G on n ≥ n0 vertices which contains at most δn|V (H)| copies of H may be made H-free by removing at most εn2 edges. Before proving Theorem 3.1, it is important to make an observation. This theorem may be informally though of as saying that, if a given large graph G contains “few” copies of a fixed graph H, then we can destroy all these copies of H by removing “few” edges from G. However, while this is indeed a valid interpretation of the result, it hides the crucial fact that the two occurrences of the word “few” are actually very different in nature. Indeed, if G has n vertices, then “few” copies of H mean a small constant times n|V (H)|, while “few” edges mean a 2 7/2 small constant times n . It is not at all clear why, say, 2n copies of the complete graph K4 on four vertices cannot be fitted inside G in such a way to obtain at least n2/1000 edge-disjoint copies of K4.
n ε ε|E(H)| o Proof. Let h be the number of vertices of H, and define γ := min 4 , 2|E(H)| . We will prove |E(H)| −1 −h −1 the theorem for δ := ε (4h!) (2K(γ)) and n0 := δ , where K(γ) is the bound given by the Szemer´ediRegularity Lemma (Theorem 2.3) for error parameter γ and lower bound m = 1. Suppose that G has at most δnh copies of H. Applying the Szemer´ediRegularity Lemma to G with error parameter γ and lower bound m = 1, we obtain a γ-regular partition V = V0 ∪ V1 ∪ · · · ∪ Vk into k + 1 ≤ K(γ) + 1 parts. We now construct a subgraph G0 of G by deleting the following edges:
– Edges incident to a vertex in the exceptional set V0;
– Edges between pairs (Vi,Vj) which are not γ-regular;
– Edges between pairs (Vi,Vj) with edge-density at most ε. The number of edges deleted is then at most
n2 n γn · n + γk2 · + ε ≤ εn2 k 2
If G0 does not contain a copy of H we are done; suppose for contradiction that G0 contains a copy of H, and fix such a copy. For every i ∈ [h], let φ(i) ∈ [k] be the index of the set Vφ(i) which contains the i-th vertex of this copy of H in G0; applying the local counting lemma (Lemma 3.2) on the graph G0 and 3.3. Application to property testing 19
vertex sets Vφ(1), ··· ,Vφ(h), we obtain the existence of at least h Y Y |E(H)| (1 − γ)n d 0 (V ,V ) − γ|E(H)| |V | ≥ (ε − γ|E(H)|) G φ(i) φ(j) φ(i) k ij∈E(H) i∈[h] ε|E(H)| n h ≥ 2 2K(γ) = 2h!δnh homomorphisms of H in G0, and hence at least 2h!δnh − nh−1 > h!δnh labeled copies of H in G0. This contradicts the fact that G has at most δnh copies of H, completing the proof.
3.3 Application to property testing
As a simple application of Theorem 3.1 we will use it to give an interesting result in the area of property testing, namely that the property of being H-free is testable for any graph H. Let us first briefly recall some definitions given in Section 1.3. Given a graph property P and a constant ε > 0, we say that a graph G on n vertices is ε-far from satisfying P if one needs to add and/or delete at least εn2 edges to G in order to turn it into a graph that satisfies P. An ε-test for P is a randomized algorithm which, by making only Oε(1) edge queries, can distinguish with probability at least 2/3 between graphs satisfying P and graphs that are ε-far from satisfying P. Finally, a graph property P is testable if, for any given ε > 0, there exists an ε-test for P. For a given graph H, we then say that a graph G satisfies the property of being H-free if it has no subgraph which is isomorphic to H. Let us now use Theorem 3.1 to show that the property of being H-free is testable for any fixed graph H. Let δ = δ(ε) be the constant obtained in Theorem 3.1 applied to a given graph H and quantity ε > 0. Our ε-test for H-freeness then proceeds as follows: take k := d2/δe sets of h := |V (H)| vertices each from G, chosen uniformly and independently at random, and declare G to be H-free if and only if none of them contains a copy of H. If G is indeed H-free, it is clear that we won’t obtain a copy of H this way, so the algorithm will give the correct answer with probability 1. If G is ε-far from being H-free, by Theorem 3.1 there exist at least δnh copies of H in G. Since each set of h vertices in G can contain at most h! copies of H, it follows that the probability of finding a copy of H inside a uniformly chosen set of h vertices in G is at least δnh/h! n ≥ δ h The probability that the algorithm errs in this case (that is, fails to find a copy of H) is then at most (1 − δ)k ≤ e−δk ≤ e−2 < 1/3, proving this algorithm is indeed an ε-test. Using a stronger variant of the Regularity Lemma which is very close in spirit to our Strong Structure Theorem (when specialized to the graph setting), Alon, Fischer, Krivelevich and M. Szegedy [2] were able to prove the natural generalization of the Graph Removal Lemma for induced subgraphs (where we are now allowed to add and/or remove up to εn2 edges in order to destroy all induced copies of H in G), and thus that being induced H-free is also testable. They in fact obtained a much stronger result in property testing, as we will briefly discuss in the introductory part of the next chapter.
21
Chapter 4
Extensions of graph regularity
There have been various extensions of Szemer´edi’sRegularity Lemma for graphs (see [27] for some of them), which involve many of the same principles and ideas as the original Regularity Lemma but are each tailored for a specific application. We have already seen Frieze and Kannan’s Weak Regularity Lemma (Theorem 2.1), which has a weaker notion of regularity but provides much better bounds for the size of the regular partitions, and may thus be used for algorithmic purposes (see, for instance, [12] and [11]). In the opposite direction, there is the Strong Regularity Lemma of Alon, Fischer, Krivelevich and M. Szegedy [2] mentioned at the end of the last chapter, which provides partitions with stronger regularity properties at the cost of worsening still the upper bound we get for the order of these partitions. Roughly speaking, given a constant ν > 0 and any decreasing function : N → (0, 1], this lemma provides two equitable partitions P ⊂ Q of the graph such that P is ν-regular, Q is (|P|)-regular, and P is ν-close to Q in some sense which is related to the edge densities between the classes inside each partition. This lemma was originally obtained in order to prove a far-reaching generalization of the Induced Graph Removal Lemma for a family of colored graphs, and with this result (and some additional ideas) prove that every first-order graph property not containing a quantifier alterna- tion of the form ∀∃ is testable (in the sense described in Section 1.3). It isn’t hard to prove the Strong Regularity Lemma by iterating our Strong Structure Theorem or the Szemer´ediRegularity Lemma (we will essentially do it in our proof of Theorem 4.1), but its statement is somewhat technical and we refrain from giving it in order not to make our presentation too repetitive. We only remark that, even on the tame case when the function (k) is a polynomial in 1/k, the bound we get for the size of the partitions P and Q is a wowzer-type function (one level higher in the Ackermann hierarchy than the exponential tower function) in a power of 1/ν. Moreover, this cannot be improved (see [6]). In this chapter we will focus on two variants of the Regularity Lemma which have a somewhat distinct flavor.
4.1 Regular approximation
As noted in Remark 2.5, the number of parts in an ε-regular partition of an arbitrary graph cannot be guaranteed to be smaller that a tower of exponents of height Θ(ε−2); even a weak −2 ε-regular partition can only be guaranteed to have size at most 2Θ(ε ) [6], which is still a reasonably large function of ε. Despite these results, the next theorem allows us to have an arbitrarily good control on the size of the regular partition in terms of its regularity parameter, as long as we are allowed to modify a small fraction of the edges of the graph. It was first obtained by R¨odland Schacht [26] in a more general form (stated for uniform hypergraphs), and is a byproduct of the hypergraph generalization of the Regularity Lemma, which we will see in the next chapter. Theorem 4.1 (Regular Approximation Lemma [26]). For every ν > 0 and every function : N → (0, 1] there exist integers n0 and K0 such that the following holds. For every graph G = (V,E) on |V | = n ≥ n0 vertices, there exists an equitable partition V = V1 ∪ V2 ∪ · · · ∪ Vk 0 into k ≤ K0 parts and a graph H = (V,E ) on the same vertex set as G such that: •| E4E0| ≤ νn2
• On the graph H, every pair (Vi,Vj) is (k)-regular 22 Chapter 4. Extensions of graph regularity
Proof. We will apply an energy-increment argument similar to that used in the proof of Theorem 2.2, the main difference being that we now iterate the Szemer´ediRegularity Lemma (in the form which it appears in Corollary 2.1) instead of Lemma 2.3. Start with the trivial partition P0 := {V } and k0 := |P0| = 1. For each i ≥ 0, having Pi and ki already known, apply Corollary 2.1 to the graph G with initial partition Pi and error parameter (ki)/4. We then obtain an equitable partition Pi+1 of size ki+1 = Oki,(ki)(1) which 2 refines Pi and such that all but at most (ki)ki+1/4 pairs of classes in Pi+1 are (ki)/4-regular for G. Since each Pi+1 refines Pi, if we denote by Pi ⊗ Pi the product σ-algebra on V × V generated 2 by the products of classes in Pi, we see by Pythagoras’ theorem that kE [1G | Pi ⊗ Pi]kL2 is increasing and bounded between 0 and 1. By the pigeonhole principle, there must exist a value of j ≤ 9/ν2 such that
ν2 k [1 | P ⊗ P ]k2 ≤ k [1 | P ⊗ P ]k2 + (4.1) E G j+1 j+1 L2 E G j j L2 9
For such a value of j, let us denote P := Pj and Q := Pj+1, and note that k := |Pj| is bounded by a function depending only on (·) and ν. By Pythagoras’ theorem and equation (4.1), we conclude that ν k [1 | P ⊗ P] − [1 | Q ⊗ Q]k ≤ E G E G L2 3 We relabel the classes of the partitions P and Q in such a way that [ P = (Vi)i∈[k], Q = (Vi,r)i∈[k],r∈[m], and Vi = Vi,r for all i ∈ [k]. r∈[m]
Since all refining partitions Pi in our argument are required to be equitable and they have bounded order, this can always be done if |V | = n is sufficiently large. Now, for every pair of classes (Vi,r,Vj,s) with i, j ∈ [k], r, s ∈ [m], we add or delete edges randomly to change the (expected) density of this pair to dG(Vi,Vj). Note that the expected number of changed edges in each pair (Vi,r,Vj,s) is
|dG (Vi,Vj) − dG (Vi,r,Vj,s)| · |Vi,r||Vj,s| (4.2)
Because for every (x, y) ∈ Vi,r × Vj,s we have that E [1G|Q ⊗ Q](x, y) = dG(Vi,r,Vj,s) and E [1G|P ⊗ P](x, y) = dG(Vi,Vj), it follows from (4.2) that the expected total number of edges changed in G is X X |dG (Vi,Vj) −dG (Vi,r,Vj,s) | · |Vi,r||Vj,s| i,j∈[k] r,s∈[m] 2 = n Ex,y∈V [ |E [1G|P ⊗ P](x, y) − E [1G|Q ⊗ Q](x, y)| ] 2 = n kE [1G|P ⊗ P] − E [1G|Q ⊗ Q]kL1 νn2 ≤ 3 By concentration inequalities, this value will be less than νn2/2 with high probability. Moreover, it follows from the Chernoff bound (see the proof of Lemma 7.2 in Chapter7) that, after these changes, with high probability all pairs (Vi,r,Vj,s) which were (k)/4-regular in G will be (k)/2-regular and have density dG(Vi,Vj) ± (k)/4. 2 2 By definition, at most ((k)/4)k m pairs (Vi,r,Vj,s) are not (k)/4-regular in G. For each of them, we substitute the graph G[Vi,r,Vj,s] by a random graph on Vi,r × Vj,s with expected density dG(Vi,Vj). Then with high probability we change a further at most
(k)k2m2 2n2 νn2 2 ≤ 4 k2m2 2 edges, and all of these pairs (Vi,r,Vj,s) will also be (k)/2-regular and have density dG(Vi,Vj) ± (k)/4 on the modified graph. 4.2. Relative regularity 23
Call by H the graph obtained after all these modifications, and note that
|E(G)4E(H)| ≤ νn2
Consider now a pair of indices i, j ∈ [k]. For all r, s ∈ [m], we know that (Vi,r,Vj,s) is (k)/2- regular in H and has density dH (Vi,r,Vj,s) = dG(Vi,Vj) ± (k)/4. Thus, for any sets A ⊆ Vi, B ⊆ Vj we have
X (k) e (A, B) = d (V ,V )|A ∩ V ||B ∩ V | ± |V ||V | H H i,r j,s i,r j,s 2 i,r j,s r,s∈[m] X (k) (k) = d (V ,V ) ± |A ∩ V ||B ∩ V | ± |V ||V | G i j 4 i,r j,s 2 i,r j,s r,s∈[m] 3(k) = d (V ,V )|A||B| ± |V ||V | G i j 4 i j = dH (Vi,Vj)|A||B| ± (k)|Vi||Vj|, showing that (Vi,Vj) is (k)-regular in H and completing the proof.
4.2 Relative regularity
In some applications, we are dealing with spanning subgraphs of a fixed graph G (which may be easier for us to analyze) and we wish to obtain some regularity result for these graphs relative to the host graph G. One important example of this is when we are dealing with subgraphs of the random graph G(n, p), with p = p(n) tending to zero as n grows, and we wish to obtain results valid with high probability for all spanning subgraphs of G(n, p) (see [8] for many such results). A closely related situation is that of subgraphs of sparse pseudorandom graphs (see [7]). In this case, instead of having a random model for graphs and obtaining results valid with high probability over the random choices made, we have a fixed (very sparse) graph G which exhibits random-like behavior in the distribution of its edges, and we wish to use this behavior to extend results from the usual “dense” setting of graphs to all spanning subgraphs of G (this philosophy will be retaken in Chapters6 and7). In this section we will try to obtain the most general conditions possible that the host graph G must satisfy which allow us to use our framework to prove “relative regularity” results for its spanning subgraphs. This is done so that we may use the properties we know the graph G satisfies in order to obtain similar properties satisfied by its subgraphs. The precise notion of relative regularity we will use is defined below:
Definition 4.1 ((ε, H, G)-regularity). Let G be a graph on V and H be a spanning subgraph of G. We say a pair (U, W ) of subsets of V is (ε, H, G)-regular if
eH (A, B) eH (U, W ) − ≤ ε ∀A ⊆ U, B ⊆ W : |A × B| ≥ ε|U × W |, eG(A, B) eG(U, W ) where we define eH (A, B)/eG(A, B) := 0 when eG(A, B) = 0. In our analysis we will only consider the case where G is bipartite, but it is easy to extend this analysis to the non-partite case by using the same arguments that we used in the proofs of Theorems 2.1 and 2.3. Let then G = (V1 ∪V2,E(G)) be a bipartite graph, and we wish to see under which conditions we are able to obtain regularity-like results for a subgraph H ⊆ G relative to G. Let PG be the probability distribution on V1 × V2 given by PG(x, y) := 1G(x, y)/|E(G)|, and denote the 2 corresponding L norm by k · kL2(G). Using the Strong Structure Theorem to 1H in this space with a small error parameter α, increasing function F , and with structured set
S := {σ(A × V2,V1 × B): A ⊆ V1,B ⊆ V2}, 24 Chapter 4. Extensions of graph regularity
V1×V2 we obtain a factor B ⊂ 2 of S-complexity at most M = OF,α(1) and a decomposition 1H = fstr + fpsd + ferr, where fstr = EG [1H |B], fpsd is 1/F (M)-pseudorandom and kferrkL2(G) ≤ α. The factor B is formed by the join of at most M factors of the form σ(Ai × V2,V1 × Bi), with Ai ⊆ V1 and Bi ⊆ V2. If V1 = U1 ∪ U2 ∪ · · · ∪ UK is the partition of V1 induced by the sets Ai and V2 = W1 ∪ W2 ∪ · · · ∪ WL is the partition of V2 induced by the sets Bi, then we know their M orders K,L are at most 2 and every atom of B is of the form Ur × Ws. We may refine these partitions in order to obtain equitable partitions
V1 = V1,0 ∪ V1,1 ∪ · · · ∪ V1,k,V2 = V2,0 ∪ V2,1 ∪ · · · ∪ V2,k
M into k := d2 /αe sets of equal size plus the exceptional sets V1,0,V2,0 satisfying |V1,0| < α|V1|, |V2,0| < α|V2|. We then wish to show that
eH (A, B) eH (V1,i,V2,j) − ≤ ε eG(A, B) eG(V1,i,V2,j) whenever A ⊆ V1,i,B ⊆ V2,j satisfy |A × B| ≥ ε|V1,i × V2,j|, for some i, j ∈ [k]. Because each V1,i × V2,j is contained within a single atom Ur × Ws of B (if i, j ∈ [k]) and
(1 − α)2|V × V | |V × V | |V × V | ≥ 1 2 ≥ 1 2 , 1,i 2,j k2 2k2 by the triangle inequality it suffices to show that
eH (A, B) eH (Ur,Ws) ε − ≤ eG(A, B) eG(Ur,Ws) 2
2 holds whenever A × B ⊆ Ur × Ws ∈ B satisfy |A × B| ≥ ε|V1 × V2|/2k . Now we note that, for every (x, y) ∈ Ur × Ws, we have
EG[1H 1Ur ×Ws ] eH (Ur,Ws) EG[1H |B](x, y) = = PG(Ur × Ws) eG(Ur,Ws)
This implies that, whenever A × B ⊆ Ur × Ws, we have eH (A, B) eH (Ur,Ws) eG(A, B) EG[(1H − fstr)1A×B] = − eG(A, B) eG(Ur,Ws) |E(G)|
As 1H − fstr = fpsd + ferr, by the triangle inequality it then suffices to show that
ε e (A, B) ε e (A, B) | [f 1 ]| ≤ G and | [f 1 ]| ≤ G (4.3) EG psd A×B 4 |E(G)| EG err A×B 4 |E(G)|
ε|V1×V2| whenever |A × B| ≥ 2k2 and A × B is contained inside a single product set V1,i × V2,j. For the first inequality, we note that 1 | [f 1 ]| ≤ k [f |σ(A × V ,V × B)]k ≤ (4.4) EG psd A×B EG psd 2 1 L2(G) F (M)
For the second inequality, we apply Cauchy-Schwarz and obtain
2 2 2 2 eG(A, B) | [f 1 ]| ≤ kf 1 k 2 k1 k 2 = f 1 , EG err A×B err V1,i×V2,j L (G) A×B L (G) EG err V1,i×V2,j |E(G)| so the second inequality of (4.3) would be satisfied if we could ascertain that
2 2 ε eG(A, B) EG[ferr1V1,i×V2,j ] ≤ min (4.5) 16 A×B⊂V1×V2 |E(G)| 2 |A×B|≥ε|V1×V2|/2k 4.2. Relative regularity 25
Suppose then we have a lower-bound γ > 0 on the relative density of product sets A × B of 2 size greater than ε|V1 × V2|/2k , i.e.
dG(A, B) eG(A, B)/|A × B| ε|V1 × V2| = ≥ γ, ∀A ⊆ V1,B ⊆ V2 : |A × B| ≥ 2 dG(V1,V2) eG(V1,V2)/|V1 × V2| 2k
2p γ 2 Then, if we take α = ε 32 , equation (4.5) must be satisfied by all but at most εk pairs (V1,i,V2,j), otherwise we would have
2 4 2 2 2 2 ε γ|A × B| ε 2 α ≥ kferrkL2(G) = EG[ferr] > εk · ≥ γ = α 16 |V1 × V2| 32
2 This proves that the second inequality in (4.3) holds for all but at most εk of the pairs (V1,i,V2,j). Likewise, if we take the function F (x) := ε−2α−2γ−122x+5, we have that
2 2 2 1 ε γ α ε γ ε dG(A, B) |A × B| ε eG(A, B) = M ≤ 2 ≤ = F (M) 8 2 · 2 8k 4 dG(V1,V2) |V1 × V2| 4 |E(G)|
2 holds whenever |A × B| ≥ ε|V1 × V2|/2k . Together with equation (4.4), this proves the first inequality in (4.3). It is a simple exercise to extend this argument to the multi-partite case by repeating it for each pair (with a smaller error parameter) and refining the partitions obtained. We then obtain the following theorem of relative regularity.
Theorem 4.2 (Relative Regularity Lemma). For every ε, γ > 0 and k0, ` ≥ 1, there exist constants η > 0 and K0 ≥ k0 such that the following holds. Let G = (V,E(G)) be a P0-partite graph on n vertices, where P0 : V = V1 ∪ · · · ∪ V` is an equitable `-partition of V , and suppose that
dG(A, B) ≥ γ ∀A ⊆ Vi,B ⊆ Vj : |A × B| ≥ η|Vi × Vj| (4.6) dG(Vi,Vj) is valid for every i, j ∈ [`]. Then every spanning subgraph H ⊆ G admits an equitable partition P into k parts refining P0 which satisfies:
• k0 ≤ k ≤ K0 • All but at most εk2 pairs of parts in P are (ε, H, G)-regular The condition (4.6) given in the statement of the theorem amounts to saying that the graph G contains no reasonably large sets of vertices having density much smaller than its expected density. It may be intuitively though of as having “no sparse spots” (apart from those given by the partition P0). One of the simplest classes of graphs satisfying this condition is the class of η-uniform graphs, which are graphs that satisfy a natural kind of pseudorandomness condition which takes into consideration the graph’s edge density. Below we give its definition in the more general situation of partite graphs, as given in [19]:
Definition 4.2 ((P0, η)-uniform graphs). Let a partition P0 = (Vi)i∈[`] of V be fixed. We write (A, B) ≺ P0 if either ` = 1 or A ⊂ Vi, B ⊂ Vj for some i 6= j in [`]. Given a constant η > 0, we 2 then say a graph G = (V,E) of density p := 2|E|/|V | is (P0, η)-uniform if
eG(A, B) 2 − p ≤ ηp ∀(A, B) ≺ P0 : |A × B| ≥ η|V | |A × B|
If P0 is the trivial partition of V into a single part, we say simply that G is η-uniform. The Sparse Regularity Lemma of Kohayakawa and R¨odlthen follows as an immediate corol- lary of Theorem 4.2:
Corollary 4.1 (Sparse Regularity Lemma I [18, 19]). For every ε > 0 and k0 ≥ 1, there exist constants η > 0 and K0 ≥ k0 such that the following holds. 26 Chapter 4. Extensions of graph regularity
Suppose G is a (P0, η)-uniform graph, where P0 is an equitable partition of V (G) into at most k0 parts. Then every spanning subgraph H ⊆ G admits an equitable partition P into k parts refining P0 which satisfies:
• k0 ≤ k ≤ K0 • All but at most εk2 pairs of parts in P are (ε, H, G)-regular
Another version of the Sparse Regularity Lemma considered in these same papers deals with upper-regular graphs, a condition which may be roughly described as having “no dense spots” and is in some sense dual to the condition given by equation (4.6). We will show how to derive this version in Section 6.2. 27
Chapter 5
Hypergraph regularity
In this chapter we will extend the Regularity Lemma to uniform hypergraphs. This result, called the Hypergraph Regularity Lemma, was first obtained by Nagle, R¨odl,Schacht and Skokan [23, 29, 28] and, independently, by Gowers [14]. The version we will present here is due to Tao [32], who proved it in order to obtain his result that the Gaussian primes contain arbitrarily shaped constellations [35], which is a version of the Green-Tao theorem for the Gaussian primes. The proof presented here is closely related to the one given by Tao, but adapted to our setting and using the methods already developed at earlier sections.
5.1 Intuition and definitions
Definition 5.1. Given an integer d ≥ 2, a d-uniform hypergraph is a pair H = (V, E) where V V is a vertex set and E ⊆ d is a collection of unordered d-tuples of vertices (which we will call edges). The Hypergraph Regularity Lemma may be seen as a “higher-order” version of Szemer´edi’s Regularity Lemma for graphs (Theorem 2.3); while the graph version seeks to regularize the set of edges of a graph (which is of “second order”, as a subset of the pairs of vertices) by partitioning its vertex set (which is then of “first order”), the hypergraph regularity lemma seeks to regularize the d-th order set of edges of a d-uniform hypergraph by (d − 1)-th order sets of (d − 1)-tuples of vertices, and then regularize these new sets by (d − 2)-th order sets of (d − 2)-tuples of vertices and so on, until we end up in a partition of its vertex set. This way, we get a sequence
V V V P = E, \E , P ⊂ P , ··· , P ⊂ P , P ⊂ P(V ) d d d−1 d − 1 2 2 1 of partitions at each order such that the j-th order partition Pj is well approximated in a certain sense by the (j − 1)-th order partition Pj−1, for all 2 ≤ j ≤ d. Remark 5.1. It might seem strange that we need so many partitions, and at all different “or- ders”. It is possible to obtain a “regularity lemma” only partitioning the set of vertices and not higher-order tuples, but the regular properties of the partitions obtained are not sufficiently strong to imply some important applications such as a hypergraph counting lemma. This is related to a similar problem when trying to construct limit objects for hypergraphs using the Hypergraph Regularity Lemma (see [38]), which requires us to consider a (2d − 2)-dimensional object for limits of d-uniform hypergraphs To obtain such partitions, we will make use of a “multidimensional” generalization of the Strong Structure Theorem (SST, Theorem 2.2) given in Section 2.4, and whose proof is essentially identical to that of the original SST. For this, we will need to define a kind of multidimensional conditional expectation: (1) (k) N 2 Definition 5.2. Let f := (f , ··· , f ) ∈ i∈[k] L (Bmax) be a k-tuple of square-integrable N (i) k N real functions. Given a factor B := i∈[k] B of Bmax := i∈[k] Bmax, define E[f|B] ∈ N 2 (i) i∈[k] L (B ) by
(1) (1) (k) (k) E[f|B](x) := E[f |B ](x), ··· , E[f |B ](x) , ∀x ∈ X 28 Chapter 5. Hypergraph regularity
We also define the norm v u k uX (i) 2 kfkL2∗k := t kf kL2 i=1 With these definitions, we obtain: Theorem 5.1 (Multidimensional Strong Structure Theorem). Let S be a collection of “struc- k (1) (k) N 2 2 tured factors” of Bmax. Suppose f := (f , ··· , f ) ∈ i∈[k] L (Bmax) satisfies kfkL2∗k ≤ k, let 0 < ε ≤ 1 and m ≥ 1 be constants, and let F : R+ → R+ be an arbitrary increasing function. 0 k Then there exists an integer M = Oε,F,m,k(1) satisfying M ≥ m, factors B ⊆ B ⊆ Bmax, and a decomposition f = fstr + fpsd + ferr such that:
• fstr = E[f|B], with complexS (B) ≤ M 0 • fpsd = f − E[f|B ] is 1/F (M)-pseudorandom 0 • ferr = E[f|B ] − E[f|B] satisfies kferrkL2∗k ≤ ε The main difference between this multidimensional version of SST and simple repeated appli- cations of the original SST is that in Theorem 5.1 the individual factors B(1), ··· , B(k) may be made correlated to each other, depending on the structure of the set S. This correlation cannot be obtained simply by applying k times the original SST, and it will be crucial for establishing the Hypergraph Regularity Lemma. Proof. We repeat the proof of the Strong Structure Theorem (Theorem 2.2), but using the new N conventions. Set M0 := m and B0 := i∈[k] {∅,X}. For each i ≥ 1, use (a multidimensional version of) the Weak Structure Theorem (Lemma 2.3) with ε being 1/F (Mi−1) and B being 2 Bi−1. We obtain a factor Zi of complexity at most kF (Mi−1) relative to S, and such that f − E[f|Bi−1 ∨ Zi] is 1/F (Mi−1)-pseudorandom; set then Bi := Bi−1 ∨ Zi and Mi := Mi−1 + 2 kF (Mi−1) . 2 2 Because kfkL2∗k ≤ k, by the pigeonhole principle there exists j ≤ k/ε such that
2 2 2 2 kE[f|Bj+1] − E[f|Bj ]kL2∗k = kE[f|Bj+1]kL2∗k − kE[f|Bj ]kL2∗k ≤ ε
0 We may then take B := Bj , B := Bj+1 and M := Mj. We will now fix some notation for the hypergraph setting before stating the Hypergraph Regularity Lemma. It will be convenient to restrict ourselves to the case of partite hypergraphs. Let then H = ((Vj)j∈[`], E) be an `-partite d-uniform hypergraph whose vertex set is indexed by [`]; this means that each hyperedge will have exactly d vertices, no two of which belonging to the same class Vj. Q For any subset f ⊆ [`], we define Vf = (Vj)j∈f := j∈f Vj and let πf : V[`] → Vf be the canonical projection map onto the coordinates in f. We then define on V[`] the σ-algebra Af := −1 {πf (E): E ⊆ Vf}, which is the collection of all subsets of V[`] which depend only on the elements whose index belongs to the set f. As an example, consider a 3-partite 3-uniform hypergraph H = ((V1,V2,V3), E). Then E ⊆ V1 × V2 × V3, V{1,3} = V1 × V3, V{1} = V1,
π{1,3}(x1, x2, x3) = (x1, x3) ∈ V{1,3} ∀(x1, x2, x3) ∈ V1 × V2 × V3
V1×V2×V3 and A{1} = {E1 ×V2 ×V3 : E1 ⊆ V1}, A{1,2} = {E12 ×V3 : E12 ⊆ V1 ×V2}, A{1,2,3} = 2 . Given a factor B ⊆ A[`], define the complexity of B (written complex(B)) as the smallest number of sets needed to generate B as a σ-algebra. Given a set e ⊆ [`], define the skeleton ∂e of e as the collection {f ⊂ e : |f| = |e| − 1}.
5.2 Regularity at a single level
Let us now give a high-level overview of our Hypergraph Regularity Lemma to be proven in the next section. 5.2. Regularity at a single level 29