<<

FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM

ZIJIAN WANG

Abstract. We introduce the basis of ergodic theory and illustrate Fursten- berg’s proof of Szemer´edi’stheorem.

Contents 1. Introduction 1 2. A brief introduction to ergodic theory 2 2.1. and weak mixing 6 2.2. Compact systems 13 2.3. Factor and extension 14 2.4. Conditional measures 15 2.5. Weak mixing and compactness for extensions 17 2.6. The structure theorem 18 3. Furstenburg’s proof of Szemer´edi’stheorem 18 3.1. General Strategy 18 3.2. Szemer´edi’stheorem 19 3.3. Correspondence 19 3.4. Two fundamental systems 23 3.5. Extension principles 27 3.6. Conclusion 28 Acknowledgments 28 4. bibliography 28 References 28

1. Introduction The statement of Szemer´edi’stheorem is very simple. Theorem 1.1 (Szemeredi). A of integers with positive upper Banach density has arbitrarily long arithmetic progressions. It was first proved by Szemer´ediin 1975 using a combinatorial and completely elementary approach. Although his method was extremely complicated, some of the important ideas such as Szemer´edi’sregularity lemma in graph theory came out from his proof. Two years later, a totally different approach is introduced by Furstenberg. He turned Szemer´edi’stheorem, a problem that looks extremely com- binatorial, into an ergodic puzzle about multiple recurrence of a -preserving

Date: AUGUST 28, 2018. 1 2 ZIJIAN WANG system. Later in 2002, Gowers gave a Fourier-analytic proof. The fact that the orig- inal question asked by Erd¨osand Tur´anin 1936 is answered in three completely distinct ways has already made this problem highly interesting. In this paper, we discuss Furstenberg’s ergodic proof of Szemer´edi’stheorem. Despite the elegance of Furstenberg’s ergodic proof of Szemer´edi’stheorem, the value of this proof goes way beyond solving the problem per se. His proof sheds light on many important topics in ergodic theory, for instance, the classification of dynamical systems, conditional measures, extensions, etc.

2. A brief introduction to ergodic theory Ergodic theory studies dynamical systems. By dynamical systems, we mean certain “good” actions on measure spaces that exhibit interesting long-term behav- iors. An Z−action is just a function from the space to itself, or in other words, a dynamics. Obviously, not all functions are “well-behaved”. In this section, we talk about the basics of ergodic theory to the foundation for our later discussions.

Definition 2.1. A measure space (X, B, µ) is a space X with measure µ and the σ-algebra B of measurable sets. Sometimes we ignore the σ-algebra associated to the measure space and just write (X, µ) when it is not so important. However, one shall treat the σ−algebra with great caution when dealing with conditional measures which will be discussed later in this paper. Remark 2.2. In this paper, we mostly assume that we are dealing with probability spaces, in which the measure of the entire space is 1.

Definition 2.3. A map T :(X, BX , µ) → (Y, BY , ν) is measure-preserving if for −1 any set A ∈ BY , µ(T A) = ν(A). Definition 2.4. A measure-preserving map φ is an invertible measure-preserving map if the inverse of φ is measurable and well-defined almost everywhere. Definition 2.5. We call (X, B, T, µ) a measure-preserving system, or equiva- lently a dynamical system, if T is a measure-preserving map on X. Example 2.6. We define 2Z to be the infinite product of {0, 1}. This space is compact by the Tychonoff’s theorem. Given an element x ∈ 2Z, we denote the kth coordinate of x by x[k]. We define a measure µ on the space 2Z by an infinite th product. Let πj be the projection onto the j coordinate. On each copy of {0, 1}, we use the ”half-half” measure ν, i.e. for each measurable set A1   1 if A = {0, 1}, ν[A] = 0 if A is empty,  1 2 otherwise. Q We define µ by µ(B) = ν(πnB) for all the measurable rectangles in the form of n∈Z Q 2 Aj and extend the measure to the entire σ−algebra of measurable sets. This j∈Z way of defining a measure is valid as explained in Remark 2.8. Moreover, one can define the Bernoulli shift Tk on this space for any integer k. Tk acts on an element

1Since we have a finite set, every subset is measurable. 2 Each Aj is measurable in its own copy of {0, 1}. FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM 3 x ∈ 2Z by shifting each coordinate of x to the left by k bits, i.e. x[s] = Tkx[s − 4] for all s ∈ Z. Example 2.7. Given a circle T = R/Z equipped with the Haar measure µ, we can define rotation Rα acting by addition, i.e. Rα : x 7→ x + α. This forms a measure-preserving system. In order to prove that Rα is measure-preserving, it suffices to show that Rα preserves the measure of all the intervals3. Notice that for 4 −1 any interval (a, b) ⊂ T, µ(Rα (a, b)) = (b − α) − (a − α) = b − a = µ((a, b)). Remark 2.8. Proving the measure-preserving property for every measurable set can be painful. However, it suffices to prove this property for a collection of sets that generates the σ-algebra. This is a standard trick that we will keep using repeatedly. Example 2.9. Instead of rotation, we can define a different dynamics on the circle T, namely the circle doubling map M2, where M stands for multiplication. M2 : T → T is defined by M2(a) = 2a. For an arbitrary interval (a, b) ⊂ T, −1 a b S a+1 b+1 b a b+1 a+1 µ(M2 (a, b)) = µ(( 2 , 2 ) ( 2 , 2 )) = ( 2 − 2 ) + ( 2 − 2 ) = b − a = µ(a, b). Therefore, the circle doubling map M2 is also a dynamics on T. In fact, we can show that Mk, multiplication by k, is measure-preserving for every natural number k.

Remark 2.10. Now we have defined two different dynamics on the same space T (or R/Z). One natural question to ask is whether these two systems are equivalent. Although it is quite obvious that they are different given that the action a 7→ a + α is not even close to the action a 7→ 2a. However, it is hard to tell whether two dynamical systems are different or “behave in some similar ways” when they are in different spaces. Therefore we introduce the notion of measurable isomorphism. Definition 2.11. Given a probability measure space (X, B, µ) and a measurable set A ∈ B. A is null if µ(A) = 0. On the other hand, A is conull if µ(A) = 1. This gives us a convenient way to talk about the special sets that have zero or full measure, which we will encounter a lot in our discussion of ergodic theory. Definition 2.12. In a dynamical system (X, B, T, µ) and a measurable set A ∈ B. We call A T-invariant, or invariant to T , if TA ⊂ A. Moreover, if TA = A, we say that A is strictly T-invariant, or strictly invariant to T .

Definition 2.13. Two systems (X, BX ,TX , µ) and (Y, BY ,TY , ν) are measurably 0 0 isomorphic if there exist conull sets X ∈ BX invariant to TX and Y ∈ BY invariant 0 0 to TY and an invertible measure-preserving map f : X → Y such that f ◦ TX = 0 TY ◦ f for every x ∈ X , i.e. that the following diagram commutes a.e.

X TX X f f Y TY Y Notice that the above commutative diagram may only be defined on a set of full measure.

3See Remark 2.8. 4Here we view the circle as the interval [0, 1) with endpoints identified. 4 ZIJIAN WANG

2 2 Example 2.14. (T, BT ,M4, µ) is isomorphic to (T , BT ,M2 ⊗ M2, µ ⊗ µ) where 2 2 µ⊗µ is the product measure and M2 ⊗M2 : T → T is defined by M2 ⊗M2(t1, t2) = 2 (2t1, 2t2). It is clear that M2 ⊗ M2 is a measure-preserving map on T . Here we construct a measure-preserving map φ from T to T2 such that the diagram below commutes.

M4 T T φ φ

M2⊗M2 T2 T2 2 We construct a sequence {φn}n∈N of maps from T to T where each φn is a measure- preserving map on a ”small σ-algebra”. When n = 1, we define C1 ⊂ BT to be the trivial σ-algebra, which only contains the entire interval [0, 1) and the empty set.

2 Similarly, we define D1 ⊂ BT to be the σ-algebra that only contains the unit 5 2 square. We define φ1 to be some bijective map from [0, 1)to [0, 1) . It is clearly 2 measure-preserving when viewed as a map from (T, C1) to (T , D1). When n = 2, 1 1 1 1 3 3 we divide the interval into four subintervals {[0, 4 ), [ 4 , 2 ), [ 2 , 4 ), [ 4 , 1)} and define C2 ⊂ BT to be the σ-algebra generated by these four subintervals. Similarly, we 2 can divide T into four squares and define the σ-algebra D2. The function φ2 is defined by sending the four subintervals of T into the for subsquares of T2 in counter clockwise order starting at the top left square. Again, we can use some bijective 2 map to ensure that the map φ2 :(T, C2) → (T , D2) is measure-preserving. By construction, the sequence {φn}n∈N converges to some measurable function φ and 2 2 the limit φ :(T, BT) → (T , BT ) is measure-preserving. We can prove that such a map is an isomorphism by viewing the points in T in digit 4 expansions and the points in T in binary expansions, which is another standard trick in the theory of dynamical systems. Remark 2.15. We think of the bijection from a different direction i.e., [0, 1)2 → [0, 1). There are many constructions of such a bijective map. One of the most direct ways is to interleave the terms in the continued fraction expansion of each coordinate to get a single number in [0, 1). However, it is not very clear that this map is onto. Actually, we only need to know the existence of a bijective map instead of the actual form. Therefore, we can use the Cantor-Schr¨oder-Bernstein theorem. x 7→ (x, 0) gives the injective map from [0, 1)to [0, 1)2. Interleaving the digits of decimal expansion on each of the coordinates, i.e (0.a1a2a3..., 0, b1b2b3) 7→ 2 0.a1b1a2b2..., gives an injection from [0, 1) to [0, 1). When the decimal expansion is not unique, we use the Axiom of Choice to pick a random one. As we have seen in the previous examples, the measure-preserving property is already highly nontrivial. There are already a lot to say about these systems directly from the definition of being measure-preserving. Here we prove some elementary but useful results about measure-preserving maps.

Definition 2.16. Given a measure-preserving system (X, B, T, µ), we define UT the associated operator, or equivalently the operator associated to T by UT f(x) = ∗ f(T x). Moreover, we define UT to be the adjoint of UT such that hUT f, gi = ∗ 2 hf, UT gi for all f, g ∈ L (X).

5See Remark 2.15 for more details. FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM 5

Remark 2.17. Notice that in the Definition 2.16, we did not specify the spaces that the operator lives in. This is a rather loose definition since the actual spaces depend on the context. For example, when we are dealing with L2 spaces like in Theorem 2 2 2.27, we assume that UT : L (X) → L (X). Theorem 2.18. Given a measure-preserving system (X, B, T, µ), for any L1 func- tion f, we have Z Z UT fdµ = fdµ . X X R −1 R Proof. For characteristic function χB, X UT χBdµ = µ(T B) = µ(B) = X χB since T is measure-preserving. Now for any L1 function f that is nonnegative, we can take a sequence of simple functions that approximate f and apply the monotone 1 convergence theorem. Finally for a general L function f, we can define f = f+ −f− where both parts are nonnegative and then apply the previous result. 1 Remark 2.19. Another way to state Theorem 2.18 is to say that UT : L (X) → 1 L (X) is an isometry, i.e. kUT fk1 = kfk1. Theorem 2.20. (Poincar´eRecurrence) Given a dynamical system (X, B, T, µ) and A ∈ B, almost every point in A returns to A infinitely often. Proof. We need to show that there exists a set A0 ⊂ A of full measure such that 0 for every point a ∈ A there exists a strictly increasing integer sequence {ni}i∈N such that T ni a ∈ A. In order to find such an A0 we remove from A step-by-step a n countable collection of sets with measure 0. Let N1 = {x ∈ A|T a∈ / A for all n ≥ T∞ −i c T 1} = i=1 T A A ⊂ A. We claim that the set N has measure zero. Indeed, −n T∞ −i c T −n −n −n T N1 = i=n+1 T A T A. For 1 ≤ m < n, we have T N1 ⊂ T A but −m −n c −k T N1 ⊂ T A since m < n. Therefore, the countable family {T N1} contains mutually disjoint sets. We know that ∞ [ −i 1 ≥µ( T N1) i=0 ∞ X −i = µ(T N1) (they are disjoint) i=0 ∞ X = µ(N1)(T is measure-preserving), i=0

so µ(N1) = 0. We now define A1 = A − N1. By construction, every point in A1 2n returns to A at least once and µ(A) = µ(A1). Now we define N2 = {x ∈ A|T a∈ / T∞ −2i c T A1 for all n ≥ 1} = i=1 T A1 A1. Now we let A2 = A1 − N2. Similarly, µ(A2) = µ(A1) = µ(A) since N2 has measure zero. Moreover, every point in A2 0 T∞ 0 returns to the set A at least twice. Let A = i=1 Ai. Then A has the same measure as A and every point in A0 returns to A infinitely often. Remark 2.21. Theorem 2.20 does not require any high-level techniques. We can generate the result of Poincar´e’stheorem from the recurrence of points to recurrence of part of the set A that has positive measure. We call this property the multiple recurrence property. As we will see later, such generalization applies to arbitrary measure-preserving systems just like Poincar´erecurrence. 6 ZIJIAN WANG

Definition 2.22. A measure-preserving system (X, B, T, µ) has multiple recur- rence of order k if for each A ∈ B that has positive measure, there exist n ∈ N such that k−1 ! \ µ T −inA > 0. i=0 Despite the similarity between Theorem 2.20 and Definition 2.22, the latter is ac- tually much more sophisticated and involves some of the deeper results in ergodic theory. In fact, the whole point of this paper is to prove the multiple recurrence property for a general measure-preserving system. We will show that this is equiv- alent to proving Szemer´edi’stheorem in Theorem 3.9.

2.1. Ergodicity and weak mixing. Definition 2.23. A measure-preserving system (X, B, T, µ) is ergodic if for any strictly invariant set A ∈ B such that T −1A = A either µ(A) = 1 or µ(A) = 0. In the above definition of ergodicity, we can actually “replace” sets with functions and give a different characterization of ergodicity. The proof is similar to that of Theorem 2.18, where we prove the proposition for simple functions first and pass through limits. Theorem 2.24. A measure-preserving system (X, B, T, µ) is ergodic if and only if any measurable function f such that f(T x) = f(x) for almost every x ∈ X is constant almost everywhere.

Example 2.25. Consider the dynamical system (T,Rα) where T is the circle and 2 P∞ 2πint Rα : x 7→ x + α. For any Rα− invariant function f ∈ L (T), let n=−∞ cne be the Fourier expansion of f(t). ∞ X 2πin(t+α) f(Rαt) = cne n=−∞ ∞ X 2πinα 2πint = cne cne . n=−∞

By the uniqueness of Fourier coefficients, we know that cn = 0 for all n except n = 0, 2πinα 2πinα where cn = cne holds trivially since e = 1. Therefore, f is constant and irrational rotation on the circle is ergodic.

Example 2.26. Now consider the circle doubling system (T,M2) where M2 acts on T by M2(x) = 2x. We use the same technique as the previous example. Suppose we 2 P∞ 2πint have a L function f(t) that has Fourier expansion n=−∞ cne . Now we have P∞ 2πin(2t) P∞ 2πi(2n)t f(2t) = n=−∞ cne = n=−∞ cne . Therefore, ck = c2k = c4k... for 2 every integer k. Notice that the function f(t) is in L , so the sequence {cn}n∈Z should be square summable. This is not possible if there is some k 6= 0 such that ck 6= 0. As a result, f must be constant and the system (T,M2) is ergodic. Theorem 2.27. (Mean ergodic theorem) Given a measure-preserving system (X, B, T, µ), we define P the orthogonal projection onto the closed subspace

2 2 V = {h ∈ L (X)|UT h = h} ⊂ L (X). FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM 7

Then we have N−1 1 X U nf → P f N T n=0 in L2 for all f ∈ L2(X).

2 2 2 Proof. Observe that L (X) = V ⊕ W where W = {UT g − g|g ∈ L (X)} ⊂ L (X). To see this, it suffices to show that W ⊥ = V . If h ∈ V , then

hh, UT g − gi =hh, UT gi − hh, gi

=hUT h, UT g − hh, gi =hh, gi − hh, gi (by Theorem 2.18) =0. ⊥ If h ∈ W , we need to show that h = UT h. Indeed, we know that 0 = hh, UT g−gi = ∗ 2 hh, UT gi − hh, gi = hUT h, gi − hh, gi for all g ∈ L (X). This means that ∗ (2.28) h = UT h ∗ where UT is the adjoint of UT defined in Definition 2.16. Therefore 2 kUT h − hk2 =hUT h − h, UT h − hi

=hUT h, UT hi + hh, hi − 2hUT h, hi ∗ =2hh, hi − 2hh, UT hi =2hh, hi − 2hh, hi (by 2.28) =0. Now we have proved our observation and we are ready to prove the mean ergodic 2 2 theorem. Given any f ∈ L (X), there exists a sequence of L functions {hi}i∈N ⊂ 6 W such that f = P f + h where hi → h. Notice that N−1 N−1 N−1 1 X 1 X 1 X U nf = U nP f + U nh N T N T N T n=0 n=0 n=0 N−1 N−1 1 X 1 X = P f + U nh (P f ∈ V ) N N T n=0 n=0 N−1 1 X =P f + U nh. N T n=0 1 PN−1 n It suffices to show that N n=0 UT h → 0. Let hi = UT gi − gi for each hi. Then we have N−1 N−1 1 X 1 X U nh = U n(U g − g ) N T i N T T i i n=0 2 n=0 2 1 N = U gi − gi N T 2 2 kg k ≤ i 2 N

6Whenever we talk about convergence in this proof, we assume it is convergence in L2 norm. 8 ZIJIAN WANG which goes to 0 as N → ∞. For any  > 0, we choose i large enough such that

 1 PN−1 n  kh − hik2 < 2 and then choose N large enough such that N n=0 UT hi < 2 . 2 Finally,

N−1 N−1 N−1 1 X 1 X 1 X U nh ≤ U (h − h ) + U nh N T N T i N T i n=0 2 n=0 2 n=0 2 N−1 1 X  < kU (h − h )k + N T i 2 2 n=0 N−1 1 X  = k(h − h )k + (by Theorem 2.18) N i 2 2 n=0 <.



Remark 2.29. Even though Theorem 2.27 bears the name of ”mean ergodic”, it actually applies to general measure-preserving systems that are not ergodic. No- tice that when using mean ergodic theorem on ergodic systems, we have a stronger 2 1 PN−1 R n L R 2 result, namely that N n=0 X UT fdµ −−→ X fdµ for any L function f by The- orem 2.24. Indeed, this is a direct consequence of the fact that the only strictly invariant sets in an ergodic system either has zero or full measure. Combining with the property that L2 convergence implies convergence under the weak topology in the Hilbert space L2, here we provide a weaker version of the mean ergodic theorem.

Corollary 2.30. Given an ergodic system (X, B, T, µ) and f, g ∈ L2, we have

N−1 Z Z Z 1 X n lim fUT gdµ = fdµ gdµ N→∞ N n=0 X X X

Proof. By the mean ergodic theorem, we know that

N−1 Z 2 Z 1 X L U ngdµ −−→ gdµ. N T n=0 X X

Therefore,

N−1 Z 2 Z 1 X L U ngdµ −−→ gdµ N T n=0 X X

Actually, we only need that

N−1 Z 2 Z 1 X L U ngdµ −−* gdµ N T n=0 X X FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM 9 which is true since L2 convergence implies weak convergence. Now we have N−1 Z 1 X n lim hf, UT gi =hf, gi N→∞ N n=0 X Z =hf, 1i gdµ X Z Z = fdµ gdµ. X X Corollary 2.31. A measure-preserving system (X, B, T, µ) is ergodic if and only 1 PN−1 −n if limN→∞ N n=0 µ(A ∩ T B) = µ(A)µ(B) for all A, B ∈ B. Proof. If (X, B, T, µ) is ergodic, we apply Corollary 2.30 by taking f, g to be the 1 PN−1 characteristic functions of the sets A, B. It is clear that limN→∞ N n=0 µ(A ∩ −n 1 PN −n T B) = µ(A)µ(B). On the other hand, suppose limN→∞ N n=0 µ(A∩T B) = µ(A)µ(B). Recall that in order to show that (X, B, T, µ) is ergodic, we need to show that any strictly invariant set V ∈ B has either zero or full measure. We take A = V c and B = V . Notice that µ(A∩T −nB) = µ(A∩B) = µ(Bc ∩B) = 0 for any 1 PN−1 −n integer n. Therefore, µ(A)µ(B) = limN→∞ N n=0 µ(A ∩ T B) = 0, meaning c that either A = B or B has zero measure, i.e. T is ergodic.  While the mean ergodic theorem only applies to L2 functions, the theorem below is more general since it is true for all L1 functions. The trade-off here is that the convergence is only guaranteed to be pointwise. Theorem 2.32. (Birkhoff). Given a measure-preserving ergodic system (X, B, T, µ) and any L1 function f, we have N−1 Z 1 X n lim UT f = fdµ n→∞ N n=0 X almost everywhere and pointwise in L1. Definition 2.33. A measure-preserving system (X, B, T, µ) is mixing, or strong −n mixing if limn→∞ µ(A ∩ T B) = µ(A)µ(B) for all A, B ∈ B. Definition 2.34. A measure-preserving system (X, B, T, µ) is weak mixing if 1 PN−1 −n limN→∞ N n=0 |µ(A ∩ T B) − µ(A)µ(B)| = 0 for all A, B ∈ B. Remark 2.35. Recall that it is a fact in analysis that if a sequence of real numbers {an} ⊂ R converges to some real number a, then the Cesaro sum has the same 1 PN−1 limit, i.e. N n=0 an → a. Therefore, strong mixing implies weak mixing. Theorem 2.36. If a measure-preserving system (X, B, T, µ) is weak mixing, then it is ergodic. Proof. Take a strictly invariant set A ∈ B. By Definition 2.34, we have N−1 1 X lim |µ(A ∩ T −nA) − µ(A)µ(A)| = 0. N→∞ N n=0 Since A = T −nA for any integer n, we know that µ(A) = µ(A)2. Therefore, µ(A) is either 0 or 1. This implies the ergodicity of (X, B, T, µ). 10 ZIJIAN WANG

Remark 2.37. We can finally conclude that: Mixing ⇒ Weak mixing ⇒ Ergodic. Now we provide some characterizations of weak mixing systems via product spaces. The following lemma, which characterizes bounded real sequences that has zero Cesaro sum, is a fundamental and elementary result in real analysis.

Lemma 2.38. If an ⊂ R is a nonnegative bounded sequence that has zero Cesaro PN 7 sum, i.e. limN→∞ n=0 an = 0, then there exists an index set J ∈ N with density zero such that

lim an = 0. n→∞,n/∈J

This lemma yields a direct corollary on weak mixing systems when we take an to be |µ(A ∩ T −nB) − µ(A)µ(B)|.

Corollary 2.39. Given a weak mixing system (X, BX ,TX , µ) and A, B ∈ B, there exists an index set J ∈ N with density zero such that lim |µ(A ∩ T −nB) − µ(A)µ(B)| = 0. n→∞,n/∈J Remark 2.40. Corollary 2.39 shows that weak mixing systems are very similar to strong mixing systems when we ignore an index set with zero density.

Theorem 2.41. Given a measure-preserving system (X, BX ,TX , µ) following are equivalent: 1. (X, BX ,TX , µ) is weak mixing. 2. Given any ergodic system (Y, BY ,TY , ν), (X × Y, BX ⊗ BY ,TX × TY , µ ⊗ ν) is ergodic. 3. (X × X, BX ⊗ BX ,TX × TX , µ ⊗ µ) is weak mixing.

Proof. (3) ⇒ (1): Take A, B ∈ BX , we consider sets A × X,B × X ∈ BX ⊗ BX . Notice that for any measurable set C ∈ BX , µ⊗µ(C ×X) = µ(C). We can conclude that X is weak mixing by the weak mixing of X × X. (1) ⇒ (3): It suffices to check the definition of weak mixing for rectangle sets A × B,C × D ∈ BX ⊗ BX . By Corollary 2.39, there exists index sets K,L ∈ N with density zero such that lim |µ(A ∩ T −nC) − µ(A)µ(C)| = 0, n→∞,n/∈K

lim |µ(B ∩ T −nD) − µ(B)µ(D)| = 0. n→∞,n/∈L Notice that a finite union of zero density sets has zero density. We take J = K ∪ L and have lim |µ(A ∩ T −nC) − µ(A)µ(C)| = 0, n→∞,n/∈J

lim |µ(B ∩ T −nD) − µ(B)µ(D)| = 0. n→∞,n/∈J

7We have defined density in Remark 3.2. FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM 11

Therefore, lim |µ ⊗ µ((A × B) ∩ T −n(C × D)) − µ ⊗ µ(A × B)µ ⊗ µ(C × D)| n→∞,n/∈J = lim |µ ⊗ µ((A ∩ T −nC) × (B ∩ T −nD)) − µ(A)µ(B)µ(C)µ(D)| n→∞,n/∈J = lim |µ(A ∩ T −nC)µ(B ∩ T −nD) − µ(A)µ(B)µ(C)µ(D)| n→∞,n/∈J =0.

This proves that TX × TX is weak mixing since J has zero density. (1) ⇒ (2): It suffices to prove Corollary 2.31 for rectangular sets in X × Y . For any pair of rectangular sets A × B,C × D ∈ BX ⊗ BY , we need to show that N 1 X lim µ ⊗ ν((A × B) ∩ T −n(C × D)) = µ ⊗ ν(A × B)µ ⊗ ν(C × D). N→∞ N n=0 Actually, N 1 X lim µ ⊗ ν((A × B) ∩ T −n(C × D)) N→∞ N n=0 N 1 X = lim µ(A ∩ T −nC)ν(B ∩ T −nD) N→∞ N n=0 N 1 X = lim µ(A)µ(C)ν(B ∩ T −nD) N→∞ N n=0 N 1 X + lim (µ(A ∩ T −nC) − µ(A)µ(C))ν(B ∩ T −nD) N→∞ N n=0 N 1 X =µ(A)µ(C)ν(B)ν(D) + (µ(A ∩ T −nC) − µ(A)µ(C))ν(B ∩ T −nD) N n=0 (by the ergodicity of (Y, BY ,TY , ν)) N 1 X ≤µ(A)µ(C)ν(B)ν(D) + (µ(A ∩ T −nC) − µ(A)µ(C)) N n=0 (since ν(B ∩ T −nD) ≤ 1)

=µ(A)µ(C)ν(B)ν(D) (since (X, BX ,TX , µ) is weak mixing) =µ ⊗ ν(A × B)µ ⊗ ν(C × D).

This shows that (X × Y, BX ⊗ BY ,TX × TY , µ ⊗ ν) is ergodic. (2) ⇒ (1) Assume that for any ergodic system (Y, BY ,TY , ν), (X ×Y, BX ⊗BY ,TX × 0 TY , µ ⊗ ν) is ergodic. We first consider the ergodic system (Y , BY 0 , idY 0 , ν) where 0 0 Y only contains a single element e and idY 0 is the identity element on Y . There is a canonical isomorphism φ : X × Y 0 → X that sends (x, e) ∈ X × Y 0 to x ∈ X. 0 Therefore (X,TX ) is ergodic since it is isomorphic to (X × Y ,TX × idY 0 ) and 0 (X × Y ,TX × idY 0 ) is ergodic by (2). We can further deduce that (X × X,TX × TX ) is ergodic by applying (2) again knowing that (X,TX ) is ergodic. Recall that in order to show that (X, BX ,TX , µ) is weak mixing, we need to prove that 1 PN−1 −n limN→∞ N n=0 |µ(A ∩ TX B) − µ(A)µ(B)| = 0 for all A, B ∈ BX . Actually, 12 ZIJIAN WANG

1 PN−1 −n 2 it suffices to show that limN→∞ N n=0 |µ(A ∩ TX B) − µ(A)µ(B)| = 0 by a general result in real analysis. Indeed,

N−1 1 X −n 2 (2.42) lim (µ(A ∩ TX B) − µ(A)µ(B)) N→∞ N n=0 N−1 N−1 1 X −n 2 1 X 2 2 (2.43) = lim µ(A ∩ TX B) + lim µ(A) µ(B) N→∞ N N→∞ N n=0 n=0 N−1 1 X −n −2 lim µ(A ∩ TX B)µ(A)µ(B)(2.44) N→∞ N n=0 N−1 1 X −n 2 2 2 (2.45) = lim µ(A ∩ TX B) + µ(A) µ(B) N→∞ N n=0 N−1 1 X −n (2.46) −2µ(A)µ(B) lim µ(A ∩ TX B). N→∞ N n=0

In order to compute the two terms

N−1 1 X −n 2 lim µ(A ∩ TX B) N→∞ N n=0 and N−1 1 X −n lim µ(A ∩ TX B), N→∞ N n=0 we apply the ergodicity of (X × X,TX × TX ) and deduce that

N−1 1 X −n lim µ(A ∩ TX B) N→∞ N n=0 N−1 1 X −n = lim µ ⊗ µ((A × X) ∩ (TX × TX ) (B × X)) N→∞ N n=0 =(µ ⊗ µ)(A × X)(µ ⊗ µ)(B × X) =µ(A)µ(B),

N−1 1 X −n 2 lim µ(A ∩ TX B) N→∞ N n=0 N−1 1 X −n = lim (µ ⊗ µ)((A × A) ∩ (TX ⊗ TX ) (B × B)) N→∞ N n=0 =(µ ⊗ µ)(A × A)(µ ⊗ µ)(B × B) =µ(A)2µ(B)2. FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM 13

Now we can apply (2.46) and have that N−1 1 X −n 2 lim (µ(A ∩ TX B) − µ(A)µ(B)) N→∞ N n=0 N−1 1 X −n 2 2 2 = lim µ(A ∩ TX B) + µ(A) µ(B) N→∞ N n=0 N−1 1 X −n −2µ(A)µ(B) lim µ(A ∩ TX B) N→∞ N n=0 =µ(A)2µ(B)2 + µ(A)2µ(B)2 − 2µ(A)µ(B)µ(A)µ(B) =0. 

Corollary 2.47. If (X, BX ,TX , µ) and (Y, BY ,TY , ν) are both weak mixing, then (X × Y, BX ⊗ BY ,TX × TY , µ ⊗ ν) is weak mixing.

Proof. Take an arbitrary ergodic system (Z, BZ ,TZ , κ). Since (Y, BY ,TY , ν) is weak mixing, by the equivalence that we established in Theorem 2.41, we have that the system (Y × Z, BY ⊗ BZ ,TY × TZ , ν ⊗ κ is ergodic. We apply Theorem 2.41 again and we know that (X × Y × Z, BX ⊗ BY ⊗ BZ ,TX × TY × TZ , µ ⊗ ν ⊗ κ) is ergodic because (X, BX ,TX , µ) is weak mixing. Since (Z, BZ ,TZ , κ) is chosen arbitrarily, we can conclude that (X × Y, BX ⊗ BY ,TX × TY , µ ⊗ ν) is weak mixing.  2.2. Compact systems. We have seen some compact systems already but here we provide the formal definition about what we mean by compact systems. Definition 2.48. A system (X, B, T, µ) is compact if the orbit of every L2 func- tions on X is precompact. Remark 2.49. We sometimes call functions that have precompact orbits almost periodic functions. Another way of saying a dynamical system is compact is that every L2 functions on it is almost periodic. We now give a different characterization of dynamical compactness and we will be using this to prove that compact systems are SZ. Theorem 2.50. A system (X, B, T, µ) is compact if and only if for any f ∈ L2(X) k and  > 0 the set {k| f − UT f 2 < } ⊂ N has bounded gaps. Remark 2.51. Integer sets that have bounded gaps are called syndetic sets. Below is a formal definition for syndetic sets but it is much easier to just think of them as sets that has bounded gaps. Definition 2.52. A set A ⊂ N is syndetic if there exists a finite set of integers Sk {ai}1≤i≤k such that N ⊂ i=1 A − ai. 2 Proof of Theorem 2.50. Given an arbitrary L function f, let A ⊂ N be k {k| f − UT f 2 < }. It suffices to show that Orb(f) is totally bounded. Indeed, for any  > 0 there exists a finite set of integers {i}1≤i≤k ⊂ N such that N ⊂ Sk Sk −i 8 i=1 A − i. This means that Orb(f) ⊂ i=1 B(UT , ) . This proves totally boundedness since  is arbitrary.  8B(x, r) means the open ball around the point x with radius r. 14 ZIJIAN WANG

Example 2.53. It is helpful to think of rotations when we are dealing with compact systems. Recall the rotation system on the circle {T,Rα, µ} where α∈ / Q. Given a 2 L function f, the orbit of f is Orb(f) = {f(x + nα)}n∈N. It might not be obvious at first glance that the set Orb(f) is precompact. However, the conclusion is quite obvious if we use the second characterization of compactness given in Theorem 2.50. This follows from the fact that the orbit of any point on a circle is equidistributed under irrational rotations. Remark 2.54. The rotations does not have to be on the circle. In fact, we can define rotation on any compact abelian group.

Definition 2.55. A Kronecker system is defined by (G, Rα) where G is a compact abelian group and Rα acts on G by translation by α, i.e. Rα(g) = gα for some α ∈ G. We conclude the introduction to basic dynamical systems by stating the two useful results about compact systems and weak mixing systems [1]. Theorem 2.56. If a compact system is ergodic, then it is isomorphic to a kronecker system. Remark 2.57. Now we have seen the two fundamental structures of dynamical systems, namely the weak mixing systems and the compact systems. Interestingly enough, the two fundamental systems are “opposite” to each other in the following sense. Theorem 2.58. A dynamical system is weak mixing if and only if it has no non- trivial compact factors. Remark 2.59. In other words, a measure-preserving system is either weak mixing or has at least one nontrivial compact factor. 2.3. Factor and extension. We have already encountered a special kind of exten- sion in Example 2.14. Here, we generalize the idea of an isomorphism and introduce the notion of an extension.

Definition 2.60. Given two measure-preserving systems X = (X, BX , µ, TX ) and Y = (Y, BY , ν, TY ), we say that Y is a factor of X if there is some measure- preserving factor map φ defined almost everywhere such that the diagram below commutes. X TX X φ φ Y TY Y We say that X is an extension of Y. Remark 2.61. A factor map is weaker than an isomorphism in the sense that we do not require the factor map to be invertible. Example 2.62. Every measure-preserving system has a trivial factor, namely the factor that consists of a single element. Factors may also be created by taking sub σ−algebras. Given a measure-preserving system (X, B, µ, T ) and B0 ⊂ B a proper sub σ−algebra, we can view the system (X, B0, µ, T ) as a factor of (X, B, µ, T ) −1 where the factor map is given by the identity map idX . Despite the fact that idX FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM 15

0 is clearly defined almost everywhere, idX is not an isomorphism. Note that B is strictly smaller than B. We will see a generalization of this example later in Example 2.71 where we talk about condition measures. 2.4. Conditional measures. The concept of conditional measure is useful when we are working with more than one dynamical systems at the same time, e.g. relatively weak mixing extensions. It allows us to construct a measure with a given sub σ−algebra. In our case of scenario, the smaller σ−algebra is usually generated by either the fibers of the measure-preserving map that we are working with or the pullback of the σ−algebra of the factor as we will see in Example 2.71. Before going directly into conditional measures, we first take a look at conditional expectations. Definition 2.63. Given a probability space (X, B, µ) and an integrable function f, R 9 the expectation of f is defined by E(f) = X fdµ . If C ⊂ B is a sub σ−algebra, the conditional expectation of f on C is E(f|C), where E(f|C) is the unique element in L1(X, C, µ) such that the following is true: (1) The function E(f|C) is a measurable function on (X, C, µ). R R (2) For each C ∈ C, C E(f|C)dµ = C fdµ. Remark 2.64. The existence and uniqueness of conditional expectation is a conse- quence of the Radon-Nikodym theorem. We will only use the notions discussed in this section as black boxes since they are not the focus of this paper.

10 Example 2.65. (X, B, µ) is a probability space and {Pi}1≤i≤n a partition of X . We consider C, a finite sub σ−algebra generated by the given partition {Pi}1≤i≤n. The conditional expectation of an integrable function f is given by E(f|C)(x) = R fdµ Pi µ(P ) if x ∈ Pi. This is clearly measurable under C. Moreover, for any measurable i R R set C ∈ C, we have that C E(f|C)dµ = C fdµ. This is trivial for each C ∈ {Pi}1≤i≤n but every set in C is a finite union of the elements in the partition. Note that the conditional expectation is defined almost everywhere, therefore it does not matter if an element in the partition has measure zero. Theorem 2.66. Given a probability space (X, B, µ) and sub σ−algebras D ⊂ C ⊂ B and f ∈ L∞(X, C, µ), g ∈ L∞(X, B, µ),the following are true: (1) E(fg|C) = fE(g|C), (2) E(E(g|C)|D) = E(g|D). Remark 2.67. The first statement in Theorem 2.66 implies that an integrable func- tion that is measurable in the smaller sub σ−algebra can be considered as a “con- stant” thus can be pulled out. One can also think of σ−algebras as the collection of information and the conditional expectation with respect to a given sub σ−algebra is just the expectation of an event given the information that we have at hand. If we have no information at hand, i.e. the sub σ−algebra is trivial, then the best guess we can give is the usual expectation.

Example 2.68. When D is the trivial σ−algebra, we have that E(E(f|C)) = E(f), which is exactly what we required in the Definition 2.63.

9 We use E and E to distinguish the usual expectation and the conditional expectation 10A collection of of a given space is called a partition if the elements in the collection are mutually disjoint and the union of all the elements is the entire space. 16 ZIJIAN WANG

Definition 2.69. Let (X, B, µ) be a probability space. Suppose that we have a sub σ−algebra C ⊂ B. Then for almost every x ∈ X, we can define a family of of 11 probability measures (on X) {µx}x∈X . We call them the conditional measures. The conditional measures satisfy the following properties: R (1) E(f|C)(x) = X f(t)dµx(t), 12 (2) The map x 7→ µx is measurable with respect to C, (3) When C is countably generated, µx = µy if and only if they are in the same atom of C. 13 Remark 2.70. Notice that some regularity conditions on the sub σ−algebra is re- quired to make sense of statement (3) in Definition 2.69. Although countability of the sub σ−algebra is a fair assumption to make, e.g. in Example 2.71, conditional measures can still be defined without countability. However, (3) would not work if the sub σ−algebra is not countably generated. This is just because the intersection of uncountably many measurable sets is not necessarily measurable. Example 2.71. Just as we have mentioned at the start of the introduction to conditional measure, we are going to use this tool when we need to work with φ extensions. Suppose (X, BX , µ, Tx) −→ (Y, BY , ν, Ty) is an extension between two −1 dynamical systems. C = φ BY gives a sub σ−algebra of BX . Using C we can define a family of conditional measures {µx}x∈X for almost every x and a C−measurable δx −1 map x 7−→ µx. Actually, we are working with the φ BY , so µx = µx0 are the same if x and x0 are in the same fiber of φ. Moreover, we can think of the conditional δy measures as {µy}y∈Y . A canonical measurable map y 7−→ µy can be given by the diagram below

δx X {µx}x∈X

φ φ0

δy Y {µy}y∈Y .

2 2 Since the spaces L (X, C, µ, Tx) and L (Y, BY , ν, Ty) are isomorphic, we can in- troduce the following notation used by Furstenberg [1]. Given a function f ∈ 2 R L (X, B, µ), we define the ”conditional” expectation E(f|Y)(y) = fdµy. Recall that in order for φ to be an isomorphism, we need the following diagram to commute a.e. X Tx X φ φ T Y y Y. Therefore, we have the following identities deduced from Theorem 2.66. Given a C−measurable function g, E(gf|Y) = gE(f|Y),

UTy E(f|Y) = E(UTx f|Y).

11Note that the family of measures depends on the choice of C. 12We can think of the space of the measures on X as the linear functionals on L∞(X). 13Given a measure space (X, B, µ) where B is countably generated, the atom containing the point x is defined to be the intersection of all the measurable sets containing x. FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM 17

2.5. Weak mixing and compactness for extensions. We can generalize the compactness and weak mixing properties of dynamical systems to relative com- pactness and relative weak mixing properties with the help of extensions and con- ditional measures. All the definitions below assumes that X = (X, BX , µ, Tx) is an extension of Y = (Y, BY , ν, Ty).

Definition 2.72. A function f ∈ L2(X, µ) is almost periodic with respect to 2 Y if for every  > 0, there exists a finite collection of functions {gi}1≤i≤k ⊂ L (X, µ) n such that min1≤i≤r kUT f − gikL2 <  almost everywhere for all n ≥ 1. µy Remark 2.73. Note that the definition of relative almost periodicity involves the conditional measure µy constructed in Example 2.71. They are probability measures for X instead of Y but there is a measurable map y 7→ µy.

Definition 2.74. The extension X → Y is compact if the set of functions almost periodic with respect to Y is dense in L2(X, µ).

In order to generalize weak mixing, we need to first introduce the notion of relatively independent joining. There are several different constructions of relatively independent joining, here we introduce the formulation used by Einsiedler and Ward [2].

Definition 2.75. Given two measure-preserving systems (X, BX , µ, TX ) and (Y, BY , ν, TY ), a joining is a TX × TY −invariant measure δ defined on the space (X × Y, BX ⊗ BY ) such that the projections of δ onto X and Y coordinates are µ and ν respectively, i.e. (1) δ(AX × Y ) = µ(AX ) for all AX ∈ BX , (2) δ(AY × X) = ν(AY ) for all AY ∈ BY .

Example 2.76. The product measure µ × ν is always a joining by construction.

Definition 2.77. Given two invertible measure-preserving systems (X, BX , µ, TX ) and (Y, BY , ν, TY ) that shares a common non-trivial factor (Z, BZ , δ, TZ ) via factor maps φX and φY , we define the relatively independent joining µ×δ ν(, or X ×Z Y ) in 0 −1 0 −1 the following way. Let BX be φX BZ and BY be φY BZ . We can define conditional measures on X and Y using the sub σ−algebras B0 and B0 . Let µ0 be µ and X Y φX (x) x

νφY (y)] be νy and Z 0 0 µ ×δ ν = µz × νzdδ(z). Z Definition 2.78. The extension X → Y is weak mixing relative to Y if the system (X × X, µ ×Y µ, T × T ) is ergodic.

Remark 2.79. Recall that in Theorem 2.41, we have shown that a dynamical system (Z, C, δ, S) is weak mixing if and only if the product system (Z×Z, C⊗C, δ×δ, S×S) is ergodic. Now back to our definition of relatively weak mixing. Note that the relatively independent joining µ ×Y µ is exactly µ × µ if Y is a trivial factor of X . In other words, X is relatively weak mixing with respect to its trivial factor if and only if the system (X ×X, BX ⊗B, µ×µ, TX ×TX ) is ergodic, i.e. X is weak mixing in the usual sense. 18 ZIJIAN WANG

2.6. The structure theorem. The theorem that we are going to introduce is usually referred to as the Furtsenberg-Zimmer structure theorem, which is a general result in ergodic theory and has nothing to do with Szemer´edi’stheorem at first glance. Instead of proving the structure theorem, we will only give a simplified version of it used by Furstenberg in his proof of Szemer´edi’stheorem [1].

Theorem 2.80. (X, BX , µX ,TX ) is a measure-preserving system. Suppose it has a proper factor (Y, BY , µY ,TY ), then one of the following is true: (1) The extension X → Y is relatively weak mixing. (2) There exists some intermediate factor (Z, BZ , µZ ,TZ ) such that the proper ex- tension Z → Y is compact (i.e. Y and Z are not isomorphic). Remark 2.81. The structure theorem can be viewed as a way of decomposing an arbitrary dynamical system where the decomposition results in a tower of exten- sions and each extension is either relatively weak mixing or compact as the picture below shows.

φ∞ φ2 φ1 φ0 X −−→ ... −→X2 −→X1 −→X0

Each Xi represents a factor and each extension φi is either weak mixing or compact. The structure theorem can be very powerful with the help of transfinite induction. Imagine that we need to prove some property about a general dynamical system, we only need to check that such property lifts through both weak mixing and com- pact extensions and that there exists a maximal factor that satisfies such property. Actually, this is exactly how we are going to prove Szemer´edi’stheorem.

3. Furstenburg’s proof of Szemeredi’s´ theorem In this section, we illustrate Furstenberg’s proof of Szemer´edi’stheorem. The proof consists of three parts: showing that multiple recurrence of any measure- preserving system implies Szemer´edi’stheorem, proving the multiple recurrence property for two basic dynamical systems and finally using the structure theorem from Section 2.6 and the extension principles that we are going to establish to prove Szemer´edi’stheorem.

3.1. General Strategy. In order to prove Szemer´edi’stheorem using ergodic the- ory, the first step is to establish some sort of correspondence between a sequence of numbers and a measure-preserving system. This is done by proving the corre- spondence principle in Theorem 3.9, which reduces our problem to proving that every measure-preserving system has some property SZ to be defined in Definition 3.12. As we have seen in Theorem 2.80, an arbitrary dynamical system can be decomposed into a tower of factors, where each extension is either relatively weak mixing or compact.

φ∞ φ2 φ1 φ0 X −−→ ... −→X2 −→X1 −→X0 Now we show that the SZ property can be passed through both relatively weak mixing extensions and compact extensions by the extension principles proved in Theorem 3.27 and Theorem 3.29. Finally, using Zorn’s lemma and a lemma by Furstenberg, we will be able to prove Szemer´edi’s theorem. FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM 19

3.2. Szemer´edi’stheorem. Before starting to prove the theorem, we shall intro- duce some conventions and backgrounds.

Definition 3.1. Given a set of integers A ∈ Z, the upper Banach density of A µ(A T[−N,N]) is defined to be lim supN→∞ 2N+1 , where µ is the usual counting measure for the space of integers.

Remark 3.2. This notion of upper Banach density might seem weird at first glance. There’s actually a definition of natural density which replaces the lim sup with lim. Obviously there exist sets which doesn’t possess natural density, but the notion of upper Banach density applies to all sets. One can define the upper Banach density µ(A∩[N,M]) in a different but more conventional way, e.g. lim supM−N→∞ M−N . It is not hard to show that these two definitions are actually equivalent in the sense that the property of having positive upper density is preserved. Indeed, we do not care about the exact numerical value of the upper density. Throughout this paper, we are going to use the first definition. 1 Example 3.3. The set of odd numbers has density 2 and has the same upper density.

Definition 3.4. Given a set of integers A ∈ Z, we say that it contains k-term arithmetic progression if there exists integers n, d ∈ Z such that n + id ∈ Z for i ∈ [0, k − 1]. Alternatively, we say that the set A contains k-AP.

Theorem 3.5. (Szemer´edi). Any set of integers A ∈ Z that has positive upper density contains k-AP for all k ∈ N.

3.3. Correspondence. In this part, we introduce the notion of multiple recur- rence. In fact, multiple recurrence of any order for any measure-preserving system is equivalent to Szemer´edi’stheorem. However for the sake of our argument, it suffices to show that multiple recurrence of all orders implies Szemer´ediproperty. Once we have established the correspondence between Szemer´edi’stheorem and multiple recurrence, we can use the ergodic theory gadgets that we introduced in the second section to prove Szemer´edi’stheorem.

To get started, we first recall the definition for multiple recurrence. Definition 3.6. A measure-preserving system (X, B, T, µ) has multiple recur- rence of order k if for each A ∈ B that has positive measure, there exist n ∈ N such that k−1 ! \ µ T −inA > 0. i=0 In order to prove the correspondence principle, we need to introduce a classical result from functional analysis. Theorem 3.7. (Banach-Alaoglu) Given a separable normed linear space X, the closed unit ball in the dual space X∗ is sequentially compact under the weak* topol- ogy. 20 ZIJIAN WANG

Remark 3.8. It is obvious that in Euclidean spaces, unit balls are closed and bounded therefore compact. However, a theorem of Riesz states that closed unit balls in infinite dimensional spaces are not compact in the usual norm topology. However, compactness of unit balls is regained in infinite dimensional spaces under the weak-* topology by the Banach-Alaoglu theorem. This theorem is extremely useful here since we are always working with infinite dimensional linear spaces in the form of C(Y ) where Y is compact. C(Y ) is therefore separable.

Theorem 3.9. Multiple recurrence of any order for all measure-preserving systems implies Szemer´edi’s theorem.

Proof. Given an integer sequence {an} ⊂ Z with positive upper Banach density and a fixed natural number k ∈ N we show that {an} contains k-AP. Recall the dynamical system (2Z,T ) where T is the left Bernoulli shift. The sequence {an} naturally corresponds to a point x in our space 2Z where 14

 x[j] = 1 if j ∈ {a } (3.10) n x[j] = 0 otherwise

n 15 Let A = {T x}n∈Z be the closure of the two-sided orbit of x. The subspace A is still compact since it is closed and 2Z is compact. Let B ⊂ A be the subset of elements such that the 0th coordinate is 1. Observe that under the usual mea- sure defined in section 2, B has measure 0 since B is an at most countable union of singletons, which has zero measure under the measure that we defined before. Therefore, we need to find a proper measure µ under which µ(B) > 0 in order to use the multiple recurrence property. The construction we give here is a standard one. Moreover, we need to relate the measure with the upper Banach density of the corresponding sequence. For any point y ∈ A, we define δy to be the delta measure supported at the point y. Observe that

N T 1 X |{aj} [−N,N]| δ n (B) = . 2N + 1 T x 2N + 1 n=−N

We call the measure defined by the average of the delta measures

N 1 X ν = δ n . N 2N + 1 T x n=−N

14We have defined x[j] to be the jth coordinate of x. 15Usually when we refer to orbits, we assume that the number of times that we iterate is always n positive, i.e. we consider sets in the form of {T x}n∈N. FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM 21

Since the sequence {an} has positive upper Banach density, we can assume that there exists a sequence of natural numbers {Ni} such that

lim νN (B) j→∞ j

Nj 1 X = lim δT nx(B) j→∞ 2Nj + 1 n=−Nj |{a } ∩ [−N ,N ]| = lim n j j j→∞ 2Nj + 1 |{a } ∩ [−N,N]| = lim sup n > 0. N→∞ 2N + 1

By construction {νNj } is a subset of the unit sphere in the space of measures. Therefore, there is a subsequence {N } such that the sequence {ν } con- jl k∈N Njl k∈N verges to some measure µ under the weak* topology. Moreover, µ(B) > 0 by weak* convergence since B is closed. We also need to verify that µ makes the system measure-preserving. Indeed, ν − U ν Njl T Njl

 Nj Nj +1  1 Xl Xl =  δT nx − δT nx 2Njl + 1 n=−Njl n=−Njl +1 1  = δ −N − δ N +1 , T jl x T jl x 2Njl + 1 which goes to 0 as l approaches infinity. Therefore, the limit µ is invariant under T . Now we are ready to use the multiple recurrence hypothesis on the dynamical system (A, T, ν) and the set B. By multiple recurrence of order k, there exists some Tk−1 −in positive integer n such that ν( i=0 T B) > 0. Therefore, there is at least a point n Tk−1 −in m y ∈ {T x}n∈N ∩ i=0 T B since B is open in A. Suppose y = T x, then the m+in set {T x}0≤i≤k−1 is contained in B. In other words, {an} contains the k-term arithmetic progression {m + in}0≤i≤k−1. Since k is arbitrary, we have Szemer´edi’s theorem.  Remark 3.11. Recall that we have shown the Poincar´erecurrence (Theorem 2.20). It is a very elementary result yet similar to the complicated Szemer´edi’s theorem. Notice that for weak mixing systems, we automatically have multiple recurrence of order 2 by definition. Weak mixing systems seem easier to deal with. Therefore, we prove a stronger result which looks quite similar to the weak mixing property, which also involve Cesaro limits. We call this property SZ for simplicity, which is short for Szemer´edi.

Definition 3.12. We call a measure-preserving system (X, B, T, µ) SZ if for each A ∈ B that has positive measure and any positive integer k, N k−1 ! 1 X \ lim inf µ T −inA > 0 N→∞ N n=1 i=0 Remark 3.13. Notice that in the proof of Szemer´edi’stheorem, we provide different criteria that characterize the “Szemer´edi”property, for instance, Definion 3.12. 22 ZIJIAN WANG

Some of these criteria are equivalent but the others are not. The following theorem shows that being SZ is stronger than having multiple recurrence of any order. Theorem 3.14. A measure-preserving system (X, B, T, µ) has multiple recurrence of any order if it is SZ. Proof. For any positive integer k, X is SZ gives us N k−1 ! 1 X \ lim µ T −inA > 0. N→∞ N n=1 i=0 Tk−1 −in  Then there must exist some integer n such that µ i=0 T A > 0. If not, 1 PN Tk−1 −in  limN→∞ N n=1 µ i=0 T A would be 0 for all N. The limit as N → ∞ would also be 0.  Now we give an alternative characterization of the SZ property using functions. We will be using this to prove that weak mixing systems are SZ. As we will see later, this formulation will also allow us to exploit certain special properties of com- pactness and relative compactness since they are characterized by almost periodic functions. Theorem 3.15. A measure-preserving system (X, B, T, µ) is SZ if and only if for ∞ R any function f ∈ L such that f is nonnegative almost everywhere and X f > 0, we have N−1 Z k−1 1 X Y in (3.16) lim inf UT fdµ > 0, N→∞ N n=0 X i=0 where UT is the associated operator of T .

Proof. Suppose Theorem 3.16 is true, let f = χA be the characteristic function of a set A with positive measure. Then we have the SZ property. On the other hand the SZ property guarantees Theorem 3.16 for all characteristic functions of sets that have positive measures. We know that Theorem 3.16 is also true for all nonnegative L∞ functions f with positive integral if we take a sequence of simple 16 functions approaching f. Now we apply the monotone convergence theorem . 

Example 3.17. Consider the dynamical system (T,Rα) where T is the circle and Rα : x 7→ x + α is an irrational rotation acting on the circle. We use the function formulation of SZ in Theorem 3.15. For a fixed N, the Furstenberg ergodic average 1 PN−1 R Qk−1 looks like N n=0 X i=0 f(t − nα)dµ(t). We have proved in Example 2.25 that irrational rotations are ergodic, which means that {nα}n∈N is equidistributed on the circle. Therefore, when N → ∞ the ergodic average is equivalent to k Z Z Y f(x − it)dµ(x)dµ(t). T T i=0 We can see that the double integral is positive by considering t sufficiently small. Note that the expectation of f is strictly positive by assumption. When t is suf- ficiently small, the double integral is close to CE(f)k, where C is some positive constant depending on the restriction that we put on t.

16This is the reason that we require f to be nonnegative a.e. FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM 23

3.4. Two fundamental systems. We prove that the two dynamical systems that are very well-structured has the SZ property, namely the compact systems and weak mixing systems.

3.4.1. Weak mixing systems. In order to prove that weak mixing systems are SZ, we need to introduce the following lemma, which is a very classical result in real analysis. It is sometimes known as the van der Corput trick.

Lemma 3.18. (van der Corput). Suppose we have a bounded sequence {vn}n≥1 in some Hilbert space X. We define a sequence of real numbers {an}n≥0 ⊂ R such 1 PN 1 PN−1 that ai = lim supN→∞ | N n=1hvn+i, vni|. If limN→∞ N n=0 an = 0, then we 1 PN have limN→∞ || N n=1 vn|| = 0 Theorem 3.19. Given a weak mixing system (X, B, T, µ) and a finite set of k ∞ functions {fi}1≤i≤k ⊂ L (X), we have N−1 k k Z 1 X Y in Y lim UT fi = fidµ N→∞ N n=0 i=1 i=1 X in L2. Remark 3.20. It is clear that Theorem 3.19 implies the “function formulation of SZ” stated in Theorem 3.16 where we take a single function instead of k different functions. It is actually not quite surprising why in the case of weak mixing systems we not only know that the limit is larger than zero when all the functions are nonnegative and have positive expectation, but also have information on exactly what the limit converges to. Indeed we know that if (X, B, T, µ) is weak mixing, then 1 PN−1 T −n N n=0 |µ(A T B) − µ(A)µ(B)| goes to 0 for any measurable set A, B ∈ B. Proof. The proof goes by induction. When k = 1, by mean ergodic theorem and the fact that weak mixing implies er- 1 PN−1 n godicity, we know that limN→∞ N n=0 UT f1 exists and equals to the expectation of f1. For simplicity, we will only show how the k = 1 case implies the k = 2 case instead of doing a general inductive step. Therefore, we need to show that N−1 Z Z 1 X n 2n lim UT f1UT f2 = f1dµ f2dµ. N→∞ N n=0 X X

Notice that if f1 = c for some constant c, then N−1 1 X n 2n lim UT f1UT f2 N→∞ N n=0 N−1 1 X 2n =c lim UT f2 N→∞ N n=0 Z =c f2 (By the k = 1 case) X Z Z = f1 f2. X X 24 ZIJIAN WANG

We can also reduce to the base case when f2 is constant. In order to apply the van R R der Corput lemma, we need some normalization to make sure that X f1 X f2 = 0. Actually, we can assume without loss of generality that f1 has expectation R zero. Indeed, we can always replace f1 by f1 − X f1dµ. It suffices to show that 1 PN−1 n 2n limN→∞ N n=0 UT f1UT f2 = 0. We define a sequence {vn}n≥1 in the Hilbert 2 n 2n space L by vn = UT f1UT f2. Now we compute the real numbers {ai}i≥0 defined in the van der Corput lemma and we have

N−1 1 X ai = lim hvn, vn+ii N→∞ N n=0 N−1 Z 1 X n 2n n+i 2n+2i = lim UT f1UT f2UT f1UT f2dµ N→∞ N n=0 X N−1 Z 1 X n i n+2i = lim f1UT f2UT f1UT f2dµ (T is measure-preserving) N→∞ N n=0 X N−1 Z 1 X i n 2i = lim (f1UT f1)UT (f2UT f2)dµ N→∞ N n=0 X Z Z i 2i = f1UT f1dµ f2UT f2dµ (by Corollary 2.30). X X By the characterizations of weak mixing systems in section 2, we know that {X × 2 17 2 X,T × T , µ ⊗ µ} is also weak mixing. We define g ∈ L (X × X) by g(x1, x2) = f1(x1)f2(x2). Recall that in the van der Corput lemma, we require that 1 PN−1 limN→∞ N n=0 an = 0. Indeed, we have N−1 N−1 Z Z 1 X 1 X n 2n lim an = lim f1UT f1dµ f2UT f2dµ N→∞ N N→∞ N n=0 n=0 X X N−1 Z Z 1 X n = lim g(x1, x2)UT ×T 2 g(x1, x2)dµ(x1)dµ(x2) N→∞ N n=0 X X N−1 Z 1 X n = lim gUT ×T gdµ ⊗ µ N→∞ N n=0 X×X Z Z = gdµ ⊗ µ gdµ ⊗ µ X×X X×X (by ergodicity and Corollary 2.30) Z = ( gdµ ⊗ µ)2 X×X Z Z 2 = ( f1dµ f2dµ) X X = 0 (since the expectation of f1 is 0). .

17 k Qk i We will need {X , i=1 T } to be weak mixing in the general inductive step but this is also clear directly from the properties of weak mixing systems. FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM 25

By the van der Corput lemma, we have

N−1 1 X 0 = lim vn N→∞ N n=0 2 N−1 1 X n 2n = lim UT f1UT f2 . N→∞ N n=0 2 This completes the proof.  Theorem 3.21. If a measure-preserving system (X, B, T, µ) is weak mixing, then it is SZ.

Proof. In order to show that X is SZ, we need to prove that for any function f ∈ L∞ R such that f is nonegative a.e. and X f > 0, N−1 Z k−1 X 1 Y in lim inf UT fdµ > 0. N→∞ N n=0 X i=0 By Theorem 3.19, we have N−1 Z k−1 Z X 1 Y in k lim inf UT fdµ = ( fdµ) > 0 N→∞ N n=0 X i=0 X .  3.4.2. Compact systems. Recall that Theorem 2.56 gives a strong and useful charac- terization about compact systems. Kronecker systems are much easier to visualize since they can be viewed as rotations. However, we will prove that compact systems are SZ directly from definition. Theorem 3.22. If the measure-preserving system (X, B, T, µ) is compact, then it is SZ.

Proof. Given any nonnegative L∞ function f with positive expectation and a fixed integer k, we need to show that N−1 Z k−1 1 X Y in lim inf UT fdµ > 0. N→∞ N n=0 X i=0

Without loss of generality, we may assume that 0 < kfk∞ ≤ 1. Indeed, we can f k always replace f by . As before, define A ⊂ to be {k| f − UT f < }. By kfk∞ N 2 Theorem 2.50, we know that for each 0 <  < 118 there is some large finite integer ∞ g ∈ N such that g is larger then any gap in the set A. Given any nonnegative L function f with positive expectation and a fixed integer k, we need to show that N−1 Z k−1 1 X Y in lim inf UT fdµ > 0. N→∞ N n=0 X i=0

18It is not strictly required for  to be small than 1. It is just a technical issue which will appear later in the proof. 26 ZIJIAN WANG

If n ∈ A  , then for 1 ≤ i ≤ k we have k2k i−1 in X jn (j+1)nf f − UT f ≤ UT f − UT (Triangle inequality) 2 2 j=0 k−1 X jn (j+1)nf ≤ UT f − UT 2 j=0 k−1 X n ≤ kf − UT fk2 (T is measure-preserving) j=0 n =k kf − UT fk2  ≤ . 2k in 2  19 Let hi be UT f − f. It has L norm no greater than 2k . Observe that for each fixed 1 ≤ s ≤ k and for a finite index set {jl}1≤l≤s ⊂ [1, k], s s Z Y Z Y f k−s h dµ ≤ h dµ (because kfk < 1)(3.23) jl jl ∞ X l=1 X l=1 s Y (3.24) = hh , h i j1 jl l=2  (3.25) ≤ (by H¨older’s inequality), 2k where we assume s ≥ 2 in (3.24) since the result is trivial when s = 1. Now, we have

Z k−1 Z Z k−1 Z Y in k Y k UT fdµ − f dµ =| (f + hi)dµ − f dµ| X i=0 X X i=0 X  ≤(2k − 1) (by Lemma 3.23) 2k <.

R Qk−1 in R k In other words, X i=0 UT fdµ is at least X f dµ−. We can take  to be smaller R k than X f dµ so that Z k Z Y in k UT fdµ > f dµ −  > 0. X i=0 X R Qk in Recall that since f is nonnegative a.e., X i=0 UT fdµ ≥ 0 even when n is not in A  . The biggest gap in A  is smaller than the constant g  by the definition k2k k2k k2k of g. Therefore if we take the average, we get N−1 Z k−1 R k 1 X Y in X f dµ −  lim inf UT fdµ ≥ > 0 N→∞ N X g  n=0 i=0 k2k . 

19 Notice that h0 = 0. FURSTENBERG’S ERGODIC THEORY PROOF OF SZEMEREDI’S´ THEOREM 27

Remark 3.26. From the proof of Theorem 2.50 we can see that compactness is actually a very strong property since it provides us with the tools to approximate a large portion of points on the orbit. One may observe that the proof above is quite similar to Example 3.17, which proves SZ for irrational rotations. 3.5. Extension principles. Now we introduce the two extension principles. We will not prove the extension principles in full detail since they are actually analogous to the proof of the two fundamental systems.

Theorem 3.27. If X = (X, BX , µ, Tx) is a compact extension of Y = (Y, BY , ν, Ty) and Y is SZ, then X is also SZ. Remark 3.28. Just as in the proof of compact systems, given a function f ∈ L2(X) we try to find a set B for each n such that on the set B , R Qk U infdµ is very n n X i=0 Tx close to R f k. This is strictly positive, so the ergodic average Bn N−1 k−1 1 X Z Y U infdµ N Tx n=0 X i=0 is also positive as N → ∞. The fact that the extension X → Y is compact ensures that the constant does not get too small or too close to zero, just as the gap g is bounded in Theorem 3.22. However, since we are working with relative compact- ness instead of compactness, we need to make some adjustments. We consider the k−1 2 space ⊕i=0 L (X, µy) where the norm is given by k(f0, f1, ..., fk−1)k = max kfjkL2 . µy Next, we investigate the set S = {(f, U n f, ..., U (k−1)nf} ⊂ ⊕k−1L2(X, µ ). Tx Tx n∈Z i=0 y We are able to use the hypothesis that Y is SZ now since there is a measur- 20 able map y → µy. We start with some set B ⊂ X such that E(f|Y) is not too small. Then we keep reducing B if necessary and finally reach some set Bn such that there is a subset of S that is at most  separated for some small  in the space S. Finally, we are able to find some integer n depending on 1 PN−1 R Qk−1 in R k Bn and approximate U fdµ by f dµ, which is positive. N n=0 X i=0 Tx Bn Similar to Theorem 3.22, the set nm has bounded gap d and we would have lim 1 PN−1 R Qk−1 U infdµ is larger than C 1 R f kdµ where C is some N→∞ N n=0 X i=0 Tx d X constant. This is strictly larger than zero since f has positive expectation. As we can see, this theorem is actually just a relative version of Theorem 3.22.

Theorem 3.29. If X = (X, BX , µ, Tx) is a weak mixing extension of Y = (Y, BY , ν, Ty) and Y is SZ, then X is also SZ. Remark 3.30. In the case of weak mixing extensions, the limit does not neces- sary exists as in Theorem 3.21. However, we are able to approximate the product Qk−1 U inf by Qk U ln E(f |Y). This uses following lemma stated in Furstenberg’s i=0 Tx l=0 Ty l proof [1]. This is just a relative version of Theorem 3.21.

Lemma 3.31. Let (X, BX , µ, TX ) be a relatively weak mixing extension of (Y, BY , ν, TY ). ∞ Fix any integer k ∈ N. Then if fl ∈ L (X, B, µ), l = 0, 1, ..., k, we have the follow- ing two equalities: 2 (1) lim 1 PN E(Qk U lnf |Y) − Qk U lnE(f |Y) = 0, N→∞ N n=1 l=0 T l l=0 S l

1 PN Qk ln Qk ln (2) limN→∞ N n=1( l=1 UT fl − l=1 UT E(fl|Y)) = 0. L2 20This is defined in Example 2.71. 28 ZIJIAN WANG

3.6. Conclusion. To complete the proof of Szemer´edi’s theorem, we need the fol- lowing lemma proved by Furstenberg [1].

Lemma 3.32. Let X = (X, BX , µ, Tx) be a measure-preserving system and S be the family of factors of X that are SZ. Suppose R ⊂ S is a totally ordered21 family of factors, then sup R ∈ S, i.e. sup R is SZ. A direct application of Zorn’s lemma combined with the above result proves the existence of a maximal factor.

Theorem 3.33. Let X = (X, BX , µ, Tx) be a measure-preserving system. The family of factors of X that are SZ contains a maximal element. Theorem 3.34. Every measure-preserving system is SZ. Proof. Let X be an arbitrary measure-preserving system. If X is weak mixing, then by Theorem 3.21, X is already SZ. Suppose X is not weak mixing. We know that X has some nontrivial compact factor Y0 by Theorem 2.58. Moreover, Y0 is SZ according to Theorem 3.22. Let Y be the maximal SZ factor of X . Suppose Y is a proper factor of X . If the extension X → Y is relatively weak mixing, we know that X is also SZ by Theorem 3.29. In this case, Y cannot be maximal. On the other hand, if the extension X → Y is not relatively weak mixing, then by Theorem 2.80 there exists an intermediate compact extension X → Z → Y. By Theorem 3.27, we know that Z is SZ, thus Y is not maximal. We reach contradiction in both situations. Therefore, we can conclude that every measure-preserving system is SZ.  Acknowledgments It is a pleasure to thank my mentor Brian Chung for his time and effort in guiding me and helping me understand the material. I have been interested in Szemer´edi’s theorem for a long time but I had no experience in either the theorem or ergodic theory before this summer. I would not be able to see the elegance of Furstenberg’s proof without the help of Brian. I also want to thank Professor May for helping me with my paper writing skills and letting me be a part of the REU program.

4. bibliography References [1] H. Furstenberg, Y. Katznelson, D. Ornstein. The Ergodic Theoretical Proof of Szemer´edi’s Theorem. 1982. [2] Manfred Einsiedler, Thomas Ward. Ergodic Theory With A View Towards Number Theory. Springer London Dordrecht Heidelberg New York. 2011. [3] Yufei Zhao. Szemer´edi’sTheorem via Ergodic Theory. 2011.

21 Given P and Q. Then we have P < Q if there exists some extension map φP,Q : P → Q, i.e. Q is a factor of P.