Sparse Bounds in Harmonic Analysis and Semiperiodic Estimates

by Alexander Barron

May 2019 2 c Copyright 2019 by Alexander Barron

i This dissertation by Alex Barron is accepted in its present form by the Department of

Mathematics as satisfying the dissertation requirement for the degree of Doctor of Philosophy

Date

Jill Pipher, Ph.D., Advisor

Recommended to the Graduate Council

Date

Benoit Pausader, Ph.D., Reader

Date

Sergei Treil, Ph.D., Reader

Approved by the Graduate Council

Date

Andrew G. Campbell, Dean of the Graduate School

ii Vitae

Alexander Barron graduated from Colby College with a B.A. in Mathematics in May 2013

(summa cum laude, Phi Beta Kappa). He also earned an M.S. in mathematics from Brown

University in May 2016. At Brown he was a teaching assistant and instructor for multiple sections of Calculus I and II, and also for multivariable calculus. He has given research talks at various conferences and institutions, including Georgia Tech, the University of Virginia, the

University of New Mexico, and the American Institute of Math (AIM). Starting in August 2019 he will be serving as a Doob Visiting Assistant Professor at the University of Illinois Urbana-

Champaign.

Below is a list of publications and preprints by the author.

1. Weighted Estimates for Rough Bilinear Singular Integrals via Sparse Domination, N.Y.

Journal of Math 23 (2017) 779-811.

2. Sparse bounds for bi-parameter operators using square functions, with Jill Pipher. Preprint

arXiv:1709.05009.

3. Sparse domination and the strong maximal function, with Jos´eConde Alonso, Yumeng

Ou, and Guillermo Rey. Adv. Math. 345 (2019), 1-26.

4. On Global-in-time Strichartz estimates for the semiperiodic Schr¨odingerequation. Preprint,

arXiv:1901.01663

iii iv Preface

Part I of this thesis studies some results related to sparse bounds in harmonic analysis. We provide a brief overview of sparse bounds in Chapter 1. In Chapter 2 we study sparse bounds for certain ‘rough’ bilinear singular integrals with limited smoothness assumptions. In Chapters

3 and 4 we study the extent to which the sparse theory applies in the setting of operators with a bi- or multi-parameter structure. In particular in Chapter 4 we show that the simplest one- parameter sparse bound (for the dyadic Hardy-Littlewood maximal function) does not directly generalize to the bi-parameter setting. Chapter 2 is based on [1]. Chapter 3 is based on [2] and is joint work with Jill Pipher. Chapter 4 is based on [3] and is joint work with Jos´eConde

Alonso, Yumeng Ou, and Guillermo Rey.

In Part II we study certain global-in-time Strichartz-type estimates for solutions to the linear

n d semiperiodic Schr¨odingerequation (on manifolds of the type R × T ). In Chapter 5 we survey n d the related theory on R and T . In Chapter 6 we discuss some earlier results on R × T and n d then prove a certain generalization on R × T (see Chapter 6 for a precise statement). A large part of Chapter 6 is based on [69].

v vi Acknowledgments

This thesis would not have been possible without the generous help and support of my advisor

Jill Pipher. I also benefited greatly from discussions with other faculty at Brown, including Jos´e

Conde Alonso, Francesco Di Plinio, Benoit Pausader and Sergei Treil. The same can be said of all the students in analysis that I’ve had the privilege of talking to during my time at Brown:

Amalia Culiuc, Milen Ivanov, Spyridon Kakaroumpas, Linhan Li, and Yumeng Ou.

vii viii Contents

I Sparse Bounds in Harmonic Analysis5

1 A Brief Overview of Sparse Bounds7

2 Sparse Bounds for Rough Bilinear Operators 11

2.0.1 Structure of the Chapter...... 15

2.0.2 Notation and Definitions...... 15

2.1 Abstract Sparse Theorem in Multilinear Setting...... 16

2.1.1 Remark on Truncations...... 18

2.1.2 The Abstract Theorem...... 19

2.1.3 Some Remarks on Theorem 2.2...... 21

2.2 Multilinear Calder´on-Zygmund Operators...... 23

2.2.1 Single-Scale Estimates...... 25

2.2.2 Cancellation Estimates...... 28

2.3 Rough Bilinear Singular Integrals...... 33

2.3.1 Interpolation Lemmas...... 34

2.3.2 The Sparse Bound...... 38

2.4 Weighted Estimates...... 40

2.4.1 Single Weight - Proof of Corollary 2.2...... 40

2.4.2 Multiple Weights...... 42

2.5 Proof of Theorem The Abstract Theorem...... 44

1 2.5.1 Construction of the Sparse Collection...... 44

2.5.2 Multiplication Operators...... 49

3 Sparse Bounds in the Multi-Parameter Setting 53

3.1 Some Difficulties...... 54

3.2 An Approach Using Square Functions...... 56

3.2.1 Remarks on Theorem 3.1...... 58

3.3 The Lacey-Ou Example...... 59

3.4 The Bi-Parameter Martingale Transform...... 62

3.4.1 The Geometric Proof of Theorem 3.1...... 63

3.5 Bi-Parameter Dyadic Shifts and Singular Integrals...... 67

3.5.1 Definitions...... 68

3.5.2 The Sparse Bound...... 69

3.5.3 Bi-Parameter Singular Integrals...... 73

3.6 Weighted Estimates for Bi-Parameter Martingale Transform...... 74

3.7 Weighted Estimates for the Shifted Square Function...... 78

3.7.1 Preliminary Results...... 79

3.7.2 The Sparse Bound...... 82

4 A Counterexample for the Strong Maximal Function 93

4.1 Lower bounds for MS and upper bounds for sparse forms...... 96

4.1.1 Lower bound for MS ...... 99 4.1.2 Upper bound for sparse forms...... 100

4.2 Construction of the extremal sets...... 104

4.2.1 Construction of P and the proof of Theorem 4.2...... 105

4.2.2 Construction of Z ...... 107

4.2.3 Proof of lemma 4.5 and (Z.5)...... 109

4.3 Extensions and open questions...... 114

2 4.3.1 Failure of sparse bound for bi-parameter martingale transform...... 114

4.3.2 Failure of sparse bound in higher dimensions...... 116

4.3.3 Other sparse forms...... 119

II Semiperiodic Strichartz Etimates 123

n d 5 Strichartz Estimates on R and T 125 5.0.1 Strichartz Estimates on Euclidean Space...... 125

5.0.2 Strichartz Estimates on the Torus...... 127

6 The Semiperiodic Case 131

6.1 Local Estimates on R × T ...... 132 n d 6.2 Global Estimates on R × T ...... 137 6.3 Preliminary Results...... 142

6.4 The Decoupling Argument...... 147

6.4.1 Main Lemmas...... 148

6.4.2 Main Argument...... 150

6.4.3 Proof of Decoupling Lemma 6.7...... 152

6.5 The -Removal Argument...... 156

6.5.1 The Local-In-Time Case...... 156

6.5.2 The Global Case...... 160

6.6 Some Applications...... 167

6.6.1 Function Spaces...... 167

6.6.2 The Quintic Equation on R × T ...... 169 2 6.6.3 The Cubic Equation on R × T ...... 176 6.7 Additional Remarks...... 178

3 4 Part I

Sparse Bounds in Harmonic Analysis

5

Chapter 1

A Brief Overview of Sparse Bounds

Over the past several years there has been a lot of activity concerning the sparse domination of various operators arising in harmonic analysis. Given an operator T and suitable test functions

n f, g on R one typically shows that there exists a sparse collection of cubes S such that

X 1 T f(x) . hfiQ Q(x) Q∈S or such that X |hT f, gi| ≤ C hfir,Qhgis,Q|Q| (1.1) Q∈S

1 R r 1/r for some 1 ≤ r, s < ∞, where hfir,Q = ( |Q| Q |f| dx) and hfiQ = hfi1,Q (if T is multilinear one needs to work with a more general multilinear version of this estimate, see below). The sparse collection S is almost disjoint in the following sense: there exists η ∈ (0, 1) such that for each Q ∈ S there is a subset EQ ⊂ Q such that |EQ| ≥ η|Q|, and moreover EQ ∩ EQ0 = ∅ for distinct Q, Q0 ∈ S.

The sparse theory offers a new approach to the analysis of many classical (and non-classical) operators in harmonic analysis, and indeed most of the major results in the area were proved within the last five years. We now know that bounds of the type (1.1) hold for a wide variety of operators: smooth and rough Calder´on-Zygmund operators, singular integrals along curves, the

7 spherical maximal function, the variational Carleson operator, the bilinear Hilbert transform, certain Bochner-Riesz multipliers, pseudodifferential operators, singular operators associated to elliptic operators on manifolds, and various discrete analogues of classical operators (this list is far from exhaustive, see for example [53], [48], [4], [6], [19], [1] and the references therein).

Sparse bounds typically recover the full range of known strong and weak-type Lp bounds for

T , and also yield vector-valued estimates as a corollary. Moreover, a bound of type (1.1) implies a range of quantitative weighted estimates for T with respect to Muckenhoupt and reverse H¨older weight classes, and these bounds are often sharp in terms of the weight characteristic (e.g. the sparse bound for Calder´on-Zygmund operators yields a simple proof of the sharp weighted bounds originally proved in [41]). Note that the operators T listed above are oscillatory or non- local, yet the sparse bound (1.1) gives control of T by an operator that is positive and localized without losing information about the mapping properties of T . The sparse arguments also generalize to the probabilistic setting where T is a discrete or continuous martingale transform, as long as one replaces averages over cubes with conditional expectations adapted to iterated stopping times [27].

In subsequent chapters we will work with positive sparse forms. Let S be a sparse collection

d of cubes in R and suppose ~p = (p1, p2, ..., pm+1) is an (m + 1)-tuple of Lebesgue exponents. We define the form m+1 ~p X Y PSFS (f1, ..., fm+1) := |Q| (fi)pi,Q, Q∈S i=1 where

−1/q (f)q,Q = |Q| kf1Qkq.

These operators were initially studied in [6], [24], [25], motivated by earlier pointwise estimates in the sparse domination theory (see [48], [53], [54], and [55] for some examples). It is important to note that there are examples of operators that are bounded by sparse forms, but which may not admit a pointwise sparse bound (for example, the bilinear Hilbert transform [24]). It is therefore typically easier to prove sparse-form estimates in place of pointwise sparse bounds,

8 and in practice sparse form bounds are not much weaker than pointwise estimates. Working with sparse forms also helps to avoid certain technical obstacles (for example, one can avoid the reliance on maximal truncation estimates present in [48], [54]). We refer the reader to [19], [47], and [51] for further discussion and examples of recent developments.

In the next chapter we study sparse-form bounds for a certain class of rough bilinear singular integrals. In Chapters 3 and 4 we turn our attention to sparse bounds in the setting of multi- parameter operators, where very little is known.

9 10 Chapter 2

Sparse Bounds for Rough Bilinear Operators

The study of rough singular integral operators dates back to Calder´onand Zygmund’s classic

n−1 R papers [10] and [11]. In [11] the authors proved that if Ω ∈ L log L(S ) with Sn−1 Ω dσ = 0, then the operator Z Ω(y/|y|) RΩf(x) := p.v. n f(x − y)dy Rn |y|

p n is bounded on L (R ) for 1 < p < ∞. Hoffman [38] and Christ and Rubio de Francia [15], [16] proved such operators are bounded from L1 → L1,∞ for small dimensions, and weak-(1, 1) boundedness in arbitrary dimensions was later proved by Seeger [63] and Tao [67] (in a more general setting). The weighted theory for such operators was developed during this time as well; for some examples, see the work by Duoandikoetxea [29] and Watson [68]. More recently,

Conde-Alonso, Culiuc, Di Plinio, and Ou [19] have shown that the bilinear form associated to a rough operator RΩ can be bounded by positive sparse forms, proving quantitative weighted estimates for RΩ as an easy corollary. Also see the recent work by Hyt¨onen,Roncal, and Tapiola [44], where the authors establish quantitative estimates using a different method.

We are interested in the bilinear analogues of the operators RΩ. The study of these operators originates with work by Coifman and Meyer [17]. Suppose Ω ∈ Lq(S2d−1) for some q > 1 with

11 R S2d−1 Ω dσ = 0, and define the rough bilinear operator

Z Z Ω((y1, y2)/|(y1, y2)|) TΩ(f1, f2)(x) = p.v. f1(x − y1)f2(x − y2) 2d dy1dy2. (2.1) Rd Rd |(y1, y2)|

Grafakos, He, and Honz´ık[34] have proved using a wavelet decomposition that if Ω ∈ L∞(S2d−1) then

kTΩkLp1 (Rd)×Lp2 (Rd)→Lp(Rd) < ∞ when 1 < p , p < ∞ and 1 + 1 = 1 . Also see [34] for references to earlier work. In this 1 2 p1 p2 p chapter we develop a weighted theory for the rough bilinear operators TΩ, using a multilinear generalization of the sparse domination theory from [19] along with the results by Grafakos, He, and Honz´ık.The main result is the following theorem.

Theorem 2.1. Suppose TΩ is the rough bilinear singular integral operator defined in (2.1), with

∞ 2d−1 R Ω ∈ L (S ) and S2d−1 Ω = 0. Then for any 1 < p < ∞, there is a constant Cp > 0 so that

(p,p,p) |hTΩ(f1, f2), f3i| ≤ CpkΩkL∞(S2d−1) sup PSFS (f1, f2, f3). S

Here the supremum is taken over all sparse collections S with some fixed sparsity constant η that does not depend on the functions. This theorem is a consequence of a more general multilinear sparse domination result, which is stated in Section 2. As an application of Theorem 2.1 we derive weighted estimates for TΩ. Recall the Ap class of weights w, where w ∈ Ap for 1 < p < ∞

1 if w > 0, w ∈ Lloc, and

p−1  1 Z   1 Z 1  − p−1 [w]Ap := sup w dx w dx < ∞. d |Q| |Q| Q⊂R Q Q cubes

Below we write Lp(w) for the space Lp with measure w(x)dx.

Corollary 2.1. Fix 1 < p , p < ∞ and 1 < p < ∞ such that 1 + 1 = 1 . Then for all weights 1 2 2 p1 p2 p p p (w 1 , w 2 ) in (A ,A ), there is a constant C depending on [wp1 ] , [wp2 ] , d, p and p such 1 2 p1 p2 Ap1 Ap2 1 2

12 that

p p p p kTΩ(f1, f2)k p ≤ Ckf1k p1 1 kf2k p2 2 L (w1 w2 ) L (w1 ) L (w2 )

pi pi for all fi ∈ L (wi ).

These estimates were originally proved by Cruz-Uribe and Naibo in [23] with a different tech- nique. Our proof uses the sparse domination from Theorem 2.1 along with methods from [24],

[55], [56] and extrapolation. Note the following special case of Corollary 2.1 in the single-weight case.

∞ 2d−1 R Corollary 2.2. Suppose 1 < q < ∞ and Ω ∈ L (S ) with S2d−1 Ω dσ = 0. Then if w ∈ Aq, there is a constant C = C(w, q, Ω) such that

kTΩ(f, g)kLq/2(w) ≤ CkfkLq(w)kgkLq(w) for all f, g ∈ Lp(w).

We provide a separate proof of this corollary in Section 5.1 that indicates how to track the dependence of C on [w]Ap . The proof of Corollary 2.2 is again a consequence of sparse domination and extrapolation.

We can also prove weighted estimates with respect to the more general multilinear Mucken- houpt classes. In particular, suppose P3 1 = 1 with 1 < q , q , q < ∞ and let v , v , v be i=1 qi 1 2 3 1 2 3 strictly positive functions such that 3 1 Y qi vi = 1. i=1 Define 1 1 3 pi −  Z  pi qi Y 1 pi−qi [~v]A~p := sup vi ~q |Q| Q i=1 Q for any tuple ~p = (p1, p2, p3) with 1 ≤ pi < qi < ∞ for i = 1, 2, 3. Notice that we assume vi > 0

1 for each i, but we do not assume that vi ∈ Lloc. This multilinear class was originally introduced in [56] with ~p = (1, 1, 1), and used in [24] for more general ~p. In the case when ~p = (1, 1, 1), it

13 is the natural weight class associated to the maximal operator

2 Y 1 Z M(f~)(x) := sup |f (y )|dy . |Q| i i i x∈Q i=1 Q

We will prove the following in Section 5.2.

Corollary 2.3. Suppose the tuples ~p,~q and the functions v1, v2, v3 are defined as above, with pi > 1 for i = 1, 2, 3. Also let

q q 0 − 2 − 1 −q3/q3 q1+q2 q1+q2 σ = v3 = v1 v2 ,

0 ∞ 2d−1 R where q3 is the conjugate of q3. Then if [~v] ~p < ∞ and Ω ∈ L (S ) with S2d−1 Ω = 0, we A~q have n o max qi qi−pi kTΩ(f1, f2)k q0 ≤ CΩ,d,p ,q [~v] kf1k q1 kf2k q2 L 3 (σ) i i ~p L (v1) L (v2) A~q

q for all fi ∈ L i (vi), i = 1, 2.

We will see in Section 5.2 below that Corollary 2.1 and Corollary 2.3 can both be deduced from the same lemma, which is a consequence of the sparse domination. However, the class [~v] ~p is A~q 1 in general strictly larger than Aq1 × Aq2 since, for example, v1 and v2 do not have to be in Lloc (see [56] for some examples when ~p = (1, 1, 1)).

Sample Application. We note that the Calder´oncommutator

Z A(x) − A(y) C(a, f)(x) = p.v. 2 f(y)dy, R (x − y) where a is the derivative of A, is an example of the rough bilinear operators considered in this paper [17]. Let e(t) = 1 if t > 0 and e(t) = 0 if t < 0. Then we can write this operator as

Z Z p.v. K(x − y, x − z)f(y)a(z)dydz R R 14 e(z)−e(z−y) Ω((y,z)/|(y,z)|) with K(y, z) = y2 =: |(y,z)|2 . Moreover, Ω is odd and bounded, so Theorem 2.1 and the weighted corollaries all apply to C(a, f).

2.0.1 Structure of the Chapter

The proof of Theorem 2.1 is broken up into two parts. We begin by formulating an abstract sparse domination theorem in the multilinear setting (Theorem 2.2 below), a result that gener- alizes Theorem C in [19] by Conde-Alonso, Culiuc, Di Plinio, and Ou. The proof of this theorem is similar to their result, so it is deferred to Section 2.5. In the second part of the paper we use the deep results of Grafakos, He, and Honz´ıkto show that the assumptions of Theorem 2.2 are satisfied by the rough bilinear operators TΩ. Along the way we also prove a sparse domination result for multilinear Calder´on-Zygmund operators with a standard smoothness assumption, as defined by Grafakos and Torres in [36]. This is the content of Theorem 2.3 below. This sparse domination result for multilinear Calder´on-Zygmund operators uses different methods than the recent paper by K. Li [57], and in particular avoids reliance on maximal truncations. As a corollary we can partially recover the weighted estimates from [55]. Finally, in Section 2.4 we prove the weighted estimates outlined in the corollaries above.

Theorem 2.2 is independently interesting, and due to its generality we expect it to be useful for analyzing other multilinear operators in the future. We also note that Theorem 2.1 only applies when Ω ∈ L∞(S2d−1), since we need some analogue of the classical Calder´on-Zygmund size condition. Sparse domination when Ω ∈ Lq(S2d−1) for other values of q, or when Ω is in some Orlicz-Lorentz space, is still an open problem. In fact, the full range of boundedness of TΩ

q 2d−1 when Ω ∈ L (S ) for 2 < q < ∞ is not known. It is also unknown whether TΩ is bounded anywhere for q < 2. See [34] for more details.

2.0.2 Notation and Definitions

ˆ 5 Given a dyadic cube L, we let sL = log2(length(L)) and let L be the 2 -fold dilate of L. d Throughout the paper we fix a dyadic lattice D in R . A collection of disjoint cubes P ⊂ D will

15 be called a stopping collection with top Q if its elements are contained in 3Q and satisfy the following separation properties:

1. If L, R ∈ P and |sL − sR| ≥ 8 then 7L ∩ 7R = ∅

S S 2. L∈P 9L ⊂ L∈P L. 3L∩2Q6=∅

This definition is taken from [19] (the particular constants here are chosen for technical reasons related to the proof of Theorem 2.2 below).

Throughout the paper we use cα to represent a positive constant depending on the parameter

α that may change line to line. We often write A . B to A ≤ cB, where c is a positive constant depending on the dimension d, the multilinearity term m, or relevant exponents. For

d E ⊂ R we let |E| denote its Lebesgue measure and 1E its indicator function. Finally, we will use Mp(f)(x) to denote the p-th Hardy-Littlewood maximal function

 Z 1/p 1 p Mp(f)(x) = sup |f(y)| dy x∈Q |Q| Q

d (here the supremum is taken over cubes Q ⊂ R containing x). Recall that Mp is bounded on r d w L (R ) when r > p. We also write Mp (f) to denote the p-th maximal function associated to a weight w. This operator satisfies the same boundedness properties as the standard maximal operator when w is doubling (and in particular when w ∈ Aq for some q).

2.1 Abstract Sparse Theorem in Multilinear Setting

In this section we formulate an abstract sparse domination result which we will apply in Section

2.3 to prove Theorem 1. Let T be a bounded m-linear operator mapping Lr1 × ... × Lrm → Lα for some r , α ≥ 1 with 1 + ... + 1 = 1 , and assume T is given by integration against a kernel i r1 rm α

K(x1, ..., xm+1) away from the diagonal. We will assume the kernel of T has a decomposition

X K(x1, ..., xm, xm+1) = Ks(x1, ..., xm, xm+1), (2.2) s∈Z

16 s such that in the support of Ks we have |xk − xl| ≤ 2 for all l, k. We also define

mds p0 [K]p := sup 2 sup kKs(y, · + y, ... , · + y)kLp(Rmd)+ s∈Z y∈Rd  kKs(· + y, y, · + y, ... , · + y)kLp(Rmd) + ...

(the last ‘...’ indicates the other possible symmetric terms), and require that

[K]p < ∞. (2.3)

This is an abstract analogue of the basic size estimate for multilinear Calder´on-Zygmund kernels, see [36] and Section 2.2 below. We will refer to conditions (2.2) and (2.3) as the (S) (single-scale) properties.

Since the operators under consideration are m-linear, we will have to deal with (m+1)-linear forms of type Z T (f1, ..., fm)(x)fm+1(xm+1) dxm+1. Rn

ν We define Λµ(f1, ..., fm+1) to be the form

Z X Ks(x1, ..., xm+1)f1(x1)...fm+1(xm+1) dx1...dxm+1, (m+1)d R µ 0 and ν < ∞. Finally, we will assume the following uniform estimate on the truncations: given 1 + ... + 1 = 1 as above, r1 rm α

ν C (r , ..., r , α) := sup kΛ k r 0 < ∞. T 1 m µ L 1 ×...×Lrm ×Lα →C

Here the supremum is taken over all finite truncations. From now on the truncation bounds

µ, ν will be omitted from the notation unless explicitly needed, and we will write CT in place of

CT (r1, ..., rm, α).

17 2.1.1 Remark on Truncations

We assume below that our operator is truncated to finitely many scales, and prove estimates that are uniform in the number of truncations. The justification for this assumption is sketched below. The argument is a straightforward generalization of a result that can be found in [65]

Ch. 1, section 7.2.

 Let T denote our operator truncated at scales larger than  and smaller than 1/. Fix fi ∈

r L i with compact support, 1 ≤ i ≤ m. By the uniform bound assumption on CT , after passing

 α to a subsequence we can assume T (f1, ..., fm) converges weakly in L to some T0(f1, ..., fm). It is clear from the weak convergence that T0 must be multilinear. We claim that if Qi is a cube

n in R then

(T − T0)(f11Q1 , ..., fm1Qm )(xm+1) (2.4)

= 1Q1 (xm+1)...1Qm (xm+1)(T − T0)(f1, ..., fm)(xm+1) a.e.

In fact, the support restriction on the kernel of T − T  shows that the integral vanishes if xm+1 ∈/ Qi for any i, provided  is small enough. Then the identity follows from the weak convergence of T . Using multilinearity we can extend (2.4) to simple functions, and then using

ri j the boundedness of T we see that (2.4) holds with gi ∈ L in place of 1Qi . In particular, let E d r be an increasing sequence of open sets that exhaust R , and suppose fi ∈ L i with support in

Eji . Then we must have

(T − T0)(f1, ..., fm)(xm+1)

= f1(xm+1)...fm(xm+1)(T − T0)(1Ej1 , ..., 1Ejm )(xm+1) a.e.

It follows that T differs from the limit T0 by a multiplication operator, provided

(T − T0)(1Ej1 , ..., 1Ejm )(xm+1) =: φ(xm+1)

18 forms a coherent function with respect to the Eji . This is clear as in the linear case. Moreover,

∞ φ ∈ L since there exists cT > 0 such that

k(T − T0)(f1, ..., fm)kLα cT > sup r r r fi∈L i kf1kL 1 ...kfmkL m cpt. supp

kφ · f1...fmkLα = sup ≥ kφkL∞ . r r r fi∈L i kf1kL 1 ...kfmkL m cpt. supp

Therefore, as long as we can prove admissible bounds for multiplication operators of the form

∞ Aφ(f1, ..., fm) = φ · f1...fm, with φ ∈ L , we are justified in working with a finite (but otherwise arbitrary) number of scales.

2.1.2 The Abstract Theorem

Assume we are given some stopping collection of cubes P with top Q. We will use the space

Yp = Yp(P) from [19], with norm k · kYp defined by

    S max h1Rd\ L , sup inf Mph(x) if p < ∞ L∈P ∞ L∈P x∈Lˆ khkYp := (2.5)  khk∞ if p = ∞

ˆ 5 (recall from above that L is the 2 -fold dilate of L). We let kbkXp denote the Yp-norm of b P ˙ R when b = L∈P bL with bL supported on L ∈ P, and use Xp to signal that bL = 0 for each L. Observe that these norms are increasing in p, a property we use many times below. Also recall that if h ∈ Yp there is a natural Calder´on-Zygmund decomposition of h associated to the stopping collection P: we can split h = g + b, where

X  1 Z  b = h − h 1 (2.6) |L| L L∈P L and

kgk ≤ 25dkhk , b ∈ X˙ , kbk ≤ 25d+1khk . Y∞ Yp p X˙p Yp

19 Given a cube L, define

min(sL,top trunc) ΛL(f1, ..., fm, fm+1) := Λ (f11L, f2, ..., fm, fm+1)

min(sL,top trunc) = Λ (f11L, f213L, ..., fm13L, fm+113L), meaning the truncation from above never exceeds the scale of L. The second equality follows from the support condition on the kernel imposed by (2.2). For simplicity we often write ΛsL to indicate that all truncations are at or below level sL. We will work with

X ΛP (f1, ..., fm, fm+1) := ΛQ(f1, ..., fm+1) − ΛL(f1, ..., fm+1) (2.7) L∈P L⊂Q

(as above, Q is the top cube of the stopping collection P). Note that ΛP is not symmetric in all of its arguments, due to some lack of symmetry in definitions. However, we will see below that

ΛP is ‘almost’ symmetric, meaning it is symmetric up to a controllable error term.

We can now state the abstract sparse domination theorem.

Theorem 2.2. Let T be an m-linear operator with kernel K as above, such that K can be decomposed as in (2.2) and CT < ∞. Also let Λ be the (m + 1)-linear form associated to

T . Assume there exist 1 ≤ p1, ..., pm, pm+1 ≤ ∞ and some positive constant CL such that the following estimates hold uniformly over all finite truncations, all dyadic lattices D, and all stopping collections P:

|ΛP (b, g2, g3, ..., gm+1)| ≤ CL|Q|kbk ˙ kg2kYp kg3kYp ...kgm+1kYp Xp1 2 3 m+1

|ΛP (g1, b, g3, ..., gm+1)| ≤ CL|Q|kg1kY∞ kbk ˙ kg3kYp ...kgm+1kYp (2.8) Xp2 3 m+1

|ΛP (g1, g2, b, g4, ..., gm+1)| ≤ CL|Q|kg1kY∞ kg2kY∞ kbk ˙ kg4kYp ...kgm+1kYp Xp3 4 m+1 . .

|ΛP (g1, g2, ..., gm, b)| ≤ CL|Q|kg1kY∞ kg2kY∞ ...kgmkY∞ kbk ˙ . Xpm+1

Also let ~p = (p1, ..., pm+1). Then there is some constant cd depending on the dimension d such

20 that

ν sup |Λµ(f1, ..., fm, fm+1)| ≤ cd [CT + CL] sup PSFS;~p(f1, ..., fm, fm+1) µ,ν S

p d for all fj ∈ L j (R ) with compact support, where the supremum is taken with respect to all sparse collections S with some fixed sparsity constant that depends only on d, m.

2.1.3 Some Remarks on Theorem 2.2

We will prove in Section 2.5.1 that the multiplication operators Aφ considered above in Remark 2.1 satisfy admissible PSF bounds. Therefore Theorem 2.2 implies

~p |Λ(f1, ..., fm, fm+1)| ≤ cd [CT + CL] sup PSFS (f1, ..., fm, fm+1) S

∞ d q d q d when fj ∈ L (R ) with compact support. If Λ extends boundedly to L 1 (R ) × ... × L m+1 (R ), q d we can use standard density arguments to lift the PSF bound to the case where fi ∈ L i (R ).

We also provide some more motivation for the estimates (2.8). Let P be a stopping collection P of dyadic cubes with top Q and suppose b = L∈P bL with bL supported in L. Also assume g1, g2, ..., gm+1 are functions supported in 3Q. If we fix L ∈ P with scale sL, then by definition

ΛP (bL, g2, ..., gm+1) splits as

sQ sL ΛP (bL, g2, ..., gm+1) = Λ (bL, g2, ..., gm+1) − Λ (bL, g213L, ..., gm+113L).

Now let s be a fixed scale with s ≤ sL. Then the piece of ΛP (bL, g2, ..., gm+1) corresponding to this scale can be written as

Z KsbL(x1)(g2(x2)...gm+1(xm+1) − g213L(x2)...gm+113L(xm+1)) d~x R(m+1)d Z = KsbL (g213L...gm+113L − g213L...gm+113L) d~x R(m+1)d = 0.

21 s s Here we’ve used the truncation of the kernel: since x1 ∈ L and |x1 − xi| ≤ 2 ≤ 2 L for any i, we must have xi ∈ 3L for each i. Therefore all scales s entering into ΛP (bL, g2, ..., gm+1) satisfy s > sL, and as a consequence we have the decomposition

Z X ΛP (bL, g2, ..., gm+1) = KsL+lbL(x1)g2(x2)...gm+1(xm+1)d~x. (m+1)d R l≥1

Summing over L ∈ P then gives

X Z X ΛP (b, g2, ..., gm+1) = KsL+lbL(x1)g2(x2)...gm+1(xm+1)d~x (2.9) (m+1)d L∈P R l≥1 Z X X = Ksbs−l(x1)g2(x2)...gm+1(xm+1)d~x, (m+1)d R s l≥1 where X bs−l = bL. L∈P sL=s−l The representation (2.9) often enables one to verify the estimates (2.8) in practice. We will see this in the case of multilinear Calder´on-Zygmund operators below in Section 2.2. In the

Appendix we prove the following result about the adjoint forms:

Proposition 2.1. Let b, g1, ..., gm+1 be defined as above, fix 1 < p ≤ ∞, and suppose [K]p0 < ∞. If we set

in X b = bL, L∈P 3L∩2Q6=∅ then

ΛP (g1, b, g3, ..., gm+1) Z X X in = Ksbs−l(x2)g1(x1)g3(x3)...gm+1(xm+1)d~x + φ, (m+1)d R s l≥1

22 where φ is an error term that satisfies the estimate

0 |φ| . [K]p |Q|kbkX1 kg1kYp kg3kYp ...kgm+1kYp .

An analogous decomposition holds for each of the other forms

ΛP (g1, g2, b, g4, ..., gm+1), ..., ΛP (g1, ..., gm−1, b, gm+1), ΛP (g1, ..., gm, b), with error terms satisfying the same bounds (with obvious changes in indicies).

In practice the error term φ does not cause any issues. This quantifies the ‘almost symmetry’ of ΛP mentioned above.

The proof of Theorem 2.2 is similar to the proof of the abstract Theorem C in [19], so we postpone the argument to Section 2.5 below.

2.2 Multilinear Calder´on-Zygmund Operators

In this section we use Theorem 2.2 to prove sparse bounds for multilinear Calder´on-Zygmund

(CZ) operators that satisfy a certain smoothness conditon (defined below). For the theory of these operators see [33] or [36]. In the process we prove an estimate which is necessary for the proof of Theorem 2.1 given in Section 2.3 below. The reader who who wishes to skip to the proof of Theorem 2.1 only needs to be familiar with the statement of Proposition 2.4 and the list of inequalities immediately following its proof.

Fix an m-linear CZ operator T with kernel K. We have to suitably decompose the kernel to verify the initial assumptions of Theorem 2.2. Recall that we have the basic size estimate

CK |K(x1, ..., xm, xm+1)| ≤ md , (max |xi − xj|) i6=j where 1 ≤ i, j ≤ m + 1. This estimate motivates the following decomposition. Let ~x =

23 d (x1, ..., xm, xm+1) with xi ∈ R , and define

md M~x = (max |xi − xj|) i6=j and

(m+1)d Mij = {~x ∈ R : M~x = |xi − xj|}.

∗ (m+1)d Note that there are m(m − 1)/2 such sets Mij. Also define Mij be the open subset of R where 1 |x − x | > |x − x | i j 2 k l

ij (m+1)d for all k, l. Let ψ be a smooth partition of unity of R \{0} subordinated to the open ∗ ij ij cover {Mij}, such that ψij = 1 on Mij. Let K = ψ K, so that our operator splits as

Z X ij T (f1, ..., fm)(xm+1) = K (x1, ..., xm, xm+1)f1(x1)...fm(xm)d~x. md i,j R

Now choose the integer l so that 2l−1 < m(m − 1)/2 ≤ 2l and set

m m (m+1)d s−l−1 X X s−1 As = {~x ∈ R : 2 < |xi − xj| ≤ 2 }. i=1 j=i+1

P −s Let φ be a smooth radial function supported in A0 such that s φ(2 ~x) = 1 for ~x 6= 0, and −s write φs(~x) = φ(2 ~x). Finally, let

ij ij Ks (~x) = ψ (~x)φs(~x)K(~x).

s Notice that if ~x ∈ Mij ∩ As, then |xi − xj| ∼ 2 . This leads to the decomposition

ij X ij K (x1, ..., xm, xm+1) = Ks (x1, ..., xm, xm+1) s X ij = K (x1, ..., xm, xm+1)φs(~x). s

24 s ∗ s Observe that if |xi − xj| ∼ 2 in the region Mij then necessarily |xk − xl| ≤ 2 for all other pairs, ∗ since |xk − xl| ≤ 2|xi − xj| in Mij. Given 1 ≤ p < ∞ we can easily prove the single-scale size estimate

 1/p Z − mds ij p p0 |Ks (~x)| . 2 (2.10) Rmd for each i, j. This bound holds uniformly in the free variable. One also has

ij −mds kKs kL∞(Rmd) < 2 ,

with a similar interpretation for the free variable. Therefore [K]p < ∞ in this setting, for 1 ≤ p ≤

∞. We are also free to assume that T satisfies the uniform truncation bound CT (r1, ..., rm, α) < ∞ for some collection of exponents with 1 + ... + 1 = 1 (see [33], [36]). r1 rm α

Assume for the rest of the section that P is a fixed stopping collection of cubes with top

Q. We will work towards a proof of the estimates (2.8) needed to apply Theorem 2.2 in this context. We begin by proving a single-scale estimate in a slightly more abstract setting, which will be useful for the rest of the paper.

2.2.1 Single-Scale Estimates

P Suppose b = L∈P bL with bL supported in L. Also pick functions g1, ..., gm+1 supported in 3Q. The following lemma applies to all kernels K = P K satisfying the (S) properties from s∈Z s section 2.

Lemma 2.2. Suppose b, g1, g2, ..., gm+1 are defined as above and 1 < β ≤ ∞. Let T be an m- linear operator with kernel K, such that K satisfies the (S) properties with [K]β0 < ∞. For a

25 fixed l ≥ 1 we have

Z X |Ks| · |bs−l|(x1)|g2(x2)|...|gm+1(xm+1)| d~x (m+1)d R s

0 . [K]β |Q|kbkX1 kg2kYβ ...kgm+1kYβ .

Symmetric estimates also hold for the other tuples (g1, b, g3, ..., gm+1), ..., (g1, ..., gm, b).

Proof. From (2.9), we see that it is enough to show that if L ∈ P and s = sL + l, then

Z |Ks||bL(x1)||g2(x2)|...|gm+1(xm+1)|d~x R(m+1)d

0 . [K]β |L|kbkX1 kg2kYβ ...kgm+1kYβ .

We can then sum over L and use disjointness to complete the proof. Begin by fixing x1 ∈ L,

∞ with the goal of proving an Lx1 bound for

Z |Ks(x1, ..., xm+1)||g2(x2)|...|gm+1(xm+1)|dx2...dxm+1. Rmd

We change variables and set zi = xi − x1 for i = 2, ..., m + 1. Notice that the support of Ks

s implies |zi| . 2 for each i. Now for all such x1,

26 Z |Ks(x1, x2, ..., xm+1)||g2(x2)|...|gm+1(xm+1)|dz2...dzm+1. Rmd Z 1/β0 β0 ≤ |Ks(x1, z2 + x1, ..., zm+1 + x1)| dz2...dzm+1 Rmd m+1 Z !1/β Y β × |gi| dzi s+10 i=2 B(x1,2 ) m+1 Z !1/β − mds Y β ≤ 2 β [K]β0 · |gi| dzi s+10 i=2 B(x1,2 ) m+1 Y [K]β0 inf Mβ(gi)· . ˆ i=2 L

0 . [K]β kg2kYβ ...kgm+1kYβ .

s+10 Note that s > sL and x1 ∈ L, so Lˆ is contained in B(x1, 2 ). This justifies adding the infinums in the fourth line. As a consequence, we get

Z |Ks||bL(x1)||g2(x2)|...|gm+1(xm+1)|d~x R(m+1)d

0 . [K]β |L|kbkX1 kg2kYβ ...kgm+1kYβ ,

since kbLkL1 . |L|kbkX1 . This completes the proof of the main estimate. The relevant estimates for the tuples (g1, b, g3, ..., gm+1), etc., can be proved in just the same way using the representa- tions from Proposition 2.1. Notice that the error term φ from this proposition has an acceptable contribution.

If Λ is the (m + 1)-linear form associated to an m-linear CZ operator T , then from (2.9) we see that

Z X X X ij ΛP (b, g2, ..., gm+1) = Ks bs−l(x1)g2(x2)...gm+1(xm+1)d~x. (2.11) (m+1)d R ij s l≥1

27 The number of pairs i, j is bounded by a constant depending only on m, so it is clear that the result of Lemma 2.2 applies to this operator as well (with a possibly different constant).

2.2.2 Cancellation Estimates

Let T be an m-linear Calder´on-Zygmund operator as defined above, such that

0  0 A|x1 − x | |K(x1, x2, ..., xm+1) − K(x , x2, ..., xm+1)| ≤ md+ (2.12) (max|xi − xj|) i6=j

0 1 for some 0 <  ≤ 1 whenever |x1 − x | ≤ m+1 (|x2 − x1| + ... + |xm+1 − x1|), and suppose that symmetric estimates hold with respect to the other variables xi (with the same constant). If CK is the constant from the basic Calder´on-Zygmund size estimate, we also assume that CK ≤ A. We then call (2.12) the ‘-smoothness’ property of T , and A the ‘smoothness’ constant. We will use (2.12) to sum over l in the context of Lemma 2.2, and as a result prove the required estimates

(2.8) for the sparse bound. It is possible that this smoothness condition may be relaxed, but it is all we need below for the proof of Theorem 2.1.

P Suppose b = L∈P bL and g1, ..., gm+1 are as in the last section, with the additional property R that bL = 0 for each L. We will use the cancellation of b and (2.12) to estimate

ΛP (b, g2, ..., gm+1), ..., ΛP (g1, ..., gm, b).

Before proceeding, we need to prove a technical lemma related to the truncation.

0 s−l Lemma 2.3. Suppose |x1 − x | ≤ 2 for some integer l ≥ 1, and let

ij ij ψs (x1, ..., xm+1) = ψ (x1, ..., xm+1)φs(x1, ..., xm+1).

28 Then

X ij K(x1, x2, ..., xm+1)ψs (x1, x2, ..., xm+1) i,j

0 ij 0 − K(x , x2, ..., xm+1)ψs (x , x2, ..., xm+1)

X 0 1 ∗ (~x)φ (x , ..., x ) K(x , ..., x ) − K(x , ..., x ) (2.13) . Mij s 1 m+1 1 m+1 m+1 i,j

−l 0 + 2 k∇φ0k∞|K(x , ..., xm+1)|.

Symmetric estimates also hold for the other variables under suitable assumptions.

0 Proof. We will prove the lemma for the pair x1, x . The proof for the other variables is identical.

0 For notational simplicity, below we will write K = K(x1, ..., xm+1) and Ke = K(x , x2, ..., xm+1). ij After adding and subtracting Ke · ψs to each term of the sum, we can estimate (2.13) by

X ij X ij 0 ij |K − Ke|ψ (~x) + (ψ (x , x2, ...xm+1) − ψ (x1, ..., xm+1)) · |Ke|. s s s ij ij

So we have to show that

X ij 0 ij −l 0 (ψ (x , x2, ...xm+1) − ψ (x1, ..., xm+1)) 2 kφ k∞ s s . 0 ij

0 s−l when |x1 − x | ≤ 2 . Note that under this hypothesis

0 −s 0 |φs(x1, x2, ..., xm+1) − φs(x , x2, ..., xm+1)| . k∇φ0k∞2 |x − x1|

−l . 2 k∇φ0k∞,

by the scaling and smoothness of φs. Summing over i, j and applying this estimate proves the lemma.

We can now prove the main result of the section.

29 Proposition 2.4. Let K be an m-linear Calder´on-Zygmundkernel such that the -smoothness condition (2.12) holds with constant A. Fix 1 < p ≤ ∞. Then

Z X X X ij Ks · bs−l(x1)g2(x2)...gm+1(xm+1)d~x (m+1)d ij l≥1 R s

A |Q|kbk kg k ...kg k . .  X˙1 2 Yp m+1 Yp

Analogous estimates also hold for Λ(g1, b, g3, ..., gm+1), ..., Λ(g1, ..., gm, b).

Proof. By (2.9), it suffices to prove

Z X X ij Ks +lbL(x1)g2(x2)...gm+1(xm1 )d~x (2.14) (m+1)d L l≥cm R i,j A |L|kbk kg k ...kg k , .  X˙1 2 Yp m+1 Yp

where cm is some constant depending on the multilinearity parameter m, to be determined below. We can then use disjointness of the collection to sum over L, and finitely many applications of

0 Lemma 2.2 to handle the cases where l < cm. Let x = cL, the center of L. Note that if

0 s−l s = sL + l then |x1 − x | ≤ 2 since x1 ∈ L. We use the mean-zero condition on bL to replace P Kij (x , x , ..., x ) by i,j sL+l 1 2 m+1

X   Kij (x , x , ..., x ) − Kij (x0, x , ..., x ) , (2.15) sL+l 1 2 m+1 sL+l 2 m+1 i,j and then estimate (2.15) using Lemma 2.3. To complete the proof, we can argue as in the proof

ij of Lemma 2.2, using the cancellation estimate (2.12) in place of the basic size estimate for |Ks |

(with s = sL + l).

We now sketch the rest of the proof. Set F = K − Ke and observe that the first term from

30 the estimate in (2.13) contributes

X Z 1 ∗ (~x)φ (~x)|F (x , ..., x )b (x )g (x )...g (x )|d~x. Mij s 1 m+1 L 1 2 2 m+1 m+1 (m+1)d ij R

∞ We now fix x1 ∈ L, with the goal of proving an Lx1 bound for

Z |F (x1, ..., xm+1)||g2(x2)|...|gm+1(xm+1)|dx2...dxm+1. ij suppKs

Once again make the change of variables zi = xi − x1 for i = 2, ..., m + 1, so in particular

s |zi| . 2 . Then we proceed as in Lemma 2.2, with F (x1, x1 + z2, ..., x1 + z2) in place of the Ks term. The only change is in the estimate involving the kernel term. Since we are working at

∗ scale ∼ s, in any region Mij there is some uniform constant αm depending only on m such that

s−αm s−αm−1 s−αm−1 |xi − xj| > 2 . Therefore we must have either |x1 − xi| ≥ 2 or |x1 − xj| ≥ 2 .

0 s−l It follows that we can choose some uniform cm such that if l ≥ cm and |x1 − x | ≤ 2 , then

1 |x − x0| ≤ (|z | + ... + |z |). 1 m + 1 2 m+1

Hence we can apply the cancellation estimate (2.12) for |F (x1, x1 +z2, ..., x1 +zm+1)|. A straight- forward calculation shows that for this choice of s and any i, j and x1 ∈ L, we have

1/p0 Z 0 p0 ! |x1 − x | mds −l − p 0 0 dx2...dxm+1 2 2 . (2.16) ij mdp +p . supp Ks (x1, ·) (|xi − xj|)

This allows us to proceed as in the proof of Lemma 2.2, with an addition of the 2−l term. The second term from Lemma 2.3 can be handled using the same methods from Lemma 2.2. Since the number of pairs i, j is bounded by some cm, we ultimately see that the left side of (2.14) is

31 bounded by

X c (2−l + 2−lk∇φ k )A |L|kbk kg k ...kg k m 0 ∞  X˙1 2 Yp m+1 Yp l≥1

A |L|kbk kg k ...kg k , .  X˙1 2 Yp m+1 Yp and summing over L proves the desired estimate. To prove the remaining part of the proposition, use the representations from Proposition 2.1 and repeat the argument just given with respect to the new variables.

As a consequence of the preceding proposition and (2.9), we see that

|Λ (b, g , g , ..., g )| ≤ A |Q|kbk kg k kg k ...kg k P 2 3 m+1  X˙1 2 Yp 3 Yp m+1 Yp |Λ (g , b, g , ..., g )| ≤ A |Q|kg k kbk kg k ...kg k P 1 3 m+1  1 Yp X˙1 3 Yp m+1 Yp |Λ (g , g , b, g , ..., g )| ≤ A |Q|kg k kg k kbk kg k ...kg k P 1 2 4 m+1  1 Yp 2 Yp X˙1 4 Yp m+1 Yp . .

|Λ (g , g , ..., g , b)| ≤ A |Q|kg k kg k ...kg k kbk , P 1 2 m  1 Yp 2 Yp m Yp X˙1

where A is the smoothness constant from (2.12). These estimates hold uniformly over finite truncations and stopping collections. We can therefore apply Theorem 2.2 to prove the following.

Theorem 2.3. Let T be an m-linear Calder´on-Zygmund operator satisfying the -smoothness

r d rm d α d condition (2.12) with constant A. Suppose T is bounded from L 1 (R )×...×L (R ) → L (R ), such that CT (r1, ..., rm, α) < ∞. Also fix 1 < p ≤ ∞. If Λ is the (m + 1)-linear form associated

1 d p d to T , then for any f1, ..., fm+1 with f1 ∈ L (R ) and fi ∈ L (R ) for i = 2, ..., m + 1, we have

(1,p,...,p) |Λ(f1, f2, ..., fm+1)| ≤ cd [CT + A] sup PSFS (f1, f2, ..., fm+1). S

As we mentioned in the introduction, sparse domination results for multilinear Calder´on-Zygmund operators have been known for a few years. The main novelty here is that we do not appeal

32 to local mean oscillation [55] or maximal truncation estimates [57]. We obtain a subset of the known weighted estimates for multilinear Calder´on-Zygmund operators as an easy corollary of

Theorem 2.3. See [9] and [55] for some examples.

Remark. The same sparse domination result will hold if we assume an abstract Dini condition for the kernel, as in Lemma 3.2 in [19]. Given 1 < p ≤ ∞, let

Γp(h) = (kKs(x, x + ·, ..., x + ·) − Ks(x + h, x + ·, ..., x + ·)kp + ...) .

The last ‘...’ before the end of the parenthesis indicates the other possible symmetric terms.

Now define sdm 0 ωj,p(K) := sup 2 p sup sup Γp(h). d d s∈Z x∈R h∈R s−j−c khk∞<2 m

Then if [K]p < ∞ and ∞ X [K]1,p := ωj,p(K) < ∞, j=1 we can prove an analogue of Proposition 2.4 and show that the assumptions of Theorem 2.2 are satisfied. The argument is similar to the proof of Lemma 3.2 in [19]. However, note that we would have to do essentially the same amount of work as above to check that this condition is satisfied by the -smooth Calder´on-Zygmund operators considered in this section.

2.3 Rough Bilinear Singular Integrals

In this section we use our sparse domination theorem to prove Theorem 2.1. Recall that

Z Z Ω((y1, y2)/|(y1, y2)|) TΩ(f1, f2)(x) = p.v. f1(x − y1)f2(x − y2) 2d dy1dy2, Rd Rd |(y1, y2)|

∞ 2d−1 R with Ω ∈ L (S ) and S2d−1 Ω = 0. We will use the results from the paper [34] by Grafakos,

He, and Honz´ık.We quickly review their initial decomposition of the kernel K of TΩ. Let {βj}j∈Z

33 2d j−1 j+1 be a smooth partition of unity on R \{0}, with βj adapted to the annulus {2 < |z| < 2 }. j Let ∆j denote the standard Littlewood-Paley operator localizing frequencies at scale 2 . Then define Ki = ∆ β K and decompose K = P K , with j j−i i j∈Z j

X i Kj = Kj. i∈Z

j Write T for the operator associated to the kernel Kj.

Proposition 2.5 (Prop. 3 and 5 in [34]). There exists 0 < δ < 1 so that

−δj kTj(f1, f2)kL1 ≤ C2 kΩk∞kf1kL2 kf2kL2 when j ≥ 0, and

−(1−δ)|j| kTj(f1, f2)kL1 ≤ C2 kΩk∞kf1kL2 kf2kL2 when j < 0.

Proposition 2.6 (Lem. 10 in [34]). Fix any j ∈ Z and 0 < η < 1. The kernel of Tj is a bilinear Calder´on-Zygmundkernel that satisfies the η-smoothness condition (2.12) with constant

|j|η Aj,η ≤ Cd,ηkΩk∞2 .

By Proposition 2.6 we can apply Proposition 2.4 at each scale j, with resulting constant Aj,η ≤

|j|η Cd,ηkΩk∞2 . The η can be arbitrarily small, so we will be able to interpolate with the bound from Proposition 2.5 and then sum over j to get the required estimates for the sparse-form bound of TΩ.

2.3.1 Interpolation Lemmas

Assume below that a stopping collection P with top Q has been fixed. The following lemmas allow us to interpolate bounds between various X·(P) and Y·(P) spaces in the trilinear setting. A more general interpolation theorem for these spaces should be available, but we only prove two particular results needed for the proof of Theorem 2.1.

34 1 Lemma 2.7. Fix any 0 <  < 8 and suppose p = 1 + 4. Let Λ be a (sub)-trilinear form such that

|Λ(b, g, h)| ≤ A kbk kgk khk 1 X˙1 Yp Yp and

|Λ(b, g, h)| ≤ A kbk kgk khk . 2 X˙2 Y2 Y∞

Then if q = p + 4 and q < r < ∞, we have

|Λ(f , f , f )| ≤ (A )1−(A )kf k kf k kf k . 1 2 3 1 2 1 X˙q 2 Yq 3 Yr

Proof. We can assume A2 < A1. We also make the normalizations A1 = 1 and

kf k = kf k = kf k = 1. 1 X˙q 2 Yq 3 Yr

It will now be enough to prove that

 |Λ(f1, f2, f3)| . A2.

Pick λ ≥ 1 and define f>λ = f1|f|>λ. We decompose f1 = b1 + g1, where

X  1 Z  b := (f ) − (f ) 1 . 1 1 >λ |R| 1 >λ R R∈P R

For i = 2, 3 we also decompose fi = bi +gi with bi := (fi)>λ. Using the normalization assumption and the definition of the Y spaces, it is straightforward to verify the following estimates:

35 1− q 1−q kg k 1, kg k λ 2 , kb k λ (2.17) 1 X˙p . 1 X˙2 . 1 X˙1 . q 1− q 1− 2 p kg2kY2 . λ , kb2kYp . λ

r 1− p kg3kY∞ . λ, kb3kYp . λ .

Note that g1 ∈ X˙2 since f1 and b1 are supported on cubes in P, with mean zero on each cube.

We now estimate |Λ(f1, f2, f3)| by the sum of eight terms

|Λ(g1, g2, g3)| + |Λ(b1, g2, g3)| + |Λ(g1, b2, g3)| + |Λ(g1, g2, b3)|

+ |Λ(b1, b2, g3)| + |Λ(b1, g2, b3)| + |Λ(g1, b2, b3)| + |Λ(b1, b2, b3)|.

Applying the estimates in (2.17), we get

|Λ(g , g , g )| A kg k kg k kg k A λ3−q. 1 2 3 . 2 1 X˙2 2 Y2 3 Y∞ . 2

For the remaining seven terms, we use the first assumption |Λ(b, g, h)| ≤ kbk kgk khk , along X˙1 Yp Yp with suitable estimates from (2.17). Here it is crucial that g1, b1 ∈ X˙2. Ultimately we conclude

q q q 1−q 1− 1− r 1−q 1− 1−q 1− r 1− 1− r |Λ(f1, f2, f3)| . λ +λ p + λ p + λ λ p + λ λ p + λ p λ p q 1−q 1− 1− r 3−q + λ λ p λ p + A2λ .

q q r Let α = 1 − p and notice that −4 < α < − and α = max(1 − q, 1 − p , 1 − p ). We can simplify

36 the previous estimate to get

α 2α 3α 3−q |Λ(f1, f2, f3)| . 3λ + 3λ + λ + A2λ

α 3−q−α . λ 7 + A2λ

− 3−q+4 . λ 7 + A2λ .

−3+q−4 We set λ = A2 . Since A2 < 1 and q < 2 this is admissible and we conclude that

(−3+q−4)(−) 3+42−q  |Λ(f1, f2, f3)| . A2 . A2 . A2.

This completes the proof.

1 Lemma 2.8. Fix any 0 <  < 8 and set p = 1 + 4. Let Λ be a (sub)-trilinear form such that

|Λ(g, h, b)| ≤ A kgk khk kbk 1 Yp Yp X˙1 and

|Λ(g, h, b)| ≤ A2kgkY2 khkY2 kbkY∞ .

Then if q = p + 4 and q < r < ∞, we have

|Λ(f , f , f )| ≤ (A )1−(A )kf k kf k kf k . 1 2 3 1 2 1 Yq 2 Yq 3 X˙r

Proof. The argument is almost identical to the proof of Lemma 2.7. In this case we decompose f3 = g3 + b3 with X  1 Z  b = (f ) − (f ) 1 . 3 3 >λ |R| 3 >λ R R∈P R

For i = 1, 2 we let fi = gi + bi with gi = (fi)≤λ, and then proceed as in Lemma 2.7.

37 2.3.2 The Sparse Bound

We can now prove Theorem 2.1. Let TΩ be a rough bilinear operator defined as above, with

∞ 2d−1 Ω ∈ L (S ). Such operators TΩ satisfy the standard Calder´on-Zygmund size estimate, so the single-scale properties (S) hold. The proof that CTΩ < ∞ for some tuple of exponents is also straightforward. In particular, inspection of the proofs of Propositions 2.5 and 2.6 in [34] shows that the same estimates hold for Tj if one replaces the kernel of Tj by

∞ X i Kej := ai · Kj, i=−∞ with ai ∈ {0, 1} but otherwise arbitrary. Moreover, these estimates are uniform over the choice of ai. Any truncation of the kernel of T leads to a special case of this small modification, and therefore the L2 × L2 → L1 estimate of Proposition 2.5 still holds. Likewise, the Calder´on-

Zygmund smoothness estimate from Proposition 2.6 holds uniformly over the choice of ai. This is once again obvious from the proof in [34]. It follows that CTΩ (2, 2, 1) < ∞. We must now verify the estimates (2.8) for (q1, q2, q3) = (r, r, r), where r > 1.

Fix 0 < η < 1. Since T j is a bilinear Calder´on-Zygmund kernel satisfying the η-smoothness

|j|η condition (2.12) with constant A ≤ Cd,ηkΩk∞2 , we see from Proposition 2.4 that

|Λj(b, g, h)| ≤ C kΩk 2|j|η|Q|kbk kgk khk d,η,p ∞ X˙1 Yp Yp for any 1 < p ≤ ∞. From Proposition 2.5 we also have

|Λj(b, g, h)| 2−c|j|kΩk |Q|kbk kgk khk , . ∞ X˙2 Y2 Y∞ where c is some positive constant independent of j. Interpolating via Lemma 2.7, we find that

38 1 for any 0 <  < 8 there are 1 < q < r so that

|Λj(b, g, h)| (2η|j|)1−(2−c|j|)|Q|kΩk kbk kgk khk .,η ∞ X˙q Yq Yr 2η|j|2−η|j|2−c|j||Q|kΩk kbk kgk khk . .,η ∞ X˙q Yq Yr

c η 1 Suppose we have chosen η so that η < 8 . If we let  = c then 0 <  < 8 , and therefore

|Λj(b, g, h)| 2−η|j||Q|kΩk kbk kgk khk . .,η ∞ X˙q Yq Yr

This is summable over j ∈ Z. Hence if ΛP is the form associated to TΩ, truncated to some finite number of scales, we can conclude that

|Λ (b, g, h)| |Q|kΩk kbk kgk khk . (2.18) P . ∞ X˙q Yq Yr

By symmetry, the same argument also yields

|Λ (g, b, h)| |Q|kΩk kgk kbk khk . (2.19) P . ∞ Yq X˙q Yr

These estimates are uniform over all finite truncations and stopping collections. Finally, we argue as above and interpolate using Lemma 2.8 to prove

|Λ (g, h, b)| |Q|kΩk kgk khk kbk , (2.20) P . ∞ Yq Yq X˙r which is once again uniform over truncations and stopping collections.

The estimates (2.18), (2.19), (2.20) show that the form associated to a rough bilinear operator

TΩ satisfies assumption (2.8) of Theorem 2.2 with tuple (r, r, r), since the Yq norm is increasing in q. It is clear that we can take any r > 1, so this completes the proof of Theorem 2.1.

39 2.4 Weighted Estimates

We now prove the weighted estimates claimed in Corollary 2.2 and Corollary 2.1. We assume that the reader is familiar with basic results from the theory of Ap weights, for example the openness property

0 [w] [w] when η = c [w]1−p (2.21) Ap−η . Ap d,p Ap

(see [32] or [65] for a proof). Below we always assume that Ω ∈ L∞(S2d−1) with mean zero.

2.4.1 Single Weight - Proof of Corollary 2.2

The argument is a combination of known results from the sparse domination theory. It suffices to prove the desired estimate for fixed p > 2. We can then apply the multilinear extrapolation theory due to Grafakos and Martell, in particular Theorem 2 in [35]. Fix 2 < p < ∞ and w ∈ Ap, with the goal of showing that

kTΩ(f, g)kLp/2(w) ≤ Cp,ΩcwkfkLp(w)kgkLp(w).

Define − 2 σ = w p−2 and notice that

2 p−2 w p σ p = 1.

0 p Let r = (p/2) = p−2 , and choose qi = 1+i for i > 0, such that qi < r and qi < p. By Theorem 2.1 and duality considerations, it will be enough to show that for any sparse collection S,

(q1,q2,q3) PSFS (f, g, h) .p cwkfkLp(w)kgkLp(w)khkLr(σ). (2.22)

40 We begin by quoting the estimate

max( p , p ) (1,1,q3) p−1 2 PSFS (g1, g2, g3) .p γw kg1kLp(w)kg2kLp(w)kg3kLr(σ) with 2− 2 1 − 1  1 Z 1  p  1 Z q3  q3 r − q −r γw = sup w p−1 σ 3 . Q |Q| Q |Q| Q

This can be proved using techniques from [24] or [55] (see, for example, Theorem 3 in [24] and its proof). Now write the term involving σ as

1 − 1 1  1 Z  q3 r 1 − w1+α , 1 + α = r . |Q| 1 − 1 Q q3 r

We can apply the reverse H¨olderinequality if we choose 3 correctly (as we are free to do), and after some straightforward calculation this leads to the estimate

2 γ c [w] p , w .p w,p Ap

where cw,p is a power of the constant appearing in the reverse H¨olderinequality. Hence

2 ·max( p , p ) (1,1,q3) p p−1 2 PSF (g , g , g ) c [w] kg k p kg k p kg k r , (2.23) S 1 2 3 .p w,p Ap 1 L (w) 2 L (w) 3 L (σ)

and cw,p can be computed explicitly in terms of [w]Ap (see, for example, Chapter 7.2 in [32]).

q1,q2,q3 We can lift (2.23) to a bound for PSFS (f, g, h) using an due to Di Plinio and Lerner ([26], Proposition 4.1):

d (f)1+,Q ≤ (f)1,Q + 2 (M1+f)1,Q. (2.24)

Recall also that q p−q p p kMqkL (w)→L (w) . [w]A p q

41 when p > q (see [8]). Using the openness of the At classes, we see that if 1, 2 are chosen properly then in fact  qi  1  p−qi 1−p kM k p p [w] (2.25) qi L (w)→L (w) . Ap

(q1,q2,q3) for i = 1, 2. Now apply (2.24) to the q1- and q2-averages occurring in the form PSFS (f, g, h) to get

(q1,q2,q3) (1,1,q3) 0 (1,1,q3) PSF (f, g, h) . PSF (f, g, h) + cwPSF (f, M1+2 g, h)

0 (1,1,q3) + cwPSF (M1+1 f, g, h)

0 (1,1,q3) + cwPSF (M1+1 f, M1+2 g, h),

0 with cw appearing from (2.24) and the openness estimate. Then (2.23) and (2.25) imply the claimed boundedness. If αw is the constant appearing in (2.23), we see as a consequence that

p p p/2 TΩ : L (w) × L (w) → L (w) with constant

 q1  1   q2  1  C (α + c0 ) · [w] p−q1 1−p [w] p−q2 1−p . w . w w Ap Ap

0 Note that cw can also be computed explicitly in terms of [w]Ap , after solving for the 1, 2 used above and using (2.21).

2.4.2 Multiple Weights

We prove Corollaries 2.1 and 2.3. Suppose v1, v2, v3 are strictly positive functions such that

3 1 Y qi vi = 1 i=1 for some q ≥ 1 with P3 1 = 1. Let ~p = (p , p , p ) be any tuple of exponents with 1 < p < q , i i=1 qi 1 2 3 i i and recall that 1 1 3 pi −  Z  pi qi Y 1 pi−qi [~v]A~p = sup vi , ~q |Q| Q i=1 Q

42 d with the supremum taken over cubes Q ⊂ R . We can argue as in the proof of Lemma 6.1 in [24] to prove that for any sparse collection S we have

n q o! 3 max i ~p qi−pi Y PSF (f1, f2, f3) ≤ c~p,~q[~v] ~p kfik qi S A L (vi) ~q i=1

when [~v] ~p is finite. Applying Theorem 2.1 yields the following result. A~q

Lemma 2.9. Let v1, v2, v3 and ~p,~q be as above. Also assume [~v] ~p < ∞. Then A~q

n q o 3 max i qi−pi Y |hTΩ(f1, f2), f3i| ≤ CΩ,~p,~q[~v] ~p kfik qi A L (vi) ~q i=1

q for fi ∈ L i (vi).

The reader can consult [24] for the explicit value of the constant. Corollary 2.3 immediately follows from this lemma by duality. We now show that Corollary 2.1 is also a consequence.

By using multilinear extrapolation techniques from [30], [35], it is enough to prove the claimed estimate of Corollary 2.1 when qi = 3 for each i = 1, 2, 3. Write ~3 = (3, 3, 3), and suppose ~p = (1 + , 1 + , 1 + ) for some small  > 0. Observe that if t > 0 is chosen properly, then there is some absolute C > 0 such that

1+t [vi ]A3 ≤ C[vi]A3 , i = 1, 2. (2.26)

This estimate can be proved using the openness property and reverse H¨olderestimates as in the last section; see [22], Section 3.7, for a more general version of this inequality. Now Lemma 2.9 implies that

3 3 3/2 −1/2 TΩ : L (v1) × L (v2) → L (v3 )

3 with operator norm bounded by C[~v] . Hence it will be enough to prove A~p ~3

1 1 1+t 3(1+t) 1+t 3(1+t) [~v] ~p ≤ [v ] · [v ] (2.27) A 1 A3 2 A3 ~3

43 2(1+) with 1 + t = 2− , since we can then apply (2.26) after choosing  correctly (as we are free to do). One can prove (2.27) by using H¨older’sinequality; we omit the details.

2.5 Proof of Theorem The Abstract Theorem

Here we prove our multilinear sparse domination theorem, and also resolve the technical issue related to multiplication operators discussed in Remarks 2.1 and 2.3 above.

2.5.1 Construction of the Sparse Collection

Suppose we are given a form Λ as in the statement of Theorem 2.2 with CT = CT (r1, ..., rm, α),

p d and Λ truncated to a finite (but otherwise arbitrary) number of scales. Given fi ∈ L i (R ) with compact support, we would like to construct a sparse collection of cubes S so that

m+1 X Y |Λ(f1, ..., fm+1)| ≤ cd [CL + CT ] |R| (fi)pi,R. R∈S i=1

We will generalize the iterative argument from [19]. The following lemma is crucial for the induction step.

Lemma 2.10. Let Q be a fixed dyadic cube and P a stopping collection with top Q. If the estimates (2.8) hold, then

|ΛsQ (h 1 , h 1 , ..., h 1 )| ≤ C |Q|kh k kh k ...kh k 1 Q 2 3Q m+1 3Q d 1 Yp1 2 Yp2 m+1 Ypm+1

X sL + |Λ (h11L, h213L, ..., hm+113L)|. L∈P L⊂Q

Proof. We can assume supph1 ⊂ Q and supphi ⊂ 3Q for 2 ≤ i ≤ m + 1. By definition of ΛP we

44 have

sQ Λ (h1, h2, ..., hm+1) = ΛP (h1, h2, ..., hm+1)

X sL + Λ (h11L, h213L, ..., hm+113L), L∈P L⊂Q so it is enough to estimate ΛP (h1, h2, ..., hm+1). For each j perform the Calder´on-Zygmund (CZ) decomposition (2.6) of hj with respect to P. Then hj = bj + gj with

X bj = bjL, bjL := (hj − (hj)L) 1L L∈P and

kgjkY∞ khjkYp , kbjk ˙ khjkYp . . j Xpj . j

We now decompose ΛP (h1, h2, ..., hm+1) using the CZ decomposition. If we let fhi (0) = gi and m+1 fhi (1) = bi, we see that the form breaks up into the 2 terms

X ΛP (fh1 (k1), fh2 (k2), ..., fhm+1 (km+1)). (2.28) 0≤k1,...,km+1≤1

From the definition of ΛP and the pairwise disjointness of elements of P, we see that finiteness of CT implies

|ΛP (g1, g2, ...gm+1)| ≤ CT kg1kLr1 ...kgmkLrm kgm+1kLα0 X + CT kg11LkLr1 ...kgm+113LkLα0 L∈P L⊂Q

≤ c C |Q|kg k kg k ...kg k d T 1 Yr1 2 Yr2 m+1 Yα0 ≤ c C |Q|kh k kh k ...kh k . d T 1 Yp1 2 Yp2 m+1 Ypm+1

45 Here we’ve used 1 1 1 1 1 + ... + + 0 = + 0 = 1, r1 rm α α α

the disjointness of L ∈ P, and the kgjkY∞ bound. We can now use (2.8) and the CZ estimates to control the remaining terms by the desired quantity. For example,

|ΛP (b1, g2, ..., gm+1)| ≤ CL|Q|kb1k ˙ kg2kYp ...kgm+1kYp Xp1 2 m+1 ≤ c C |Q|kh k kh k ...kh k d L 1 Yp1 2 Yp2 m+1 Ypm+1 by the first estimate in (2.8) and the CZ properties. The other estimates are similar. We use the last estimate in (2.8) to control the single term ΛP (g1, g2, ..., gm, bm+1):

|ΛP (g1, g2, ..., gm, bm+1)| ≤ CL|Q|kg1kY∞ kg2kY∞ ...kgmkY∞ kbm+1k ˙ Xpm+1 ≤ c C |Q|kh k kh k ...kh k kh k . d L 1 Yp1 2 Yp2 m Ypm m+1 Ypm+1

To bound the remaining terms, we choose from the first m inequalities in (2.8). If there are more than one bi terms in the particular piece of (2.28) we wish to control, we let bn be the first mean-zero term that appears (ordered from left to right) and then apply the n-th estimate in the list (2.8), ordered from top to bottom. Then applying the CZ properties as above yields the desired estimates.

The construction of the collection S now proceeds as in [19], with minor changes to account for the addition of more functions. For this reason we sketch the proof here, and send the reader to [19] for the finer details.

We will construct the required sparse collection S iteratively by decomposing exceptional sets of the form   Mpi (fi13Q)(x) EQ := x ∈ 3Q : max ≥ Cd , (2.29) i=1,...,m+1 (fi)pi,3Q

46 where Cd is some constant depending on the dimension. Notice that if Cd is chosen large enough, then by the maximal theorem we can assume

−cd |EQ| ≤ 2 |Q| (2.30) for some uniform c > 0.

p d We begin the argument by fixing fi ∈ L j (R ) with compact support, i = 1, ..., m + 1. We may assume we have chosen our dyadic lattice D so that there is Q0 ∈ D with supp(f1)⊂ Q0 and supp(fi)⊂ 3Q0 for i = 2, ..., m+1, such that sQ0 is bigger than the largest scale occurring in the truncation of Λ. We let S0 = {Q0} and E0 = 3Q0, and then define (using definition (2.29))

E1 := EQ0 ,S1 := maximal cubes L ∈ D such that 9L ⊂ E1.

It is easy to verify that P1(Q0) := S1 is a stopping collection with top Q0, such that |Q0\E1| ≥

−cd (1 − 2 )|Q0|. By maximality and the definition of the exceptional set EQ0 we have

kfjk ≤ C (fi)p ,3Q , i = 1, ..., m + 1, Ypj (P1(Q0)) d i 0 so applying Lemma 2.10 yields

sQ0 |Λ(f1, f2, ..., fm+1)| = |Λ (f11Q0 , f213Q0 , ..., fm+113Q0 )| m+1 Y ≤ Cd|Q0| (fi)pi,3Q0 i=1

X sL + |Λ (f11L, f213L, ..., fm+113L)|.

L∈P1(Q0) L⊂Q0

This is the base case that sets up the recursive argument. For each L ∈ P1(Q0) we apply the

47 above argument with L in place of Q0, defining

[ E2 = EL

L∈P1(Q0) and

S2 := maximal cubes R ∈ D such that 9R ⊂ E2.

For each L ∈ P1(Q0) we also define

P2(L) = {R ∈ S2 : R ⊂ 3L}, which can be shown to be a stopping collection with top L (the argument is the same as in [19]).

Now we apply Lemma 2.10 to estimate each piece of the sum

X sL |Λ (h11L, h213L, ..., hm+113L)|,

L∈P1(Q0) L⊂Q0 using the stopping collection P2(L) in the application to the term

sL |Λ (f11L, f213L, ..., fm+113L)|.

We then repeat the process just described at the next level.

Since we are working with finitely many scales the process eventually terminates with the desired sparse form bound. The sparse collection S consists of all cubes chosen in the various stopping collections constructed along the way. Condition (2.30) at each level guarantees the sparsity of these cubes.

48 2.5.2 Multiplication Operators

r1 rm α Let Aφ : L × ... × L → L be the multiplication operator described in Remark 2.1, with φ ∈ L∞ and 1 + ... + 1 = 1 . We let r1 rm α

A(f1, ...fm+1) = hAφ(f1, ..., fm), fm+1i.

Using a simpler version of the stopping-time argument from the previous section, it is easy to see that (1,1,...,1) |A(f1, ..., fm, fm+1)| .φ sup PSFS (f1, ..., fm+1) S

r ∞ when fi ∈ L i ∩ L with compact support. Let P be any stopping collection with top Q and P b = L∈P bL with bL supported on L, and observe that if

AP (b, g2, ..., gm+1) := A(b13Q, g213Q, ..., gm+113Q) X − A(b13R, g213R, ..., gm+113R) R∈P R⊂Q then in fact

AP (b, g2, ..., gm+1) = 0.

Similarly AP (g1, b, g3, ..., gm+1), ..., AP (g1, ..., gm, b) all vanish. Hence we can repeat the argu- ment given in the proof of Theorem 2.2, avoiding most of the technical complications. In particular, if fi = gi + bi with gi the ‘good’ terms from the Calder´on-Zygmund decomposition relative to P, then we can argue as in the proof of Lemma 2.10 to show

∞ |AP (g1, ..., gm+1)| . kφkL |Q|kf1kY1 kf2kY1 ...kfm+1kY1 .

We just saw that all other terms of AP (g1 + b1, ..., gm+1 + bm+1) vanish, so we can easily run the (1,...,1) stopping-time argument given in the last section to prove the claimed PSFS bound.

49 Appendix: Adjoint Forms

We prove Proposition 2.1. The argument is almost identical to the proof of the linear variant from [19], so we only provide a sketch. Below we will call two dyadic cubes L, R neighbors and write L ∼ R if 7L ∩ 7R 6= ∅ and |sL − sR| < 8. By separation property (i), if L, R ∈ P are distinct cubes with 7L ∩ 7R 6= ∅, L ∼ R. P Let P be a stopping collection with top Q. Let b = L∈P bL as in the last sections. We want to show that

ΛP (g1, b, g3, ..., gm+1), ..., ΛP (g1, ..., gm, b)

can be decomposed in the same way as ΛP (b, g2, ..., gm+1), up to a controllable error term. We

first analyze ΛP (g1, ..., gm, b). Below we assume g1 is supported in Q and gi is supported in 3Q for i ≥ 2.

As in the appendix in [19], split b = bin + bout with

in X b = bL. L∈P 3L∩2Q6=∅

Fix any R ⊂ Q. Using the support of the kernel, it is easy to see that if s < sR then

Z out Ks(x1, ..., xm)g1(x1)1R(x1)g2(x2)...gm(xm)b (xm+1)dx1...dxm+1 R(m+1)d is identically zero. This is because dist(supp bout, R) ≥ l(R)/2, since bout is supported on L with

3L ∩ 2Q = ∅, but R ⊂ Q. Therefore

Z X Λ (g , ..., g ,bout) = Kij (~x)g (x )...g (x )bout(x )d~x P 1 m sQ 1 1 m m m+1 (m+1)d R i,j X Z X − Kij (~x)g 1 (x )...g (x )bout(x )d~x sR 1 R 1 m m m+1 (m+1)d R∈P R i,j R⊂Q

50 There is only one scale in the kernel in each integral, so we can estimate each term as in Lemma

2.2 and sum over R to see that

out 0 |ΛP (g1, ..., gm, b )| . [K]p |Q|kbkX1 kg1kYp ...kgmkYp .

in We now have to analyze ΛP (g1, ..., gm, b ). As in the appendix of [19], it is enough to show that

 

in  sQ in X sL  ΛP (g1, ..., gm, b ) = Λ (g1, ..., gm, b ) − Λ (g1, ..., gm, bL) (2.31)   L∈P 3L∩2Q6=∅

+ φ(g1, ..., gm, b),

out where φ is some error term satisfying the same estimate as |ΛP (g1, ..., gm, b )|. Then we can decompose the term in parenthesis as in Section 2.2, and use symmetry of the kernel to prove the desired estimates for ΛP (g1, ..., gm, b).

The proof of (2.31) is almost the same as the linear case in [19]. The main observation is that

X sR in X X sR Λ (g11R, g2, ..., gm, b ) = Λ (g11R, g2, ..., gm, bL), (2.32) R∈P R∈P L∈P R⊂Q R⊂Q 3L∩3R6=∅ 3L∩2Q6=∅ since ΛR(g11R, g2, ..., gm, bL) = 0 if 3L ∩ 3R = ∅. In particular this implies that L ∼ R, so if R is fixed then the number of terms in the second sum in (2.32) is bounded by a universal dimensional constant. It is easy to show that

sR sL |Λ (g11R, g2, ..., gm, bL) − Λ (g11R, g2, ..., gm, bL)|

0 . |L|[K]p kg1kYp ...kgmkYp kbkX1 ,

since |sL − sR| < 8 (apply single-scale estimates, noting that the number of such estimates will be bounded by a dimensional constant). With the help of the separation properties we then can

51 replace each ΛsR in (2.32) by ΛsL , up to an admissible error term φ. Now repeat the rest of the argument from [19], with trivial changes to account for the addition of more functions, to show that the remaining terms are of the form (2.31).

Also observe that there was nothing special about the choice of the position of b in the above argument, so the same reasoning applies to ΛP (g1, b, g3, ..., gm+1), ΛP (g1, g2, b, g4, ..., gm+1), etc. This proves Proposition 2.1, up to the trivial steps we have omitted.

52 Chapter 3

Sparse Bounds in the Multi-Parameter Setting

Despite all of the recent activity surrounding sparse domination of operators in harmonic analy- sis, little is known about how to appropriately generalize ‘sparse bounds’ to the product setting where T is a singular operator that commutes with a multi-parameter family of dilations. Some simple examples of these operators include the bi-parameter Hilbert transform

1 H ⊗ Hf(x, y) = p.v. f ∗ , xy the strong maximal function

1 Z MSf(x) = sup |f(y)|dy x∈R |R| R

(where R are axis-parallel rectangles), and multi-parameter Haar multipliers (see below for a precise definition). The analysis of this family of operators dates back to work of Chang, Feffer- man, Journ´e,and Stein (see for example [12], [45]). In recent years, multi-parameter harmonic analysis and its generalizations have been used to study problems related to discrepancy theory

(see Chapter 4), PDE (bi-parameter Leibniz estimates, hypoelliptic operators [59], [66]), and

53 problems in several complex variables [61]. A major theme in the field is that one cannot simply

‘iterate’ results from the one-parameter theory to fully understand multi-parameter operators.

In particular, the one-parameter sparse theory says essentially nothing about the possible (or im- possible) extensions of bounds of type (1.1) to the multi-parameter setting, even in the product setting on low-dimensional Euclidean space.

This chapter is devoted to some attempts at understanding the extent to which sparse bounds are possible for multi-parameter operators. In Chapter 4 we provide a counterexample demonstrating that the (dyadic) strong maximal function does not admit a sparse bound with respect to L1-type forms.

3.1 Some Difficulties

As a first step one could hope that it would be possible to ‘iterate’ the one-parameter sparse estimates to obtain some positive results for tensor-product type operators. Unfortunately there does not appear to be a straightforward way to carry out this argument. Suppose that

Z f(y1, y2) T f(x1, x2) = p.v. dy1dy2, R2 (x1 − y1)(x2 − y2) so that T = H1 ⊗H2 with H1 a Hilbert transform in the x1 direction and H2 a Hilbert transform in the x2 direction. If we fix the variable x1 then we can apply the one-parameter pointwise sparse bound for H to show that there is a sparse collection of intervals Sx1 such that

X |T f(x1, x2)| . (|H1f(x1, ·)|)Q1Q(x2)

Q∈Sx1 for almost every x2. However, the collection of intervals depends on x1 and so there is no obvious way to iterate this estimate to get a sparse bound for the full operator H1 ⊗ H2.

We therefore need to find a more direct approach that does not rely on the one-parameter results. It is clear that any analogue of the one-parameter sparse bound must involve collections

54 of rectangles rather than cubes, due to the underlying of bi-parameter singular integrals like H1 ⊗ H2. At this point one encounters substantial difficulties adapting the one-parameter methods. For example, to prove a sparse-form estimate of type (1.1) we construct the sparse collection of cubes using a stopping-time argument that is intimately related to the Calder´on-

Zygmund decomposition and the Hardy-Littlewood maximal operator. In the bi-parameter setting the natural maximal operator to work with is the strong maximal function

1 Z MSf(x) = sup |f(y)|dy, x∈R |R| R where the supremum is taken over axis-parallel rectangles containing x. However, this operator lacks the martingale structure that enables the type of stopping-time arguments used in the one- parameter setting. In the bi-parameter setting, for both singular integral theory and martingale theory, it turns out that the square function is a more natural operator to work with. For example, in [5] Bernard used the square function to obtain an atomic decomposition for bi- parameter Hardy spaces and to prove H1 − BMO duality. These results were later extended to the continuous setting of bi-parameter singular integrals, Hardy spaces, and product BMO in [14] by Chang and R. Fefferman. Once again square functions played a key role in their arguments.

There is another related geometric problem one encounters: in the bi-parameter setting, it is also not immediately clear what the proper definition of a ‘sparse collection of rectangles’ should be. In this chapter and the next we will work with the following two notions of ‘sparse collection,’ which turn out to be equivalent:

n Definition 3.1. A collection S of rectangles in R is said to be sparse in the disjoint-pieces sense if there is some η > 0 such that for all R ∈ S there is ER ⊂ R with |ER| > η|R|, and such that

0 if R 6= R then ER ∩ ER0 = ∅.

n Definition 3.2. A collection S of rectangles in R satisfies the Carleson packing condition if

55 n there is some Λ > 0 such that for all open sets U ⊂ R ,

X |R| ≤ Λ|U|. R∈S R⊂U

The structure of the packing condition in definition 3.2 is natural in light of the fact that the definition of bi-parameter BMO requires a similar packing condition on Haar or wavelet coefficients relative to rectangles contained in open sets (see the next section for more on bi- parameter BMO spaces). Both definitions are equivalent for collections of cubes [55] and much more generally, as shown by Dor [28]. We include a proof of this equivalence in an appendix at the end of the chapter.

3.2 An Approach Using Square Functions

Let S be the dyadic bi-parameter square function given by

X 1R(x) Sf(x)2 = |hf, h i|2 , R |R| R∈D

2 where D = D1 × D2 is the collection of dyadic rectangles in R (relative to two grids D1 and D2 in R), and hR is the bi-parameter Haar function associated to R. Recall that if R = I × J then

1 h (x , x ) = h (x )h (x ), h = (1 − 1 ), R 1 2 I 1 J 2 I |I|1/2 Il Ir

where Il and Ir are the left and right children of I. Also recall that the functions hR form a

2 basis of L (R) for any D. Given the role of square functions in the bi-parameter theory, it is natural to attempt to prove a sparse form bound of the type

X |hT f, gi| . |R|(Sf)R(Sg)R, (3.1) R∈S

56 where S is a collection of rectangles that satisfies either the disjoint-pieces or Carleson packing condition.

We begin by studying the bi-paramater martingale transform

X T f = Rhf, hRihR, sup |R| ≤ C. R∈D R

The following theorem can be proved by a simple argument using Cauchy-Schwarz and the sparse bound for the identity operator (and in fact this proof gives a collection of cubes, not just rectangles).

Theorem 3.1. Let T be the bi-parameter martingale transform defined above, and suppose f and g are functions with finitely many Haar coefficients. Then there exists a collection of rectangles

S that is sparse in the sense of Definitions 3.1 and 3.2 such that

X |hT f, gi| . (sup |R|) |R|(Sf)R(Sg)R. (3.2) R R∈S

The implicit constant does not depend on f or g.

In the next section we will present a geometric proof of this theorem based on the the underlying bi-parameter geometry. The hope is that this argument is more robust and will be useful in contexts where T is more complicated than a Haar multiplier, and where a simple Cauchy-

Schwarz argument does not suffice.

The methods used to prove Theorem 3.1 generalize to the case where T is a bi-parameter dyadic shift as long as we replace the dyadic square functions S by certain shifted square func- tions Si,j. Since the statement of this result is somewhat technical, we defer the detailed defini- tions until Section 3.5. As a consequence of Martikainen’s representation theorem [58], we are then able to deduce a type of sparse bound for paraproduct-free bi-parameter singular integrals belonging to the Journ´eclass. See Corollary 3.1 for a precise statement of this result.

The one-parameter theory indicates that we should be able to easily prove weighted estimates once we have established a sparse bound. This is still the case for the square-function sparse form

57 estimate (3.1), and we derive weighted corollaries of our main results in Sections 3.6 and 3.7. It is also straightforward to track the dependence of the constants on the Ap characteristic of the weight. However, due to the addition of the square functions S and some extra complications related to the strong maximal function, this approach does not give weighted estimates that are sharp in terms of the Ap characteristic (see Section 3.6 for definitions). Nevertheless, our sparse bounds provide an alternative approach to proving Ap estimates for bi-parameter martingale transforms and cancellative dyadic shifts (see [39] for another recent method).

3.2.1 Remarks on Theorem 3.1

(1) For simplicity the results and the proofs are stated for R × R, but our methods all extend n m directly to the product space R × R once suitable modifications are made to the definition of the Haar functions. We also do not see any obstacles to carrying out the arguments below in the multi-parameter setting.

(2) The sparse bound (3.1) is true in the one-parameter setting when we are working with intervals (or cubes), but in a stronger sense. That is, (3.1) holds with localized square functions, so that X |hT f, gi| . |I|(SI f)I (SI g)I . I∈S

Here the square function SI only involves dyadic intervals J contained in I (see Theorem 15 in [4]). This localized square-function sparse bound cannot hold in the bi-parameter setting, as observed by M. Lacey and Y. Ou [50]. We prove this claim in the following section.

(3) It is clear from our proofs of the weighted corollaries in Sections 3.6 and 3.7 that there are still significant obstacles to overcome if we wish to develop a sharp weighted theory for multi- parameter operators by using sparse domination (see, for example, the comments after the proof of Theorem 3.3 and the first appendix). A different notion of ‘sparse operator’ in the multi- parameter setting may be needed, possibly one that allows us to circumvent the obstructions caused by the strong maximal function.

58 3.3 The Lacey-Ou Example

Let T be a bi-parameter Haar multiplier, and let SRf be the localized bi-parameter dyadic square function X 1L(x) S f(x)2 = |hf, h i|2 . R L |L| L⊂R

Here R and L are dyadic rectangles with hL the associated product Haar function. We show that in general there can be no sparse bound of the type

X |hT f, gi| ≤ C |R|(SRf)R(SRg)R, R∈S where the collection S is η-sparse with respect to some fixed but arbitrary 0 < η < 1. We learned of this example from Michael Lacey and Yumeng Ou. Before continuing it will be helpful to review the definitions of two bi-parameter BMO spaces. Recall that dyadic BMOrect is the space of all functions such there is some Λ > 0 with

X 2 |hf, hLi| ≤ Λ|R| L⊂R

for all dyadic rectangles R, with kfkBMOrect the smallest such Λ. Also recall that dyadic 0 BMOprod is the space of functions such that there is some Λ > 0 with

X 2 0 |hf, hLi| ≤ Λ |Ω| L⊂Ω

0 for all open sets Ω, with kfkBMOprod the smallest such Λ . One has strict containment BMOprod (

BMOrect, as can be shown by using the function considered below.

We will assume that T is just the identity operator (though suitable modifications show that

P −1 this argument works for any Haar multiplier T f = R Rhf, hRihR(x) with R ∈ [λ , λ] for some fixed λ > 0.) Recall that for each N > 1 we can construct a collection C of rectangles in the unit cube Q with the following properties:

59 P • L∈C |L| = 1

P • If R0 is any dyadic rectangle then L⊂R0 |L| ≤ |R0| R∈C

S • For Ω = L∈C L we have |Ω| ∼ 1/N

This is the Carleson ‘quilt’ example that shows there is a distinction between rectangle and product BMO (see for example Chapter 3 in [60] for a construction). Now assume for a con- tradiction that the localized-square-function sparse bound holds, with some uniform constant

C and sparse parameter η. We will also make the extra assumption that the rectangles from the sparse collection are contained in 3Q. Note that in the one-parameter setting one typically gets a sparse collection contained in some dilate of the support of the functions. It is, however, possible to remove this assumption; see below.

For an arbitrary N > 1 we take

X 1/2 f = |L| hL(x), L∈C a normalized Haar series adapted to C with |Ω| ∼ 1/N (this is the same function that demon- strates the difference between BMOrect and BMOprod). If the localized sparse bound held we would have

X 2 1 = hf, fi . |R|(SRf)R R∈S

X 2 X 2 . |R|(SRf)R + |R|(SRf)R R∈S R∈S |R∩Ω|≥α|R| |R∩Ω|<α|R|

:= I + II, where 0 < α < 1 is some parameter to be chosen later. To estimate I, observe that by Cauchy-

60 Schwarz and the definition of SR we must have

X X X I . |L| . |R|, R∈S L⊂R R∈S |R∩Ω|≥α|R| L∈C |R∩Ω|≥α|R| the second inequality following from the rectangle packing property of C. Hence by the sparseness of the collection S we conclude

−1 −1 I . |{M(1Ω) > α}| . α log(α )|Ω|.

S To estimate II, we first define CR = L⊂R L, and then observe that by Cauchy-Schwarz (as L∈C before) X |CR| X X II |L| |C |. . |R| . R R∈S L⊂R R∈S |R∩Ω|<α|R| L∈C |R∩Ω|<α|R|

But for each R ∈ S we have CR ⊂ R ∩ Ω, and therefore |CR| < α|R| in the above sum. Since the collection S is sparse and contained in 3Q we conclude

X II . α |R| . α|Q| . α. R∈S

In summary we have shown that

−1 −1 1 . α log(α )|Ω| + α, which translates to

−1 −1 N . α log(α ) + αN for any 0 < α < 1. Now take α = N −1/2 to get

1/2 1/2 1/2 N . N log(N ) + N .

61 We can assume that the implicit constant is negligible compared to N (by taking N large enough in the beginning), so this is the desired contradiction.

To remove the assumption that rectangles from S are contained in 3Q, one can further decompose the sum II according to pieces with both |R ∩ Ω| ≤ α|R| and |R ∩ Q| ∼ 2−k|R| for k ≥ 0. In this case CR ⊂ R ∩ Q ∩ Ω = R ∩ Ω, so that

1/4 3/4 −3k/4 1/4 |CR| ≤ |R ∩ Ω| ≤ |R ∩ Ω| |R ∩ Q| ≤ 2 α |R|.

Hence these extra terms contribute a term c · α1/4 to the final estimate (after summing over k ≥ 1). This gives

−1 −1 1/4 1 . α log(α )|Ω| + α + α , or

−1 −1 1/4 N . α log(α ) + αN + α N.

Taking α = N −1/2 as before gives

1/2 1/2 1/2 7/8 N . N log(N ) + N + N , which is once again a contradiction.

3.4 The Bi-Parameter Martingale Transform

Fix two dyadic lattices D1, D2 in R and let D = D1 × D2 be the associated dyadic rectangles in 2 R . We give a geometric proof of the square-function sparse form bound claimed in Theorem 3.1 for the bi-parameter martingale transform

X T f = Rhf, hRihR, (3.3) R∈D

62 where as above supR |R| ≤ C. The argument begins by decomposing the form hT f, gi according to the Chang-Fefferman variant of the Calder´on-Zygmund decomposition from [13]. We then select a certain sparse collection of rectangles using the C´ordoba-Fefferman algorithm from [20], and further decompose the operator in terms of these rectangles. The structure of the square function allows us to absorb the ‘error’ terms (i.e., the rectangles not belonging to the sparse collection).

There are a few similarities between our basic approach and the standard sparse domination scheme in the one-parameter setting. For example, the bi-parameter analogue of the Calder´on-

Zygmund decomposition plays an important role in the first step. Additionally, we select the sparse collection of rectangles via a covering lemma that is equivalent to the boundedness of the strong maximal function; in the one-parameter setting, sparse cubes are typically chosen via a similar covering lemma associated to the Hardy-Littlewood maximal function.

3.4.1 The Geometric Proof of Theorem 3.1

2 We fix two test functions f, g on R with support in some large cube Q0, and assume there are only finitely many dyadic rectangles R with hf, hRi or hg, hRi nonzero. Let αf = c · (Sf)Q0 and

αg = c · (Sg)Q0 , where c is some large constant. Define

Ω0 = {x ∈ Q0 : Sf(x) > αf } ∪ {x ∈ Q0 : Sg(x) > αg},

1 and assume c has been chosen so that |Ω0| ≤ 2 |Q0|. Let R0 be the collection of rectangles R 1 such that |R ∩ Ω0| < 2 |R|, and for positive integers k define

k k Ωk = {x ∈ Q0 : Sf(x) > 2 αf } ∪ {x ∈ Q0 : Sg(x) > 2 αg}.

Also set 1 1 F = {R : |R ∩ Ω | > |R| and |R ∩ Ω | ≤ |R|}. k k 2 k+1 2

63 We begin with the case where R = 1 for all R. We wish to estimate

X X X X αR = αR + αR

R R∈R0 k R∈Fk by a sparse form (with square function averages). Observe that since |Ωk| → 0 as k → ∞ there are only finitely many Fk that contribute to the sum (recall that f, g have only finitely many nonzero Haar coefficients). Therefore it suffices to fix a large N and bound

N X X X αR + αR := I + II

R∈R0 k=0 R∈Fk by a sparse form, provided all constants are independent of N. Note that I corresponds to the

‘good’ piece in the Chang-Fefferman variant of the Calder´on-Zygmund decomposition, and II corresponds to the ‘bad’ piece. The estimate for I is straightforward:

Z 1 c (y) X X R∩Ω0 hf, hRihg, hRi ≤ |hf, hRihg, hRi| c dy R∩Ωc |R ∩ Ω0| R∈R0 R∈R0 0 Z X 1R(y) ≤ 2 |hf, hRihg, hRi| dy (3.4) R∩Ωc |R| R∈R0 0

. |Q0|(Sf)Q0 (Sg)Q0 .

The last inequality follows from Cauchy-Schwarz and the definition of Ω0. To handle the re- maining term II, we construct a sparse collection of rectangles using the C´ordoba-Fefferman selection algorithm from [20], and decompose II in terms of these rectangles.

Fix β ∈ (0, 1) and begin at level N. Order the rectangles {Ri} in FN according to size (for

∗ ∗ example), and set R1 = R1. Proceeding inductively, choose those Rk such that

∗ [ ∗ ∗ |Rk ∩ Rj | < β|Rk|, j

64 ∗ and such that Rk is minimal with this property relative to the initial order. Relabel the collection ∗ (N) {Rk} as {Rk } (the rectangles in the collection at level N). Now suppose we have added rectangles to the collection up until level l +1. Let Λl+1 denote the union of all rectangles added (l) to this point. Order the rectangles in Fl as before, and let R1 be the first rectangle relative to this order such that

(l) l+1 (l) |R1 ∩ Λ | < β|R1 |.

(l) Inductively, choose Rk such that

(l) [ (l) l+1 (l) |Rk ∩ ( Rj ∪ Λ )| < β|Rk |, j

(l) and such that Rk is minimal with this property (relative to the initial order). The resulting (m) collection {Rj }m,j is sparse in the disjoint-pieces sense, with sparse parameter 1 − β. In (l) S (l) l+1 particular, for R = Rk we can choose ER = R\( j

It remains to be shown that

N X X X (m) α |R |(Sf) (m) (Sg) (m) . R . j R R j j k=0 R∈Fk m,j

Break up this sum as N X X X αR + αR := A + B. k=0 (k) rest R=Ri

To estimate A, first observe that if αR = α (k) then Ri

65 1 (k) (y) Z R ∩Ωc α = α · i k+1 dy R R (k) R(k)∩Ωc c i k+1 |Ri ∩ Ωk+1| Z 1R(y) ≤ 2 αR dy (k) c |R| Ri ∩Ωk+1 Z ≤ 2 Sf(x)Sg(x)dx. (k) c Ri ∩Ωk+1

k k c Now recall that Sf . 2 (Sf)Q0 and Sg . 2 (Sg)Q0 in Ωk+1. Moreover, by construction we must k k (k) have either Sf & 2 (Sf)Q0 or Sg & 2 (Sg)Q0 in more than a quarter of Ri . Without loss of k generality suppose Sf & 2 (Sf)Q0 . Then

Z   1 Z Sf(x)Sg(x)dx 2k(Sf) · |R(k)| Sg(y) dy . Q0 i (k) R(k)∩Ωc R(k) i k+1 |Ri | i (k) . |Ri |(Sf) (k) (Sg) (k) , (3.5) Ri Ri and we can ultimately conclude that

X X (k) A . |Ri |(Sf) (k) (Sg) (k) . (3.6) Ri Ri k i

We now turn to the term B. Observe that a rectangle R ∈ Fl contributes to the sum B if R was never chosen in the C´ordoba-Fefferman selection process. It follows that for such an R ∈ Fl we must have

[ [ (k) c |R ∩ Ri ∩ Ωl+1| ≥ (β − 1/2)|R|, k≥l i

1 provided we have chosen β > 2 . This is because

1 |R ∩ Ωc | ≥ |R| l+1 2

66 and [ [ (k) |R ∩ Ri | ≥ β|R|. k≥l i

We then have

S S (k) c X X |R ∩ k≥l i Ri ∩ Ωl+1| B ≤ (β − 1/2)−1 α R |R| l R∈Fl (k) c X X X X |R ∩ R ∩ Ω | ≤ (β − 1/2)−1 α i k+1 (3.7) R |R| l R∈Fl k≥l i Z ! −1 X X X 1R ≤ (β − 1/2) αR dy. (k) c |R| k i Ri ∩Ωk+1 R

c c We’ve used the fact that Ωl+1 ⊂ Ωk+1 when k ≥ l in the second line. We can now finish the c estimate by applying Cauchy-Schwarz and using the properties of the Ωk+1, as in (3.5). Hence

X X (k) B . |Ri |(Sf) (k) (Sg) (k) Ri Ri k i as well, completing the proof of the case where R = 1 for all R.

For the general martingale transform, we simply remark that we have not used any cancel- lation in the above argument. Hence the argument is exactly the same if we replace αR with

|αR|, and as a consequence we may repeat the above argument with (supR |R|)|αR| in place of P αR. The sparse bound for R Rhf, hRihR follows.

3.5 Bi-Parameter Dyadic Shifts and Singular Integrals

The argument from Section 3.4 generalizes to the case where T is a cancellative bi-parameter dyadic shift if one replaces the usual square function by certain shifted variants. This ulti- mately leads to a type of sparse bound for paraproduct-free bi-parameter singular integrals via

Martikainen’s representation theorem [58].

67 3.5.1 Definitions.

We let D1 and D2 be two dyadic grids in R (not necessarily the standard grids), and let D =

D1 ×D2 be the collection of dyadic rectangles relative to these grids. If I is a dyadic interval and k ∈ N, we let (I)k denote the children of I at level k, so that J ∈ (I)k if and only if J ⊂ I and −k |J| = 2 |I|. We also denote the bi-parameter Haar wavelets by hR = hR1 ⊗ hR2 for rectangles

R = R1 × R2, and use fˆ(R) to denote the Haar coefficient of a function f.

Given tuples of non-negative integers i = (i1, i2) and j = (j1, j2), we define the cancellative bi-parameter dyadic shift of complexity (i, j) by

i,j X X X ˆ T f(y) = aP QR · f(P )hQ(y) (3.8) R ∈D 1 1 P1∈(R1)i1 Q1∈(R1)j1 R2∈D2 P2∈(R2)i2 Q2∈(R2)j2

Here P = P1 × P2,Q = Q1 × Q2 and R = R1 × R2 are dyadic rectangles, and aP QR is a constant satisfying the bound

p p |P ||Q | |P ||Q | 1 1 1 2 2 − (i1+j1+i2+j2) |aP QR| ≤ = 2 2 . (3.9) |R1||R2|

We also define the dyadic shifted square function adapted to the shift parameters i, j by

 2  X X X 1Q 1Q (Si,jf(y))2 = |fˆ(P )| 1 ⊗ 2 (y) . (3.10) |Q1| |Q2| R ∈D 1 1 P1∈(R1)i1 Q1∈(R1)j1 R2∈D2 P2∈(R2)i2 Q2∈(R2)j2

This clearly depends on the choice of D, but we omit this dependence from the notation since our bounds will be independent of D. Also note that this definition is not symmetric in i, j.

The same is true for the bi-parameter shift of complexity (i, j). (The definition (3.10) is taken from the paper [39] by Holmes, Petermichl, and Wick).

68 3.5.2 The Sparse Bound

We will now adapt the argument from Section 3.4 to prove the following sparse bound.

Theorem 3.2. Let D be an arbitrary system of dyadic rectangles. Let T i,j be the cancellative shift defined above (using rectangles from D), and fix test functions f, g. Then there exists a sparse collection S of D-dyadic rectangles such that

i,j −(i1+i2+j1+j2) X i,j j,i |hT f, gi| . 2 |R|(S f)R(S g)R. R∈S

The collection S depends on f, g and (i, j), but the implicit constant does not.

Note that the order of i, j is switched in the term containing g. From [39] we know that

1 i,j (i1+i2+j1+j2) kS fkLp(w) . cw2 2 kfkLp(w), so we need the factor in front of the sparse form for applications to weighted estimates. In the last section we will show that in the case p = 2 we

5 can at least take cw = [w]A2 .

The proof of the theorem is similar to what we have seen above. Begin by assuming that

i,j j,i f, g are supported in some cube L. Let αf = c · (S f)L and αg = c · (S g)L, where c is some large constant. Define

i,j j,i Ω0 = {x ∈ L : S f(x) > αf } ∪ {x ∈ Q0 : S g(x) > αg}.

Notice that Z Z 1 i,j 1 j,i 2 |Ω0| ≤ S f + S g ≤ |L|, αf L αg L c

1 so we can assume c has been chosen independent of (i, j) such that |Ω0| ≤ 2 |L|. For positive integers k, also define

i,j k j,i k Ωk = {x ∈ L : S f(x) > 2 αf } ∪ {x ∈ L : S g(x) > 2 αg},

69 1 and let R0 be the collection of rectangles R such that |R ∩ Ω0| < 2 |R|. Finally, for k ≥ 0 define

1 1 F = {R : |R ∩ Ω | > |R| and |R ∩ Ω | ≤ |R|}. k k 2 k+1 2

We first estimate the ‘good’ part of our form corresponding to the rectangles in R0. To further simplify the notation, if R = R1 × R2 is a dyadic rectangle and P = P1 × P2 is a dyadic rectangle contained in R, we write P ∈ R~i to mean P1 ∈ (R1)i1 and P2 ∈ (R2)i2 .

Lemma 3.1. Let R be a dyadic rectangle. Then

P 1/2 P 1/2 P ∈R 1P1 ⊗ 1P2 (y) Q∈R 1Q1 ⊗ 1Q2 (y) 1R(y) − 1 (i +i +j +j ) ~i ~j = 2 2 1 2 1 2 . 1/2 1/2 |R| (|P1||P2|) (|Q1||Q2|)

Proof. This is a simple consequence of the fact that R is a disjoint union of all rectangles P such that P ∈ R~i, and similarly R is a disjoint union of all rectangles Q such that Q ∈ R~j. Since 1 1 1/2 (i1+i2) 1/2 1/2 (j1+j2) 1/2 |R| = 2 2 (|P1||P2|) and |R| = 2 2 (|Q1||Q2|) the identity follows.

Now let i,j X X ˆ hT f, gigood = aP QRf(P )ˆg(Q).

R∈R0 P ∈R~i Q∈R~j

Arguing as in the beginning of the proof of Theorem 3.1 we find that

1 i,j − (i1+i2+j1+j2) X X ˆ |hT f, gigood| ≤ 2 2 |f(P )ˆg(Q)|

R∈R0 P ∈R~i Q∈R~j

Z c 1 X X 1R∩Ω (y) − 2 (i1+i2+j1+j2) ˆ 0 = 2 |f(P )ˆg(Q)| c dy c |R ∩ Ω | R∩Ω0 0 R∈R0 P ∈R~i Q∈R~j

1 Z 1 (y) − (i1+i2+j1+j2) X X ˆ X R . 2 2 |f(P )| |gˆ(Q)| dy c |R| Ω0 R∈R0 P ∈R~i Q∈R~j Z −(i +i +j +j ) i,j j,i . 2 1 2 1 2 S f(y) · S g(y)dy. c Ω0

70 To get to the last line, we applied Lemma 3.1 and then used Cauchy-Schwarz and the definition

c of the shifted square functions. But the integral is over Ω0, so we can conclude that

i,j −(i1+i2+j1+j2) i,j j,i |hT f, gigood| . 2 |L|(S f)L(S g)L.

It remains to estimate

i,j i,j i,j hT f, gibad = hT f, gi − hT f, gigood.

As in Section 3.4, this is where the sparse collection enters into the picture. We will assume that fˆ(R) andg ˆ(R) are nonzero for only finitely many R, and let N denote the largest integer ˆ such that f(R) andg ˆ(R) are nonzero for some R ∈ FN . All bounds will be independent of N, so density arguments will allow us to extend the results to more general f, g. We construct the sparse collection using the C´ordoba-Fefferman selection algorithm as before. The only change is

i,j that our exceptional sets Ωk now depend on the shifted square function S , but otherwise the construction proceeds in exactly the same way as in the case of the martingale transform. We (k) omit the details since the argument would be a copy of what appears in Section 3.4. Let {Rn } (k) denote the resulting collection, with Rn ∈ Fk. We can break up the ‘bad’ part of the form as

i,j X X ˆ X ˆ hT f, gibad = aP QRf(P )ˆg(Q) + aP QRf(P )ˆg(Q) := A + B. (k) P ∈R rest R=Rn ~i Q∈R~j

71 (k) Estimating A. Fix R = Rn and observe that since R ∈ Fk we have

Z 1 c (y) X 1 X R∩Ωk+1 ˆ − 2 (i1+i2+j1+j2) ˆ aP QRf(P )ˆg(Q) ≤ 2 |f(P )ˆg(Q)| c dy c |R ∩ Ω | R∩Ωk+1 k+1 P ∈R~i P ∈R~i Q∈R~j Q∈R~j

1 Z 1 (y) − (i1+i2+j1+j2) X ˆ X R . 2 2 |f(P )| |gˆ(Q)| dy c |R| R∩Ωk+1 P ∈R~i Q∈R~j Z −(i +i +j +j ) i,j j,i . 2 1 2 1 2 S f(y)S g(y)dy, c R∩Ωk+1 applying Lemma 3.1 as above to get to the last line. Now argue as in (3.5) and sum over all (k) Rn to get

−(i1+i2+j1+j2) X (k) i,j j,i |A| 2 |Rn |(S f) (k) (S g) (k) . . Rn Rn n,k

Estimating B. The term B involves a sum of P a fˆ(P )ˆg(Q) over all R not chosen P ∈R~i P QR Q∈R~j in the C´ordoba-Fefferman selection process. For any such R ∈ Fl we must have

[ [ (k) c |R ∩ Rn ∩ Ωl+1| ≥ (β − 1/2)|R| k≥l n

1 as in Section 3.4, provided we have chosen β > 2 .

The proof now proceeds as in (3.7). One repeats the argument given in (3.7) and makes modifications similar to what we’ve seen the the proof of A in order to insert the shifted operators

Si,jf, Sj,ig. It follows that

−(i1+i2+j1+j2) X (k) i,j j,i |B| β 2 |Rn |(S f) (k) (S g) (k) , . Rn Rn n,k completing the proof of Theorem 3.2.

72 3.5.3 Bi-Parameter Singular Integrals

We can now use Martikainen’s representation theorem [58] to show that if T is a paraproduct- free bi-parameter singular integral belonging to the Journ´eclass, then T can be estimated by an average of sparse forms of the type appearing in Theorem 3.2. Loosely speaking, T is a

2 2 bi-parameter Journ´eoperator on R × R if it has a kernel K(x1, x2, y1, y2) on R × R that satisfies analogues of the Calder´on-Zygmund kernel conditions in each variable separately, along with mixed H¨olderand size conditions involving the two parameters. We send the reader to

Martikainen’s paper [58] for the precise definition of a Journ´eoperator T . We say that such a T is paraproduct-free if the bi-parameter version T (1) = T ∗(1) = 0 holds, meaning there are only cancellative shifts in the dyadic representation from [58].

We briefly recall one more definition. Let D0 denote the standard dyadic intervals in R, and

Z for every η = (ηj)j∈Z ∈ {0, 1} define the shifted grid

X −k Dη := {I + η : I ∈ D0}, with I + η := I + 2 ηk. 2−k<|I|

We assign {0, 1}Z the natural Bernoulli(1/2) product measure. This gives us a probability measure on the space of shifted grids, and hence a probability measure on the space of shifted dyadic rectangles Dη × Dη0 (see [41] or [58] for more properties of these random grids).

2 Corollary 3.1. Let T be a paraproduct-free bi-parameter Journ´esingular integral on R and suppose f, g are test functions with finitely many (bi-parameter) Haar coefficients. For each pair

−(i +i +j +j ) i,j of tuples of non-negative integers i, j set τi,j = 2 1 2 1 2 . Also let Sω be the shifted square function (3.10) defined with respect to the random dyadic system Dω = Dω1 × Dω2 . Then there exists δ > 0 and sparse collections of rectangles Λi,j (depending on f, g) such that

X − max(i1,j1)δ/2 − max(i2,j2)δ/2 X i,j j,i |hT f, gi| . Eω1 Eω2 2 2 τi,j |R|(Sω f)R(Sω g)R. i,j≥0 R∈Λi,j

i,j Proof. Given a system of shifted dyadic rectangles Dω = Dω1 × Dω2 , we let Tω denote the shift

73 operator (3.8) defined with respect to rectangles from Dω. Martikainen proved that if T is a paraproduct-free operator in the Journ´eclass then

X − max(i1,j1)δ/2 − max(i2,j2)δ/2 i,j hT f, gi = Eω1 Eω2 2 2 hTω f, gi. i,j≥0

i,j The claimed result now follows by applying Theorem 3.2 to each form hTω f, gi (recall that

Theorem 3.2 applies for any dyadic system Dω1 × Dω2 ).

It should be possible to extend the result of Corollary 3.1 to arbitrary Journ´eoperators T , although the weighted estimates that follow would be far from optimal. We briefly outline one approach. It would be sufficient to prove analogues of Theorem 3.2 for the various paraproducts that show up in the dyadic representation of T . To this end, one can work with the mixed operators SM and MS, where S and M are one-parameter square and maximal functions in different directions. By combining methods from [39] or [59] with our sparse domination scheme, it should be possible to prove bounds of the type

X |hΠf, gi| . |R|(SMf)R(SMg)R, (3.11) R∈S where Π is a bi-parameter paraproduct. One may also have to work with shifted variants of the mixed operators (see [39]), and prove analogues of (3.11) involving these operators.

3.6 Weighted Estimates for Bi-Parameter Martingale Transform

Let w(x1, x2) be a positive, locally integrable weight on R×R. Recall that w is a two-parameter

Ap(R × R) weight for 1 < p < ∞ if and only if

 Z   Z p−1 1 1 1−p0 [w]Ap(R×R) := sup w(x1, x2) dx w(x1, x2) dx < ∞. (3.12) R |R| R |R| R

74 By the Lebesgue differentiation theorem this condition is equivalent to w(·, x2) ∈ Ap(R) uni- formly in x2 and w(x1, ·) ∈ Ap(R) uniformly in x1, and in fact

[w] max(k[w(·, x2)] kL∞ , k[w(x1, ·)] kL∞ ). Ap(R×R) w Ap(R) x2 Ap(R) x1

Write [w]Ap = [w]Ap(R×R). As we noted in the introduction, in the one-parameter setting sparse bounds lead to weighted estimates that are sharp in terms of the Ap characteristic. Here we use our sparse bound (3.1) to derive Ap estimates in terms of the bi-parameter characteristic. Un- fortunately, the square-function sparse bound we have proved does not seem to imply estimates that are sharp. By using known methods, for example the arguments in [39], one can prove

8 kT fk p [w] kfk p L (w) . Ap(R×R) L (w) for a Journ´e-type operator [62], whereas our methods yield a power that is much worse. On the other hand, our method of proof simplifies the somewhat technical arguments that currently exist in the literature.

Lemma 3.2. Let S be the bi-parameter square function with respect to some fixed dyadic grid

D, and let M be the strong maximal function. Then if w ∈ A2,

2 2 2 kSfkL (w) . [w]A2 kfkL (w) and

2 2 2 kMfkL (w) . [w]A2 kfkL (w) for all f ∈ L2(w).

Proof. Recall that the dyadic one-parameter square function satisfies the weighted estimate

2 2 kS1(f)kL (w) . [w]A2(R)kfkL (w)

75 and the Hardy-Littlewood maximal operator satisfies the estimate

2 2 kM1(f)kL (w) . [w]A2(R)kfkL (w).

Both of the claimed estimates follow by iterating the one-parameter results, using the pointwise bound Mf ≤ M1(M2f) for the strong maximal function (here M1 is the Hardy-Littlewood operator in the direction x1, and M2 is the Hardy-Littlewood operator in the direction x2).

Proposition 3.3. Let R be a uniformly bounded sequence indexed over dyadic rectangles with supR |R| ≤ C, and let T f be the following bi-parameter martingale transform:

X T f(x) = Rhf, hRihR(x). R

2 Then for all w ∈ A2 = A2(R × R) and f ∈ L (w) we have

8 2 2 kT fkL (w) . C[w]A2 kfkL (w).

Proof. The estimate follows from sparse domination. Let σ = w−1. By duality it is enough to show that for all g ∈ L2(σ) we have

8 2 2 |hT f, gi| . C[w]A2 kfkL (w)kgkL (σ).

We know from above that there is a sparse collection of rectangles S so that

X |hT f, gi| . C |R|(Sf)R(Sg)R. R∈S

We now a repeat a version of the standard argument from the one-parameter theory, using the

76 strong maximal function in place of the Hardy-Littlewood maximal function. We have

X X |R|(Sf)R(Sg)R |ER|( inf M(Sf)(x))( inf M(Sg)(x)) . x∈R x∈R R∈S R∈S Z X 1/2 1/2 . M(Sf)(x)M(Sg)(x)w (x)σ (x)dx R∈S ER Z 1/2 1/2 . M(Sf)(x)M(Sg)(x)w (x)σ (x)dx R2

. kM(Sf)kL2(w)kM(Sg)kL2(σ)

4 2 2 . [w]A2 kSfkL (w)kSgkL (σ)

8 2 2 . [w]A2 kfkL (w)kgkL (σ), as desired.

By passing through a square function, it is not too hard to show that the bi-parameter martingale transform satisfies the A2 bound

3 2 2 kT fkL (w) . [w]A2 kfkL (w), hence the constants in Proposition 3.3 are far from optimal. We do not know if the power of

8 appearing in Proposition 3.3 can be pushed down further using our methods. In the one- parameter setting, the usual argument that produces the sharp A2 bound invokes the weighted maximal operator Z µ 1 M1 f(x) = sup |f(y)|µ(y)dy, (3.13) x∈I µ(I) I which is bounded on L2(µ) for any positive function µ, with norm independent of µ (this follows from the Besicovitch covering lemma or martingale theory, see [82] for example). However, the bi-parameter analogue of (3.13) is in general not bounded on L2(µ), due to the more complicated geometry. R. Fefferman proved in [31] that µ ∈ A∞(R × R) is sufficient for the strong weighted

77 maximal function M µ to be bounded on L2(µ), but the sharp dependence of the operator norm on [µ]A∞ is unclear from his argument. It is somewhat surprising that if we trace the dependence in his argument and use some recent sharp results related to A∞ ([75], [42]), we uncover a dependence that is exponential in the A∞ characteristic. Recall that w is in the bi-parameter weight class A∞(R × R) if w is in the one-parameter class A∞(R) uniformly in each variable.

w Proposition 3.4. Suppose w ∈ Ap(R × R) and let M be the two-dimensional weighted strong maximal function 1 Z M w(f)(y) = sup |f(x)| w(x)dx. y∈R w(R) R

w c[w]A∞ Then for all 1 < p < ∞ we have kM kLp(w)→Lp,∞(w) .p [w]Ap e .

We prove this proposition in the first appendix.

The sparse bounds from Section 3.5 also allow us to derive weighted estimates for dyadic shifts and paraproduct-free Journ´eoperators. The argument is almost the same as the proof of

Theorem 3.3, but in this case we have to work with weighted estimates for the shifted square

i,j functions S . We know from [39] that if w ∈ Ap(R × R) there is some cw > 0 such that

i,j (i+j)/2 kS fkLp(w) ≤ 2 cwkfkLp(w)

p for all f ∈ L (w), but we would like to track the dependence of cw on [w]Ap . This is the content of the next section.

3.7 Weighted Estimates for the Shifted Square Function

It was proved in [40] that the one-parameter shifted square function

 2 X X X 1Q(x) Si,jf(x)2 = |fˆ(P )| 1 |Q| R∈D P ∈(R)i Q∈(R)j

78 i,j (i+j)/2 satisfies the weighted estimate kS1 fkL2(w) ≤ 2 CwkfkL2(w) for w ∈ A2. In particular, the argument in [40] gives C ≤ [w]2 . In this section we prove a type of sparse bound for Si,j that w A2 allows us to show C ≤ [w]1/2[w]1/2 . An iteration argument then shows that the bi-parameter w A2 A∞ analogue of Si,j satisfies a weighted bound with constant c [w]4 [w] . The method of 1 w . A2 A∞ proof is an adaptation of the scalar case of the argument by Hyt¨onen,Petermichl, and Volberg in [43].

3.7.1 Preliminary Results

Fix an arbitrary (one-paramter) dyadic lattice D.

i,j Lemma 3.5. Suppose fk is a sequence of functions such that S1 (fk) is defined for each k. i,j P P i,j Then S1 ( k fk) ≤ k S (fk).

Proof. Fix an arbitrary x ∈ R. The lemma is a simple consequence of Minkowski’s inequal-

2 2 P 2 1R ity for the weighted space ` (1R(x)/|R|), where k{αR}k 2 = α . Let F = ` (1R/|R|) R R |R| k,(R)i P |fˆ (P )|. Then P ∈(R)i k

i,j X j/2 X 1 S1 ( fk) = 2 k Fk,(R)i k 2 R ` ( |R| ) k k j/2 X 1 ≤ 2 kFk,(R)i k 2 R ` ( |R| ) k !1/2 X X 1R X = 2j/2 (F )2 = Si,j(f ). k,(R)i |R| 1 k k R k

i,j 1 1,∞ i,j (i+j)/2 Proposition 3.6. The operator S1 maps L (R) into L (R) with kS1 kL1→L1,∞ . i2 .

Proof. The argument is a variation of the standard approach via the Calder´on-Zygmund decom-

1 1 R position. Fix f ∈ L (R) and λ, α > 0. Choose maximal dyadic intervals J such that |J| J |f| > P αλ and let Ω denote the union of such intervals. Then f = g + b, with g = f1Ωc + J (f)J 1J P and b = J (f − (f)J )1J . Moreover kgkL∞ . αλ and kbJ kL1 . αλ|J|.

79 By scaling invariance we may normalize kfkL1 = 1. By Lemma 3.5 we have

i,j i,j i,j |{S1 f > λ}| ≤ |{S1 g > λ/2}| + |{S1 b > λ/2}|.

2 i,j Using the L -boundedness of S1 we can immediately conclude that

α |{Si,jg > λ/2}| λ−22i+jkgk2 2i+j . 1 . L2 . λ

S −1 −1 Let E = J 5J and note |E| . α λ . We also claim that

1 |{x ∈ Ec : Si,jb(x) > λ/2}| · i2(i+j)/2. (3.14) . λ

R To prove the last claim, we first note that if J ( P then bbJ = 0 since bJ = 0 and hP is constant on dyadic sub-intervals of P . Moreover, only the intervals R with R ⊃ J contribute to

i,j c c S c S (bJ )(x) when x ∈ E , since E = ( J 5J) (and if J ∩ R = ∅ and P ∈ (R)i then bbJ (P ) = 0). It follows that

Z Z   2 1/2 i,j j/2 X X 1R(x) S1 (bJ )(x)dx = 2 |bbJ (P )| dx. Ec Ec |R| R⊃J P ∈(R)i P ⊂J

Now since

Z   2 1/2 Z X X 1R(x) X X 1R(x) 2j/2 |b (P )| dx ≤ 2j/2 |b (P )| dx bJ bJ 1/2 Ec |R| Ec |R| R⊃J P ∈(R)i R⊃J P ∈(R)i P ⊂J P ⊂J

80 we get

Z Z i,j X i,j S1 (b)(x)dx ≤ S1 (bJ )(x)dx c c E J E Z X X X 1R(x) ≤ 2j/2 |b (P )| dx bJ 1/2 Ec |R| J R⊃J P ∈(R)i P ⊂J

j/2 X X X i/2 1/2 ≤ 2 2 |P | |bbJ (P )|

J R⊃J P ∈(R)i 2i|J|≥|R| P ⊂J Z (i+j)/2 X X X 1/2 ≤ 2 |bJ (x)||P | |hP (x)|dx

J R⊃J P ∈(R)i 2i|J|≥|R| P ⊂J

(i+j)/2 X X . 2 kbJ kL1 J R⊃J 2i|J|≥|R| using additionally the fact that P |P |1/2|h (x)| is bounded independent of i (due to the P ∈(R)i P disjointness of P ∈ (R)i). Now for each J there are no more than i dyadic intervals R such that R ⊃ J and 2i|J| ≥ |R|, and therefore we conclude

Z i,j (i+j)/2 X (i+j)/2 S1 (b)(x)dx . i2 kbJ kL1 . i2 , c E J which implies (3.14).

In summary, we have shown

α 1 1 |{Si,jf > λ}| 2i+j + + i2(i+j)/2 1 . λ αλ λ for arbitrary α > 0. Setting α = 2−(i+j)/2 yields

i,j (i+j)/2 1 |{S f > λ}| i2 kfk 1 , 1 . λ L completing the proof.

81 3.7.2 The Sparse Bound

i,j The weak bound for S1 allows us to mimic the sparse domination scheme from [43]. A simple computation shows that

 2 i,j 2 j X X ˆ kS1 fkL2(w) = 2 |f(P )| (w)R. (3.15) R∈D P ∈(R)i

We will estimate the term on the right by the norm of a sparse operator. We assume there are only finitely many Haar coefficients of f, so there is some large interval J that contains every interval contributing to the sum in (3.15). Fix large constants C1,C2 > 0 to be determined below, and begin by choosing maximal dyadic intervals L such that either

2 X  X  2j |fˆ(P )| > 2i+jC (|f|)2 (3.16) |R| 1 J R⊃L P ∈(R)i or

(w)L > C2(w)J . (3.17)

0 00 Let S1 denote the collection of maximal intervals from (3.16), and let S1 denote the collection 0 00 of maximal intervals from (3.17). The initial collections are S0 = {J} and S1 = S1 ∪ S1 . We have

 2  2 j X X ˆ j X X ˆ 2 |f(P )| (w)R = 2 |f(P )| (w)R

R P ∈(R)i R s.t. ∀L∈S1 P ∈(R)i R*L  2 j X X ˆ + 2 |f(P )| (w)R

R s.t. ∃L∈S1 P ∈(R)i R⊂L := A + B.

82 To estimate A we use the stopping conditions (3.16) and (3.17):

 2 j X X ˆ A = 2 |f(P )| (w)R

R s.t. ∀L∈S1 P ∈(R)i R*L  2 j X X ˆ ≤ 2 C2 |f(P )| (w)J

R s.t. ∀L∈S1 P ∈(R)i R*L 2 X  X  |J| ≤ C |fˆ(P )| 2j (w) 2 |R| J R s.t. ∀L∈S1 P ∈(R)i R*L

i+j 2 ≤ 2 C1C2|J|(|f|)J (w)J .

The term B may be handled by recursion, by decomposing it as a sum of operators of type (3.15) localized to each L. The same selection process is used at each iteration, with the same constants

C1,C2. It remains to check that the stopping intervals actually form a sparse collection if we choose C1 and C2 correctly, and also that we can choose C1,C2 independent of i, j.

i,j 2 i+j 2 We claim that all of the intervals L chosen in (3.16) are contained in {(S1 f) > 2 C1(|f|)J }. In fact,

 2 X X 1R(x)1L(x) (Si,jf(x))21 (x) ≥ 2j |fˆ(P )| > 2i+jC (|f|)2 · 1 (x) 1 L |R| 1 J L R⊃L P ∈(R)i

2 by selection, which proves the claim. It follows from Proposition 3.6 that we can choose C1 ∼ i P 1 such that 0 |L| ≤ |J|. For the intervals chosen in (3.17), we directly estimate the following L∈S1 4 sum:

Z X X −1 C2 |L| ≤ (w)J w(x)dx ≤ |J|, 00 00 L L∈S1 L∈S1

00 P 1 using the disjointness of the L ∈ S . Hence if C2 = 4 then 00 |L| ≤ |J|, and as a 1 L∈S1 4 consequence P |L| ≤ 1 |J|. Moreover, we can choose C and C independently of j and the L∈S1 2 1 2 dependence on i is only quadratic. The same choice of C1,C2 at each iteration guarantees that

83 the collection is sparse. We have proved the following:

Proposition 3.7. Suppose f has finitely many Haar coefficients and fix non-negative integers i, j. Then there exists a sparse collection S of dyadic intervals such that

i,j 2 2 i+j X 2 kS1 fkL2(w) . i 2 |J|(w)J (|f|)J . (3.18) J∈S

The implicit constant is independent of i, j and f, w.

2 Corollary 3.2. Suppose f ∈ L (w) with w ∈ A2 and fix non-negative integers i, j. Then

i,j 1/2 1/2 (i+j)/2 kS fk 2 [w] [w] i2 kfk 2 . 1 L (w) . A2 A∞ L (w)

Proof. We use a special case of the general argument outlined in [43]. Note that we have

i,j estimated kS1 fkL2(w) by the (scalar version of the) same sparse object appearing in that paper.

By plugging w−1/2f into the sparse bound (3.18), we get

i,j −1/2 2 2 i+j X −1/2 2 kS1 (w f)kL2(w) . i 2 |J|(w)J (|f|w )J . J∈S

2 Hence it will be enough to show that the sum on the right above is no more than C[w]A [w]A kfk 2 . 2 ∞ L (R) Let δ = 1 and r = 2(1 + δ). By H¨older’sinequality c[w]A∞

0 1 −1/2 2 r0 2/r −(1+δ) 1+δ (|f|w )J ≤ (|f| )J (w )J , hence the reverse H¨olderinequality yields

−1/2 2 r0 2/r0 −1 (|f|w )J . (|f| )J (w )J

84 (see [42]). It follows that

X −1/2 2 X r0 2/r0 −1 |J|(w)J (|f|w )J . |J|(|f| )J (w )J (w)J J∈S J∈S

X r0 2/r0 . [w]A2 |J|(|f| )J J∈S Z r0 2/r0 . [w]A2 M(|f| ) (x)dx, R

0 using the sparsity of the collection to get the integral over R in the last line. Note that r < 2, hence

0 0 kM r (f)k2 ((2/r0)0)2/r kfk2 L2(R) . L2(R)

0 0 0 0 0 2/r0 with (2/r ) the dual exponent to 2/r . Using the definition of r we see that ((2/r ) ) . [w]A∞ , and as a consequence we can conclude that

X |J|(w) (|f|w−1/2)2 [w] [w] kfk2 J J . A2 A∞ L2(R) J∈S as desired.

Remark. It is now straightforward to prove weighted estimates for bi-parameter dyadic shifts and the type of Journ´eoperators considered in Corollary 3.1, although these estimates are far from sharp. In particular, by using Corollary 3.2 and the iteration argument from

i,j Section 3 in [39], one can show that the bi-parameter shifted square function S satisfies the A2

i,j 4 bound kS fk 2 [w] [w] (the extra powers come from passing through a martingale L (w) . A2 A∞ transform). Then argue as in the proof of Theorem 3.3, with the Si,j replacing the simpler square functions S. Note that the dependence on [w]A2 would be improved if we could prove a sharper weighted estimate for Si,j.

85 Appendix: The Weighted Strong Maximal Function

Here we prove Proposition 3.4. As in [31], the idea is to bootstrap the boundedness of the maximal function in dimension one with the help of the A∞ property of the weight. We follow R. Fefferman’s argument and also use some recent ‘weighted Solyanik estimates’ due to P.

Hagelstein and I. Parissis [75], which rely on the sharp reverse H¨olderestimates in [42].

Fix 1 < p < ∞. By the covering-lemma argument in [20], it is enough to show that if

R1,R2, ... is a sequence of rectangles with sides parallel to the axes, then there is a subcollection

{Rej} of {Rj} such that Z Z w(x)dx ≥ C1 w(x)dx (3.19) S S Rej Rj and 1/p0 X  Z  k 1 k p0 ≤ C2 w(x)dx , (3.20) Rej L (w) S j Rj where p0 is the dual exponent to p. In this case one has

kfk p w({M wf > α})1/p ≤ C−1C L (w) , 1 2 α

w −1 so that kM kLp(w)→Lp,∞(w) ≤ C1 C2.

By monotone convergence we may assume the initial sequence {Rj} is finite. We choose the subcollection {Rej} using the C´ordoba-Fefferman selection algorithm from [20]. Assume the rectangles {Rj} have been ordered with decreasing sidelengths in the x2 direction, and take

Re1 = R1. Proceeding inductively, let Rej be the first Rk occurring after Rej−1 so that

[ ∗ |Rk ∩ Rel | < (1 − )|Rk|, l

86 where  ∈ (0, e−c[w]A∞ ). Then arguing as in the proof of Cor. 5.3 from [75], we conclude that

−1 [ (c[w]A ) [ w( Rj) ≤ (1 + c ∞ )w( Rek) j k [ ≤ A · w( Rek), k with A independent of w (we’ve used the assumed upper bound on ). Therefore we can take

C1 independent of w in (3.19).

Now take a pointx ¯ = (α, β) inside a rectangle Rk which does not occur among the Rej. Let

{Ti} denote the intervals obtained by slicing the two-dimensional rectangles {Ri} with a line perpendicular to the x2 axis at height given by β (the x2-coordinate ofx ¯). Given any rectangle

∗ R = I × J, write R = I × 3J. We claim that for each such Ri ∼ Ti × Ji we must have

[ ∗ |Ti ∩ Tej | ≥ (1 − )|Ti|, (3.21)

∗ ∗ with Tej the slices corresponding to Rej . This follows from the assumption about decreasing ∗ sidelengths. In fact, we may assume all Tej appearing in the union correspond to Rej that intersect Ri and were chosen before Ri relative to the initial order (the full union is only larger). By the selection criterion

[ ∗ |Ri ∩ ( Rej )| ≥ (1 − )|Ri| = (1 − )|Ti||Ji|.

But the sidelengths of the Rej parallel to the x2 axis are longer than Ji, and in particular their S ∗ S ∗ three-fold dilates contain Ji. It follows that Ri ∩ ( Rej ) = (Ti ∩ Tej ) × Ji. This implies (3.21), |R ∩(S R∗)| i ej S ∗ since we must have = |Ti ∩ T |. |Ji| ej

S ∗ Next, observe that if Ej = Tej − l . Since w ∈ Ap we

87 can conclude that !p p |Ej| w(Ej)  ≤ ≤ [w]Ap |Tej| w(Tej)

p uniformly in the free variable. Hence for any f ∈ L (wdx1) with kfkp ≤ 1 we have

Z Z Z X −p 1 1 (x1)f(x1)w(x)dx1 ≤  [w]A w(x)dx1 · f(x1)w(x)dx1 Tej p R w(x)dx1 j Ej Tej Tej Z −p w ≤  [w]Ap M1 (f)(x1)w(x)dx1 S Tej 0 Z !1/p −p w ≤  [w]Ap kM1 fkLp(wdx ) w(x)dx1 1 S Tej 0 Z !1/p −p  [w]A w(x)dx1 , . p S Tej using the fact that the weighted one-dimensional maximal operator is bounded independent of w. We also used the disjointness of the sets Ej to sum. After integrating in x2 it follows that

−p we can take C2 =  [w]Ap in (3.20).

Combining the above results yields

w −p kM kLp(w)→Lp,∞(w) .  [w]Ap

−c[w]A∞ w c[w]A∞ for any  ∈ (0, e ). Hence kM kLp(w)→Lp,∞(w) . [w]Ap e as claimed.

We do not know if there is an alternative approach the the boundedness of M w that yields a smaller dependence on [w]A∞ . Note, however, that Fefferman’s covering lemma is equivalent to the boundedness of M w on Lp(w), up to constants (see [20]).

88 Appendix: The Equivalence of Disjoint-Parts and Packing Con- ditions

We prove that Definitions 3.1 and 3.2 are equivalent for rectangles. This proof is contained in a much more general argument due to L. Dor [28]. We owe the presentation here to Jose Cond´e

Alonso, Yumeng Ou, and Jill Pipher. Note that it follows almost from the definition that that disjoint-parts condition implies the packing condition. The main theorem is then the following:

Theorem 3.3. Let R be a Λ-Carleson collection of axis-parallel rectangles. Then for each

−1 R ∈ R there exists a subset ER ⊂ R such that |ER| ≥ Λ |R| and such that the sets in the family {ER}R∈R are disjoint.

We state and prove this theorem for Lebesgue measure, but in fact it remains true for any measure µ that has no atoms. Note that even in this simplified setting the proof still requires the axiom of choice.

Proof. We can assume the collection R is finite as long as our estimates are independent of the

N number of rectangles, so suppose R = {Rj}j=1. Let Σ = σ(R) be the σ- generated by the L collection R. Then Σ is finite and has a finite number of atoms, which we denote by {Bi}i=1. S Finally we let Ω = j Rj. N We will study families A = {Aj}j=1 of sets such that

[ 0 Aj ⊂ Rj for all j, Ω = Aj,Aj ∩ Aj0 = ∅ if j 6= j . (3.22) j

Begin by assuming there exists a partition A as in (3.22) that maximizes the quantity

|Aj| cA = min . (3.23) 1≤j≤N |Rj|

−1 We will show that if A is such a maximizer then cA ≥ Λ , which implies the desired result. We then prove that maximizers actually exist.

89 We first claim that if A maximizes (3.23) then there must be two different indicies j0 and j1 such that

|Aj0 | |Aj1 | cA = = . (3.24) |Rj0 | |Rj1 |

Indeed, suppose this is not the case and (after relabeling)

|A1| |Aj| cA = < , 2 ≤ j ≤ N. |R1| |Rj|

If cA = 1 we have nothing to prove, and otherwise there must be some j with Aj intersecting

R1 in a set of positive measure. But then we can modify A1 → Af1 and Aj → Afj by removing some of Aj and adding it to A1 in such a way that

|Af1| |Af1| |Afj| > cA, < , |R1| |R1| |Rj|

|Af1| |Ak| < k 6= 1, j. |R1| |Rk|

This contradicts the assumption that A maximized cA, and so in our case (3.24) must hold.

Next, we define a set  |Aj| I = IA = j : = cA . |Rj|

Note that |I| ≥ 2. We assume our maximizer A has been chosen so that |IA| is as small as possible. In this case we see that if j ∈ I and k∈ / I then |Rj ∩ Ak| = 0, since otherwise we could modify A and A as in the last paragraph and obtain a maximizer A with |I | smaller. j k e Ae

In particular, since Aj ⊂ Rj we can conclude that

[ [ Rj = Aj, j∈I j∈I

90 which follows from the identity

[ [ [ Rj = (Rj ∩ Ak). j∈I j∈I k∈I

Then using the packing condition we have

1 X [ [ X X |R | ≤ | R | = | A | = |A | = c |R |, Λ j j j j A j j∈I j∈I j∈I j∈I j∈I as desired.

It remains to show that maximizers A actually exist. To each measurable Aj ⊂ Rj we

L associate a point vRj ∈ R whose components are

vRj (i) = |Aj ∩ Bi|, 1 ≤ i ≤ L.

Of course given such a point vRj there exist uncountably many subsets of Rj associated to vRj . L Note, however, that as long as vj ∈ R satisfies

0 ≤ vj(i) ≤ |Rj ∩ Bi|, 1 ≤ i ≤ L,

there exists some subset Avj ⊂ Rj associated to vj (since Lebesgue measure is non-atomic).

Since we are looking for ‘disjoint parts’ we will only consider families of N points v1, ..., vN

L in R such that

N X vj(i) = |Bi|, 1 ≤ i ≤ L, vj(i) ≥ 0, 1 ≤ i ≤ L, 1 ≤ j ≤ N. (3.25) j=1

N Given a collection of points {vj}j=1 satisfying (3.25) we claim that we may associate a N partition {Avj }j=1 satisfying (3.22). Indeed, suppose we have such a collection with 0 ≤ vj(i) ≤

91 i |Rj ∩ Bi|. Using (3.25) we split each Bi into disjoint pieces bj such that

i i |bj| = vj(i) and bj ⊂ Rj,

i with bj = ∅ if vj(i) = 0. We then set [ i Avj = bj i for each j, and note that these are disjoint unions. Clearly Avj ⊂ Rj, and moreover

[ [ i [ Avj = bj = Bi = Ω. j j,i i

This proves the claim. The collection {Avj } is non-unique, but this will not affect the argu- ment; we only need some collection with the given properties associated to each tuple of points

LN (v1, ..., vN ) ⊂ R . N Now assume (by the axiom of choice...) that we fix one collection {Avj }j=1 for each tuple LN of points (v1, ...vN ) satisfying (3.25). The collection of such tuples in R define a compact set LN M ⊂ R . We define the function f : M → R+ by

PL i=1 vj(i) f(v1, ..., vN ) = min , 1≤j≤N |Rj| and note that f is continuous. It follows that f attains a maximum on M and therefore there must exist a partition A as in (3.22) that maximizes (3.23). This completes the proof.

92 Chapter 4

A Counterexample for the Strong Maximal Function

In this chapter we continue to study analogues of sparse bounds in the multi-parameter setting.

We will focus on the bi-parameter analgoue of the dyadic maximal function, the dyadic strong maximal function

MSf(x) := suph|f|iR. (4.1) R3x

Here supremum is taken over all dyadic rectangles containing x with sides parallel to the axes.

The strong maximal function is one of the most important operators in the theory of multi- parameter singular integrals, and also perhaps the simplest to analyze since it is positive.

We note that axis-parallel rectangles are the natural geometric objects to consider if one wishes to develop some kind of sparse bound for (4.1), since MS will in general be large on axis-parallel rectangles of arbitrary eccentricity. It is then reasonable to ask if the following sparse estimate holds: X |hMSf, gi| . h|f|iRh|g|iR|R|, (4.2) R∈S where the collection S consists of dyadic axis-parallel rectangles. Recall that the direct analogue

93 of (4.2) for the dyadic Hardy-Littlewood maximal function is true (in fact this operator satisfies a stronger pointwise bound with respect to a sparse collection of dyadic cubes). It is hopeful to note that the strong maximal function is indeed dominated by sparse forms when restricted to a single point mass. Indeed, one can make sense of (4.1) when applied to finite positive measures, and then it is easy to see that 1 M (δ )(x, y) ≤ . (4.3) S 0 |x||y|

m This function can then be dominated by a sparse operator: if we define Im = [0, 2 ) and

m−1 m Jm = [2 , 2 ) then

1 X 1 = 1 (x, y) |x||y| |x||y| Jm×Jn m,n∈Z X 2−m−n1 ≤ 2 Jm×Jn (x, y) m,n∈Z X 1 = 4 hδ0iIm×In Im×In (x, y). m,n∈Z

1 Note that the collection S = {Im × In : m, n ∈ Z} is 4 -sparse since for every R = Im × In we can define E(R) = Jm × Jn.

This example is relevant because (approximate) point masses are extremal examples for the

1 weak-type behavior of MS and the Hardy-Littlewood maximal functions near the L endpoint. Moreover, we know from the one-parameter theory that there is a connection between sparse bounds and weak-type endpoint estimates; for example, a sparse bound of type (1.1) with

(r, s) = (1, 1) implies that T maps L1 into weak L1 (see the appendix in [24]). We also note that there is no immediate contradiction implied by a bound of type (4.2); indeed, an estimate of the type X |hT f, gi| . h|f|iRh|g|iR|R| R∈S implies that T maps L log L into weak L1, but known arguments do not imply that T maps

L1 into weak L1. The proof is similar to the argument in the appendix of [24], though for

94 completeness we include the details in an appendix at the end of the chapter.

The main result of this chapter answers the question raised above in the negative: there can be no domination by positive sparse forms of the type (4.2) for the strong maximal function.

Theorem 4.1. For every C > 0 and 0 < η < 1 there exist a pair of compactly supported integrable functions f and g such that

X |hMSf, gi| ≥ C h|f|iRh|g|iR|R|, R∈S for all η-sparse collections S of dyadic rectangles with sides parallel to the axes.

The proof of Theorem 4.1 is based on the construction of pairs of extremal functions for which the sparse bound cannot hold. These extremal examples take advantage of the behavior of MS when applied to sums of several point masses. When we apply MS to a single point mass we get level sets that look like dyadic stairs, and we will describe a way to place many point masses sufficiently far from each other in such a way that these stairs are packed too tightly to be sparse. Since one needs to ‘cover’ these stairs in order to control hMSf, gi, it will follow that no sparse form is large enough to bound hMSf, gi for all f, g. In particular, we will construct a set of points P which are maximally separated with respect to a quantity that captures natural bi-parameter geometry of the problem. Our first extremal function f is then the sum of point masses at the points in P. As we said above, this will guarantee that MSf is uniformly large on the unit square. We complement the construction of the set P with another set, Z, with the following structure: for each point p ∈ P there exist a large amount of points z ∈ Z that are both near p and far away from all the other points in P.

Moreover, the family of points in Z associated with a given p ∈ P are not clustered together, but are instead spread out over the boundary of a hyperbolic ball centered at p. Our second extremal function is then the normalized sum of point masses at the points in Z. We will show that the placement of the points in Z implies that if a form given by a family S dominates the pair hMSf, gi then the rectangles in S need to cover the portion of the stair centered at p that

95 contains each z, for all z ∈ Z. But then the resulting collection of rectangles cannot be sparse.

We remark that our result actually extends to the situation in which we take slightly larger averaged norms of f. In particular our methods allow us to push Theorem 4.1 to the case of

Orlicz φ(L)-norms with φ(x) = x log(x)α for α < 1/2. We carry out this extension in Theorem

4.6.

We have learned that our construction of P is closely connected to discrepancy theory. In particular it is a special construction of a low-discrepancy sequence. Our results show that these kind of sets are particularly well-suited to the study of the strong maximal function (see also [7] for more connections between discrepancy theory and bi-parameter harmonic analysis). It would be interesting to see if tools related to discrepancy theory can be used to analyze other typoes of maximal operators. In particular, Zygmund-type maximal operators (defined like the strong maximal function, but with rectangles with fewer “parameters” than the ambient dimension) could be amenable to this connection.

The rest of the chapter is devoted to the proof of Theorem 4.1 and some extensions. In

Section 4.1 we prove lower bounds for the form associated to MS and also upper bounds for the sparse form 4.2 when f, g are the approximate uniform measures described above. We assume the existence of the key sets of points P and Z, which are needed to construct f and g. We then construct the sets P and Z in detail in Section 4.2, completing the proof of Theorem

4.1. Finally, in Section 4.3 we show how our results extend to higher dimensions by a tensor product argument. We also show that the strong maximal function is in a sense supercritical for L1-sparse forms, which allows us to slightly strengthen our result.

4.1 Lower bounds for MS and upper bounds for sparse forms

We start by introducing a quantity which is intimately related with the geometry of the bi- parameter setting: for every pair of points p, q in [0, 1)2 we set

1/2 distH(p, q) = inf{|R| : R ∈ D × D is a dyadic rectangle containing p and q}.

96 One can also define a distance between two (dyadic) rectangles:

distH(R1,R2) := inf distH(p, q). p∈R1,q∈R2

Note that distH is not a distance function, and not even a quasidistance, as the is completely false in general. We will, however, refer to it as the distance between two points.

Using this language, the (dyadic) strong maximal function of a point mass (4.3) can now be written as 1 MS(δ0)(p) = 2 . distH(p, 0)

We are going to study the precise behavior of MS when applied to uniform probability measures concentrated on certain point sets: given a finite set of points F let

1 X µ = µ = δ . (4.4) F #(F) p p∈F

If the points in F are ‘uniformly’ spread out, then at large enough scales one expects the averages

µ(R) #(R ∩ F) hµi = = R |R| #F|R| to be relatively small. At smaller scales we will be able to exploit some of the key properties of the collections P, Z described in the following two theorems. We postpone their proofs to

Section 4.2.

Theorem 4.2. For every m ≥ 0 there exists a collection of points P ⊂ [0, 1)2 such that

−m distH(p, q) ≥ 2 , ∀p 6= q in P, (P.1) and

#(P) = 22m+1. (P.2)

Theorem 4.3. For every k  m and every 2−m-separated set P ⊂ [0, 1)2 there exists a set

97 Z ⊂ [0, 1)2 satisfying the following properties:

2m (Z.1)#( Z) & m2 .

(Z.2) For every z ∈ Z there exists exactly one point p(z) ∈ P such that

2 −2m−1 distH(p(z), z) < 2 .

(Z.3) For every z ∈ Z,

2 −2m−k distH(p(z), z) ∼ 2 .

(Z.4) For every dyadic rectangle R with |R| ≤ 2−2m−2 that intersects P we have

#(R ∩ Z) . k.

(Z.5) For every dyadic rectangle R with |R| ≥ 2−2m−1 we have

2m #(R ∩ Z) . 2 mk|R|.

The implied constants are independent of k and m.

Remark 4.1.1. A simple pigeonholing argument shows that if P is the collection from Theorem

4.2 and R is a dyadic rectangle contained in the unit cubte with |R| ≥ 2−2m−1, then #(R∩P) =

22m+1|R|. Indeed, if |R| = 2−2m−1 then we may tile the unit cube with R and 22m translates

Re. But no rectangle in this collection can contain more than one point since the area of each is too small; since there are 22m+1 rectangles in the tiling then every rectangle must contain exactly one point from P, and in particular R must contain exactly one point from P. Now if

|R| > 2−2m−1 the claim follows by dyadically sub-dividing R into 22m+1|R| distinct rectangles of area 2−2m−1.

Remark 4.1.2. Above, we deliberately omitted the precise dependence of m on k. It turns out

98 that any m ≥ k + C is admissible as one can check from the proofs in Section 4.2. This justifies our choices of m and k in the next subsections.

4.1.1 Lower bound for MS

For the rest of this section k  m will be some fixed large numbers, and (P, Z) will be the sets given by Theorems 4.2 and 4.3. Also, µ = µP and ν = νZ will always denote the associated uniform probability measures introduced in (4.4). The proofs of Theorems 4.3 and 4.3 in the next section show that µ and ν can be arbitrarily well-approximated by L1 functions, in particular because we will be able to choose the points from P and Z in arbitrarily small cubes. Because of this we will prove Theorem 4.1 with f and g replaced by the measures µ and ν, respectively, and the full result can then be recovered by the limiting argument found at the end of the section in Lemma 4.3.

Proposition 4.1. Under these conditions we have

k hMS(µ), νi & 2 . (4.5)

Proof. The positivity of MS makes the proof almost trivial since we do not need to care about possible interactions among the points in P. In particular for all z ∈ Z let p(z) be the point guaranteed by (Z.2) of Theorem 4.3. Then by ((Z.3)),

1 22m+k MS(µ)(z) ≥ 2 ∼ . #(P)distH(p(z), z) #(P)

Now by (P.2) of Theorem 4.2,

k MS(µ)(z) & 2 and the claim follows from the fact that ν is a uniform probability measure over the set Z.

99 4.1.2 Upper bound for sparse forms

We now prove upper bounds for sparse forms when acting on µ and ν. We recall from the last chapter the definition of Carleson sequences:

Definition 4.1. Let α be a non-negative function defined on all rectangles that is zero for all but a finite collection of rectangles. We say that α is Λ-Carleson if for all open sets Ω we have

X αR|R| ≤ Λ|Ω|, R⊆Ω where the sum is taken over all rectangles contained in Ω.

We call a collection of dyadic rectangles S a Λ-Carleson collection if the sequence

  1 if R ∈ S, αR =  0 otherwise is a Λ-Carleson sequence. The notions of sparse and Carleson collections are equivalent as we saw in the last chapter: a Λ-Carleson collection is Λ−1-sparse and vice-versa.

Proposition 4.2. For every Λ-Carleson collection S and all k and m we have

X  22k  hµi hνi |R| Λk 1 + . (4.6) R R . m R∈S

Proof. For any rectangle R let R be the intersection of R with the unit square [0, 1)2. We will show that  22k |R|2 hµi hνi k 1 + . (4.7) R R . m |R|

Assume first that R is large, i.e.: |R| ≥ 2−2m−1. Then by the 2−m-separation of P

1 #(P ∩ R) |R| hµi = = . R 22m+1 |R| |R|

100 Similarly, for ν we have by (Z.5):

|R| hνi k . R . |R|

Therefore, for large rectangles we have

|R|2 hµi hνi k . R R . |R|

If instead R is small, i.e.: |R| < 2−2m−1, then we have to be more careful. If

hµiRhνiR 6= 0 then R must contain at least one point p ∈ P. In fact, since R is sufficiently small, there exists exactly one p ∈ P ∩ R. Hence 1 hµiR . . 22m|R|

Since hνiR 6= 0 the rectangle R must contain at least one point from Z. This implies a lower bound on the size of R. To see this observe that if z is any point in R ∩ Z, then by the smallness

−2m−k of R and (Z.2) we must have p = p(z), and hence distH(p, z) ∼ 2 by (Z.3), therefore

−2m−k |R| & 2 .

Now, by (Z.4) we have

#(R ∩ Z) . k, so k2k hµi 2k and hνi , R . R . m from which inequality (4.7) follows.

101 Now we can finish the proof by splitting S into families Sj as follows:

−j−1 −j Sj = {R ∈ S : 2 |R| ≤ |R| < 2 |R|}.

Note that the rectangles in Sj are contained in

2 1 −j Ωj = {p ∈ R : MS( [0,1)2 ) & 2 }, which, by the weak-type boundedness of the strong maximal function, satisfies

j |Ωj| . j2 . (4.8)

Then by estimate (4.7), the Carleson condition, and (4.8),

∞ X X X hµiRhνiR|R| = hµiRhνiR|R|

R∈S j=0 R∈Sj ∞  22k  X X k 1 + 2−2j |R| . m j=0 R∈Sj ∞  22k  X ≤ k 1 + 2−2jΛ|Ω | m j j=0 ∞  22k  X Λk 1 + j2−j . m j=0  22k  Λk 1 + . . m

Remark 4.1.3. An examination of the proof above shows that we do not use the full power of the Carleson condition. In particular, the collection S can be assumed to satisfy the Λ-Carleson packing condition with respect to the unit cube and the level sets Ωj, but it does not need to be Λ-Carleson with respect to any other open sets (and in particular need not be Λ-Carleson at

102 small scales).

We can now formally conclude the proof of Theorem 4.1 conditionally on Theorems 4.2 and

4.3.

Proof of Theorem 4.1. Choose m = k22k. Proposition 4.1 implies that

k hMS(µ), νi & 2 , so Proposition 4.2 forces Λ → ∞ if we make k → ∞, which leads to a contradiction.

Finally we resolve the issue of approximating the measures µ, ν.

Lemma 4.3. Let µ, ν be the uniform measures described above. Then if there is no sparse bound of the form X hMSµ, νi ≤ C hµiRhνiR|R| R∈S there can be no sparse bound of the form

X hMSf, gi ≤ C hfiRhgiR|R| R∈S for all f, g ∈ L1.

Proof. Fix a sparse parameter η ∈ (0, 1) and fix C > 0. We construct µ, ν as described above for fixed m, k. From the proofs of Theorems 4.2 and 4.3 in the next section we see that we may vary points p ∈ P and z ∈ Z in a small neighborhood and still preserve the properties of the theorems.

For each p ∈ P and z ∈ Z we let Qn,p and Qn,z be descending sequences of dyadic cubes contained in these neighborhoods, with p ∈ Qn,p and z ∈ Qn,z. We then set

1 1 X Qn,p fn = #P |Qn,p| p∈P

103 and 1 1 X Qn,z gn = . #Z |Qn,z| z∈Z Then we have the uniform bound

X X hfniRhgniR|R| ≤ hµiRhνiR|R|. R∈S R∈S

On the other hand 1 X 1 Z hMsfn, gni = MSfn(y)dy, #Z |Qn,z| z∈Z Qn,z and from the construction of Z below we see that for any y ∈ Qn,z there exists p ∈ P with dist2 (y, x) ≥ 2−2m−k for any x ∈ Q . By arguing as in Proposition 4.1 we see that H n,z

22m+k hM f , g i ≥ c ∼ 2k s n n #P

(for each z test against the smallest rectangle containing both Qn,p and Qn,z). Therefore if we had an η-sparse form bound with constant C for all L1 functions, it would imply that

k X c2 ≤ C hµiRhνiR|R|. R∈S

Arguing as above shows that this forces C ≥ c02kη, and we may let k → ∞ to obtain a contradiction. This works for any η ∈ (0, 1) and so we are done.

4.2 Construction of the extremal sets

In this section, we will sometimes need to work with the projections of a rectangle R onto the axes. To that end, if R = I × J we denote

π1(R) := I, π2(R) := J.

104 For dyadic intervals I we let Ib denote the dyadic parent of I. We also let I(j) denote the j-fold dyadic dilation of I, so that I(j) is the dyadic interval containing I with |I| = 2j|I|.

4.2.1 Construction of P and the proof of Theorem 4.2

The following observation will be useful in the construction:

Lemma 4.4. Let R1 = I1 × J1,R2 = I2 × J2 be two dyadic rectangles such that I1 ∩ I2 =

J1 ∩ J2 = ∅. Then

distH(p1, p2) = distH(q1, q2), ∀p1, q1 ∈ R1, p2, q2 ∈ R2.

Proof. Given any p1 ∈ R1, p2 ∈ R2, it suffices to show that any dyadic rectangle R = I × J containing both p1 and p2 needs to contain R1 and R2. Since I ∩Ii 6= ∅, i = 1, 2, and I1 ∩I2 = ∅, one has I ⊃ Ii, i = 1, 2. The same holds for J: J ⊃ Ji for i = 1, 2.

Proof of theorem 4.2. We are going to show the following claim by induction: for every non-

2m+1 m m 2 negative integer m there exist 2 dyadic squares Q1 ,...,Q22m+1 in [0, 1) satisfying

m −2m−1 1. `(Qi ) = 2 for all i,

m m −m 2. distH(Qi ,Qj ) ≥ 2 for all i 6= j .

Assuming the claim, it is enough to take

  m m P = pQ : Q ∈ Q1 ,...,Q22m+1 ,

where pQ denotes the center of the cube Q —in fact, any point of Q would work—. Then, by Lemma 4.4 we immediately get (P.1) and (P.2) and we end the proof.

0 2 We turn to the proof of the claim. The case m = 0 is easy: it suffices to take Q1 = [0, 1/2) 0 2 and Q2 = [1/2, 1) . Assume now by induction that the theorem is true for m − 1. Scale the family obtained in that step by 1/2 in both dimensions and place a translated copy of the result

105 2 Q Q in each square Q ∈ D1([0, 1) ). Denote the cubes so constructed by P1 ,...,P22m−1 . By the dilation invariance of distH we have

Q −1 −2(m−1)−1 −2m `(Pi ) = 2 · 2 = 2 .

Q Therefore, it is enough to choose a first-generation child from each of these squares {Pi }i,Q in 2 0 1 0 1 such a way that (2) is satisfied. Write D1([0, 1) ) = {Q0,Q0,Q1,Q1}, where the children are 0 listed in the order of upper left, upper right, lower left, and lower right. We first consider Q0 1 Q 0 1 and Q1. For each Pi with Q ∈ {Q0,Q1} we choose an arbitrary first-generation child and call Q it Qi . By induction we have

Q Q Q Q −1 −(m−1) −m distH(Qi ,Qj ) ≥ distH(Pi ,Pj ) ≥ 2 · 2 = 2 for all i 6= j,

0 1 while the distance between any square in Q0 and any other square in Q1 must be exactly 1 according to Lemma 4.4.

Now we must choose the children from the squares in the other diagonal. This choice is more delicate since the squares from the first diagonal could be much closer. By the pigeonhole 0 1 Q 1 0 Q0 Q1 principle, for each P = Pi , Q ∈ {Q0,Q1}, there exist exactly two other squares, Pj and Pk , which are at distance at most 2−m. Indeed, according to Lemma 4.4, the distance condition 0 1 Q0 Q1 Q 0 forces Pj ,Pk to be in the same row (or column) as Pi , while each row (or column) of Q0 (or 1 Q Q1) contains exactly one square by induction hypothesis. In fact, the distance between Pi and 0 1 Q0 Q1 Pj is precisely equal to 0, and the same holds for Pk .

0 1 0 1 Q0 Q1 Q0 Q1 We now project Qj and Qk , the children chosen from Pj and Pk onto P . Note that Q their projection leaves exactly one first-generation child of P untouched, which we select as Qi . Q 0 −m It suffices to show that the distance from Qi to any square in Q0 is larger than 2 , since the 1 case of Q1 is symmetric and any other combination can be dealt with in the same way as above. 0 0 Q0 To see this, if the square in Q0 does not come from the Pj chosen above, the distance has to 0 −m Q0 be larger than 2 by Lemma 4.4. Otherwise, the square is exactly Qj . By the construction

106 Q above, both its horizontal projection and its vertical projection and the respective ones of Qi are disjoint, which allows one to apply Lemma 4.4 once more to conclude the proof.

4.2.2 Construction of Z

There are two aspects of the construction of Z that will require special care: each point in Z has to be close to exactly one point from P (but not too close), while two different points in Z cannot be too close to each other. We begin by constructing certain special rectangles inside which we shall place the points forming Z.

Definition 4.2. We say that a rectangle R is a standard rectangle if it belongs to the collection

E = {R ⊆ [0, 1)2 : R dyadic, |R| = 2−2m−2, and R ∩ P 6= ∅}.

If R ∈ E we let pR denote the unique point in R∩P. Also let Ep denote the collection of standard rectangles containing p ∈ P.

One can obtain Ep by area-preserving dilations of one fixed R ∈ Ep. Indeed, for any dyadic

R = I × J and any j ∈ Z define 0 0 DiljR = I × J ,

0 0 j −j where I × J is the unique dyadic rectangle in Ep of dimensions 2 |I| × 2 |J|. Then it is easy

2 to see that for each standard rectangle R, Ep = {DiljR ⊆ [0, 1) }. In particular, for any p ∈ P there are approximately m distinct standard rectangles in Ep. Given x, y ∈ R, let

δ(x, y) = inf{|Q| : Q ∈ D is a dyadic interval containing x and y}.

Also, for dyadic intervals K define

δ(x, K) := inf δ(x, y). y∈K

Now suppose R = I × J ∈ E, with pR = p = (x, y). Given k ≥ 1, we define I(k) to be the largest

107 dyadic interval I0 such that

δ(x, I0) = 2−k+1|I|,

and define J(k) similarly. Note that I(1) is the child of I that does not contain x, and for all k ≥ 1, I(k) is the unique dyadic offspring of I of generation k satisfying x∈ / I(k) and x ∈ Id(k)

(the dyadic parent of I(k)). The following result is the main technical lemma needed for our construction.

Lemma 4.5. Fix an integer k ≥ 1 with k  m. For any dyadic rectangle R = I × J ∈ Ep there

∗ exists a sub-rectangle R ⊂ I(1) × J(k) such that for all q 6= p in P:

∗ 2 −2m−1 distH(q, R ) ≥ 2 .

We will apply this lemma to construct the points Z used in the proof of the main theorem as

∗ S follows: given R = I × J ∈ E, we choose one point zR in R ⊂ I(1) × J(k) and let Z = R∈E zR. Note that this immediately guarantees properties (Z.1), (Z.2) and (Z.3) of Theorem 4.3. This placing also allows us to prove the remaining two properties. We start by proving directly (Z.4):

Lemma 4.6. For every R ∈ E we have

#(R ∩ Z) . k.

0 Proof. First observe that if zR0 ∈ R with R,R ∈ E then by Lemma 4.5 we must have pR0 = pR = p. So let ZR be defined by

ZR = {T ∈ Ep : zT ∈ R}.

It suffices to show that #(ZR) . k.

Observe that any T ∈ ZR must have π1(T ) ⊆ π1(R). Indeed, if π1(T ) ) π1(R), by definition zT ∈ π1(T )(1) × π2(T )(k) and one has π1(R) ⊂ π1(T )(1). But this is impossible since π1(T )(1) ⊆

108 π1(T ) \ π1(R). In addition, there must hold

k−1 |π2(T )| ≤ 2 |π2(R)|,

as the smallest interval containing both π2(pT ) and π2(zT ) is π2\(T )(k). The desired estimate then follows from the fact that there are O(k) such standard rectangles.

4.2.3 Proof of lemma 4.5 and (Z.5)

In this subsection, given two dyadic rectangles R1,R2 that are not contained in one another but which do intersect at some nontrivial set, we say that R2 intersects R1 horizontally if

π2(R2) ( π2(R1),

and similarly R2 intersects R1 vertically if

π1(R2) ( π1(R1).

One can see that these two options are mutually exclusive for R1 and R2. Also, one can see that if R2 intersects R1 horizontally then π1(R1) ( π1(R2), and if R2 intersects R1 vertically then

π2(R1) ( π2(R2). We start by recording the following information that will be used later:

Lemma 4.7. Let R = I × J ∈ E be such that I × J (`+1) ⊆ [0, 1)2, ` ≥ 1. Then

(`+1) I(`) × J

contains exactly one q ∈ P, q 6= pR, and furthermore

(`+1) (`) q ∈ I(`) × (J \ J ).

(`+1) −`+1 `+1 −2m Proof. We have that |Ic(`) × J | = 2 2 |I||J| = 2 , so there are exactly two points

109 (`+1) from P in Ic(`) × J . One of these points must be pR, so the other one is q. To prove the (`+1) (`) second part of the assertion, note that we cannot have q ∈ (Ic(`)\I(`)) × J or q ∈ I(`) × J , since in each case we would have pR and q contained in a dyadic rectangle of area smaller or equal than 2−2m−1, violating (P.1).

Lemma 4.8. Let R = I × J ∈ E. For any ` ≥ 1 with 2`+1|J| ≤ 1, there exists exactly one

−1 T = W × H ∈ E intersecting R vertically such that pT 6= pR, W ⊂ I(`), and |W | = 2 |I(`)|.

(`+1) Proof. By Lemma 4.7 the rectangle I(`) × J contains exactly one q ∈ P and q 6= pR. (`+1) (`) Moreover, we know that q ∈ I(`) × (J \J ). We define W as the child of I(`) that contains −2m−2 π1(q), and choose H so that π2(q) ∈ H and T = W × H has area 2 . In particular, we have H = J (`+1). By construction, T ∈ E.

We now claim that T is the unique standard rectangle intersecting R vertically with pT 6= pR,

1 0 0 0 W ⊂ I(`), and |W | = 2 |I(`)|. By contradiction, suppose there existed another T = W × H 0 0 satisfying the same properties. Then since W, W ⊂ I(`) with |W | = |W | we know that T and 0 0 0 T are disjoint and therefore pT 6= pT 0 . Since T is a standard rectangle and T,T both intersect R vertically, we also know that H0 = H = J (`+1). But this contradicts Lemma 4.7: that there

(`+1) can only be one q ∈ P in I(`) × J with q 6= pR, so pT = pT 0 , which is a contradiction.

Lemma 4.8 holds for rectangles intersecting R horizontally with exactly the same proof, which we omit for brevity. Assume now that the rectangle R = I × J has been fixed. Let

`,v  ER = T ∈ E : T intersects R vertically and π1(T ) ⊂ I(`) . and `,h  ER = T ∈ E : T intersects R horizontally and π2(T ) ⊂ J(`) .

`,v We will analyze the vertical rectangles ER ; essentially the same arguments apply to the hori- `,h zontal collection ER . `,v If S and T are in ER , we say that S  T if π1(S) ⊂ π1(T ) (an analogous order can be

110 `,v `,v,∗ defined in ER ). Denote by ER = {T1,T2,... } the set of maximal elements with respect to the ordering . Assume they are ordered so that the sequence ai := |π1(Ti)| is non-increasing.

Lemma 4.9. The sequence ai satisfies

−i ai = 2 |I(`)|, 1 ≤ i ≤ m − `.

−1 Proof. By Lemma 4.8 we find that T1 satisfies π1(T1) ⊂ I(`) and a1 = 2 |I(`)|. Moreover, we −2 know that there can be no other Tj with aj = a1, and so in particular a2 ≤ 2 |I(`)|.

We assume inductively that the desired result holds for T1,...,Ti−1, and aim to show that

−i ai = 2 |I(`)|. Note that {π1(T1), . . . , π1(Ti−1)} are pairwise disjoint. Then by induction

i−1 G π1(Ti) ⊂ I(`)\ π1(Tj), j=1

−1 −1 and we also have that aj = 2 aj−1 and a1 = 2 |I(`)|. Therefore, π1(Ti) must be contained in −i+1 some interval K with |K| = 2 |I(`)|. Moreover, by induction K and π1(Ti−1) must have the same dyadic parent.

We first show that ai < |K|. To see this, suppose by contradiction that π1(Ti) = K. Then ai = ai−1 and

π\1(Ti) = π1\(Ti−1).

Since both Ti−1 and Ti must also have the same height, their union is a dyadic rectangle of area

−2m−1 2 which contains both pTi−1 and pTi , which is impossible by (P.1).

−1 `,v,∗ Finally, we have to show that ai ≥ 2 ai−1 or, equivalently, that there exists T ∈ ER with |π (T )| = 2−1a . To that end, let T˜ ∈ E be the unique rectangle with π (T˜ ) = K. If 1 i−1 i−1 pTi−1 1 i−1 b 2 2 |π2(T˜i−1)| ≤ 1, one can apply Lemma 4.8 with ` = 1 to T˜i−1 to conclude that there is exactly ˜ one T ∈ E intersecting Ti−1 vertically with pT 6= pTi−1 and

−1 −i |π1(T )| = 2 |Kb(1)| = 2 |I(`)|.

111 2 Ti = T by maximality, and this closes the induction. If on the other hand 2 |π2(T˜i−1)| > 1, then we claim that i > m − ` and we have finished. Indeed, if Ti exists in this case, one has

−1 |π1(Ti)| < |K| = 2 |π1(T˜i−1)|.

Therefore,

2 |π2(Ti)| > 2|π2(T˜i−1)| ≥ 2 |π2(T˜i−1)| > 1,

2 which it is impossible for Ti to be contained in [0, 1) .

Proof of lemma 4.5. It is enough to show the following slightly stronger claim: for any pair of integers k, ` ≥ 1, any R = I × J ∈ E contains a non-empty rectangle E × F such that

[ 0 E × F ⊆ I(`) × J(k)\ R . R0∈E pR0 6=pR

0 `,v k,h Any R ∈ E which intersects I(`) × J(k) must be in one of the two collections ER , ER , when pR 6= pR0 . `,v,∗ We start by constructing E, so now we only need to take rectangles belonging to ER into −i account. From Lemma 4.9 we know that ai = |π1(Ti)| satisfies ai = 2 |I(`)|. It thus follows `,v,∗ that |π2(Ti)| must increase in size exponentially, and therefore the total number of Ti ∈ ER is bounded by some constant that depends on R (since all rectangles are contained in [0, 1)2).

Since there are only finitely many π1(Ti) ⊂ I(`), the estimate on their sizes from Lemma 4.9 implies that there must be some dyadic subinterval E ⊂ I(`) with

π1(Ti) ∩ E = ∅ for all i,

0 k,h,∗ as desired. The same procedure yields an interval F ⊂ J(k) which no rectangles T ∈ ER can intersect.

Finally, we turn to the proof of (Z.5), which is the content of the next lemma.

112 Lemma 4.10. For every dyadic rectangle R ⊂ [0, 1)2 with |R| ≥ 2−2m−1 we have

2m #(R ∩ Z) . 2 mk|R|.

Proof. First, we may assume that |R| = 2−2m−1 and show

#(R ∩ Z) . mk.

Indeed, if it is larger we can just write it as a disjoint union of dyadic rectangles of area 2−2m−1 and use the estimate for rectangles of that size. Since |R| = 2−2m−1 we know that it contains exactly one point p0 ∈ P, and from Lemma 4.6 we know R contains about k points z ∈ Z such that p(z) = p0

For every z ∈ R ∩ Z such that p(z) 6= p0, there exists T ∈ E for which zT = z, which we denote by Tz. Since both R and Tz are dyadic rectangles and |Tz| < |R| we must have

Z ∩ R = V t H t O,

where V consists of those points z ∈ Z ∩ R such that Tz intersects R vertically, H consists of those points z ∈ Z ∩ R such that Tz intersects R horizontally, and O is the collection of points z ∈ Z whose associated point p(z) = p0. We will only estimate the size of V, since the argument for H is similar.

`,v,∗ By the construction of Z and the previous arguments, for each z ∈ V there exists T ∈ ER containing z. By Lemma 4.6 it suffices to show that there are no more than some constant times m such rectangles.

Let R˜ = I˜× J˜ be the unique rectangle in E satisfying

pR ∈ R˜ ⊂ R and π1(R˜) = π1(R).

`,v,∗ Observe that any rectangle T ∈ ER intersecting R vertically and whose zT is inside R must

113 ˜ ˜ intersect (I)(1) × J vertically (since othwerwise zT would be too close to p0). According to Lemma 4.8 with ` = 1, the maximal rectangle with shortest height must have height at least

4|J˜| = |π2(R)| and the heights of these maximal rectangles increase exponentially. Therefore ˜ the number of maximal rectangles going through (I)(1) is at most

2m log2(2 |π2(R)|) ≤ 2m + log2(|π2(R)|) ≤ 2m.

The proof of Theorem 4.3 is complete.

4.3 Extensions and open questions

4.3.1 Failure of sparse bound for bi-parameter martingale transform

In this subsection, we extend our main theorem to the bi-parameter martingale transform, which is a 0-complexity dyadic shift that resembles the behavior of the bi-parameter Hilbert transform. We recall the definition of such operators from the last chapter: given a sequence

σ = {σR}R∈D×D satisfying |σR| ≤ 1, the corresponding martingale transform is defined as

X Tσ(f) := σRhf, hRihR. R∈D

We have the following lower bound result.

Theorem 4.4. Let measures µ, ν be the same as above. Then there exists a martingale transform

Tσ such that k |hTσ(µ), νi| & 2 .

Recalling the upper estimate of sparse forms obtained in Subsection 4.1.2, one immediately gets the following corollary:

114 Corollary 4.1. There cannot be any (1, 1) sparse domination for all bi-parameter martingale transforms.

2 Proof of Theorem 4.4. We first assume that σR = 0 unless R ⊂ [0, 1] . For any R = I × J ∈ E that gives rise to some point z ∈ Z we let TR := Ic(`) × Jd(k). This is the smallest rectangle that contains both z and p(z) ∈ P. We also denote the standard rectangle that gives rise to z ∈ Z by Rz. Now for any fixed z ∈ Z,

1 X X T (µ)(z) = σ h (p)h (z). σ #P R R R p∈P R⊂[0,1]2

Set σT = 0 unless T = TR for some R ∈ E. In particular, {TR}R∈E and {R}R∈E are one-to-one, and each TR contains exactly one pR ∈ P. Hence,

−2m X Tσ(µ)(z) ∼ 2 σTR hTR (pR)hTR (z). R∈E

Note that z cannot be contained in TR unless R = Rz according to Lemma 4.11 below, and therefore

T (µ)(z) ∼ 2−2mσ h (p )h (z) = 2−2mσ |T |−1ξ , σ TRz TRz Rz TRz TRz Rz TRz where ξ = ±1. Choosing σ = ξ , one has TRz TRz TRz

−1 −2m −1 −2m  −k+1  k Tσ(µ)(z) ∼ 2 |TRz | = 2 · 2 |Rz| ∼ 2 , and the desired estimate follows immediately.

In the proof above, we have used the key observation that even though R ∈ E can contain many points z ∈ Z with pR = pRz , TR can only contain one zR ∈ Z. This is justified by the following result.

Lemma 4.11. For any integers k, ` ≥ 1 and R = I × J ∈ E, let zR ∈ Z be the point chosen in

115 R as in Subsection 4.2.2, and TR = Ic(`) × Jd(k). Then

zT ∈/ TR, ∀R,T ∈ E,R 6= T.

0 0 0 Proof. Fix R = I × J ∈ E and denote z = zR. Given any other R = I × J ∈ E, our goal is to show zR0 ∈/ TR. Obviously, if pR0 6= pR, then by Lemma 4.5 zR0 is not even contained in R.

0 0(j) It thus suffices to assume pR0 = pR = p. Suppose |I| > |I |, i.e. I = I for some j ≥ 1, then

−j 0 0 0 0 |J| = 2 |J | and J ⊂ J . Since both Jd(k), Jd(k) contain π2(p), one has Jd(k) ) Jd(k). 0 We claim that π2(zR0 ) ∈/ Jd(k), hence zR0 ∈/ TR. Indeed, π2(zR0 ) ∈ J(k), and Jd(k) is either 0 0 contained in J(k) or its dyadic sibling. But it is impossible for Jd(k) to be contained in J(k) as it 0 would imply π2(p) ∈ J(k), which is an obvious contradiction. Therefore, π2(zR0 ) ∈/ Jd(k) and the proof for the case |I| > |I0| is complete.

The case |I| < |I0| can be treated symmetrically, as one has |J| < |J 0| and can show in the same way as above that π1(zR0 ) ∈/ Ic(`).

Remark 4.3.1. Using the same method, one can easily show that there exists a bi-parameter dyadic shift of any given complexity which cannot have a (1, 1) sparse bound. We omit the details.

4.3.2 Failure of sparse bound in higher dimensions

For n ≥ 1, define the n-parameter strong maximal function

Z 1 n Mn(f)(x) := sup |f(y)| dy, f : R → C, R3x |R| R where R is any n-dimensional dyadic rectangle. Clearly, MS equals M2. We have the following theorem.

Theorem 4.5. Let (Dn) denote the following statement: there exists C > 0 such that for all

n compactly supported integrable functions f, g on R , there exists a sparse collection of dyadic

116 rectangles S such that X |hMn(f), gi| ≤ C h|f|iRh|f|iR|R|. R∈S Then for all n ≥ 2, there holds

(Dn) =⇒ (Dn−1).

Proof. The desired result follows from a tensor product argument. For the sake of simplicity, we will only prove (D3) =⇒ (D2). Our goal is to show (D2): for any given functions f, g on

2 R , we would like to find a sparse dominating form for hM2(f), gi. Define fe = f ⊗ χ[0,1) and 3 ge = g ⊗ χ[0,1), both are compactly supported integrable functions on R . By the assumption 3 (D3), there exists a sparse collection Se of rectangles in R such that

X |hM (f), gi| ≤ C h|f|i h|g|i |R|, 3 e e e Re e Re e Re=R×J∈Se

2 where R ∈ R ,J ∈ R.

3 By definition, for all xe = (x, x3) ∈ R with x3 ∈ [0, 1),

1 Z M3(fe)(x) = sup |fe(y)| dy = M2(f)(x)M(1 )(x3) = M2(f)(x). e |R| · |J| e e [0,1) R×J3xe R×J

Therefore,

hM (f), gi 3 = hM (f), gi 2 . 3 e e R 2 R

3 2 It thus suffices to show that the sparse form in R can be dominated by a sparse form in R .

To see this, rewrite

X X h|f|i h|g|i |R| = α h|f|i h|g|i |R| e Re e Re e Re e Re e Re e Re=R×J∈Se Re=R×J⊂R3 where α = 1 if R ∈ R, and 0 otherwise. The sparsity of R thus means for some constant Λ Re e e e

117 there holds X α |R| ≤ Λ|Ω|, ∀ open set Ω ⊂ 3. Re e e e R Re⊂Ωe One can further simplify the expression

X α h|f|i h|g|i |R| Re e Re e Re e Re=R×J⊂R3 2 X X |[0, 1) ∩ J| = h|f|i h|g|i |R| α |J| · R R R×J |J| R⊂R2 J⊂R X =: h|f|iRh|g|iR|R|βR. R⊂R2

It suffices to show that {βR}R⊂R2 is a Carleson sequence. Indeed, decomposing ∞ X −2j X βR = 2 αR×J |J|, j=0 J⊂R |J∩[0,1)|∼2−j |J|

2 one has for all open set Ω ⊂ R that

∞ X X −2j X X βR|R| = 2 αR×J |R × J| R⊂Ω j=0 R⊂Ω J⊂R |J∩[0,1)|∼2−j |J| ∞ X −2j X ≤ 2 αR×J |R × J|, j=0 R×J⊂Ωe

−j 1 where Ωe := Ω × {x3 ∈ R : M(χ[0,1))(x3) > C2 }. By the weak L boundedness of M, one has j |Ωe| . 2 |Ω|, hence ∞ X X −j βR|R| . Λ 2 |Ω| . Λ|Ω|. R⊂Ω j=0

The proof is complete.

Similar results hold true for multi-parameter martingale transforms as well. One can define fe = f ⊗ h[0,1) and ge = g ⊗ h[0,1). Then one observes that |hTσ(h[0,1)), h[0,1)i| ≥ C for some martingale transform Tσ in the third variable and therefore can argue similarly as above. The

118 details are left to the reader.

4.3.3 Other sparse forms

As mentioned in the introduction, it is also interesting to consider the possibility of domination of the bi-parameter operators by bigger sparse forms, for instance those of type (φ, 1):

X h|f|iR,φh|g|iR|R|, R∈S where ( ! ) 1 Z f(x) hfiR,φ = inf λ > 0 : φ ≤ 1 dx. |R| R λ

Interesting norms include the cases φ(x) = x[log(e + x)]α, α > 0, and φ(x) = xp, p > 1. We can extend our main theorem to the following:

α 1 Theorem 4.6. Let φ(x) = x[log(e + x)] , 0 < α < 2 . Then there cannot be any (φ, 1)-sparse domination for the strong maximal function or the martingale transform Tσ.

Proof. Fix k  m. We need to first slightly modify the measure µ to get a function that lies in the correct normed space. Let ν be the same as before and define

1 1 X Qp f = , #P |Qp| p∈P

−2m where for each p ∈ P, Qp 3 p is the small cube of side-length ∼ 2 that is constructed in the proof of Theorem 4.2. By the exact same argument in Section 2, it is easy to check that the same lower bound for the bilinear forms associated to MS and the special martingale transform

Tσ still hold true, i.e. k k hMS(f), νi & 2 , hTσ(f), νi & 2 .

We shall now focus on showing the upper bound. For every Λ-Carleson collection S we will show

119 that X  22k  hfi hνi |R| Λkmα 1 + . (4.9) R,φ R . m R∈S

2k 1 If (4.9) holds true, then by taking m = 2 and by noting that α < 2 , one immediately sees that Λ blows up to infinity as k → ∞, which completes the proof.

We now prove (4.9). Compared to the estimate of hµiR in the proof of Theorem 4.2, it suffices to show that there is at most an extra factor of mα in the upper estimate of the new

2 −2m−1 average form hfiR,φ. Specifically, in the case |R| = |R ∩ [0, 1) | ≥ 2 , we would like to show that   α |R| α |R| hfiR,φ . · max m , log . (4.10) |R| |R|

Note that if this estimate holds true, then one can proceed as in the proof of Theorem 4.2 to P conclude that contribution to R∈S hfiR,φhνiR|R| from rectangles of this type is controlled by Λk(1 + mα). Similarly, in the case |R| < 2−2m−1, we will show that

mα hfiR,φ . , (4.11) 22m|R| which, combined with the estimates for large rectangles, completes the proof of (4.9).

To see why (4.10) holds, let λ > 0 and one has

 1 (x)  Z   Z P Qq 1 f(x) 1 X q∈P |Qq| φ dx = φ   dx |R| R λ |R| R∩Q λ · #P p∈P∩R p 1  1  = · #{P ∩ R} · |R ∩ Qp| · φ |R| λ · #P · |Qp| |R| 22m  ≤ · 2−2m · φ . |R| λ

 h iα Take λ = |R| · max mα, log |R|  , it suffices to show that the expression above with this λ |R| |R|

120 22m value is . 1. Indeed, note that e  λ , so

α h |R|  i  2m  2m + log − α log max m, log(|R|/|R|) |R| −2m 2 |R| · 2 · φ .  h iα . 1. |R| λ max mα, log |R|  |R|

To see (4.11), again let λ > 0 and one can calculate similarly as above to obtain

1 Z f(x) 22m  1  22m α φ dx ≤ 2−2mφ = log e +  . |R| R λ λ λ λ

α Take λ = m , then it suffices to show that the expression above with this λ value is 1. 22m|R| . −2m−k 22m Indeed, the lower bound |R| & 2 guarantees that λ  e, hence

2m α 2m 1  2  2 |R| α log e +  4m + log |R| − α log m 22m|R| 1. λ λ . mα . .

The argument in the theorem above is not strong enough to produce a contradiction when

1 α ≥ 2 . In other words, it could be possible that the bi-parameter operators that are considered 1 have a (φ, 1)-type sparse bound with some α ≥ 2 . Moreover, it is easy to check that a very similar argument as in Theorem 4.6 barely fails for concluding a contradiction when φ(x) = xp, p > 1. We think that positive or negative results for those types of bounds would be very interesting.

Appendix: Endpoint Properties of the Sparse Form

We will explain why the sparse-form bound considered in this chapter would not obviously

1 contradict well-known mapping properties of MS near L .

Recall that if T is an operator such that for all bounded, compactly supported f, g there is

121 a sparse collection of cubes S such that

X |hT f, gi| ≤ C hfiQhgiQ|Q| Q∈S then T must map L1 into L1,∞. The argument proceeds by using the standard duality result for the quasi-Banach space L1,∞. Indeed, in order to show that T f ∈ L1,∞ it suffices to show that

n 0 0 that for every set G ⊂ R with 0 < |G| < ∞ there exists G ⊂ G with |G| ≤ 2|G | such that if g = 1G0 then

|hT f, gi| . 1 (4.12)

(see for example Section 2.4 in [60]). In particular one applies this estimate to the level sets

G = {T f > µ}, Ge = {−T f > µ}, so g above is usually taken to be the indicator function of a significant subset of some level set of T f. One can use the sparse bound to prove (4.12) after picking G0 ⊂ G to be a certain subset of G where Mf ≤ c|G|−1 (here M is Hardy-Littlewood maximal function); see the appendix in

[19] for details.

We briefly indicate why the argument outlined above does not extend to the bi-parameter 1 −1 setting. We let f = Q0 with Q0 the unit cube, and note that if G = {MSf > N } then |G| ∼ N log N (indeed the level set is essentially a ‘dyadic staircase’ which contains ∼ log N rectangles containing Q0, each of area N). But similar reasoning shows that

−1 |{MSf > c|G| }| ∼ (N log N) log(N log N)  |G|,

0 0 −1 and so it will not be possible to extract a subset G with |G | ∼ |G| such that MSf ≤ c|G| in G0. This step is a crucial part of the proof in [19] that a (one-parameter) sparse bound implies a weak (1,1) mapping, and this argument will not extend to the bi-parameter setting.

122 Part II

Semiperiodic Strichartz Etimates

123

Chapter 5

n d Strichartz Estimates on R and T

In this part of the thesis we discuss some recent results related to Strichartz-type estimates for

n d solutions to the linear Schr¨odingerequation on product manifolds of the form R × T , where d T is a d-dimensional torus (the ‘semiperiodic’ Schr¨odingerequation). We also include a few applications at the end of the next chapter.

We begin with a brief overview of some important background material, in particular re- lated to the linear Schr¨odingerequation in the Euclidean and periodic cases. This summary of

Strichartz estimates in the Euclidean and periodic setting is far from complete, though we hope that it gives some motivation for the main results and arguments in the following chapter.

5.0.1 Strichartz Estimates on Euclidean Space

n If u is a solution to the linear Schr¨odingerequation on R

  −i∂tu + ∆u = 0, (5.1)  u(x, t) = u0

125 then by taking the Fourier transform and solving the resulting ODE, we see that solutions may be represented by the integral equation

Z 2πi(x·ξ+t|ξ|2) u(x, t) = uc0(ξ)e dξ, Rn at least if u0 is sufficiently smooth. We may therefore interpret solutions to (5.1) as the space- time Fourier extension of the measure

2 dµ = uc0(ξ)δ(τ − |ξ| )dξdτ,

n n 2 n+1 which is supported on the paraboloid P = {(ξ, τ) ∈ R × R : τ = |ξ| } in R . In particular we have Z u(t, x) = e2πi(x·ξ+tτ)dµ(ξ, τ). P n

This allows us to apply some deep results from Fourier restriction theory to analyze solutions to (5.1) (and also solutions to nonlinear versions of this equation). A fundamental result in this direction is the Stein-Tomas theorem, which in particular implies that if

S = {(ξ, τ) ∈ P n : |ξ| ≤ 1} then Z 2πi(x·ξ+tτ) e dµ(ξ, τ) Lp( n) ≤ Cku0kL2(Rn×R) (5.2) S R

2(n+2) 2(n+2) for p ≥ n (see, for example, [65]). Moreover, in the endpoint case p = n a parabolic rescaling (x, t) → (λx, λ2t) shows that (5.2) remains true with the same constant if we replace

S by any compact subset of P n. Then by a simple limiting argument we see that our solution u satisfies the following space-time estimate:

kuk 2(n+2) ≤ Cku0kL2(Rn). L n (Rn×R)

126 This is the simplest example of a Strichartz estimate. It turns out to be one of many possible space-time estimates one can prove for u:

it∆ Proposition 5.1 (Euclidean Strichartz estimates). Let u = e u0 be a solution to (5.1). Sup-

2 n n pose q + p = 2 . Then it∆ ke u0k q p ku0k 2 n . (5.3) Lt Lx . L (R )

n The mixed norm is taken on the whole space Rt ×Rx, and so is in particular global in time. The original proof of the case q = p due to Strichartz can be found in [82], and a proof of the more general case can be found in [84]. By considering rescaled functions u0(x) → u0(λx) it is easy

2 n n to check that the relation q + p = 2 is necessary. A crucial element of the proof of (5.3) is the − n fact that solutions disperse in time like t 2 , which can be seen, for example, via the identity

Z 2 − n i|x−y| u(x, t) = ct 2 u0(y)e 4t dy Rn for u0 sufficiently nice (or from a stationary phase argument). Indeed, it is because of this exact 4p decay in t that one has q = n(p−2) in (5.3). Strichartz estimates are also important tools in the analysis of nonlinear Schr¨odingerequations of the form

r−1 iut + ∆u = ±|u| u

(see [84], for example).

5.0.2 Strichartz Estimates on the Torus

Over the last few decades there has been a wide range of research concerning Strichartz estimates

n of the type (5.3) for solutions to Schr¨odingerequations on manifolds other than R . A particular d example of interest is the periodic equation on the torus T .

d In the case where T is the standard torus one checks that solutions to the analogue of (5.1)

127 have the representation

X 2πi(x·m+t|m|2) u(x, t) = ame , (5.4) m∈Zd where

X 2πim·x u0(x) = ame . m∈Zd Since the solution is periodic in t it is impossible to prove analogues of (5.3) which are global in time, although one can hope to prove local variants of the form

kuk p d ≤ CI ku0k 2 d , Lx,t(T ×I) L (T ) where I is a bounded time interval. The study of these type of estimates dates back to work of

p Bourgain [70], though it is only very recently that the full range of (essentially sharp) local Lt,x estimates have been proved as a consequence of Bourgain and Demeter’s `2 decoupling theorem

[71]. Their result is the following:

it∆ d Proposition 5.2 ([71]). Let u(x, t) = e T u0 be a solution to the Schr¨odingerequation on a

d 2(d+2) (possibly irrational) torus T . Then if p ≥ d one has

1 it∆ d p ke T u0kLp(Td×I) . |I| ku0kHs(Td) (5.5)

d d+2 for any s > 2 − p .

d d+2 See also the work of Killip and Visan [81], which sharpens this estimate to s = 2 − p in the 2(d+2) non-endpoint case p > d . This dependence on s is essentially sharp, as can be seen by considering a Knapp-type example (see Proposition 6.5 in the next chapter).

We note that functions of the form (5.4) have Fourier support on lattice points of the form

(m, |m|2) ⊂ P d, although the Stein-Tomas theorem is no longer enough to prove all cases of

(5.5) (it is possible to prove partial results with a larger loss in s by using the Stein-Tomas theorem, however). Bourgain was also able to obtain certain cases of (5.5) in [70] by using

128 number-theoretic arugments, inspired in part by the proof of the Stein-Tomas theorem. There were also several other partial results by other authors in the subsequent years (see the references

2(d+2) 2 in [71], [81]). The only known proof of (5.5) in the full range p ≥ d requires the deep ` decoupling theorem of Bourgain and Demeter, which we will discuss further in the next chapter.

We end the section by pointing out a key difference from the Euclidean setting which will be relevant in the next chapter. If PN is a smooth Littlewood-Paley frequency cut-off to scale N ≥ 1, then (5.5) says that for any  > 0 one has

it∆ d  ke T PN u0k 2(d+2) ≤ CN kPN u0kL2(Td). L d (Td×[0,1])

This contrasts the Euclidean setting, where there is no loss of N  at the Stein-Tomas endpoint.

It turns out that in the periodic case this loss is necessary, as shown by Bourgain. Indeed, in the case d = 1, p = 6 Bourgain showed that if

it∆T ke PN fkL6(T×[0,1]) ≤ AkPN fkL2(T)

1 PN 2πimx then A ≥ c(log N) 6 . In particular this lower bound holds when f = m=1 e , as can be shown by restricting the L6 norm to ‘major arcs’ where x, t are well-approximated by certain rational numbers. See [70] for details. Bourgain’s argument also extends to higher dimensions

(see [83] for details in the case d = 2, p = 4).

129 130 Chapter 6

The Semiperiodic Case

n d d We now turn to the setting of product manifolds of the form R × T , where T is a (rational or irrational) d-dimensional torus. There has been recent interest in the behavior of solutions to the linear and nonlinear Schr¨odingerequation on these manifolds (see for example [72],[75],[76],[78],

[79], [85]). In particular, one can exploit dispersive effects coming from the Euclidean component

d of the manifold to obtain stronger asymptotic results than in the setting of T . Indeed, as a starting point one can hope to prove global-in-time Strichartz-type estimates for solutions to

n d d p the linear equation on R × T ([75]). This contrasts the situation on T , where no global Lt,x estimates are possible as we saw above. One can also hope to prove stronger results than in the

n d d more general setting of R × M , where M is a d-dimensional Riemannian manifold, since the presence of the torus facilitates Fourier-analytic and number-theoretic methods in the vein of

[70], [71].

The main object of study for the rest of the chapter is the linear semiperiodic Schr¨odinger equation   −i∂tu + ∆ n× d u = 0, R T (6.1)  n d u(x, y, 0) = f(x, y) x ∈ R , y ∈ T .

We assume that our initial data f is smooth and rapidly decaying, which we can do with no loss of generality as long as our estimates do not depend on the smoothness of f. For most

131 d of our arguments in the following sections we will also assume that T is the standard torus d d T ∼ [0, 1] . In this case solutions to (6.1) can be represented by

Z 2 2 it∆ n d X 2πi(x·ξ+y·m+t(|ξ| +|m| )) e R ×T f(x, y) = fcm(ξ)e dξ, n d R m∈Z where

X 2πiy·m f(x, y) = fm(x)e . m∈Zd

d Qd In the more general setting of an irrational torus T ∼ i=1[0, βi] we have a similar represen- 2 P 2 2 tation formula obtained by replacing |m| above by i βi mi . We will explain when necessary d Qd how to adapt our arguments to the case T ∼ i=1[0, βi]. We will begin by surveying some earlier results in lower dimensions.

6.1 Local Estimates on R × T

Since solutions to (6.1) are periodic in at least one direction, it is not immediately clear how one can generalize the classical Euclidean Strichartz estimates in the semiperiodic case. A starting point is to consider local variants of these estimates, as in the case of the torus. The first result in this direction was proved by Takaoka and Tzvetkov [83], who studied (6.1) and the cubic NLS

2 2 on R × T. This case is in some sense ‘intermediate’ between R and T , and it turns out that one can prove the direct analogue of (5.5) with no extra loss of derivatives. The main geometric

2 insight in their proof is the following lemma. Note that the corresponding result on Z (which 2 is dual to T ) fails, since classical lattice-point-counting arguments show that there must be a loss of an extra logarithmic factor (recall the logarithmic loss in Bourgain’s example from the end of the last chapter).

Lemma 6.1. Suppose A ≥ 1 and ω > 0. We have

X 2 2 |{(ξ, m) ∈ R × Z : ξ + m ∈ [A, A + ω]}| . ω, m∈Z

132 and likewise

X 2 2 |{(ξ, m) ∈ R × Z : ξ + (m + 1/2) ∈ [A, A + ω]}| . ω. m∈Z

2 2 2 2 Proof. Let Lm = |{(ξ, m): ξ + m ∈ [A, A + ω]}|. Note that if ξ + m ∈ [A, A + ω] then √ |m| ≤ A + ω, hence we need to estimate

X Lm. √ |m|≤ A+ω

By symmetry it suffices to consider the positive m values. √ If m ≤ A is fixed then

p 2 p 2 Lm = 2( A + ω − m − A − m ) 2ω = √ √ . A + ω − m2 + A − m2

In this case it follows that

√ b Ac √ X Z A ω Lm √ dx . 2 m=1 0 A − x Z 1 1 ω √ dz ω. . 2 . 0 1 − z √ √ Now if A < m ≤ A + ω we instead have

p 2 Lm = 2 A + ω − m

Therefore by bounding the sum by an integral and applying a simple change of variables we get

√ b A+ωc Z 1 X p 2 (A + ω)ω Lm . (A + ω) √ 1 + x dx . √ √ . √ √ A (A + ω) + A + ω A m=b Ac+1 A+ω

The last term is smaller than ω, so this completes the proof of the first part of the lemma. The

133 proof of the second part is essentially the same, since m is just shifted by a small constant factor.

An immediate consequence of this lemma is the following.

0 Corollary 6.1. Suppose c ∈ R and c ∈ {0, 1/2}, and let R > 0 be arbitrary. Then

Z X −π((η−c)2+(m−c0)2−R2)2 e dη . 1, m η with implicit constant independent of c, c0,R.

Proof. Decompose the summation and integration into dyadic shells where

2 0 2 2 k |(η − c) + (m − c ) − R | ∼ 2 , k ∈ N.

In the case R ≥ 1 we can directly apply Lemma 6.1 and then use exponential decay to sum over k. If 0 < R < 1 and |m| ≥ 2 then the exponential is just smaller than the previous case, so we can apply the same argument. Finally if m = −1, 0, 1 we use the simple fact that the integral in η is uniformly bounded with respect to m.

We now turn to the first example of a Strichartz estimate on R × T.

Theorem 6.1 (Takaoka-Tzvetkov [83]). One has

it∆R×T ke fbkL4(R×T×[0,1]) ≤ CkfkL2(R×Z).

The proof below is not quite the original argument used by Takaoka and Tzvetkov, although it is similar. In particular the argument relies on the key geometric estimate found in Lemma 6.1, which was also used in [83]. We learned of this argument from discussions with B. Pausader and

M. Christ.

2 −πt 4 it∆ × Proof. After inserting a Gaussian factor e to the Lx,y,t(R × T × [0, 1]) norm of e R T fb

134 it∆ 4 it∆ 4 and expanding |e R×T f| in terms of f we can estimate ke fk by the following b b L4(R×T×[0,1]) integral:

Z  Z 2 X 0 0 0 0 e−πt f(ξ, k)f(ξ0, k0)f(η, m)f(η0, m0)ei(x(ξ−η+ξ −η )+y(k−m+k −m )) ξ,ξ0 × × Rx Ty Rt k,m η,η0 k0,m0

2 2 0 2 0 2 2 2 0 2 0 2  eit(ξ −η +(ξ ) −(η ) +k −m +(k ) −(m ) )dξd~η~ dxdydt where ξ~ = (ξ, ξ0) and ~η = (η, η0). Interchanging the order of integration and computing the

Fourier transforms in the x, y, and t variables, this integral simplifies to

Z X 2 f(ξ, k)f(ξ0, k0)f(η, m)f(η0, m0)δ(k − m + k0 − m0)δ(ξ − η + ξ0 − η0)e−πQ dξd~η,~ ξ,ξ0 k,k0 η,η0 m,m0 where

Q = ξ2 − η2 + (ξ0)2 − (η0)2 + k2 − m2 + (k0)2 − (m0)2 and we interpret the continuous-frequency Dirac mass as

0 0 1 δ(ξ − η + ξ − η ) = lim 1|ξ−η+ξ0−η0|<. →0 2

We generalize a little and note that it will be enough to show that the quadrilinear form

Z X 0 0 0 0 0 0 0 0 0 0 0 −πQ2 f1(ξ, k)f (ξ , k )f (η, m)f (η , m )δ(k − m + k − m )δ(ξ − η + ξ − η )e dξd~η~ (6.2) ξ,ξ0 1 2 2 k,k0 η,η0 m,m0

0 0 is bounded by kf1kL2 kf1kL2 kf2kL2 kf2kL2 .

Assume first that the variables ξ, k, ξ0, k0 are held fixed. Because of the relations given by the Dirac masses we may write η0 = ξ − η + ξ0 + O() and m0 = k − m + k0. Taking the limit as

135  → 0 shows that we can factor

−ξ2 + η2 − (ξ0)2 + (η0)2 = η2 − ξ2 − (ξ0)2 + (ξ − η + ξ0)2

= 2[(η − c)2 − r2], where c = (ξ + ξ0)/2 and r = (ξ − ξ0)/2 are independent of η, η0. We can similarly factor k2 − m2 + (k0)2 − (m0)2. Using these reductions we see that (6.2) is equal to

Z X 0 0 0 0 0 ~ f1(ξ, k)f1(ξ , k )I(ξ, ξ , k, k )dξ, 0 k,k0 ξ,ξ where Z X 0 0 0 −π((η−c)2+(m−c0)2−R2)2 I = f2(η, m)f2(ξ − η + ξ , k − m + k )e dη m η with c, c0 and R independent of m and η. In particular

Z 0 X −π((η−c)2+(m−c0)2−R2)2 I ≤ kf2k∞kf2k∞ · sup e dη. 0 ξ,ξ m η k,k0

The supremum is uniformly bounded by Corollary 6.1, and therefore

0 0 (6.2) ≤ kf1kL1 kf1kL1 kf2kL∞ kf2kL∞ .

But this argument was symmetric in the choice of fixed variables, and so the same reasoning also shows that

0 0 (6.2) ≤ kf2kL1 kf2kL1 kf1kL∞ kf1kL∞ .

Interpolating between these two bounds yields the desired L2 × L2 × L2 × L2 estimate.

136 Note that this argument proves the slightly more general bound

1 1 it∆ × 2 2 ke R T fbkL4( × ×[0,1]) kfk p kfk 0 R T . L (R×Z) Lp (R×Z) for 1 ≤ p ≤ ∞, although it is not clear if this is useful.

6.2 Global Estimates on Rn × Td

n Since a solution u(x, y, t) to (6.1) behaves like a solution to the Schr¨odingerequation on R for each fixed y, we expect dispersion on the order of what we saw in the purely Euclidean case as t → ∞. We therefore expect to be able to find some global-in-time (and possibly mixed)

Lebesgue space that u lies in if the initial data is sufficiently nice. Indeed, by applying the triangle inequality to eliminate the periodic component it is easy to check that

it∆R×T ke fkL6(R×T×R) . kfkH1(R×T), (6.3) where p = q = 6 are the admissible exponents in dimension n = 1 from (5.3). Although this loss in derivatives is too large to be useful in practice, the estimate at least shows that if the initial data f = u(x, y, 0) is in some Sobolev space then one has a global space-time estimate for u; the question then becomes (i) what is the lowest regularity we can assume for u(x, y, 0) to obtain an estimate like (6.3), and (ii) for which types of (possibly mixed) Lebesgue spaces we can prove an estimate like (6.3). Of course we should be able to justify our answers to (i) and

(ii) with some applications that parallel the Euclidean case.

It is possible to improve the Hs exponent in (6.3) by using duality and Sobolev embedding, although this approach only works for a limited range of Lp exponents. The following result is due to Tzvetkov and Visciglia.

137 Proposition 6.2 ([85]). One has

it∆ × ke R T fk 6 2 kfk 2 , Lx,tLy . L (Rx×Ty) and hence by Sobolev embedding

it∆ × ke R T fkL6( × × ) kfk 1 . R T R . H 3

The proof proceeds by considering the dual form of the first estimate, which enables us to exploit

it∆ × it∆ it∆ 6 the fact that e R T f = e R (e T f) and then use the (dual) Euclidean Lt,x estimate. Note 1 that one cannot do better than H 3 , as we can see by arguing as in Proposition 6.5 below.

n d n+d n+d However, since R × T locally looks like R (or T ) one hopes to be able to prove ∗ 2(n+d+2) ∗ estimates near the Stein-Tomas endpoint p = n+d . On R × T we have p = 4, and we already saw that a sharp local-in-time estimate does hold for this value of p. On the other hand, since periodic solutions to the Schr¨odingerequation do not disperse in time like in the Euclidean case, we do not expect the periodic component of the manifold to help with global estimates.

q 4p We therefore expect the admissible Lt exponents to be dictated by the relation q = n(p−2) , as in the Euclidean case.

q p n d Unfortunately there does not appear to be any easy way to prove Lt Lx,y estimates on R ×T 2(n+2) for p < n , due to significant obstructions resulting from the periodic component of the solution.

The Hani-Pausader Estimate

Hani and Pausader approached the problem outlined and the end of the last section by changing the norm to include both a local and global component [75]. Their main Strichartz-type estimate is the following, which should be compared with Propositions 5.3 and 5.5:

3 5 it∆ 2 − ke R×T P u k q p 2 N 2 p ku k 2 2 (6.4) N 0 `γ L (R×T ×[γ,γ+1]) . 0 L (R×T )

138 whenever 2 1 1 p > 4 and + = . q p 2

This estimate was then used as a starting point for their analysis of the defocusing quintic

2 nonlinear Schr¨odingerequation on R × T . q p Notice that the value of q in (6.4) is exactly the Strichartz-admissible time endpoint for Lt Lx 3 3 estimates for the Schr¨odingerequation on R, while the loss in N is the same as on T or R (and is in fact the best one can hope for, see Proposition 6.5 below). However, from the theory on

3 3 10 R or T we expect to be able to push the exponent p down to values larger than 3 (or equal 10  to 3 with a possible loss of N ). The rest of the chapter is devoted to the proof that (6.4) is 10 10 indeed true for p > 3 , and also true for p = 3 with an arbitrarily small loss in the power of N. We also extend this result to higher (and lower) dimensions and prove the scale-

n d 2(n+d+2) analogue of (6.4) on R × T for p away from the Stein-Tomas endpoint n+d .

The Main Theorem

Our main theorem is the following.

d Theorem 6.2. Let T be a d-dimensional rational or irrational torus and ∆Rn×Td the Laplacian n d ∗ 2(n+d+2) ∗ 4p on R × T . Let p = n+d and fix p > p . Then if q = q(p) := n(p−2) and q > 2,

 1/q X it∆ n d q ke R ×T u0k ku0k s n d , (6.5) Lp(Rn×Td×[γ−1,γ+1]) . H (R ×T ) γ∈Z

n+d n+d+2 ∗ ∗ where s = 2 − p . Moreover, if p = p then the result holds with q = q(p ) for any s > 0 (with a constant that blows up as s → 0).

∗ 2(n+d+2) p Note that when p = p we have q(p) = n . Setting d = 0 or n = 0, we recover the usual Lt,x n d  Strichartz estimates on R and T , respectively (modulo a loss of N in the Euclidean case). Also q p note that q(p) is exactly the admissible q value corresponding to the Lt Lx Strichartz estimates n on R , which is expected since we heuristically have n directions contributing to dispersion.

139 Indeed, one cannot prove an estimate of type (6.5) for q < q(p) (see Proposition 6.4 below for a proof).

The proof of Theorem 6.2 is in Section 6.4, following a brief review of some preliminary material in Section 6.3. Our argument combines the approach of Hani and Pausader with the decoupling method of Bourgain and Demeter (see the next section for a precise statement of their decoupling theorem). We will initially prove (6.5) in the case u0 = P≤N u0 with an extra loss of N , but we show in Section 6.5 that this loss can be removed away from the Stein-Tomas endpoint p∗. In Section 6.6 we study some applications of Theorem 6.2 to the nonlinear theory

n d of Schr¨odingerequations on R × T . Our two main results in this direction are contained in the following theorem. Recall that the quintic NLS is the equation

4 −i∂tu + ∆Rn×Td u = ±|u| u, u(x, y, 0) = u0(x, y) and the cubic NLS is the equation

2 −i∂tu + ∆Rn×Td u = ±|u| u, u(x, y, 0) = u0(x, y),

1 2 1 and that these equations are H 2 critical on R×T and R ×T, respectively. Below X 2 is a Banach 1 1 ∞ 1 space of functions u : R → H 2 defined in Section 6.6, with the property that X 2 ,→ L (R,H 2 ).

1 Theorem 6.3. The quintic NLS on R×T with initial data u0 ∈ H 2 (R×T) is locally well-posed. 1 Moreover, there exists δ > 0 such that if ku0k 1 < δ then the solution u ∈ X 2 is unique, exists H 2 1 globally in time, and scatters as t → ±∞ in the sense that there are v± ∈ H 2 such that

it∆ × lim ku − e R T v±k 1 = 0. t→±∞ H 2

2 1 2 The cubic NLS on R × T with initial data u0 ∈ H 2 (R × T) is also locally well-posed. 1 Moreover, there exists δ > 0 such that if ku0k 1 < δ then the solution u ∈ X 2 is unique, exists H 2 globally in time, and scatters as t → ±∞

140 Once we have proved Theorem 6.2 we can prove Theorem 6.3 by using some standard ma- chinery which we review at the beginning of Section 6.6. In this setting it is essential to have a global estimate of type (6.5) without any extra loss of N . In particular the result in Proposition

6.6 below, which has a shorter proof than Theorem 6.2, is not sufficient to establish Theorem 6.3.

We note that the local-in-time results in Theorem 6.3 do not require global Strichartz estimates; we will only use the full strength of Theorem 6.2 to establish the small-data global existence and scattering results.

Finally, in Section 6.7 we collect some additional remarks and related open problems.

Notation and Basic Assumptions

As is standard, we write A . B if there is some constant c > 0 depending only on the dimension and various Lebesgue exponents such that A ≤ cB. If A . B and B . A we will write A ∼ B. Moreover, if A ≤ c(α)B where the constant c(α) depends on some parameter α we will write

 A .α B. We will also often write A . N B as short-hand for the expression ‘for all  > 0 there  is c such that A ≤ cN B,’ to avoid having to write expressions involving constant multiples of an arbitrarily small parameter  > 0.

Let S be a rectangle and let S−1 denote the dual rectangle centered at the origin obtained by inverting the side lengths. We let wS be a weight adapted to S in the following sense: wS(x)

−1 decays rapidly for x∈ / S, and wcS(ξ) is supported in a fixed dilate of S . We similarly define S P wS if S is a ball, and if Ω = S S then we let wΩ = S wS. Note that we can construct wS by taking a bump function w adapted to the unit ball such that

1 |w(x)| . (1 + |x|)1000(n+d) and then applying a suitable affine transformation. If c > 0 and S is a ball or rectangle, we also let cS denote the centered dilate by c.

For a dyadic integer N ≥ 1 we let P≤N f = f ∗ ϕN , where ϕN is a smooth function such that

ϕcN is supported in B2N (0). We also let PN denote a smooth Littlewood-Paley cut-off to the

141 annulus AN = {ξ : N/2 ≤ |ξ| < N}. Finally, we assume that all functions are smooth and rapidly decaying. We can do so with no loss of generality as long as our estimates are independent of the smoothness and decay parameters.

6.3 Preliminary Results

The `2 decoupling theorem of Bourgain and Demeter will be an important tool in our argument.

n+1 Theorem 6.4 ([71]). Let N ≥ 1 and suppose f is a smooth function on R such that fb is supported in an O(N −2) neighborhood of the truncated paraboloid

n n 2 P = {(ξ, τ) ∈ R × R : τ = |ξ| , |ξ| ≤ 1}.

−1 Let {θ} be a finitely-overlapping collection of N -caps covering the support of fb, let {ϕθ} be a partition of unity subordinate to this cover, and define fθ = f ∗ ϕˇθ. Then for any ball BN 2 of 2 n+1 radius N in R one has

1   2 +αp X 2 kfk p N kf k p L (wB ) . θ L (wB ) N2 N2 θ

2(n+2) n n+2 for p ≥ 2, where αp = 0 if 2 ≤ p ≤ n and αp = 2 − p otherwise.

Decoupling inequalities date back to work of Wolff [86], who proved an analogue of the above theorem for the cone (in a limited Lp range) and used this estimate to make progress on the local smoothing conjecture for the wave equation. There was partial progress on Theorem 6.4 in certain ranges before the proof of Bourgain and Demeter, and we send the reader to [71] for references. Theorem 6.4, which also implies Wolff’s result in the full range, is proved via a complicated induction-on-scales argument that incorporates some major results in contemporary harmonic analysis (in particular the multilinear restriction and Kakeya theorems play a key role).

We will also use the following discrete analogue of the Hardy-Littlewood-Sobolev inequality.

142 Proposition 6.3 (Discrete Hardy-Littlewood-Sobolev). Suppose 1 < p, q < ∞ and 0 < µ < 1 such that 1 1 + + µ = 2. p q

Then X f(j)g(k) kfk p kgk q . |j − k|µ . ` ` j6=k

Proof. We learned of the following proof from unpublished lecture notes by E.M. Stein. The argument is a modification of one of the possible proofs of the continuous Hardy-Littlewood-

Sobolev inequality.

We let λ = 1 − µ. By duality it suffices to show that the operator

X λ−1 Iλ(f)(n) = f(n − m)|m| m6=0 m∈Z satisfies a bound

kIλfk`q ≤ Ckfk`p

1 1 for q = p − λ. This will follow by H¨older’sinequality if we can prove the following pointwise bound: p 1− p d q q Iλ(f)(n) . M (f)(n) kfk`p , (6.6) where M df is the discrete Hardy-Littlewood maximal operator1

d 1 X M f(n) = sup |f(n − m)|,N(r) = {n ∈ Z : |n| < r}. r>0 N(r) |m|≤r

1 d p p Note that M is bounded from ` (Z) to ` (Z) for 1 < p ≤ ∞ and also weak-type (1,1); the weak-type bound can be proved by imitating the standard covering-lemma argument from the continuous case, and then the `p boundedness follows by interpolating with the trivial `∞ → `∞ bound.

143 To prove (6.6) we write

X λ−1 X λ−1 Iλf(n) = f(n − m)|m| + f(n − m)|m| . 0<|m|

For the first sum we have

∞ X λ−1 X X λ−1 f(n − m)|m| . f(n − m)|m| 0<|m|

λ d . R M f(n).

On the other hand, by H¨older’s inequality

0 1 X λ−1 X (λ−1)p  0 f(n − m)|m| ≤ kfk`p |m| p |m|≥R |m|≥R

p0 1 X −( +1) 0 = kfk`p |m| q p |m|≥R

− 1 . kfk`p R q ,

1 1 using the fact that q = p − λ. Now choose R such that

kfkp R = `q . M df(n)p

1 1 Using the fact that λ + q = p we conclude that for this choice

λ d − 1 Iλf(n) . R M f(n) + kfk`p R q p 1− p d q q . (M f(n)) kfk`p as desired.

144 On the Sharpness of Theorem 6.2

In this subsection we include two arguments demonstrating that one cannot improve on the `q exponent in (6.5), and in the non-endpoint case one cannot improve on the value of s.

4p Proposition 6.4. The estimate (6.5) fails for q < n(p−2) .

q p Proof. This is essentially a consequence of the sharpness of the Lt Lx Strichartz estimates on

n n it∆ n d it∆ n R . Indeed, suppose our initial data f is a function on R . Then e R ×T f(x, y) = e R f(x) and

it∆ n it∆ n R R ke fkLp(Rn×Td×[γ,γ+1]) = ke fkLp(Rn×[γ,γ+1]).

Now an application of Minkowski’s inequality shows that if q ≥ p then for any function F (x, t) one has p q 1   Z  Z γ+1  q  p  q X q kF k q p n ≤ |F (x, t)| dt dx . Lt Lx(R×R ) n γ γ∈Z R

Hence if there is some function Gγ(x) such that |F (x, t)| ∼ |Gγ(x)| for t ∈ [γ, γ + 1] then

kF k q p n kF k q p n . (6.7) Lt Lx(R×R ) . `γ L (R ×[γ,γ+1])

We choose f so that fb is a smooth function supported in a ball of radius one centered at the origin. A stationary phase argument (exploiting standard uncertainty principle heuristics)

it ∆ it ∆ shows that |e 1 f(x)| ∼ |e 2 f(x)| uniformly for t1, t2 ∈ [0, 1]. This property extends to

i(γ+t)∆ it∆ iγ∆ iγ∆ t1, t2 ∈ [γ, γ + 1] since e f = e (e f) and e f has the same Fourier support as f. But then by (6.7) we have

it∆ n it∆ n ke R fk q p n ke R fk q p n , Lt Lx(R×R ) . `γ L (R ×[γ,γ+1])

145 and therefore if Theorem 6.2 held for this choice of p, q we would obtain

it∆ n ke R fk q p n kfk 2 n . Lt Lx(R×R ) . L (R )

This estimate remains true for the rescaled functions fλ(x) = f(λx) when λ ≤ 1, since the

Fourier support of fλ remains inside the unit ball. But then a simple change of variables implies that

2 n n it∆ n + − ke R fk q p n λ q p 2 kfk 2 n , Lt Lx(R×R ) . L (R )

2 n n 4p and in particular we must have q + p ≤ 2 and hence q ≥ n(p−2) .

2(n+d+2) n+d n+d+2 Proposition 6.5. Suppose p ≥ n+d . Then (6.5) is false for s < 2 − p .

Proof. The example demonstrating the sharpness of s is essentially the Knapp counterexample, although there are some minor extra technicalities involved. The basic idea is that if our initial data is supported in a very small ball centered at the origin, then for a short time interval there is no difference between the Euclidean and semi-periodic cases; but since there are extremal

n+d examples for the equation on R that are highly localized in space (variants of the Knapp example), one cannot do any better than in the Euclidean setting.

d 1 1 d It suffices to prove the claim for q = ∞. We identify T ∼ [− 2 , 2 ] and choose a smooth and c c n+d positive function φN (x, y) = φ(Nx, Ny) supported in [− N , N ] such that φcN decays rapidly n+d outside the ball of radius N in R centered at the origin. Note that we can also view φN (x, y) n d as a function on R × T . Now let ϕN (y) = ϕ(Ny) be a smooth function that rapidly decays c c d outside [− 10N , 10N ] , with ϕcN supported in a ball of radius ∼ N centered at the origin in d 2 c c R . Also let χN 2 (t) = χ(N t) be a smooth function decaying outside [ 20N 2 , 10N 2 ] with Fourier support in an interval of length ∼ N 2. We have

it∆ n d it∆ n d |e R ×T φN (x, y)| ≥ c|e R ×T φN (x, y) · ϕN (y)χN 2 (t)|,

146 and by computing Fourier transforms we see that

Z it∆ n d X 2 2 2πi(x·ξ+y·η+tτ) e R ×T φN (x, y)·ϕ(y) = ϕN (η−m)φcN (ξ, m)χN 2 (τ−|ξ| −|m| )e dηdξdτ. n+d+1 c d R m (6.8)

After rescaling (ξ, η, τ) → (N −1ξ, N −1η, N −2τ), we see that (6.8) is equal to

Z X 2 ϕ(η − N −1m)φb(ξ, N −1m)χ(τ − |ξ|2 − N −2|m|2)e2πi((Nx)·ξ+(Ny)·η+(N t)·τ)dηdξdτ n+d+1 b b R m := E(Nx, Ny, N 2t).

The key point is that E(x, y, t) has space-time Fourier support in a ball of constant radius, independent of N. We have therefore shown that

1 1 Z N2 Z Z N2 Z it∆ n d p 2 p |e R ×T φ| & |E(Nx, Ny, N t)| dxdydt 1 n× d 1 Bn+d 2N2 R T 2N2 1 N Z 1 Z N −(n+d+2) |E(x, y, t)|pdxdydt. & 1 n+d 2 B1

p p(n+d) − 2 Now if the estimate (6.5) holds with constant A, then since kφkL2 ∼ N we must have

Z 1 Z p p(n+d) −(n+d+2) p A N 2 |E(x, y, t)| dxdydt. & 1 n+d 2 B1

Finally, since E has space-time Fourier support in a ball of bounded radius, one easily checks

n+d n+d+2 1 n+d 2 − p that |E| ≥ c when (t, x, y) ∈ [ 2 , 1] × B1 . Therefore A & N as claimed.

6.4 The Decoupling Argument

We begin by proving Theorem 6.2 with a loss of N .

147 ∗ 4p Proposition 6.6. For all p ≥ p and q ≥ n(p−2) with q > 2 we have

1/q   n+d n+d+2 X it∆ n d q + − ke R ×T P≤N fk  N 2 p kfk 2 n d . Lp(Rn×Td×[γ−1,γ+1]) . L (R ×T ) γ∈Z

We show in Section 6.5 that this loss of N  can be removed for p away from the endpoint.

Since our arguments use the `2 decoupling theorem to handle the torus component of the

d manifold, the proposition is true regardless of whether T is the standard torus or irrational torus. We include some more details in comments below.

6.4.1 Main Lemmas

In this subsection we collect some preliminary results. For the rest of the section we will write

∆ = ∆Rn×Td .

n d Lemma 6.7. Suppose g, gl are Schwartz functions on R × T with g = P≤N g, such that

Z X 2πi(x·ξ+y·l) g(x, y) = gl(ξ)e dξ. n b d [−N,N] l∈Z |l|≤N

n Cover [−N,N] by finitely-overlapping cubes Qk of side-length ∼ 1, let {φk} be a partition of unity adapted to the {Q }, and define g = e2πiy·mF −1(g φ ). Also let p∗ = 2(n+d+2) and k θm,k x bm k n+d p ≥ p∗. Then for any time interval I of length ∼ 1 we have

 1/2 it∆ + n+d − n+d+2 X it∆ 2 ke gk p n d N 2 p ke g w k , L (R ×T ×I) . θm,k I Lp(Rn×Td×R) m,k where wI is a bump function adapted to I.

The proof of Lemma 6.7 is similar to the proof of the discrete restriction theorem in [71]. One

n d n+d 2 approximates functions on R × T by functions on R , applies the ` decoupling theorem, and then takes limits. The details are in Section 6.4.3 below.

148 P 2πiny n d Lemma 6.8. Let h(x, y, t) = l hl(x, t)e be a Schwartz function on R × T × [−1, 1] such n that hl(x, t) 6= 0 for at most R values of l. Let ψ be a bump function on R supported in B1 and define Z X −1 2πi(x·ξ+y·l+(t+γ)(|ξ|2+|l|2)) Kγ(x, y, t) = ψ(N ξ)e dξ n d R l∈Z |l|≤cN for γ ∈ Z with |γ| ≥ 1. Then for all p > 2

p−2 −µ kK ∗ hk p n d R p |γ| khk 0 , (6.9) γ L (R ×T ×[−1,1]) . Lp (Rn×Td×[−1,1])

n(p−2) ∗ n where µ = 2p . In particular, if p = p we have µ = n+d+2 .

d Proof. Let S ⊂ Z be the set of l such that hl 6= 0. By examining the Fourier coefficients we see that Kγ ∗ h = Kγ,S ∗ h, where

Z X −1 2πi(x·ξ+y·l+(t+γ)(|ξ|2+|l|2)) Kγ,S = ψ(N ξ)e dξ. n l∈S R

−n/2 By a stationary phase argument and the triangle inequality we have kKγ,SkL∞ . R|γ| , and −n/2 therefore kKγ ∗ hkL∞ . R|γ| khkL1 . On the other hand, since t is in a bounded interval we can use Plancharel’s theorem to get kKγ,S ∗ hkL2 . khkL2 . Interpolating these estimates yields (p−2) p −µ kKγ,S ∗ hkLp . R |γ| khkLp0 , and this completes the proof since Kγ ∗ h = Kγ,S ∗ h.

Finally, we record the following local Strichartz estimate.

Proposition 6.9. For any bounded time interval I and p ≥ p∗ one has

n+d n+d+2 it∆ n d + − R ×T 2 p ke P≤N fkLp(Rn×Td×I) .I, N kfkL2(Rn×Td) and Z it∆ + n+d − n+d+2 e P h(x, y, t)dt N 2 p khk 0 . ≤N L2 ( n× d) .I, Lp ( n× d×I) I x,y R T x,y,t R T

149 Proof. We prove the first estimate since the second follows easily by duality.

Suppose f = P≤N f. By interpolating with p = ∞ (via Bernstein’s inequality) it suffices to prove the endpoint case p = p∗. By Lemma 6.7

 1/2 it∆  X it∆ 2 ke fk p n d N ke f · w k . L (R ×T ×I) .,I θm,k I Lp(Rn×Td×R) m,k

By Plancharel’s theorem it suffices to prove the desired estimate when f = Pθf and θ = θm,k. In this case we apply H¨older’sinequality in time to get

it∆ it∆ ke f · wI k p n d ke f k q p , θ L (R ×T ×R) . θ Lt Lx

2(n+d+2) q p n where q = n is the admissible time exponent for the Lt Lx Strichartz estimate on R . Applying this Strichartz bound completes the proof.

Remark 6.4.1. In the special case n = d = 1 we can instead apply the stronger local Strichartz estimate

it∆R×T ke fkL4(R×T×I) ≤ CI kfkL2(R×T) due to Takaoka and Tzvetkov [83]. This will simplify the -removal argument in Section 4 when n = d = 1.

6.4.2 Main Argument

We prove Proposition 6.6 in the case p = p∗. The the case p > p∗ is essentially the same (or

4p∗ 2(n+d+2) can be obtained by interpolation). Let q = n(p∗−2) = n and let wγ be a bump function adapted to [γ − 1, γ + 1]. We wish to show that

 1/q X it∆ q  ke P≤N f · wγk ∗  N kfk 2 n d . (6.10) Lp (Rn×Td×R) . L (R ×T ) γ∈Z

150 We will assume throughout the section that f = P≤N f. Let M(f) denote the mixed norm on the

 left-hand side of (6.10). In order to show that M(f) . N kfkL2 , we decouple the frequencies to reduce to the case where f has small Fourier support. Let fθm,k be defined as in Lemma 6.7. q Then by applying Lemma 6.7 for each γ and using Minkowski’s inequality in ` 2 we obtain

  q/21/q  X X it∆ 2 M(f)  N ke fθ · wγk p∗ n d . m,k L (R ×T ×R) γ m,k   2/q1/2  X X it∆ q  N ke fθ · wγk ∗ . m,k Lp (Rn×Td×R) m,k γ  1/2  X 2 = CN M(fθm,k ) . (6.11) m,k

To complete the proof, we claim that

M(f ) kf k 2 n d (6.12) θm,k . θm,k L (R ×T ) for each m, k. This will be enough, since by (6.11) and Plancharel’s theorem we then have

 1/2  1/2  X 2  X 2  2 M(f) . N M(fθm,k ) . N kfθm,k kL2 . N kfkL . m,k m,k

Now (6.12) shows that in order to prove (6.10) we can in fact assume that the Fourier transform

n d of f is supported in a cube of side length ∼ 1 in the frequency space R × Z .

Let Pθf denote a smooth frequency cut-off of f onto θ. Then by H¨older’sinequality in time one has

 1/q  1/q X it∆ q X it∆ q ke Pθf · wγk p∗ n d . ke Pθf · wγk q p∗ n d , L (R ×T ×R) Lt L (R ×T ) γ∈Z γ∈Z

151 2(n+d+2) ∗ P where q = n > p . It is straightforward to check that γ wγ . 1, and therefore

 1/q X it∆ q it∆ ∗ ke Pθf · wγk q p∗ n d . ke Pθfk q p . Lt L (R ×T ) Lt Lx,y γ∈Z

q p∗ But Pθf only has one non-zero Fourier coefficient, and therefore we can use the Euclidean Lt Lx Strichartz estimate to finally obtain

it∆ ke Pθfk q p∗ . kPθfkL2( n) . kPθfkL2( n× d), Lt Lx,y R R T as desired.

6.4.3 Proof of Decoupling Lemma 6.7

n+d n+d+2 n Recall αp = 2 − p . Let Bl ⊂ R be a fixed ball of radius N, and let wl be a smooth weight adapted to Bl × [−1, 1]. To prove the lemma it will suffice to show that

 1/2 it∆ +αp X it∆ 2 ke gk p d N ke g k p . (6.13) L (Bl×T ×[−1,1]) . θm,k L (wl) m,k

n d Then to prove the full estimate on R × T , we can choose a finitely-overlapping collection n of balls Bl of radius N that cover R , and then apply (6.13) in each Bl and use Minkowski’s inequality to sum:

it∆ p X it∆ p ke gk p n d ≤ ke gk p d L (R ×T ×[−1,1]) L (Bl×T ×[−1,1]) l  p/2 +αp X X it∆ 2 N ke g k p . θm,k L (wl) l m,k  p/2 +αp X it∆ 2 . N ke gθm,k kLp(w) , m,k

P where the wl are weights adapted to Bl × [−1, 1] and w = l wl. Since w ≤ Cw[−1,1] this will complete the proof of Lemma 6.7.

152 it∆ n+d Let u(x, y, t) = e g(x, y). We begin by rescaling u0 to have frequency support in [−1, 1] . Note that

Z X 2 2 u(N −1x, N −1y, N −2t) = N n g(Nξ, Nm)ei(x·ξ+y·m+t(|ξ| +|m| )) dξ, n b B1 −1 d d m∈N Z ∩B1

n d n d where B1 and B1 are [−1, 1] and [−1, 1] , respectively. We let f(ξ, m) = gb(Nξ, Nm), so if −1 d d n ΛN = N Z ∩ [−1, 1] then f is supported on B1 × ΛN . Below we will exploit the fact that f can be viewed as a function on n-dimensional cubes of size ∼ 1 that are ∼ N −1-separated

n+d in B1 (this follows from the support property of gb; we have one cube for each m). Let Ef denote the extension operator

Z X 2 2 Ef = f(ξ, m)ei(x·ξ+y·m+t(|ξ| +|m| )) dξ. Bn m∈ΛN 1

After applying a change of variables on the spatial side and using periodicity in the y variable, we see that

n −(n+d+2)/p kuk p d = N N kEfk p d 2 L (BN ×T ×[0,1]) L (BN2 ×NT ×[0,N ]) n(1− 1 )− d+2 − d = N p p N p kEfk p 2 d 2 . L (BN2 ×N T ×[0,N ])

∗ 2(n+d+2) In particular, for p = n+d we have

n−d d 2 − p∗ kuk p∗ d = N kEfk p∗ 2 d 2 (6.14) L (BN ×T ×[0,1]) L (BN2 ×N T ×[0,N ]) for any solution u with initial data u0 and f = uc0(Nξ, Nm). We also introduce the operator Ee on C∞([−1, 1]n+d) defined by

Z Z 2 2 i(x·ξ1+y·ξ2+t(|ξ1| +|ξ2| )) Ehe (x, y, t) = h(ξ1, ξ2)e dξ1dξ2 d n B1 B1

n+d n+d+1 (this is the usual extension operator associated to the paraboloid P in R ). Given a

153 n function f on [−1, 1] × ΛN , let

X 1 f δ(ξ , ξ ) = c δ−d1 f(ξ , m), δ < 1 2 d {|ξ2−m|≤δ} 1 N m∈ΛN where cd is a dimensional constant chosen for normalization. Then by Lebesgue differentiation and Fatou’s lemma we have

δ kEfkLp(B ×N 2 d×[0,N 2]) ≤ lim inf kEfe kLp(B ×[0,N 2]d×[0,N 2]), (6.15) N2 T δ→0 N2

2 d 2 d where as usual we identify N T with [0,N ] . We will begin by estimating Efe for arbitrary f n+d δ on B1 before specializing to f and passing to the limit later in the argument.

Fix a parameter R . N and choose a finitely-overlapping collection of (n + d)-dimensional n d boxes θm,k = Ik × Im ⊂ B1 × B1 with the following properties:

−1 (i) Each θm,k has side lengths ∼ R

(ii) Im is centered at m ∈ ΛN

−1 d (iii) m varies over an ∼ R -separated subset of ΛN such that the cubes Im cover [−1, 1] .

n+d Such a collection of θm,k yields a finitely-overlapping tiling of B1 by boxes of side length −1 n+d −1 ∼ R , which corresponds to a covering of the paraboloid P by R -caps. Let {ϕm,k} be a smooth partition of unity subordinate to this cover, and let fm,k = f · ϕm,k. By Theorem 6.4

1   2 +αp X 2 kEfe kLp(B ×[0,N 2]d×[0,N 2])  R kEfe m,kk p , N2 . L (wN2 ) m,k

2 d 2 ∗ where wN 2 is a bump function adapted to BN 2 × [0,N ] × [0,N ]. Setting R = N and p = p yields

1   2  X 2 kEfe kLp∗ (B ×[0,N 2]d×[0,N 2])  N kEfe m,kk p∗ , (6.16) N2 . L (wN2 ) m,k

154 1 δ with the fm,k supported on finitely-overlapping boxes of side lengths ∼ N . Specializing to f and taking a limit as in (6.15), we get

1   2  X δ 2 kEfkLp∗ (B ×N 2 d×[0,N 2]) . lim inf N kEfe m,kk p∗ . N2 T δ→0 L (wN2 ) m,k

Then as a consequence of the scaling (6.14)

1 n−d d   2 2 − p∗ + X δ 2 kukLp∗ (B × d×[0,1]) . lim inf N kEfe m,kk p∗ , (6.17) N T δ→0 L (wN2 ) m,k

where as above f(ξ, n) = uc0(Nξ, Nn) for n ∈ ΛN . Now by rescaling as before and writing out the definition of f δ we get

n−d d d 2 − p∗ δ − p∗ δ N kEfe k p∗ ≈ N ku k p∗ , m,k L (wN2 ) L (v) where

Z Z 2 2 δ X −d −1 −1 2πi(x·ξ1+y·ξ2+t(|ξ1| +|ξ2| )) u = c(Nδ) 1{|ξ2−l|≤Nδ}fm,k(N ξ1,N l)e dξ1dξ2 d n d R R l∈Z |l−m|.1

d and v is a suitable bump function adapted to BN × [0,N] × [−1, 1]. The desired result now follows from the pointwise estimate

Z 2 δ X −1 −1 2πi(x·ξ1+t|ξ1| ) |u | ≤ fm,k(N ξ1,N l)e dξ1 , n d R l∈Z |l−m|.1 which is uniform in δ.

The case p > p∗ follows by a similar argument (or interpolation).

155 6.5 The -Removal Argument

In this section we show that the N  loss from our Strichartz estimates can be removed for

∗ ∗ ∗ 2(n+d+2) p > p , where as before p is the endpoint p = n+d . The argument has a local and global component. In the local case the philosophy is the same as in [70] and [81]: one applies the

Strichartz estimate with  loss in the region where the operator is in some sense ‘small,’ and this leaves a ‘large’ portion of the operator which we can control with direct estimates for the associated kernel K, at least for p away from the endpoint. To extend the -removal to the global case we combine the local estimates with an interpolation argument to handle the global portion of a relevant bilinear form.

6.5.1 The Local-In-Time Case

We begin the argument by first showing that one can remove the N  loss from the local-in-time estimate from Proposition 6.9.

2(n+d+2) Proposition 6.10. Suppose p > n+d . Then

n+d n+d+2 it∆ n d − R ×T 2 p ke P≤N fkLp(Rn×Td×[0,1]) . N kfkL2(Rn×Td).

To prove the proposition we reduce the problem to a situation where we can apply ideas due to Killip and Visan. The argument in this local setting is almost the same as their proof of

-removal in the case n = 0 from [81].

it∆ n d Normalize kfkL2 = 1 and let F = e R ×T P≤N f. Using the level set characterization of the Lp norm and Bernstein’s inequality we get

n+d Z CN 2 p p−1 kF kLp( n× d×[0,1]) = p µ |{(x, y, t): |F (x, y, t)| > µ}| dµ. R T 0

Now if δ > 0 then we can apply Chebyshev’s inequality and Proposition 6.9 to see that

156 n+d N 2 −δ Z ∗ n+d n+d+2 ∗ n+d p−1 p ( − ∗ ) (p−p )( −δ)+ µ |{F > µ}| dµ . N 2 p N 2 0 p( n+d − n+d+2 ) −δ(p−p∗) ≤ CN 2 p N

p( n+d − n+d+2 ) ≤ CN 2 p provided  is chosen small enough. It remains to estimate the large portion of the integral

n+d Z CN 2 p−1 A := n+d µ |{F > µ}| dµ. (6.18) N 2 −δ

n+d −δ iω We let Ω = {F > µ} for µ ≥ N 2 fixed, and set Ωω = {Re(e F ) > µ/2}. Note that there

π 3π is some choice of ω ∈ {0, 2 , π, 2 } such that |Ω| ≤ 4|Ωω|, so we will estimate |Ωω| instead. In particular

2 2 µ |Ωω| h1Ω ,K ∗ 1Ω i 2 , (6.19) . ω ω Lx,y,t where K is the kernel

Z X 2 2 K(x, y, t) = ψ(N −1l)e2πi(y·l+t|l| ) · ψ(N −1ξ)e2πi(x·ξ+t|ξ| )dξ n d R l∈Z from above.

n Since t ∈ [0, 1] we cannot take advantage of the dispersion coming from the R component n+d of K. Instead we discretize the kernel and proceed as if we were working on T .

R −1 2πi(x·ξ+t|ξ|2) Lemma 6.11. Let K (x, t) = n ψ(N ξ)e dξ. Then if Q0 is the cube of side-length R R 1 centered at the origin we have

X −1 2πi[(x+2tα)·k+t|k|2] |KR(x, t)| . sup ψ(N (α + k))e α∈Q0 n k∈Z |k|≤cN

157 n Proof. Let Qk = Q0 + k, where k ∈ Z . We can write

Z X −1 2πi(x·ξ+t|ξ|2) KR(x, t) = ψ(N ξ)e dξ. Q k∈Zn k

Now making the change of variables ξ = α + k for each k, where α ∈ Q0, we get

Z 2πi(x·α+t|α|2) X −1 2πi[(x+2tα)·k+t|k|2] KR(x, t) = e · ψ(N (α + k))e dα. Q n 0 k∈Z |k|≤cN

The lemma follows by taking absolute values inside the integral and taking the supremum in

α.

We will now use a pointwise estimate for K originally due to Bourgain [70]. Given integers

a 1 1 ≤ q < N and 1 ≤ a < q with (a, q) = 1, let Sq,a = {t ∈ [0, 1] : |t − q | ≤ qN }. We apply Lemma 6.11 and the proof of Lemma 3.18 in [70] for fixed α to obtain 2

n+d 1 N |K(x, y, t)| Sq,a (t) . n+d n+d . (6.20) 2 n+d a 2 q (1 + N |t − q | )

Following Killip and Visan [81], define the ‘large’ set

T = t ∈ [0, 1] : qN 2|t − a/q| ≤ N 2ρ for some q ≤ N 2ρ, and (a, q) = 1 , where ρ > 0 is a small parameter to be determined below. Also define

Ke(x, y, t) = K(x, y, t)1T (t), and observe that from (6.20)

n+d−(n+d)ρ |K − Ke| . N . (6.21)

2 Recall that Dirichlet’s approximation theorem implies that either t ∈ Sq,a for some pair (q, a), or t is close to 0.

158 This implies that

n+d−(n+d)ρ 2 |h1Ω , (K − K) ∗ 1Ω i 2 | N |Ωω| , ω e ω Lx,y,t .

n+d −δ and since µ ≥ N 2 we can absorb the contribution of K − Ke to the left-hand side of (6.19) 2δ provided we take ρ < n+d . To estimate the contribution from Ke we use the following result due to Killip and Visan:

2(n+d+2) Proposition 6.12 ([81]). Suppose r > n+d . Then

2( n+d − n+d+2 ) kK ∗ F k r n d N 2 r kF k 0 , e L (R ×T ×[0,1]) . Lr (Rn×Td×[0,1]) provided ρ is chosen small enough (depending only on n, d, r).

Proof. The estimate follows by applying the argument from [81], Section 2, after bounding the kernel K pointwise via Lemma 6.11 and (6.20).

Applying this proposition to (6.19), we obtain

2 2 2 2( n+d − n+d+2 ) µ |Ωω| |h1Ω , K ∗ 1Ω i 2 | |Ωω| r0 N 2 r . . ω e ω Lx,y,t .

We can then conclude that

r (n+d− 2(n+d+2) ) −r |Ω| ≤ 4|Ωω| . N 2 r µ (6.22)

2(n+d+2) for any r ∈ ( n+d , p). Plugging (6.22) into (6.18) gives

n+d CN 2 r 2(n+d+2) Z 2 (n+d− r ) p−r−1 A . N n+d µ dµ N 2 −δ p( n+d − n+d+2 ) . N 2 p , completing the proof of Proposition 6.10.

159 d Remark 6.5.1. Proposition 6.12 remains true if T is replaced by a d-dimensional irrational torus, after a suitable modification of the pointwise estimate (6.20) to take into account the irrational parameters. See [81] for more details.

6.5.2 The Global Case

We now extend Proposition 6.10 to the global setting and show that for all p > p∗ and q =

4p q(p) = n(p−2) ,

 1/q X it∆ q n+d − n+d+2 ke P≤N fk N 2 p kfk 2 n d . Lp(Rn×Td×[γ−1,γ+1]) . L (R ×T ) γ∈Z

This will complete the proof of Theorem 6.2.

The argument is an adaptation of the bilinear interpolation approach of Keel and Tao [80], and is also inspired by the TT ∗ argument of Hani and Pausader [75]. We will use the fact that we already have both the local estimate without -loss, and the global estimate for q = q(p) with

∗ 2(n+d+2) ∗  loss. We assume below that p > p = n+d is fixed. We are free to take p as close to p as needed, since the remaining cases can be handled by interpolation.

Initial Reductions

−i(t+α)∆ n d Let Uα(t) = e R ×T , and for functions h(x, y, t) let hα(x, y, t) = h(x, y, t + α). By appealing to duality, we see that if T (h, g) is the bilinear form

X X Z 1 Z 1 T (h, g) = hU (s)P h (s),U (t)P g (t)i 2 dsdt (6.23) α ≤N α α+γ ≤N α+γ Lx,y 0 0 α∈Z γ∈Z

2(n+d+2) 4p then it suffices to show that for p > n+d and q = q(p) = n(p−2) we have

2αp 0 0 |T (h, g)| . N khk q p0 n d kgk q p0 n d . (6.24) `γ L (R ×T ×[γ,γ+1]) `γ L (R ×T ×[γ,γ+1])

Note that we can immediately prove (6.24) for the diagonal portion of T where |γ| ≤ 10 by using

160 Cauchy-Schwarz and Proposition 6.10 (in fact this argument gives a stronger estimate with an

2 ` sum). Hence we can assume in (6.23) that |γ| ≥ 10. We can also assume that h = P≤N h and g = P≤N g.

We dyadically decompose T and for j ≥ 3 define

X X Z 1 Z 1 T (h, g) = hU (s)h (s),U (t)g (t)i 2 dsdt, j α α α+γ α+γ Lx,y 0 0 α∈Z γ∈Z 2j ≤|γ|<2j+1 with the goal of showing that

X 2αp 0 0 |Tj(h, g)| . N khk q p0 n d kgk q p0 n d . `γ L (R ×T ×[γ,γ+1]) `γ L (R ×T ×[γ,γ+1]) j≥3

j We claim that we can assume hα(s) is zero for α outside an interval of length 2 . Indeed, let

j j Il = {α : l2 ≤ α < (l + 1)2 } and suppose that for some pair of exponents (a, b) we have

1 0 0 0 0 sup |Tj(h Il , g)| ≤ Akhk`q La kgk`q Lb . l∈Z

Note that for fixed l the terms in Tj(h, g) are zero unless α + γ ∈ 2Il+1. Then

X 1 0 0 1 0 0 |Tj(h, g)| ≤ A kh Il k`q La kg 2Il+1 k`q Lb l X q0 1 X q 1 ≤ A kh1 k  q0 kg1 k  q Il `q0 La0 2Il+1 `q0 Lb0 l l X q0 1 X q0 1 ≤ A kh1 k  q0 kg1 k  q0 Il `q0 La0 2Il+1 `q0 Lb0 l l

≤ cAkhk`q0 La0 kgk`q0 Lb0 ,

0 using the fact that q > q in the second-to-last inequality. Hence it suffices to estimate Tj(h, g) when hα is supported with α ∈ Il for some l, which additionally implies that we can assume g

j is time-supported in an interval of J length ∼ 2 . Note that with these assumptions Tj(h, g) =

161 1 1 T (h Il , g J ), so we will be able to take advantage of the dual of Proposition 6.6.

The First Interpolation

The first step is to prove the following two-parameter family of estimates. The result is similar to Lemma 4.1 in Keel-Tao [80], though we have to interpolate in a smaller range to avoid too large of a loss in N.

1 1 1 1 ∗ Lemma 6.13. For all ( a , b ) in a neighborhood of ( p , p ) with a, b > p we have

c(a,b)+ jβ(a,b) |Tj(h, g)| . N 2 khk`q0 La0 kgk`q0 Lb0 , where   p∗ n+d+2 n+d+2 (1 − a )d + a − b , if a ≤ b c(a, b) =  p∗ n+d+2 n+d+2 (1 − b )d + b − a , if a > b and n n n β(a, b) = + − . 2a 2b p

Proof. We begin by proving the lemma in the case (a, b) = (∞, ∞) and then in the two symmetric cases where a = r and r < b < p, and where b = r and r < a < p. The full range of estimates is then obtained by interpolating between these cases. Recall from above that we can assume h and g have time support in an interval of length ∼ 2j.

First consider (a, b) = (∞, ∞). Since |γ| ≥ 10 in the definition of Tj we can use kernel estimates as in Lemma 6.8 to get

d X −n/2 |Tj(h, g)| . N |γ| khαkL1 kgα+γkL1 . α,γ

n(p−2) j Letting µ = 2p and using the fact that |γ| ∼ 2 , this implies

n d −j p X −µ |Tj(h, g)| . N 2 |γ| khαkL1 kgα+γkL1 , α,γ

162 and then by the discrete Hardy-Littlewood-Sobolev inequality

n d −j p |Tj(h, g)| . N 2 khk`q0 L1 kgk`q0 L1 . (6.25)

This proves the lemma when (a, b) = (∞, ∞).

Next, suppose that a = p∗ and p∗ < b < p. By bringing the sums and time integrals into the inner product in Tj and then applying Cauchy-Schwarz we get

Z Z −it∆ 1 −it∆ 1 |Tj(h, g)| sup e (h(t) I )dt 2 e (g(t) I0 )dt 2 , . Lx,y Lx,y I,I0 R R where I,I0 are time intervals of length ∼ 2j. Applying the dual form of Theorem 6.2 and using

H¨older’sinequality in time (taking into account the fact that both functions have bounded time support) yields the following:

j( 1 − 1 ) j( 1 − 1 ) αb+ q(p∗)0 q0 q(b)0 q0 |Tj(h, g)| . N 2 2 khk`q0 L(p∗)0 kgk`q0 Lb0 .

Here we are using the fact that if s < p then q(s) > q(p), hence q(s)0 < q(p)0. Now

n + d n + d + 2 n + d + 2 n + d + 2 α = − = − , b 2 b p∗ b so this gives the desired power of N. Moreover, we have

1 1 1 1 2 1 1 −  + −  = − − . q(p∗)0 q0 q(b)0 q0 q q(p∗) q(b)

It is easy to check using the definition of q(s) that the last expression simplifies to β(p∗, b) =

β(a, b), giving the desired power of 2j as well. The proof of the symmetric case b = p∗ and p∗ < a < p is essentially the same.

163 The Second Interpolation

1 1 The fact that we have a two-parameter family of estimates in a neighborhood of ( p , p ) in Lemma 6.13 will allow us to sum in j, but we first have to suitably decompose the input functions h, g.

In particular we will use the atomic decomposition due to Keel and Tao.

Lemma 6.14 ([80]). Let (X, µ) be a measure space and 0 < p < ∞. Then any f ∈ Lp(X) can be written as f = P c χ where each χ is a function bounded by O(2−k/p) and supported k∈Z k k k k on a set of measure O(2 ), and the ck are non-negative constants such that kckk`p . kfkLp .

p0 n d We apply the lemma on L (R × T × [γ, γ + 1]) for each γ. This allows us to write

1 X γ γ 1 h(x, y, t) [γ,γ+1](t) = hkχk(x, y, t) [γ,γ+1](t) k∈Z

γ k γ −k/p0 with χk supported on a set of measure O(2 ) and |χk| . 2 for each k, such that

X γ p0 1/p0 |h | khk 0 k . Lp (Rn×Td×[γ,γ+1]) k and hence

X γ p0 1/p0 0 0 |hk| q . khk q p0 n d . (6.26) `γ `γ L (R ×T ×[γ,γ+1]) k Likewise we can decompose

1 X γ γ 1 g(x, y, t) [γ,γ+1](t) = gmϕm(x, y, t) [γ,γ+1](t) m∈Z

γ m γ −m/p0 with ϕm supported on a set of measure O(2 ) and |ϕm| . 2 for each m, such that

X γ p0 1/p0 0 0 |gm| q . kgk q p0 n d . (6.27) `γ `γ L (R ×T ×[γ,γ+1]) m

γ γ Define hkχk such that hkχk = hkχk when t ∈ [γ, γ + 1), and similarly define gmϕm. Then

164 we have X X X X |Tj(h, g)| ≤ |Tj(hkχk, gmϕm)|. (6.28) j≥3 j≥3 k∈Z m∈Z

1 1 1 1 Fix one such pair (k, m). Then for all ( a , b ) in a neighborhood of ( p , p ) Lemma 6.13 implies

jβ(a,b) c(a,b)+ 0 0 |Tj(hkχk, gmϕm)|  2 N kχkhkk q a0 kϕmgmk q b0 . `γ L `γ L − k − m k m jβ(a,b) c(a,b)+ p0 p0 0 0 γ γ  2 N 2 2 2 a 2 b kh k q0 kgmk q0 . k `γ `γ 2 n n 2 n n c(a,b)+ (j− n k)( 2a − 2p ) (j− n m)( 2b − 2p ) γ γ  N 2 2 kh k q0 kgmk q0 (6.29) . k `γ `γ

(with implicit constants depending on , a, b, p). We now optimize in a and b to get

(j− 2 k)( n − n ) −η|j− 2 k| 2 n 2a 2p = 2 n (6.30)

(j− 2 m)( n − n ) −η|j− 2 m| 2 n 2b 2p = 2 n (6.31) for some uniform η > 0. We explain how to control the power of N appearing in (6.29) in the following lemma.

Lemma 6.15. If p is close enough to p∗ and (a, b) is in a small ball around (p, p) then

c(a, b) < 2αp. (6.32)

Moreover, the size of the ball only depends on n, d.

Proof. Suppose without loss of generality that a ≤ b. Then c(a, b) < 2αp is equivalent to the inequality p∗ n + d + 2 n + d + 2 2(n + d + 2) (1 − )d + − + < n + d. (6.33) a a b p

1 1 Note that p < p∗ , hence

2(n + d + 2) 2(n + d + 2) < = n + d. p p∗

165 The lemma will then follow if we can choose a, b such that

p∗ n + d + 2 n + d + 2 2(n + d + 2) (1 − )d + − < n + d − . (6.34) a a b p

This is possible for any a, b close enough to p, provided p has been chosen sufficiently close to p∗

(which we are always free to assume). In particular, if we set a = b = p∗ then the left-hand side of (6.34) is simply 0. But by continuity the strict inequality (6.34) is preserved if we vary a, b in a small neighborhood around p, as long as p is close enough to p∗. This proves (6.33) for (a, b) in some small neighborhood of (p, p), and the size of the neighborhood clearly only depends on p∗ and hence only on n, d.

By combining Lemma 6.15 with (6.30) and (6.31),

2 2 2αp −η|j− k| −η|j− m| γ γ |Tj(hkχk, gmϕm)| N 2 n 2 n kh k q0 kgmk q0 . (6.35) . k `γ `γ

Indeed, by Lemma 6.15 we can choose  small enough (depending only on p, n, d and the choice of a, b) such that c(a, b) +  ≤ 2α(p).

Summing in j then yields

2 2 X X X 2αp X X X −η|j− k| −η|j− m| γ γ |Tj(hkχk, gmϕm)| N 2 n 2 n kh k q0 kgmk q0 . k `γ `γ k m j≥3 k m j≥3

X X 2 2 γ 2αp − n η|k−m| γ . N (1 + |k − m|)2 khkk q0 kgmk q0 . n `γ `γ k m

The right-hand side of this bound is of the form

X X γ q0 1/q0 X γ q0 1/q0 X f(m − k) |hk| |gm| := f(m − k)ckdm m,k γ γ m,k where f(m) is summable in m. By H¨older’sand Young’s inequality we can control this by

166 0 0 Cf kck`p0 kdk`p . Now p > q so by Minkowski’s inequality

0 0 0 p 1 0 q 1 X X γ q  q0  p0 X X γ p  p0  q0 kck`p0 = |hk| ≤ |hk| , k γ γ k

0 which we can control by khk`q0 Lp0 using (6.26). Also note kdk`p ≤ kdk`p0 since p > p , so the same argument (using (6.27)) works to give an acceptable bound for this term as well. We ultimately get

X X X 2αp |Tj(hkχk, gmϕm)| . N khk`q0 Lp0 kgk`q0 Lp0 , k m j≥3 as desired.

6.6 Some Applications

In this section we show how Theorem 6.2 can be used to prove Theorem 6.3. Below all mixed norms of the type `qLp are defined as in Theorem 6.2.

6.6.1 Function Spaces

We will employ the atomic and variational spaces that have frequently been used to study well- posedness problems for dispersive equations (for examples see [74], [77], [75], and [81]). We recall some basic definitions and properties and refer the reader to [74] for proofs.

p Let I ⊂ R be a time interval. Given 1 ≤ p < ∞ and a Hilbert space H, a U (I,H) atom a is defined to be a function a : I → H such that

K X 1 a(t) = [tk−1,tk)φk−1, φk ∈ H k=1 for some partition −∞ < t0 < t1 < ... < tK ≤ ∞, with the additional property that PK−1 p p k=0 kφkkH = 1. Then U (I,H) is the Banach space of functions u : I → H with a de-

167 composition of the form X u = λjaj, j

1 p p where {λj} ∈ ` (C) and aj are U (I,H) atoms. The norm on U is taken to be

X X p kukU p = inf{ |λj| : u = λjaj with aj U -atoms}. j j

We also define the variational space V p(I,H) to be the Banach space of functions v such that

 K  1 X p p kvkV p := sup kv(tk) − v(tk−1)kH < ∞. partitions {tk} k=1

We have the important duality relationship

0 (U p)∗ = V p .

We introduce two futher spaces Xs and Y s which we will use to carry out the main iteration

p −it∆ p argument. First let U∆ denote the space of functions such that e u ∈ U , and similarly define p s V∆. Let {Cz}z∈Zn+d be a tiling of frequency space by cubes of side-length ∼ 1, and define X0(R) s and Y (R) to be the spaces of functions u and v, respectively, such that the following norms are finite:

2 X 2s 2 kukXs( ) := hzi kPCz uk 2 2 n d 0 R U∆(R,L (R ×T )) z∈Zn+d

2 X 2s 2 kvkY s( ) := hzi kPCz vk 2 2 n d . R V∆(R,L (R ×T )) z∈Zn+d s s One can likewise define the time-restriction norms X0(I),Y (I) for I ⊂ R (see [75]). Recall that we have the sequence of embeddings

2 s s s 2 s p s ∞ s U∆(R,H ) ,→ X0(R) ,→ Y (R) ,→ V∆(R,H ) ,→ U∆(R,H ) ,→ L (R,H )

168 for any p > 2. Finally, since we wish to study scattering we introduce the modified space Xs as in [75] defined to be

s −it∆ s it∆ s X ( ) := {u : φ−∞ := lim e u(t) exists in H , and u(t) − e φ−∞ ∈ X0( )}, R t→−∞ R with the norm

2 2 it∆ 2 kuk s := kφ−∞k s n d + ku − e φ−∞k s . X (R) H (R ×T ) X0 (R)

s s One can also define time-restriction spaces X (I) similarly to X0(I). Note that it is not suf- s s ficient to work with X0(R) if we wish to study scattering as t → −∞ since if u ∈ X0 then 3 limt→−∞ u(t) = 0 in norm.

6.6.2 The Quintic Equation on R × T

p As a first step we can transfer our Strichartz estimates to a result about U∆ spaces.

n d Lemma 6.16. Let q, p be as in Theorem 6.2. Let C be a cube in frequency space R × Z with side-length ∼ N ≥ 1 and let I ⊂ R be a time interval. Then

n+d n+d+2 2 − p k1 · P uk q p n d N kP uk min(p,q) I C `γ L ( × ×[γ,γ+1]) . C 2 n d R T U∆ (I,L (R ×T ))

−it∆ The lemma is a direct consequence of the atomic decomposition of e PC u.

1 We begin with the case of the quintic equation on R × T with initial data in H 2 (recall this 1 4 equation is H 2 -critical). Let F (u) = |u| u. We will apply an iteration argument to the Duhamel operator Z t i(t−s)∆ u0 + e F (u)(s)ds. (6.36) 0

The presentation here is similar to [81], although we will be able to prove more in this semiperi- odic setting since we have global-in-time estimates.

3 Indeed, given  > 0 one can use the atomic decomposition to decompose u = u1 + u2 where u1(t) = 0 for t < T and ku2(t)kH ≤ ; see Proposition 2.2 in [74].

169 A first step is the following multilinear estimate, which we will see is a corollary of Theorem

6.2 and the basic properties of the function spaces outlined above.

(j) 1 − 1 Lemma 6.17. Let I ⊂ R be a time interval and suppose u ∈ X 2 (I) and v ∈ Y 2 (I). Then

Z Z 5 5 Y (j) Y (j) v · u dxdydt . kvk − 1 ku k 1 . (6.37) Y 2 X 2 (I) I R×T j=1 j=1

Proof. The proof of this lemma follows the same basic approach as in [77] (see also [81], [75], and

[74]). Below all norms are taken with respect to the time interval I (note we can have I = R). 1 1 1 Since we have embeddings X 2 ,→ Y 2 it suffices to prove (6.37) with the X 2 norms replaced by

1 Y 2 norms.

We will use the short-hand uN = P≤N u. By performing Littlewood-Paley decompositions on all six functions and exploiting symmetry of the resulting expression, we reduce matters to showing that

Z Z 5 5 X X Y (j) Y (j) vN · u dxdydt kvk 1 ku k 1 , (6.38) 0 Nj . − 2 2 I × Y Y (I) N0≥1 N1≥N2≥...≥N5 R T j=1 j=1 where Nj ≥ 1 are dyadic integers. Moreover, by Plancharel’s theorem we know that for fixed

N0, ..., N5 the corresponding term in (6.38) vanishes unless the two largest frequencies are com- parable. Hence we have two cases to consider, when N0 ∼ N1 ≥ N2 ≥ ... ≥ N5 (Case I), and when N0 . N1 ∼ N2 ≥ ... ≥ N5 (Case II).

Case I. We fix N0,N1 with N0 ∼ N1 and decompose the support of P≤N0 and P≤N1 into subcubes {Cj} of side-lengths N2. We say Cj ∼ Ck if the sumset Cj + Ck overlaps the Fourier support of P . After decomposing v and u(1) with respect to the C ’s, we see by Plan- ≤2N2 N0 N1 j charel’s theorem that it suffices to bound

Z Z 5 X X Y (j) P v P u(1) · u dxdydt (6.39) Cj N0 Ck Nj I × Cj ∼Ck N2≥N3≥N4≥N5 R T j=2 by a factor that will be summable over N0 and N1 (in particular, if Cj is not comparable to Ck

170 then the integral is 0). We turn to this task now.

We apply the mixed-norm H¨older’sinequality with Lp exponents corresponding to

4 4 4 4 1 1 + + + + + = 1 18 18 18 18 18 18 and `q exponents corresponding to

5 5 5 5 2 2 + + + + + = 1 36 36 36 36 9 9 to obtain

X X (1) (2) (3) (6.39) ≤ kPC vN k 36 9 kPC u k 36 9 ku k 36 9 ku k 36 9 j 0 ` 5 L 2 k N1 ` 5 L 2 N2 ` 5 L 2 N3 ` 5 L 2 Cj ∼Ck N2≥N3≥N4≥N5

(4) (5) · ku k 9 ku k 9 , N4 ` 2 L18 N5 ` 2 L18

where the norms are all localized in time to I. By applying Lemma 6.16 and the embedding

Y 0 ,→ U r for r > 2 we ultimately obtain

5 5 18 18 X X N4 N5 (1) (2) (3) (6.39) kPC vN k 1 kPC u k 1 ku k 1 ku k 1 . 3 7 j 0 − 2 k N1 2 N2 2 N3 2 18 18 Y Y Y Y Cj ∼Ck N2≥N3≥N4≥N5 N2 N3 (4) (5) · ku k 1 ku k 1 . N4 Y 2 N5 Y 2

Now by using Cauchy-Schwarz (or Schur’s test) to sum, we conclude that

X (1) (2) (3) (4) (5) (6.39) kPC vN k 1 kPC u k 1 ku k 1 ku k 1 ku k 1 ku k 1 . j 0 Y − 2 k N1 Y 2 Y 2 Y 2 Y 2 Y 2 Cj ∼Ck 5 (1) Y (j) kvN k 1 ku k 1 ku k 1 . 0 Y − 2 N1 Y 2 Y 2 j=2

171 and therefore

5 X (1) Y (j) (6.38) kvN k 1 ku k 1 ku k 1 . 0 Y − 2 N1 Y 2 Y 2 N0∼N1 j=2 5 Y (j) kvk 1 ku k 1 , . Y − 2 Y 2 j=1 as desired.

Case II. In this case we do not decompose into subcubes, and instead directly apply the mixed-norm H¨older inequality with the same exponents as in Case I and then Lemma 6.16. This leads to a bound of the form

11 5 5 18 18 18 X X X N0 N4 N5 (1) (2) (3) (6.38) kvN k 1 ku k 1 ku k 1 ku k 1 . 7 7 7 0 − 2 N1 2 N2 2 N3 2 18 18 18 Y Y Y Y N1 N0.N1 N1∼N2≥N3≥N4≥N5 N1 N2 N3 (4) (5) · ku k 1 ku k 1 . N4 Y 2 N5 Y 2

Then repeatedly using Cauchy-Schwarz or Schur’s test as before to sum, we get

4 18 5 X N1 (1) (2) Y (j) (6.38) kvk 1 ku k 1 ku k 1 ku k 1 . 4 − 2 N1 2 N2 2 2 18 Y Y Y Y N1∼N2 N2 j=3 5 Y (j) kvk 1 ku k 1 , . Y − 2 Y 2 j=1 completing the proof.

Below we will write Z t I(u) = ei(t−s)∆F (u)(s)ds. 0

172 Proposition 6.18. For any time interval I ⊂ R we have

5 kI(u)k 1 kuk 1 (6.40) X 2 (I) . X 2 (I) and

4 kI(u + w) − I(u)k 1 kwk 1 (kuk 1 + kwk 1 ) . (6.41) X 2 (I) . X 2 (I) X 2 (I) X 2 (I)

Proof. By duality Z Z

kI(u)k 1 ≤ sup v · F (u)dxdydt X 2 (I) v∈Y −1/2(I) I R×T kvk =1 Y −1/2

1 for any u ∈ X 2 (the proof is similar to the proof of Proposition 2.11 in [77], see also [75]). We prove the second part of the proposition, since the first is then a simple consequence.

It suffices to prove that

Z Z 4 4 4 v · (|u + w| (u + w) − |u| u)dxdydt kvk − 1 kwk 1 (kuk 1 + kwk 1 ) . Y 2 X 2 X 2 X 2 I R×T

for a fixed v with kvk 1 = 1. By expanding the expression inside the integral we see that Y − 2 this estimate is an immediate consequence of Lemma 6.17 with u(j) ∈ {±u, ±w, ±u,¯ ±w¯}. This completes the proof.

We can now prove Theorem 6.3 using a standard iteration argument. As above we fix a time interval I ⊂ R containing 0 (in particular we could have I = R).

The Global Case for Small Data

1 Suppose one has small initial data u0 ∈ H 2 (R × T) with

ku0k 1 ≤ η < η0, H 2

173 with η0 some fixed parameter to be determined. We apply a contraction mapping argument to the operator

it∆ Φ(u)(t) := e u0 ± iI(u)(t) on the ball

1 1 2 B := {u ∈ X 2 (I) ∩ CtHx,y(I, × ): kuk 1 ≤ 4η} R T X 2 (I) with respect to the metric d(u, v) = ku − vk 1 . By Proposition 6.18 we have X 2 (I)

it∆ kΦ(u)k 1 ≤ ke u0k 1 + kI(u)k 1 X 2 (I) X 2 (I) X 2 (I)

5 ≤ 3ku0k 1 + Ckuk 1 H 2 X 2 (I)

≤ 3η + C(4η)5 ≤ 4η

provided η0 is chosen small enough. Therefore Φ maps B into itself. Now the second part of Proposition 6.18 implies that

4 d(Φ(u), Φ(v)) ku − vk 1 (kuk 1 + kvk 1 ) . X 2 (I) X 2 (I) X 2 (I)

4 . d(u, v)(8η) 1 ≤ d(u, v), 2

again provided η0 is chosen sufficiently small. Now apply the contraction mapping theorem on

B to obtain a solution. By taking I = R we see in particular that for small initial data in 1 H 2 (R × T) one has global-in-time solutions to the quintic equation. We can additionally show 1 1 that such solutions scatter in H 2 (R × T) by appealing to the fact that if G ∈ X 2 (R) then 1 limt→±∞ G(t) exists in H 2 (for the case t → ∞ see Proposition 2.2 in [74]). Indeed, to prove scattering it suffices to show that if u is a solution to the quintic equation then the Duhamel

1 integral I(u)(t) is conditionally convergent in H 2 as t → ±∞.

We can also prove scattering directly from the argument above with a small amount of

174 additional work. We record some details since these estimates are also useful in the local theory for large data. To facilitate the argument we introduce the following time-divisible norm as in

[75]:   1 X X p( 4 − 1 ) p p kuk = N p 2 k1 P uk , Z(I) I N `q(p)Lp p=9/2,18 N≥1

4p where q(p) = p−2 as above. We also let

3 1 4 4 kukZ0(I) = kukZ(I)kuk 1 . X 2 (I)

Note that kukZ(I) kuk 1 by Lemma 6.16. By repeating the argument given in the proofs . X 2 (I) of Lemma 6.17 and Proposition 6.18 we obtain the stronger results

4 kI(u)k 1 kuk 1 kuk 0 (6.42) X 2 (I) . X 2 Z (I) and

3 kI(u + w) − I(u)k 1 kwk 1 (kuk 1 + kwk 1 )(kukZ(I) + kwkZ(I)) . X 2 (I) . X 2 (I) X 2 (I) X 2 (I)

5 Now if u is a solution in B with kuk 1 ≤ η0 then the norm kuk 0 is uniformly bounded H 2 Z (I) independent of I. Given  > 0, we can therefore find a time T such that if r, t > T then

kI(1[t,r]u)k 1 < . X 2 (R)

t 1 R −is∆ 2 But this implies that I(u)(t) = 0 e F (u)(s)ds is Cauchy in H , and hence converges in norm as t → ∞.

175 The Local Case for Large Data

Suppose ku0k 1 ≤ A and fix δ ≥ 0 to be determined. We apply a contraction mapping argument H 2 to the operator Φ on

1 1 0 2 B := {u ∈ X 2 (I) ∩ CtHx,y(I, × ): kuk 1 ≤ 2A and kukZ(I) ≤ 2δ}. R T X 2 (I)

it∆ 0 0 Now if ke u0kZ(I) ≤ δ then by (6.42) we see that Φ : B → B if Aδ is small enough, and likewise 1 kΦ(u) − Φ(v)k 1 ≤ ku − vk 1 X 2 (I) 2 X 2 (I)

1 if Aδ is small enough. But for any u0 ∈ H 2 we can find a time interval I containing 0 such that

it∆ ke u0kZ(I) ≤ δ, and this completes the argument.

6.6.3 The Cubic Equation on R2 × T

The skeleton of the argument in this setting is essentially the same as what we saw in the last section, so we only provide a brief sketch of the details. The key point here is that the `q(p) exponent in dimension (n, d) = (2, 1) is q(p) = 4p = 2p . It follows that if P3 1 = 1 then 2(p−2) p−2 i=0 pi P3 1 = 1, and therefore one has the quadrilinear estimate i=0 q(pi)

Z Z 3 3 Y (i) Y (i) v · u dxdydt kvk q(p ) ku k q(p ) . (6.43) . ` 0 Lp0 i pi 2 I `I L I R ×T i=1 i=1

2 Note that (6.43) will not hold in general for the admissible exponents r, q(r) on R × T , since 4r with these dimensions q(r) = r−2 . The main nonlinear estimate is the following.

(j) 1 − 1 Lemma 6.19. Let I ⊂ R be a time interval and suppose u ∈ X 2 (I) and v ∈ Y 2 (I). Then

Z Z 3 3 Y (j) Y (j) v · u dxdydt . kvk − 1 ku k 1 . (6.44) 2 Y 2 X 2 (I) I R ×T j=1 j=1

176 Proof. We use the same Littlewood-Paley decomposition argument as the proof of Lemma 6.17 with two cases corresponding to N0 ∼ N1 ≥ N2 ≥ N3 (Case I), and N0 . N1 ∼ N2 ≥ N3 (Case 7 II). In both cases we apply (6.43) with exponents p0, p1, p2 = 2 and p3 = 7.

Case I. As in the proof of Lemma 6.17 we decompose the support of PN0 and PN1 into subcubes Cj of side length N2. It then suffices to estimate

Z Z X X (1) (2) (3) P v P u · u u dxdydt . (6.45) Ck N0 Cj N1 N2 N3 I 2× N0∼N1≥N2≥N3 Cj ∼Ck R T

By (6.43) and then Lemma 6.16 we have

X X (1) (2) (3) (6.45) ≤ kPC vN k 14 7 kPC u k 14 7 ku k 14 7 ku k 14 k 0 ` 3 L 2 j N1 ` 3 L 2 N2 ` 3 L 2 N3 ` 5 L7 N0∼N1≥N2≥N3 Cj ∼Ck N 2 X X 3  7 (1) (2) (3) 1 1 1 1 . kPCk vN0 k − kPCj uN k kuN k kuN k N2 Y 2 1 Y 2 2 Y 2 3 Y 2 N0∼N1≥N2≥N3 Cj ∼Ck which is bounded above by the desired quantity by Cauchy-Schwarz or Schur’s test. Here all

2 spacetimes norms are taken relative to I × R × T. Case II. In this case we estimate

Z Z X (1) (2) (3) v u u u dxdydt (6.46) N0 N1 N2 N3 I R2×T N0.N1∼N2≥N3 by applying (6.43) and Lemma 6.16 with the same exponents as in Case I. We get

4 2 7 7 X N0 N3 (1) (2) (3) (6.46) kvN k 1 ku k 1 ku k 1 ku k 1 . 3 3 0 − 2 N1 2 N2 2 N3 2 7 7 Y Y Y Y N0.N1∼N2≥N3 N1 N2 which is bounded by the desired quantity (again by Cauchy-Schwarz or Schur’s test).

The rest of the cubic case of Theorem 6.3 can now be proved by routine modifications of arguments from the last section, using Lemma 6.19 in place of Lemma 6.17.

177 6.7 Additional Remarks

Remark 6.7.1. Theorem 6.2 remains true with a loss of N  if the operator eit∆ is replaced by

Z itφ(D) X 2πi(x·ξ+y·m+tφ(ξ,m)) e f := fcm(ξ)e dξ, m

3 n+d 2 n+d where φ(ξ, η) is a C function on R such that D φ is uniformly positive-definite on R and

1 |∂I φ(ξ, η)| . 1 1 + (|ξ|2 + |η|2) 2

−2 for any triple index I. Let ρ = (ξ, η) and define the rescaled function φ1(ρ) = N φ(Nξ, Nη).

− 2 By Taylor expansion of the phase we see that if τ ⊂ B1 has radius ∼ N 3 then

Z 2πi(z·ρ+tφ1(ρ)) gb(ρ)e dρ τ

−2 n+d+1 has space-time Fourier support in an O(N )-neighborhood of the paraboloid in R . One then obtains a decoupling result for eitφ(D) by following the iteration scheme outlined in Section

7 of [71], and then completes the argument by following the same steps from Section 3.

It is not clear if one can extend the -removal argument from Section 4 to operators with these types of phases. The main issue is with the local Strichartz estimate in this context. The argument of Killip and Visan relies on some subtle number-theoretic properties of the kernel associated to eit∆, and it is not immediately clear that the same argument will work even if eitφ(D) is a small perturbation of eit∆.

 2(n+d+2) Remark 6.7.2. We do not know if the loss of N is necessary in the endpoint case p = n+d 2(n+d+2) and q = n of Theorem 6.2 for the case when n + d > 2. In the case d = 0 the theorem is true without any loss (this is just the Stein-Tomas theorem), while in the case n = 0 some loss in N is necessary (as shown by Bourgain [70]).

n A simple argument using Sobolev embedding and the Strichartz estimate from R shows

178 2(n+2)  that Theorem 6.2 holds for p = n with no loss of N (note that in this case q = p). We also know that at least in the low-dimensional case n = d = 1 the local Strichartz estimate holds with no loss in N for p = 4, as shown by Takaoka and Tzvetkov [83]. However, as far as we know there are no higher-dimensional analogues of the Takaoka-Tzvetkov result beyond what is present in this paper.

It is possible to prove the endpoint case of Theorem 6.2 when n = d = 1 with no loss of derivatives (that is, with s = 0) by inserting Gaussians adapted to the time intervals [γ, γ + 1] and then expanding out the L4 and `8 norms and computing Fourier transforms [73]. One then obtains a uniform bound on the resulting kernel. This argument heavily relies on the fact that p = 4 and q = 8 are even exponents, which no longer holds as soon as n+d > 2. It is not clear if the endpoint case of the local or global estimate is true without any loss of derivatives in higher dimensions; we at least expect Theorem 6.2 to be true at the endpoint with s = 0 when n > d, although this may be hard to prove.

179 180 Bibliography

[1] Barron, A. Weighted Estimates for Rough Bilinear Singular Integrals via Sparse Domination,

N.Y. Journal of Math 23 (2017) 779-811.

[2] Barron, A., Pipher, J. Sparse bounds for bi-parameter operators using square functions.

Preprint arXiv:1709.05009.

[3] Barron, A. Conde Alonso, J.M., Ou, Y. Rey, G. Sparse domination and the strong maximal

function, Adv. Math. 345 (2019), 1-26.

[4] Benea, C. Bernoct, F. Conservation de certaines propri´et´es`atravers un contrˆole ´epars d’un

op´erateur et applications au projecteur de Leray-Hopf , preprint arXiv:1703.00228

[5] Bernard, A. Espaces H1 de Martingales a Dues Indices. Dualit´eAvec Les Martingales De

Type BMO , Bull. Sc. Math. 103, 1979, p. 297-303

[6] Bernicot, F.; Frey, D.; Petermichl, S. Sharp weighted norm estimates beyond Calder´on-

Zygmund theory, Anal. PDE 9 (2016), no. 5, 1079-1113.

[7] Bilyk, D. On Roth’s orthogonal function method in discrepancy theory. Uniform Distribution

Theory 6 (2011), no.1, 143-184

[8] Buckley, S.M. Estimates for operator norms on weighted spaces and reverse Jensen in- equalities, Trans. Amer. Math. Soc., 340 (1993), no. 1, 253-272.

181 [9] Bui, T.A; Conde-Alonso, J.M.; Duong, X.T.; Hormozi, M. Weighted bounds for multilinear operators with non-smooth kernels, preprint arXiv:1506.07752

[10] Calderon,´ A.P; Zygmund, A. On the existence of certain singular integrals, Acta Math. 88 (1952), 85-139.

[11] Calderon,´ A.P.; Zygmund, A. On singular integrals, Amer. J, Math. 78 (1956), 289-309.

[12] Chang, S-Y.A, Fefferman, R. A Continuous Version of Duality of H1 with BMO on the

Bi-disc. Annals of Mathematics, Second Series, Vol. 112, No. 1 (Jul., 1980), pp. 179-201

[13] Chang, S-Y.A, Fefferman, R. The Calder´on-ZygmundDecomposition on Product Domains,

American Journal of Mathematics, Vol. 104, No. 3 (Jun., 1982), pp. 455-468.

[14] Chang, S-Y.A, Fefferman, R. A Continuous Version of Duality of H1 with BMO on the

Bi-disc, Annals of Mathematics, Second Series, Vol. 112, No. 1 (Jul., 1980), pp. 179-201

[15] Christ, M. Weak type (1, 1) bounds for rough operators, Ann. of Math. (2) 128 (1988), no. 1, 19-42.

[16] Christ, M.; Rubio de Francia, J.L. Weak type (1, 1) bounds for rough operators II, Invent. Math. 93 (1988), no. 1, 225-237.

[17] Coifman, R.; Meyer, Y. On commutators of singular integrals and bilinear singular integrals, Trans. Amer. Math. Soc. 212 (1975), 315-331,

[18] Conde-Alonso, J., Rey, G. A pointwise estimate for positive dyadic shifts and some appli-

cations, Mathematische Annalen August 2016, Volume 365, Issue 3–4, pp 1111-1135

[19] Conde-Alonso, J.M.; Culiuc, A.; Di Plinio, F.; Ou, Y. A sparse domination principle for rough singular integrals, arXiv:1612.09201 (2016).

[20] C´ordoba,A., Fefferman, R. A geometric proof of the strong maximal theorem, Annals of

Mathematics, 102 (1975), 95-100.

182 [21] Culiuc, A.; Di Plinio, F.; Ou, Y. Domination of Multilinear Singular Integrals by Positive

Sparse Forms, arXiv:1603.05317 (2016).

[22] Cruz-Uribe, D.; Martell, J.M.; Perez,´ C. Weights, extrapolation and the the- ory of Rubio de Francia, Operator Theory: Advances and Applications, vol. 215,

Birkh¨auser/SpringerBasel AG, Basel, 2011.

[23] Cruz-Uribe, D.; Naibo, V. Kato-Ponce inequalities on weighted and variable Lebesgue spaces, Differential Integral Equations 29 (2016), no. 9/10, 801-836.

[24] Culiuc, A.; Di Plinio, F.; Ou, Y. Domination of Multilinear Singular Integrals by Positive Sparse Forms, arXiv:1603.05317 (2016).

[25] Culiuc, A.; Di Plinio, F.; Ou, Y. Uniform sparse domination of singular integrals via dyadic shifts, preprint arXiv:1610.01958.

[26] Di Plinio, F.; Lerner, A. K. On weighted norm inequalities for the Carleson and Walsh- Carleson operator, J. Lond. Math. Soc. (2) 90 (2014), no. 3, 654-674.

[27] Domelevo, K, and Petermichl, S. A sharp maximal inequality for differentially subordinate

martingales under a change of law. arXiv:1607.06319

[28] On Projections in L1. Annals of Math. Vol 102, No. 3 (1975), 463-474.

[29] Duoandikoetxea, J. Weighted norm inequalities for homogeneous singular integrals, Trans. Amer. Math. Soc. 336 (1993), 869-880.

[30] Duoandikoetxea, J. Extrapolation of weights revisited: New proofs and sharp bounds, Journal of Functional Analysis Volume 260, Issue 6, 2011, 1886-1901.

[31] Fefferman, R. Strong Differentiation With Respect to Measures, American Journal of Math-

ematics, Vol. 103, No.1 (Feb 1981), pp. 33-40.

[32] Grafakos, L. Classical Fourier Analysis, Third edition, Graduate Texts in Mathematics, 249, Springer, New York, 2014.

183 [33] Grafakos, L. Modern Fourier Analysis, Third edition, Graduate Texts in Mathematics, 250, Springer, New York, 2014.

[34] Grafakos, L.; He, D.; Honz´ık, P. Rough Bilinear Singular Integrals, preprint arXiv:1509.06099 (2015).

[35] Grafakos, L; Martell, J.M. Extrapolation of weighted norm inequalities for multivari- able operators, Journal of Geometric Analysis 14 (2004), no. 1, 19-46.

[36] Grafakos, L.; Torres, R. H. Multilinear Calderon-Zygmund theory, Adv. Math. 165 (2002), 124–164.

[37] Hagelstein, P and Parissis, I. Weighted Solyanik Estimates for the Strong Maximal Function,

preprint arXiv:1410.3402

[38] Hofmann, S. Weak type (1,1) boundedness of singular integrals with nonsmooth kernels, Proc. Amer. Math. Soc. 103 (1988), 260 - 264

[39] Holmes, I., Petermichl, S., and Wick, B. Weighted little bmo and two-weight inequalities for

Journ´ecommutators, preprint arXiv:1701.06526

[40] Holmes, I., Lacey, M., Wick, B. Commutators in the Two-Weight Setting, preprint

arXiv:1506.05747

[41] Hyt¨onen,T. The sharp weighted bound for general Calderon-Zygmund operators, Annals of

Mathematics, vol. 175(3), 1473-1506, 2012.

[42] Hyt¨onen,T., P´erez,C., and Rela, E. Sharp Reverse H¨olderproperty for A∞ weights on spaces of homogeneous type, preprint arXiv:1207.2394

[43] Hyt¨onen,T., Petermichl, S., and Volberg, A. The Sharp Square Function Estimate with

Matrix Weight, preprint arXiv:1702.04569

[44] Hytonen,¨ T.; Roncal, L.; Tapiola, O. Quantitative weighted estimates for rough homogeneous singular integrals, preprint arxiv:1510.05789, to appear, Israel J. Math.

184 [45] Journ´e,J-L, Two problems of Calder´on-Zygmund theory on product-spaces. Annales de

l’institut Fourier, no. 1 (1988), 111-132.

[46] Krause, B.; Lacey, M. T. Sparse Bounds for Maximally Truncated Oscillatory Singular Integrals, preprint, arXiv:1701.05249

[47] Krause, B. Lacey, M.T. Sparse bounds for random discrete Carleson theorems, preprint

arXiv:1609.08701

[48] Lacey, M.T. An elementary proof of the A2 Bound, Israel J. Math. 217 (2017), no. 1,

181–195.

[49] Lacey, M.T. Sparse Bounds for Spherical Maximal Functions, to appear in J D’Analyse

Math.

[50] Lacey, M.T. Personal Communication.

[51] Lacey, M.T.; Mena Arias, D. The sparse T1 Theorem, preprint arXiv:1610.01531.

[52] Lerner, A.K. On an estimate of Calder´on-Zygmundoperators by dyadic positive operators,

J. Anal. Math. 121 (2013), 141–161

[53] Lerner, A.K. A simple proof of the A2 conjecture, Int. Math. Res. Not. (2013), no. 14,

3159-3170.

[54] Lerner, A. K. On pointwise estimates involving sparse operators, New York J. Math. 22

(2016), 341-349.

[55] Lerner, A., Nazarov, F. Intuitive dyadic calculus: the basics, preprint arXiv:1508.05639

(2015).

[56] Lerner, A. K.; Ombrosi, S.; Perez,´ C.; Torres, R. H; Trujillo-Gonzalez,´ R. New maximal functions and multiple weights for the multilinear Calder´on-Zygmund theory,

Adv. Math. 220 (2009), no. 4, 1222-1264.

185 [57] Li, K. Sparse domination theorem for multilinear singular integral operators with Lr- H¨ormandercondition, preprint arXiv:1606.03340.

[58] Martikainen, H. Representation of bi-parameter singular integrals by dyadic operators, Ad-

vances in Mathematics, Volume 229, Issue 3, 15 February 2012, Pages 1734-1761.

[59] Muscalu, C., Pipher, J., Tao, T., Thiele, C. Bi-Parameter Paraproducts, Acta Math. Volume

193, Number 2 (2004), 269-296.

[60] Muscalu, C., Schlag, W. Classical and Multilinear Harmonic Analysis vol 2. Cambridge

University Press, 2013.

¯ n [61] Nagel, A., and Stein, E.M. The ∂b-complex on decoupled boundaries in C . Annals of Math. 164 (2006), 649 - 713.

[62] Ou, Y. Personal Communication.

[63] Seeger, A. Singular integral operators with rough convolution kernels, Jour. Amer. Math. Soc., 9 (1996), 95 - 105.

n [64] Sj¨ogren,P. A Remark on the Maximal Function for Measures in R . American Journal of Mathematics, Vol. 105, No. 5 (Oct., 1983), pp. 1231-1233.

[65] Stein, E. M. Harmonic analysis: real-variable methods, orthogonality, and oscillatory integrals, Princeton Mathematical Series, vol. 43, Princeton University Press, Princeton, NJ,

1993, With the assistance of Timothy S. Murphy, Monographs in Harmonic Analysis, III.

MR 1232192

[66] Street, B. Multi-parameter Singular Integrals, Annals of Mathematics Studies 189.

[67] Tao, T. The weak type (1, 1) of L log L homogeneous convolution operators, Indiana Univ. Math. J. 48 (1999), 1547 - 1584.

186 [68] Watson, D.K. Weighted estimates for singular integrals via Fourier transform estimates, Duke Math. J. 60 (1990), 389-399.

[69] Barron, A. On Global-in-time Strichartz estimates for the semiperiodic Schr¨odinger equa-

tion. Preprint, arXiv:1901.01663

[70] Bourgain, J. Fourier transform restriction phenomena for certain lattice subsets and appli-

cations to nonlinear evolution equations part I: Schr¨odinger equations. GAFA Vol. 3, No. 2

(1993).

[71] Bourgain, J., Demeter, C. The proof of the l2 decoupling conjecture. Annals of Math. 182

(2015), 351-389.

[72] Cheng, X., Guo, Z., Zhao Z. On scattering for the defocusing quintic nonlinear Schr¨odinger

equation on the two-dimensional cylinder. arxiv preprint arXiv:1809.01527.

[73] Christ, M. Pausader, B. Personal communication.

[74] Hadac, M., Herr, S., Koch, H. Well-posedness and scattering for the KP-II equation in a

critical space, Ann. Inst. H. Poincar´eAnal. Non Lin´eaire 26 (2009), 917-941.

[75] Hani, Z., Pausader, B. On scattering for the quintic defocusing nonlinear Schr¨odingerequa-

2 tion on R × T . Com. on Pure and Applied Math. 67 (9), 1466-1542

[76] Hani, Z., Pausader, B., Tzvetkov, N., Visciglia, N. Modified scattering for the cubic non-

linear Schr¨odingerequation on product space and applications, Forum of Mathematics, Pi,

Vol. 3 (2015)

[77] Herr, S., Tataru, D., Tzvetkov, N. Global well-posedness of the energy-critical nonlinear

1 3 Schr¨odingerequation with small initial data in H (T ). Duke Math J., Volume 159, Number 2 (2011), 329 - 349.

187 [78] Herr, S. Tatuaru, D., Tzvetkov, N. Strichartz estimates for partially periodic solutions to

Schr¨odingerequations in 4d and applications. J. Reine Angew. Math. (2012), Vol. 2014, No.

690, 65-78.

[79] Ionescu, A., Pausader, B. Global well-posedness of the energy-critical defocusing NLS on

3 R × T , Comm. Math. Phys., 312 (2012), no. 3, 781-831.

[80] Keel, M., Tao, T. Endpoint Strichartz estimates. American J. of Math. 120 (1998), 955-980

[81] Killip, R., Visan, M. Scale-invariant Strichartz estimates on tori and applications. Math.

Res. Lett. 23 (2016), 445 – 472.

[82] Strichartz, R. Restrictions of Fourier transforms to quadratic surfaces and decay of solutions

of wave equations. Duke Math. J., Volume 44, Number 3 (1977), 705-714.

[83] Takaoka, H., Tzvetkov, N. On 2D Nonlinear Schr¨odingerEquations with Data on R × T. Journal of Functional Analysis, Volume 182, Issue 2, 1 June 2001, Pages 427-442

[84] Tao, T. Nonlinear dispersive equations: local and global analysis. CBMS Vol. 106 (2006),

373 pp.

[85] Tzvetkov, N., Visciglia, N. Small data scattering for the nonlinear Schr¨odingerequation on

product spaces. Communications in Partial Differential Equations 37(1):125 - 135, January

2012.

[86] Wolff, T. Local smoothing type estimates on Lp for large p. GAFA Volume 10, Issue 5,

Dec. 2000, pp 1237 - 1288.

188